Browser Components & the DOM
In the previous sections, we've spoken in general terms about how browsers are organized in terms of their functional components.
In this module, we will explore how the browsers we've been learning about actually perform their job of browsing websites.
Normal Browser Use
The easiest way to begin discussing the DOM is to consider the normal use-case of a web browser.
A typical workflow goes something like this:
User opens tab/window
- New renderer process created
User navigates to webpage
Browser network stack makes HTTP request
- HTML content is returned from the server
Renderer processes HTML content and displays it
- Builds the DOM
- Executes any scripts
Let's focus in on Number 3: HTML content is returned from the server.
The server will return an HTML document that looks something like this:
HTML<!DOCTYPE html>
<html>
<head>
<title>Example</title>
</head>
<body>
<h1>Example HTML</h1>
<script>console.log("Hello world")</script>
</body>
</html>
The Networking Process sends this document over to our Renderer Process. The actual processing, parsing, and then rendering of that data is what we are going to be focusing on.
DOM Introduction
DOM stands for Document Object Model:
- Representation of structure and hierarchy of HTML content
- Provides an interface for HTML content
- Provides model for processing document events
The standard can be found here, and the full HTML spec here.
We can summarize the main goals of the DOM as follows:
- Build an element tree following HTML semantics
- Process and provide interface to DOM tree events
- Provide overall interface to JavaScript
The DOM Tree is the data structure that facilitates those goals. It is a fairly standard "Tree of Nodes" data-structure, where each child node can have multiple other child nodes.
DOM Tree Interfaces
The DOM Tree provides the following interfaces:
Node: Support overall tree structure
Document: Provide document context to the tree, usually the root node
Element: A node that represents an HTML element (tags)
Text: A node that simply contains some text
All the nodes in the DOM also inherit from the Event Target interface
The overall "tree structure" is made up of Nodes, while the root (the Document) provides things like encoding, URL, origin, etc that apply to the webpage itself. Elements contain the actual HTML content; these are a translated representation of various tags in the original HTML document downloaded from a server.
Finally, Text Nodes simply contain text. Eventually, all text in an HTML document ends up inside a text node deep inside the DOM Tree.
DOM Implementations
We will take a brief look at how Blink (Chromium) and WebCore (WebKit) choose to implement the DOM.
At this point, we are not overly concerned with the specifics, but you can see how the DOM specification we've talked about maps directly to the C++ code in each browser engine.
Dom Implementations - Blink
C++class CORE_EXPORT EventTarget : public ScriptWrappable {...}
// A Node is a base class for all objects in the DOM tree.
// The spec governing this interface can be found here:
// https://dom.spec.whatwg.org/#interface-node
class CORE_EXPORT Node : public EventTarget {...}
// HTMLElement class defined for Chromium
class CORE_EXPORT ContainerNode : public Node { ... }
class CORE_EXPORT Element : public ContainerNode {...}
class CORE_EXPORT HTMLElement : public Element { ... }
// All the HTML tags are subclasses of HTMLElement
// Root document class for Chromium
class CORE_EXPORT Document : public ContainerNode, ... { ... }
// The Text class defined for Chromium
class CORE_EXPORT CharacterData : public Node { ... }
class CORE_EXPORT Text : public CharacterData { ... }
Dom Implementations - WebCore
C++class EventTarget : public ScriptWrappable { ... }
// Top level Node classes
class Node : public EventTarget { ... }
// HTMLElement class defined for WebKit
class ContainerNode : public Node { ... }
class Element : public ContainerNode , public CanMakeWeakPtr<Element> { .. }
class StyledElement : public Element { ... }
class HTMLElement : public StyledElement { ... }
// All the HTML tags are subclasses of HTMLElement
// Root document class in WebKit
class Document : public ContainerNode ... { ... }
// The Text class defined for WebKit
class CharacterData : public Node { ... }
class Text : public CharacterData { ... }
This is a very similar structure to Blink.
Various other Node class definitions:
Parsing DOM
Now that we've introduced the DOM at a high level, let's see an example of how HTML is translated into DOM elements.
Below, we have a relatively simple HTML document:
HTML<html>
<head>
<title>Example</title>
</head>
<body>
<h1 id="header">Example HTML</h1>
<script>console.log("Hello world")</script>
</body>
</html>
Translated to the DOM representation we've discussed:
- Document
- HTMLHtmlElement
- HTMLHeadElement
- HTMLTitleElement
- Text: "Example"
- HTMLBodyElement
- HTMLHeadingElement: id="header"
- Text: "Example HTML"
- HTMLScriptElement
- Text: console.log("Hello world")
Next, we will explore some properties of each of these elements.
HTML Elements
HTML Elements extend the functionality of "regular" DOM nodes. Conceptually, they are the equivalent of HTML tags, and act as a sort of "container" for other content. In the tree hierarchy, they are the "parent nodes" of all the content in-between a pair of HTML tags.
HTML Elements can be the target of events (like all Nodes), can have attributes, and can be named (then referenced by that name).
There are a number of special attributes that HTML Elements have:
- id - Unique id for element (only one should exist per document)
- name - Name of the element
- class - Style settings
- Other Element specific attributes
- Data Attributes - Custom attributes beginning with
data-
DOM API
Using IDLs, the DOM exposes an API to JavaScript. This allows for programmatic interaction and modification of the overall document:
JavaScript// Select current elements
> let h = document.getElementById("header"); h
<h1 id="header">Example HTML</h1>
> h.firstChild
"Example HTML"
// Modify current elements
> document.body.remove(h)
// Create new elements
> let a = document.createElement('a');
> a.setAttribute('href','http://example.com')
// Append elements to the DOM
> document.body.appendChild(a)
<a href="http://example.com"></a>
Developer Console
You can hit F12
or right-click and select "Inspect Element" to open up the developer console
in most browsers.
From here, there are a myriad of useful tools for exploring the DOM, interacting with or debugging JavaScript, and a whole host of other functions. Most features on this screen are aimed at Web Developers, but there are lots of useful tools for us here as well.
DOM Memory Management
As you may imagine, the DOM can quickly become a very complex data structure. It is made up of various types of nodes, many of which have differing lifetimes and themselves can have dynamic behaviors. On top of this, the DOM is often manipulated by JavaScript which can change just about anything, at any time.
All this to say, node relationships can become extremely complex, and this complexity in turn makes managing the underlying memory a nontrivial task as well.
If all we had to deal with was well formed and relatively simple HTML, we may be able to represent our node-relationships with something like this:
The relationships would be simple enough to manage using even naive tree-traversal algorithms and a bit of bookkeeping. Unfortunately, the richness of the DOM paired with the complexity of JavaScript means we are usually dealing with something like this:
These are the types of relationships browsers have to deal with on the modern web. Rather than a clean tree-like structure, we have a mishmash of:
- Parent Nodes
- Child Nodes
- Sibling Nodes
- JavaScript References
- Stack / other references
To top it all off, these tend to be dynamic relationships.
There are two major consequences of implementing poor memory management for the DOM:
- Memory Leaks
- Bad for performance; direct user-facing negative outcomes
- Use-After-Free Vulnerabilities
- Bad for security, users, and everyone who is NOT a vulnerability researcher
Memory Management Strategies
There are two primary "types" of strategies that browsers use for memory management. Broadly speaking, these are reference counting and garbage collection. Each has its own benefits and downsides, and in some cases, a hybridized approach emerges out of necessity.
Roughly, the browsers implement the following:
WebKit: Reference Counting
Chrome: Garbage Collection and Reference Counting
Firefox: Reference Counting
Reference Counting
In reference counting strategies, each object keeps track of how many other objects "reference" it. In practice, a "reference" usually means a pointer that the other object is treating as valid. Technically however, it may be more appropriate to think of a reference as a generalized dependency: each reference implies another object depends on the continued existence & functioning of the current object.
When all is said and done, reference counting comes down to two "golden rules":
Whenever this reference count reaches zero:
- No other objects reference this one, it can be free'd
Whenever this reference count is greater than zero:
- At least one other object holds a reference; it cannot be free'd
Whenever we have a reference counting system that breaks either of those rules, a potential Use-After-Free vulnerability (UAF) exists!
Reference Counting - Example
Below is a trimmed down excerpt from WebKit's RefCountedBase
class:
C++class RefCountedBase {
// Increment internal reference count
void ref() const {
++m_refCount;
}
// Returns whether the pointer should be freed or not.
bool derefBase() const {
unsigned tempRefCount = m_refCount - 1;
if (!tempRefCount) {
return true;
}
m_refCount = tempRefCount;
return false;
}
mutable unsigned m_refCount;
}
An interesting thing to note: When the last reference is removed, m_refCount
is never
actually set to zero, the function returns true
instead.
Below is where the actual free()
happens:
C++template<typename T> class RefCounted : public RefCountedBase {
...
void deref() const
{
if (derefBase())
delete static_cast<const T*>(this);
}
...
}
As soon as the final deref()
is called, the object is deleted.
Overall, reference counting is a fairly common and pretty good solution for memory management. Memory leaks become rare, UAFs are mitigated, and there is relatively little overhead in terms of performance.
Unfortunately, the efficacy and security of this scheme relies entirely on programmers
remembering to call ref()
and deref()
at the appropriate times, with no exceptions.
Reference Counting - RefPtrs & Refs
Keeping track of every instance where one should call ref()
or deref()
is both bug-prone
and a major hassle. Two mechanisms exist which try to abstract away a majority of this
bookkeeping.
This post from the WebKit blog is a good resource for more information, but we will go over the main ideas it describes.
Reference Counting - RefPtrs
A RefPtr
is a smart-pointer purpose built for the WebKit project. It:
Automatically calls
ref()
andderef()
when appropriateHas mechanisms for "adopting" a raw pointer
Compatible with all classes that implement
ref
andderef
- Usually
RefCountedBase
- Usually
We can see part of its implementation below:
C++template<typename T, typename PtrTraits>
class RefPtr {
...
ALWAYS_INLINE RefPtr(T* ptr) : m_ptr(ptr) { refIfNotNull(ptr); }
...
ALWAYS_INLINE ~RefPtr() { derefIfNotNull(PtrTraits::exchange(m_ptr, nullptr)); }
...
t
As you can see, RefPtr
can be used in conjunction with templating to "wrap" other classes.
Then, by overloading certain operations, RefPtr
can automatically perform most required
bookkeeping.
Below we can see RefPtr
being used:
C++void HTMLFormElement::submit(...) {
RefPtr<FrameView> view = document().view();
RefPtr<Frame> frame = document().frame();
...
RefPtr<HTMLFormControlElement> firstSuccessfulSubmitButton;
...
}
Reference Counting - Refs
A Ref
is almost identical to RefPtr
, but is analogous to a
reference rather than a pointer.
A Ref
cannot be NULL
, but otherwise behaves essentially the same as a RefPtr
. The
WebKit blog justifies it as a way of making it clear to the caller that this function will
never return null.
Reference Counting - Reference Cycles
From what we've written so far, it may seem like "problem solved", simply use Ref
and
RefPtr
. However, there are some situations which require exceptions.
Reference cycles are a fairly common issue when trying to implement this strategy:
The red nodes are now a memory leak! Given the strategy we outlined above, this scenario
will result in a situation where neither red node will ever be free()'d
.
Reference Counting - Weak Pointers
A potential solution to the reference cycle problem is to simply use a "raw pointer" in situations that can result in a reference cycle. That way, one "end" of the cycle is broken and reference counting works as expected again.
This "raw pointer" is often called a weak pointer or weak reference, while pointers that are "RefCounted" are called strong pointers or references.
Although this solves our issue of cycles, introducing weak pointers comes with some baggage. Most critically:
- The object holding the weak reference must not outlive the object being referenced
There is also some additional book-keeping that pops up as a result:
- Objects must know which other objects hold weak references to them
- When freed, mark all their weak-refs as cleared
The "mixing" of strong and weak pointers in this manner is risky for all of the "classical exploitation" reasons. Using them in situations where object or memory lifetime is ephemeral (such as on the stack, or when queuing up JavaScript callbacks) can be especially tricky:
- If a pointer lasts longer than the object it references => UAF
Reference Counting - Overflowing RefCounts
Let's take a look at WebKits RefCountedBase
class:
C++private:
...
mutable unsigned m_refCount;
...
What would happen if we had a lot of references...
- Make 4294967295 refs to object
- Add two more, overflows to 1 ref
- Trigger a
deref()
- UAF!
Although a bit "funny" this is a completely valid tactic which has been exploited in the real world. These days, most implementations will check for overflows and abort, as no reasonable webpage will have that number of references.
Garbage Collection
Another method for avoiding reference counting issues is to use a garbage collection strategy instead.
Blink takes this approach with Oilpan, "... a project to replace reference counting in Blink with a GC".
Much better at eliminating memory leaks
Less "manual" input from programmers
- No worrying about weak pointers, adopting refs, etc
Fairly standard "mark & sweep" GC
- We'll talk more about GC algorithms in subsequent training modules
You can read this page for a more in-depth look at the GCs design.
At a high level, Oilpan performs the following:
- Trigger GC, wait for all threads to enter a "safepoint"
- Starting from the root-set, call
Trace()
on all reachable nodes - Any objects that are not reachable are deleted
Oilpan also scans the stack for references to objects and adds those to the root-set (this is called a conservative GC).
Garbage Collection - Object Ownership Issues
Some objects "own" their pointers. This ownership can be reset, potentially freeing resources.
Example: postMessage
(which sends messages to other contexts e.g. iframes / workers) "transfers" ArrayBuffers:
JavaScripta = new ArrayBuffer(10)
> ArrayBuffer(10) {}
postMessage('','*',[a])
a
> ArrayBuffer(0) {}
Let's take a look at this kind of scenario creating an exploitable bug.
Blink: CVE-2019-5786
This CVE was found as a 0-day "in the wild" by Google. At a high level, the vulnerability worked like this:
FileReaderLoader
can create twoDOMArrayBuffers
with the same underlying referenceEach
DOMArrayBuffer
"owns" its underlying referenceA UAF is caused by freeing the underlying ArrayBuffer object from one
DOMArrayBuffer
but the other has a stale reference!
Lets take a look at a relevant code-snippet, which could be invoked from a JavaScript event callback:
C++DOMArrayBuffer* FileReaderLoader::ArrayBufferResult() {
...
// Copy reference to current array buffer to DOMArrayBuffer
DOMArrayBuffer* result = DOMArrayBuffer::Create(raw_data_->ToArrayBuffer());
...
return result;
}
Under certain conditions, it was possible for raw_data_->ToArrayBuffer()
to return the
same WTF::ArrayBuffer
for consecutive calls.
If this occurs, the result
, a DOMArrayBuffer *
, can be created multiple times from the same underlying WTF::ArrayBuffer
.
As hinted previously, object ownership can become a tricky issue to handle properly.
By abusing postMessage
, we can "transfer ownership" of these DOMArrayBuffers
:
JavaScript// arr1 and arr2 wrap DOMArrayBuffers with same underlying WTF::ArrayBuffer
window.postMessage([], '*', [arr1, arr2]);
This causes the following to occur:
arr1
is transferred,WTF::ArrayBuffer
marked as neutered (cleared)- Tries to transfer
arr2
, can't since already neutered - Throws exception
- Frees ArrayBuffer in
arr1
, since it's neutered and no longer used - But
arr2
still has a reference! UAF
- Frees ArrayBuffer in
To fix this, the function now clones data before returning it in the DOMArrayBuffer
. In the
"bugged" version, it technically cloned the reference, not the data:
C++DOMArrayBuffer* FileReaderLoader::ArrayBufferResult() {
...
// Copy underlying data instead of copying reference
return DOMArrayBuffer::Create(
ArrayBuffer::Create(raw_data_->Data(), raw_data_->ByteLength()));
...
}
You can find more details on the Chrome Bug Tracker, or read an in-depth writeup from Exodus Intelligence.