🔗 Browser Components & the DOM

In the previous sections, we've spoken in general terms about how browsers are organized in terms of their functional components.

In this module, we will explore how the browsers we've been learning about actually perform their job of browsing websites.

🔗 Normal Browser Use

The easiest way to begin discussing the DOM is to consider the normal use-case of a web browser.

A typical workflow goes something like this:

User opens tab/window
- New renderer process created
User navigates to webpage
Browser network stack makes HTTP request
- HTML content is returned from the server
Renderer processes HTML content and displays it
- Builds the DOM
- Executes any scripts

Let's focus in on Number 3: HTML content is returned from the server.

The server will return an HTML document that looks something like this:

HTML
<!DOCTYPE html>
<html>
  <head>
    <title>Example</title>
  </head>
  <body>
    <h1>Example HTML</h1>
    <script>console.log("Hello world")</script>
  </body>
</html>

The Networking Process sends this document over to our Renderer Process. The actual processing, parsing, and then rendering of that data is what we are going to be focusing on.

🔗 DOM Introduction

DOM stands for Document Object Model:

Representation of structure and hierarchy of HTML content
Provides an interface for HTML content
Provides model for processing document events

The standard can be found here, and the full HTML spec here.

We can summarize the main goals of the DOM as follows:

Build an element tree following HTML semantics
Process and provide interface to DOM tree events
Provide overall interface to JavaScript

The DOM Tree is the data structure that facilitates those goals. It is a fairly standard "Tree of Nodes" data-structure, where each child node can have multiple other child nodes.

🔗 DOM Tree Interfaces

The DOM Tree provides the following interfaces:

Node: Support overall tree structure
Document: Provide document context to the tree, usually the root node
Element: A node that represents an HTML element (tags)
Text: A node that simply contains some text

All the nodes in the DOM also inherit from the Event Target interface

The overall "tree structure" is made up of Nodes, while the root (the Document) provides things like encoding, URL, origin, etc that apply to the webpage itself. Elements contain the actual HTML content; these are a translated representation of various tags in the original HTML document downloaded from a server.

Finally, Text Nodes simply contain text. Eventually, all text in an HTML document ends up inside a text node deep inside the DOM Tree.

🔗 DOM Implementations

We will take a brief look at how Blink (Chromium) and WebCore (WebKit) choose to implement the DOM.

At this point, we are not overly concerned with the specifics, but you can see how the DOM specification we've talked about maps directly to the C++ code in each browser engine.

🔗 Dom Implementations - Blink

C++
class CORE_EXPORT EventTarget : public ScriptWrappable {...}

// A Node is a base class for all objects in the DOM tree.
// The spec governing this interface can be found here:
// https://dom.spec.whatwg.org/#interface-node
class CORE_EXPORT Node : public EventTarget {...}

// HTMLElement class defined for Chromium
class CORE_EXPORT ContainerNode : public Node { ... }
class CORE_EXPORT Element : public ContainerNode {...}
class CORE_EXPORT HTMLElement : public Element { ... }
// All the HTML tags are subclasses of HTMLElement

// Root document class for Chromium
class CORE_EXPORT Document : public ContainerNode, ... { ... }

// The Text class defined for Chromium
class CORE_EXPORT CharacterData : public Node { ... }
class CORE_EXPORT Text : public CharacterData { ... }

/src/third_party/blink/renderer/core/dom/events/event_target.h

🔗 Dom Implementations - WebCore

C++
class EventTarget : public ScriptWrappable { ... }

// Top level Node classes
class Node : public EventTarget { ... }

// HTMLElement class defined for WebKit
class ContainerNode : public Node { ... }
class Element : public ContainerNode , public CanMakeWeakPtr<Element> { .. }
class StyledElement : public Element { ... }
class HTMLElement : public StyledElement { ... }
// All the HTML tags are subclasses of HTMLElement

// Root document class in WebKit
class Document : public ContainerNode ... { ... }

// The Text class defined for WebKit
class CharacterData : public Node { ... }
class Text : public CharacterData { ... }

/Source/WebCore/dom/EventTarget.h

This is a very similar structure to Blink.

Various other Node class definitions:

🔗 Parsing DOM

Now that we've introduced the DOM at a high level, let's see an example of how HTML is translated into DOM elements.

Below, we have a relatively simple HTML document:

HTML
<html>
  <head>
    <title>Example</title>
  </head>
  <body>
    <h1 id="header">Example HTML</h1>
    <script>console.log("Hello world")</script>
  </body>
</html>

Translated to the DOM representation we've discussed:

- Document
  - HTMLHtmlElement
    - HTMLHeadElement
      - HTMLTitleElement
        - Text: "Example"
    - HTMLBodyElement
      - HTMLHeadingElement: id="header"
        - Text: "Example HTML"
      - HTMLScriptElement
        - Text: console.log("Hello world")

Next, we will explore some properties of each of these elements.

🔗 HTML Elements

HTML Elements extend the functionality of "regular" DOM nodes. Conceptually, they are the equivalent of HTML tags, and act as a sort of "container" for other content. In the tree hierarchy, they are the "parent nodes" of all the content in-between a pair of HTML tags.

HTML Elements can be the target of events (like all Nodes), can have attributes, and can be named (then referenced by that name).

There are a number of special attributes that HTML Elements have:

id - Unique id for element (only one should exist per document)
name - Name of the element
class - Style settings
Other Element specific attributes
Data Attributes - Custom attributes beginning with data-

🔗 DOM API

Using IDLs, the DOM exposes an API to JavaScript. This allows for programmatic interaction and modification of the overall document:

JavaScript
// Select current elements
> let h = document.getElementById("header"); h
<h1 id="header">Example HTML</h1>
> h.firstChild
"Example HTML"

// Modify current elements
> document.body.remove(h)

// Create new elements
> let a = document.createElement('a');
> a.setAttribute('href','http://example.com')

// Append elements to the DOM
> document.body.appendChild(a)
<a href="http://example.com"></a>

🔗 Developer Console

You can hit F12 or right-click and select "Inspect Element" to open up the developer console in most browsers.

From here, there are a myriad of useful tools for exploring the DOM, interacting with or debugging JavaScript, and a whole host of other functions. Most features on this screen are aimed at Web Developers, but there are lots of useful tools for us here as well.

🔗 DOM Memory Management

As you may imagine, the DOM can quickly become a very complex data structure. It is made up of various types of nodes, many of which have differing lifetimes and themselves can have dynamic behaviors. On top of this, the DOM is often manipulated by JavaScript which can change just about anything, at any time.

All this to say, node relationships can become extremely complex, and this complexity in turn makes managing the underlying memory a nontrivial task as well.

If all we had to deal with was well formed and relatively simple HTML, we may be able to represent our node-relationships with something like this:

The relationships would be simple enough to manage using even naive tree-traversal algorithms and a bit of bookkeeping. Unfortunately, the richness of the DOM paired with the complexity of JavaScript means we are usually dealing with something like this:

These are the types of relationships browsers have to deal with on the modern web. Rather than a clean tree-like structure, we have a mishmash of:

Parent Nodes
Child Nodes
Sibling Nodes
JavaScript References
Stack / other references

To top it all off, these tend to be dynamic relationships.

There are two major consequences of implementing poor memory management for the DOM:

Memory Leaks
- Bad for performance; direct user-facing negative outcomes

Use-After-Free Vulnerabilities
- Bad for security, users, and everyone who is NOT a vulnerability researcher

🔗 Memory Management Strategies

There are two primary "types" of strategies that browsers use for memory management. Broadly speaking, these are reference counting and garbage collection. Each has its own benefits and downsides, and in some cases, a hybridized approach emerges out of necessity.

Roughly, the browsers implement the following:

WebKit: Reference Counting
Chrome: Garbage Collection and Reference Counting
Firefox: Reference Counting

🔗 Reference Counting

In reference counting strategies, each object keeps track of how many other objects "reference" it. In practice, a "reference" usually means a pointer that the other object is treating as valid. Technically however, it may be more appropriate to think of a reference as a generalized dependency: each reference implies another object depends on the continued existence & functioning of the current object.

When all is said and done, reference counting comes down to two "golden rules":

Whenever this reference count reaches zero:

No other objects reference this one, it can be free'd

Whenever this reference count is greater than zero:

At least one other object holds a reference; it cannot be free'd

Whenever we have a reference counting system that breaks either of those rules, a potential Use-After-Free vulnerability (UAF) exists!

🔗 Reference Counting - Example

Below is a trimmed down excerpt from WebKit's RefCountedBase class:

C++
class RefCountedBase {
    // Increment internal reference count
    void ref() const {
        ++m_refCount;
    }

    // Returns whether the pointer should be freed or not.
    bool derefBase() const {
        unsigned tempRefCount = m_refCount - 1;
        if (!tempRefCount) {
            return true;
        }
        m_refCount = tempRefCount;
        return false;
    }

    mutable unsigned m_refCount;
}

/Source/WTF/wtf/RefCounted.h

An interesting thing to note: When the last reference is removed, m_refCount is never actually set to zero, the function returns true instead.

Below is where the actual free() happens:

C++
template<typename T> class RefCounted : public RefCountedBase {
    ...
    void deref() const
    {
        if (derefBase())
            delete static_cast<const T*>(this);
    }
    ...
}

/Source/WTF/wtf/RefCounted.h

As soon as the final deref() is called, the object is deleted.

Overall, reference counting is a fairly common and pretty good solution for memory management. Memory leaks become rare, UAFs are mitigated, and there is relatively little overhead in terms of performance.

Unfortunately, the efficacy and security of this scheme relies entirely on programmers remembering to call ref() and deref() at the appropriate times, with no exceptions.

🔗 Reference Counting - RefPtrs & Refs

Keeping track of every instance where one should call ref() or deref() is both bug-prone and a major hassle. Two mechanisms exist which try to abstract away a majority of this bookkeeping.

This post from the WebKit blog is a good resource for more information, but we will go over the main ideas it describes.

🔗 Reference Counting - RefPtrs

A RefPtr is a smart-pointer purpose built for the WebKit project. It:

Automatically calls ref() and deref() when appropriate
Has mechanisms for "adopting" a raw pointer
Compatible with all classes that implement ref and deref
- Usually RefCountedBase

We can see part of its implementation below:

C++
template<typename T, typename PtrTraits>
class RefPtr {
    ...
    ALWAYS_INLINE RefPtr(T* ptr) : m_ptr(ptr) { refIfNotNull(ptr); }
    ...
    ALWAYS_INLINE ~RefPtr() { derefIfNotNull(PtrTraits::exchange(m_ptr, nullptr)); }
    ...
t

/Source/WTF/wtf/RefPtr.h

As you can see, RefPtr can be used in conjunction with templating to "wrap" other classes. Then, by overloading certain operations, RefPtr can automatically perform most required bookkeeping.

Below we can see RefPtr being used:

C++
void HTMLFormElement::submit(...) {
    RefPtr<FrameView> view = document().view();
    RefPtr<Frame> frame = document().frame();
    ...
    RefPtr<HTMLFormControlElement> firstSuccessfulSubmitButton;
    ...
}

/Source/WebCore/html/HTMLFormElement.cpp

🔗 Reference Counting - Refs

A Ref is almost identical to RefPtr, but is analogous to a reference rather than a pointer.

A Ref cannot be NULL, but otherwise behaves essentially the same as a RefPtr. The WebKit blog justifies it as a way of making it clear to the caller that this function will never return null.

🔗 Reference Counting - Reference Cycles

From what we've written so far, it may seem like "problem solved", simply use Ref and RefPtr. However, there are some situations which require exceptions.

Reference cycles are a fairly common issue when trying to implement this strategy:

The red nodes are now a memory leak! Given the strategy we outlined above, this scenario will result in a situation where neither red node will ever be free()'d.

🔗 Reference Counting - Weak Pointers

A potential solution to the reference cycle problem is to simply use a "raw pointer" in situations that can result in a reference cycle. That way, one "end" of the cycle is broken and reference counting works as expected again.

This "raw pointer" is often called a weak pointer or weak reference, while pointers that are "RefCounted" are called strong pointers or references.

Although this solves our issue of cycles, introducing weak pointers comes with some baggage. Most critically:

The object holding the weak reference must not outlive the object being referenced

There is also some additional book-keeping that pops up as a result:

Objects must know which other objects hold weak references to them
When freed, mark all their weak-refs as cleared

The "mixing" of strong and weak pointers in this manner is risky for all of the "classical exploitation" reasons. Using them in situations where object or memory lifetime is ephemeral (such as on the stack, or when queuing up JavaScript callbacks) can be especially tricky:

If a pointer lasts longer than the object it references => UAF

🔗 Reference Counting - Overflowing RefCounts

Let's take a look at WebKits RefCountedBase class:

C++
private:
    ...
    mutable unsigned m_refCount;
    ...

/Source/WTF/wtf/RefCounted.h

What would happen if we had a lot of references...

Make 4294967295 refs to object
Add two more, overflows to 1 ref
Trigger a deref()
UAF!

Although a bit "funny" this is a completely valid tactic which has been exploited in the real world. These days, most implementations will check for overflows and abort, as no reasonable webpage will have that number of references.

🔗 Garbage Collection

Another method for avoiding reference counting issues is to use a garbage collection strategy instead.

Blink takes this approach with Oilpan, "... a project to replace reference counting in Blink with a GC".

Much better at eliminating memory leaks
Less "manual" input from programmers
- No worrying about weak pointers, adopting refs, etc
Fairly standard "mark & sweep" GC
- We'll talk more about GC algorithms in subsequent training modules

You can read this page for a more in-depth look at the GCs design.

At a high level, Oilpan performs the following:

Trigger GC, wait for all threads to enter a "safepoint"
Starting from the root-set, call Trace() on all reachable nodes
Any objects that are not reachable are deleted

Oilpan also scans the stack for references to objects and adds those to the root-set (this is called a conservative GC).

🔗 Garbage Collection - Object Ownership Issues

Some objects "own" their pointers. This ownership can be reset, potentially freeing resources.

Example: postMessage (which sends messages to other contexts e.g. iframes / workers) "transfers" ArrayBuffers:

JavaScript
a = new ArrayBuffer(10)
> ArrayBuffer(10) {}
postMessage('','*',[a])
a
> ArrayBuffer(0) {}

Let's take a look at this kind of scenario creating an exploitable bug.

🔗 Blink: CVE-2019-5786

This CVE was found as a 0-day "in the wild" by Google. At a high level, the vulnerability worked like this:

FileReaderLoader can create two DOMArrayBuffers with the same underlying reference
Each DOMArrayBuffer "owns" its underlying reference
A UAF is caused by freeing the underlying ArrayBuffer object from one DOMArrayBuffer but the other has a stale reference!

Lets take a look at a relevant code-snippet, which could be invoked from a JavaScript event callback:

C++
DOMArrayBuffer* FileReaderLoader::ArrayBufferResult() {
    ...
    // Copy reference to current array buffer to DOMArrayBuffer
    DOMArrayBuffer* result = DOMArrayBuffer::Create(raw_data_->ToArrayBuffer());
    ...
    return result;
}

Under certain conditions, it was possible for raw_data_->ToArrayBuffer() to return the same WTF::ArrayBuffer for consecutive calls. If this occurs, the result, a DOMArrayBuffer *, can be created multiple times from the same underlying WTF::ArrayBuffer.

As hinted previously, object ownership can become a tricky issue to handle properly. By abusing postMessage , we can "transfer ownership" of these DOMArrayBuffers:

JavaScript
// arr1 and arr2 wrap DOMArrayBuffers with same underlying WTF::ArrayBuffer
window.postMessage([], '*', [arr1, arr2]);

This causes the following to occur:

arr1 is transferred, WTF::ArrayBuffer marked as neutered (cleared)
Tries to transfer arr2, can't since already neutered
Throws exception
- Frees ArrayBuffer in arr1, since it's neutered and no longer used
- But arr2 still has a reference! UAF

To fix this, the function now clones data before returning it in the DOMArrayBuffer. In the "bugged" version, it technically cloned the reference, not the data:

C++
DOMArrayBuffer* FileReaderLoader::ArrayBufferResult() {
    ...
    // Copy underlying data instead of copying reference
    return DOMArrayBuffer::Create(
        ArrayBuffer::Create(raw_data_->Data(), raw_data_->ByteLength()));
    ...
}

You can find more details on the Chrome Bug Tracker, or read an in-depth writeup from Exodus Intelligence.