🔗 Browser Overview and Components

🔗 Browser Overview

Web browsers are among the most essential and widely used software applications that people interact with. Today, almost every device that connects to the internet uses a browser-engine to transform remote data into a human-friendly format (webpages). There are an estimated 3+ billion devices with browsers active monthly.

This is a snapshot of a browser usage table. Chrome accounts for roughly 60% of users, Safari roughly 17%.

Simultaneously, web browsers are among the largest and most complex pieces of software ever created. This makes intuitive sense, as in many cases, these projects have grown alongside web technologies or even pioneered them. JavaScript itself, today an underpinning of the modern tech ecosystem, was invented as a dynamic feature for NetScape Navigator in the early 90's.

For a general sense of scale, WebKit (the browser engine used by Safari) contains roughly 32 million lines of code:

-------------------------------------------------------------------
Language         files          blank        comment           code
-------------------------------------------------------------------
C++              15196        1284740        1328729        8034688
JavaScript       53230         862614        1730736        5006027
C                16042         879110         823000        4916567
HTML             62948         369151         127415        3844171
C/C++ Header     27221         709481        1262889        3338078
Assembly          3331         244464         391867        1066578
IDL               1243           6379              0          63047
-------------------------------------------------------------------
SUM:            208684        5662886        7434005       32370387
-------------------------------------------------------------------

If this wasn't large enough, Chromium clocks in at around 44 Million:

---------------------------------------------------------------------
Language           files          blank        comment           code
---------------------------------------------------------------------
C++                65013        2861088        2151479       16291130
C/C++ Header       61959        1531706        2518555        6163851
C                   8276         524194         774031        3347402
JavaScript         29559         572627         953514        3518706
Assembly            5132         318772         549951        1242603
IDL                 2043          13551              1          98681
...
---------------------------------------------------------------------
SUM:              302467        7436973        9118176       44645556

These are enormous pieces of software that contain incalculable complexity. In fact, browsers can rival the operating systems they run on in many cases:

Browsers	OS
Chromium: 44 Million LOC	Windows: 60-100 Million est LOC
WebKit: 32 Million LOC	Linux: 17 Million LOC

While the scale may be intimidating, this degree of complexity is a double-edged sword as it guarantees the inescapable fate of exploitable bugs and flaws. Next, we will begin to break the browser down into its conceptual parts. Reading all 44 million lines of code would be neither enjoyable nor practical, so the process of directing our focus will be invaluable in the long run.

🔗 Browser Components

Although we commonly think of a web browser as a single, monolithic entity, it is often more appropriate to think about them as a collection of layered components:

Each browser will differ in its specific implementation, but this general architecture provides a useful mental model for thinking about browsers at a technical level.

When users interact with a web browser, they will typically see something like this:

We can see all the familiar bits and pieces of a modern desktop browser:

URL Bar
Various Tabs
Bookmarks
Website Content
etc...

We can already draw an important distinction between two major browser subsystems:

The "broker" which handles the frontend UI and interactions between the address bar, bookmarks, tabs, and other parts of the native browser application.
The "renderer" which handles everything related to displaying the actual web content: parsing html, applying css styles, running JavaScript, etc.

This logical separation between "broker" and "renderer" is also a security boundary. By its nature, the renderer must process untrusted, essentially arbitrary, remote blobs of HTML and JavaScript data, both of which contain the potential for significant complexity.

It is for this reason that the vast majority of this training will focus on finding and exploiting vulnerabilities within the renderer. In general, we will be writing specially crafted web-content to trigger bugs within the renderer and subsequently leveraging those bugs to obtain full blown remote code execution.

🔗 Browser Processes

As mentioned in the previous section, the 'native browser' and 'renderer' define both a subsystem separation as well as a security boundary. This begs the question of how this security boundary is enforced or implemented; as both of these components seem to exist within a single program.

The most common modern approach is to create a hard separation by placing dangerous components (such as the renderer) into their own process. We can see this with Task Manager quite easily:

Despite only running a single instance of Chrome, we can clearly see more than just "one" chrome.exe process running. Each of these processes contains an isolated component of the overall browser.

On Linux, we can more easily see a hierarchy emerge using ps fx:

itszn  87743  /bin/bash
itszn  87746   \_ /usr/lib/chromium-browser/chromium-browser --enable-pinch
itszn  87754       \_ /usr/lib/chromium-browser/chromium-browser --type=zygote
itszn  87756       |   \_ /usr/lib/chromium-browser/chromium-browser --type=zygote
itszn  87837       |       \_ /usr/lib/chromium-browser/chromium-browser --type=renderer 
itszn  87878       |       \_ /usr/lib/chromium-browser/chromium-browser --type=renderer 
itszn  87889       |       \_ /usr/lib/chromium-browser/chromium-browser --type=renderer 
itszn  87966       |       \_ /usr/lib/chromium-browser/chromium-browser --type=renderer 
itszn  87985       |       \_ /usr/lib/chromium-browser/chromium-browser --type=renderer 
itszn  88019       |       \_ /usr/lib/chromium-browser/chromium-browser --type=utility
itszn  87778       \_ /usr/lib/chromium-browser/chromium-browser --type=gpu-process
itszn  87836           \_ /usr/lib/chromium-browser/chromium-browser --type=-broker

We can immediately pull out a few different "process types" that make up Chrome:

zygote
renderer
broker
utility
gpu-process

Most notably, the majority of these processes are of type=renderer. This makes intuitive sense: the renderer is responsible for displaying and handling (nearly) all web-content. It is sometimes also referred to as the "Content Process" for this reason.

Furthermore, this implies that a new renderer is required for each webpage, browser tab, iframe, etc. Placing this untrusted content in a separate process increases the overall stability of the browser as well: If any bugs are triggered, only one browser-tab crashes rather than the entire browser. You've likely seen this in action if you've ever seen one of the infamous "Oops, something went wrong!" screens in Chrome.

Noteinfo
Although not an immediate concern for us, it should also be noted that 'renderer' processes are typically sandboxed. This is meant to provide another layer of protection for the overall system in the event that a renderer is successfully compromised by an attacker.

🔗 General Browser Architecture

Now that we've broken the browser down into a few logical components and have taken a quick look at how they use processes to compartmentalize complexity, we can draw a diagram that is more faithful to the technical reality of modern web browsers:

This time, we can also see the major parts that make up the renderer. The components shown are essentially self-explanatory in their purpose, but as we can see, they each break into additional sub-components that achieve specific goals.

However, as we see complexity exploding every time we peel away another layer, it begs the question of how multiple browser vendors all coordinate to provide a consistent experience. It would be an unimaginable mess if let a = 1 + b * 0 had different order-of-operations rules applied on Chrome versus Safari.

🔗 Web Standards and Specifications

This problem is generally solved by following various web standards and specifications. Nearly each "sub-box" that we drew into our diagram has a large, exhaustive document associated with it that describes the exact behavior browsers are expected to implement. By adhering to this common set of standards, the vendors can all be confident in their compatibility.

As you may imagine, there are many standards to abide by. Some of the noteworthy ones:

W3C - HTML related standards, CSS

WHATWG - HTML, DOM, Fetch, URL, etc

ECMA - JavaScript standards

Although many of these documents are far too dense to be useful "most of the time", it is important to be aware of their existence. Vulnerabilities often hide in the edge cases, and if you ever need the "ground truth" on how something is supposed to be implemented, the standards are where you will find that information.

🔗 WebIDL: Web Interface Definition Language

A particularly important specification to be familiar with is the Web Interface Definition Language

WebIDL provides a standardized way to define APIs between various browser components. Conceptually, you can think of it as a blueprint for the "glue" that allows the components we've seen to interface with each other. During the build process, WebIDLs are automatically converted into C++ code and other components can include the resulting header files to interface with a particular component.

Below is a snippet of an example WebIDL file:

[Constructor,
 Exposed=Window]
interface Document : Node {
  [SameObject] readonly attribute DOMImplementation implementation;
  readonly attribute USVString URL;
  readonly attribute USVString documentURI;
  readonly attribute USVString origin;
  readonly attribute DOMString compatMode;
  readonly attribute DOMString characterSet;
  readonly attribute DOMString charset; // historical alias of .characterSet
  readonly attribute DOMString inputEncoding; // historical alias of .characterSet
  readonly attribute DOMString contentType;

  readonly attribute DocumentType? doctype;
  readonly attribute Element? documentElement;
  HTMLCollection getElementsByTagName(DOMString qualifiedName);
  HTMLCollection getElementsByTagNameNS(DOMString? namespace, DOMString localName);
  HTMLCollection getElementsByClassName(DOMString classNames);
  ...

Both Chrome and Safari use WebIDLs.

Noteinfo
The generated C++ code will be in the build directory. Usually reading it is not worth the time. If you want to look for bugs in autogenerated code, you should audit the generators.

Another example of a WebIDL definition and some of its generated C++ code

[
    LegacyUnenumerableNamedProperties,
    LegacyOverrideBuiltIns,
    JSGenerateToNativeObject,
    Exposed=Window
] interface HTMLFormElement : HTMLElement {
    [CEReactions=NotNeeded, Reflect=accept_charset] attribute DOMString acceptCharset;
    [CEReactions=NotNeeded] attribute [AtomString] USVString action;
    [CEReactions=NotNeeded] attribute [AtomString] DOMString autocomplete;
    [CEReactions=NotNeeded] attribute [AtomString] DOMString enctype;
    [CEReactions=NotNeeded, ImplementedAs=enctype] attribute [AtomString] DOMString encoding;
    [CEReactions=NotNeeded] attribute [AtomString] DOMString method;
    [CEReactions=NotNeeded, Reflect] attribute DOMString name;
    [CEReactions=NotNeeded, Reflect] attribute boolean noValidate;
    [CEReactions=NotNeeded, Reflect] attribute DOMString target;
    [CEReactions=NotNeeded, Reflect] attribute DOMString rel;
    [SameObject, PutForwards=value] readonly attribute DOMTokenList relList;

    readonly attribute HTMLFormControlsCollection elements;
    readonly attribute unsigned long length;
    getter Element? (unsigned long index);
    getter (RadioNodeList or Element)? ([RequiresExistingAtomString] DOMString name);

    [ImplementedAs=submitFromJavaScript] undefined submit();
    [EnabledBySetting=RequestSubmitEnabled] undefined
            requestSubmit(optional HTMLElement? submitter);
    [CEReactions=Needed] undefined reset();
    boolean checkValidity();
    [EnabledBySetting=InteractiveFormValidationEnabled] boolean reportValidity();
};

/Source/WebCore/html/HTMLFormElement.idl

C++
class JSHTMLFormElement : public JSHTMLElement {
public:
    using Base = JSHTMLElement;
    using DOMWrapped = HTMLFormElement;
    static JSHTMLFormElement* create(JSC::Structure* structure,
    JSDOMGlobalObject* globalObject, Ref<HTMLFormElement>&&impl)
    {
        JSHTMLFormElement* ptr = new (NotNull, JSC::allocateCell
    <JSHTMLFormElement>(globalObject->vm().heap))
    JSHTMLFormElement(structure, *globalObject, WTFMove(impl));
        ptr->finishCreation(globalObject->vm());
        return ptr;
    }

    static JSC::JSObject* createPrototype(JSC::VM&,
        JSDOMGlobalObject&);
    static JSC::JSObject*prototype(JSC::VM&,JSDOMGlobalObject&);
    static HTMLFormElement* toWrapped(JSC::VM&, JSC::JSValue);
    static bool getOwnPropertySlot(JSC::JSObject*,
        JSC::ExecState*, JSC::PropertyName, JSC::PropertySlot&);
    static bool getOwnPropertySlotByIndex(JSC::JSObject*,
    JSC::ExecState*, unsigned propertyName, JSC::PropertySlot&);
    static void getOwnPropertyNames(JSC::JSObject*,
        JSC::ExecState*, JSC::PropertyNameArray&,
        JSC::EnumerationMode = JSC::EnumerationMode());
    ...
protected:
    void finishCreation(JSC::VM&);
};

In the next training module, we will focus on actually compiling a modern web browser. Although we will not be directly using or modifying WebIDLs, having a rough idea of what they're like helps demystify the mechanics of how various components integrate with each other.

🔗 Meet the Browsers

Up to now, we have been hopping around several different topics in order to paint a rough mental picture of typical browser architectures as they exist today. Before we dive into the practicalities of building these engines, we will take some time to introduce the two web browsers this training will focus on.

🔗 Safari (WebKit)

Safari is the default browser on Apple products and is developed and maintained in-house by Apple. Safari itself is a wrapper around WebKit with macOS/iOS specific tweaks applied where necessary. WebKit makes up close to the entirety of Safari, and that is where we will focus our attention.

WebKit has its origins in KHTML, an internal project at Apple circa 2001. It was open-sourced around 2005 and has grown into the WebKit project since then. The two primary components of WebKit are:

WebCore - This is the Rendering Engine
JavaScriptCore - This is the JavaScript Engine

It's not uncommon for other software projects to embed all or parts of WebKit. For example, WebKit is used in the browsers built into Nintendo consoles, Sony Playstation consoles, and the Amazon Kindle. Just as Safari is a WebKit wrapper with Apple-platform-specific tweaks, these projects similarly wrap WebKit with hardware-specific implementations.

🔗 Chrome (Chromium)

Chrome is perhaps the most well known web browser in the world, and certainly the most used (at least on Desktop systems). Chrome is built on top of the Chromium browser engine. Like WebKit, it is open source, and was open-sourced by Google in 2008.

The primary components of Chrome are:

Blink - The Rendering Engine
- Originally used WebCore from WebKit, forked in 2013

V8 - The JavaScript Engine

Like WebKit, many projects make use of Chromium. For instance, the browsers in Tesla vehicles and Samsung smart TVs.