Browser Overview and Components
Browser Overview
Web browsers are among the most essential and widely used software applications that people interact with. Today, almost every device that connects to the internet uses a browser-engine to transform remote data into a human-friendly format (webpages). There are an estimated 3+ billion devices with browsers active monthly.
This is a snapshot of a browser usage table. Chrome accounts for roughly 60% of users, Safari roughly 17%.
Simultaneously, web browsers are among the largest and most complex pieces of software ever created. This makes intuitive sense, as in many cases, these projects have grown alongside web technologies or even pioneered them. JavaScript itself, today an underpinning of the modern tech ecosystem, was invented as a dynamic feature for NetScape Navigator in the early 90's.
For a general sense of scale, WebKit (the browser engine used by Safari) contains roughly 32 million lines of code:
-------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------
C++ 15196 1284740 1328729 8034688
JavaScript 53230 862614 1730736 5006027
C 16042 879110 823000 4916567
HTML 62948 369151 127415 3844171
C/C++ Header 27221 709481 1262889 3338078
Assembly 3331 244464 391867 1066578
IDL 1243 6379 0 63047
-------------------------------------------------------------------
SUM: 208684 5662886 7434005 32370387
-------------------------------------------------------------------
If this wasn't large enough, Chromium clocks in at around 44 Million:
---------------------------------------------------------------------
Language files blank comment code
---------------------------------------------------------------------
C++ 65013 2861088 2151479 16291130
C/C++ Header 61959 1531706 2518555 6163851
C 8276 524194 774031 3347402
JavaScript 29559 572627 953514 3518706
Assembly 5132 318772 549951 1242603
IDL 2043 13551 1 98681
...
---------------------------------------------------------------------
SUM: 302467 7436973 9118176 44645556
These are enormous pieces of software that contain incalculable complexity. In fact, browsers can rival the operating systems they run on in many cases:
Browsers | OS |
---|---|
Chromium: 44 Million LOC | Windows: 60-100 Million est LOC |
WebKit: 32 Million LOC | Linux: 17 Million LOC |
While the scale may be intimidating, this degree of complexity is a double-edged sword as it guarantees the inescapable fate of exploitable bugs and flaws. Next, we will begin to break the browser down into its conceptual parts. Reading all 44 million lines of code would be neither enjoyable nor practical, so the process of directing our focus will be invaluable in the long run.
Browser Components
Although we commonly think of a web browser as a single, monolithic entity, it is often more appropriate to think about them as a collection of layered components:
Each browser will differ in its specific implementation, but this general architecture provides a useful mental model for thinking about browsers at a technical level.
When users interact with a web browser, they will typically see something like this:
We can see all the familiar bits and pieces of a modern desktop browser:
- URL Bar
- Various Tabs
- Bookmarks
- Website Content
- etc...
We can already draw an important distinction between two major browser subsystems:
The "broker" which handles the frontend UI and interactions between the address bar, bookmarks, tabs, and other parts of the native browser application.
The "renderer" which handles everything related to displaying the actual web content: parsing html, applying css styles, running JavaScript, etc.
This logical separation between "broker" and "renderer" is also a security boundary. By its nature, the renderer must process untrusted, essentially arbitrary, remote blobs of HTML and JavaScript data, both of which contain the potential for significant complexity.
It is for this reason that the vast majority of this training will focus on finding and exploiting vulnerabilities within the renderer. In general, we will be writing specially crafted web-content to trigger bugs within the renderer and subsequently leveraging those bugs to obtain full blown remote code execution.
Browser Processes
As mentioned in the previous section, the 'native browser' and 'renderer' define both a subsystem separation as well as a security boundary. This begs the question of how this security boundary is enforced or implemented; as both of these components seem to exist within a single program.
The most common modern approach is to create a hard separation by placing dangerous components (such as the renderer) into their own process. We can see this with Task Manager quite easily:
Despite only running a single instance of Chrome, we can clearly see more than just
"one" chrome.exe
process running. Each of these processes contains an isolated
component of the overall browser.
On Linux, we can more easily see a hierarchy emerge using ps fx
:
itszn 87743 /bin/bash
itszn 87746 \_ /usr/lib/chromium-browser/chromium-browser --enable-pinch
itszn 87754 \_ /usr/lib/chromium-browser/chromium-browser --type=zygote
itszn 87756 | \_ /usr/lib/chromium-browser/chromium-browser --type=zygote
itszn 87837 | \_ /usr/lib/chromium-browser/chromium-browser --type=renderer
itszn 87878 | \_ /usr/lib/chromium-browser/chromium-browser --type=renderer
itszn 87889 | \_ /usr/lib/chromium-browser/chromium-browser --type=renderer
itszn 87966 | \_ /usr/lib/chromium-browser/chromium-browser --type=renderer
itszn 87985 | \_ /usr/lib/chromium-browser/chromium-browser --type=renderer
itszn 88019 | \_ /usr/lib/chromium-browser/chromium-browser --type=utility
itszn 87778 \_ /usr/lib/chromium-browser/chromium-browser --type=gpu-process
itszn 87836 \_ /usr/lib/chromium-browser/chromium-browser --type=-broker
We can immediately pull out a few different "process types" that make up Chrome:
- zygote
- renderer
- broker
- utility
- gpu-process
Most notably, the majority of these processes are of type=renderer
. This makes intuitive
sense: the renderer is responsible for displaying and handling (nearly) all
web-content. It is sometimes also referred to as the "Content Process" for this reason.
Furthermore, this implies that a new renderer is required for each webpage, browser tab, iframe, etc. Placing this untrusted content in a separate process increases the overall stability of the browser as well: If any bugs are triggered, only one browser-tab crashes rather than the entire browser. You've likely seen this in action if you've ever seen one of the infamous "Oops, something went wrong!" screens in Chrome.
NoteAlthough not an immediate concern for us, it should also be noted that 'renderer' processes are typically sandboxed. This is meant to provide another layer of protection for the overall system in the event that a renderer is successfully compromised by an attacker.
General Browser Architecture
Now that we've broken the browser down into a few logical components and have taken a quick look at how they use processes to compartmentalize complexity, we can draw a diagram that is more faithful to the technical reality of modern web browsers:
This time, we can also see the major parts that make up the renderer. The components shown are essentially self-explanatory in their purpose, but as we can see, they each break into additional sub-components that achieve specific goals.
However, as we see complexity exploding every time we peel away another layer, it begs the
question of how multiple browser vendors all coordinate to provide a consistent
experience. It would be an unimaginable mess if let a = 1 + b * 0
had different
order-of-operations rules applied on Chrome versus Safari.
Web Standards and Specifications
This problem is generally solved by following various web standards and specifications. Nearly each "sub-box" that we drew into our diagram has a large, exhaustive document associated with it that describes the exact behavior browsers are expected to implement. By adhering to this common set of standards, the vendors can all be confident in their compatibility.
As you may imagine, there are many standards to abide by. Some of the noteworthy ones:
W3C - HTML related standards, CSS
WHATWG - HTML, DOM, Fetch, URL, etc
ECMA - JavaScript standards
Although many of these documents are far too dense to be useful "most of the time", it is important to be aware of their existence. Vulnerabilities often hide in the edge cases, and if you ever need the "ground truth" on how something is supposed to be implemented, the standards are where you will find that information.
WebIDL: Web Interface Definition Language
A particularly important specification to be familiar with is the Web Interface Definition Language
WebIDL provides a standardized way to define APIs between various browser components. Conceptually, you can think of it as a blueprint for the "glue" that allows the components we've seen to interface with each other. During the build process, WebIDLs are automatically converted into C++ code and other components can include the resulting header files to interface with a particular component.
Below is a snippet of an example WebIDL file:
[Constructor,
Exposed=Window]
interface Document : Node {
[SameObject] readonly attribute DOMImplementation implementation;
readonly attribute USVString URL;
readonly attribute USVString documentURI;
readonly attribute USVString origin;
readonly attribute DOMString compatMode;
readonly attribute DOMString characterSet;
readonly attribute DOMString charset; // historical alias of .characterSet
readonly attribute DOMString inputEncoding; // historical alias of .characterSet
readonly attribute DOMString contentType;
readonly attribute DocumentType? doctype;
readonly attribute Element? documentElement;
HTMLCollection getElementsByTagName(DOMString qualifiedName);
HTMLCollection getElementsByTagNameNS(DOMString? namespace, DOMString localName);
HTMLCollection getElementsByClassName(DOMString classNames);
...
Both Chrome and Safari use WebIDLs.
NoteThe generated C++ code will be in the build directory. Usually reading it is not worth the time. If you want to look for bugs in autogenerated code, you should audit the generators.
Another example of a WebIDL definition and some of its generated C++ code
[ LegacyUnenumerableNamedProperties, LegacyOverrideBuiltIns, JSGenerateToNativeObject, Exposed=Window ] interface HTMLFormElement : HTMLElement { [CEReactions=NotNeeded, Reflect=accept_charset] attribute DOMString acceptCharset; [CEReactions=NotNeeded] attribute [AtomString] USVString action; [CEReactions=NotNeeded] attribute [AtomString] DOMString autocomplete; [CEReactions=NotNeeded] attribute [AtomString] DOMString enctype; [CEReactions=NotNeeded, ImplementedAs=enctype] attribute [AtomString] DOMString encoding; [CEReactions=NotNeeded] attribute [AtomString] DOMString method; [CEReactions=NotNeeded, Reflect] attribute DOMString name; [CEReactions=NotNeeded, Reflect] attribute boolean noValidate; [CEReactions=NotNeeded, Reflect] attribute DOMString target; [CEReactions=NotNeeded, Reflect] attribute DOMString rel; [SameObject, PutForwards=value] readonly attribute DOMTokenList relList; readonly attribute HTMLFormControlsCollection elements; readonly attribute unsigned long length; getter Element? (unsigned long index); getter (RadioNodeList or Element)? ([RequiresExistingAtomString] DOMString name); [ImplementedAs=submitFromJavaScript] undefined submit(); [EnabledBySetting=RequestSubmitEnabled] undefined requestSubmit(optional HTMLElement? submitter); [CEReactions=Needed] undefined reset(); boolean checkValidity(); [EnabledBySetting=InteractiveFormValidationEnabled] boolean reportValidity(); };
C++class JSHTMLFormElement : public JSHTMLElement { public: using Base = JSHTMLElement; using DOMWrapped = HTMLFormElement; static JSHTMLFormElement* create(JSC::Structure* structure, JSDOMGlobalObject* globalObject, Ref<HTMLFormElement>&&impl) { JSHTMLFormElement* ptr = new (NotNull, JSC::allocateCell <JSHTMLFormElement>(globalObject->vm().heap)) JSHTMLFormElement(structure, *globalObject, WTFMove(impl)); ptr->finishCreation(globalObject->vm()); return ptr; } static JSC::JSObject* createPrototype(JSC::VM&, JSDOMGlobalObject&); static JSC::JSObject*prototype(JSC::VM&,JSDOMGlobalObject&); static HTMLFormElement* toWrapped(JSC::VM&, JSC::JSValue); static bool getOwnPropertySlot(JSC::JSObject*, JSC::ExecState*, JSC::PropertyName, JSC::PropertySlot&); static bool getOwnPropertySlotByIndex(JSC::JSObject*, JSC::ExecState*, unsigned propertyName, JSC::PropertySlot&); static void getOwnPropertyNames(JSC::JSObject*, JSC::ExecState*, JSC::PropertyNameArray&, JSC::EnumerationMode = JSC::EnumerationMode()); ... protected: void finishCreation(JSC::VM&); };
In the next training module, we will focus on actually compiling a modern web browser. Although we will not be directly using or modifying WebIDLs, having a rough idea of what they're like helps demystify the mechanics of how various components integrate with each other.
Meet the Browsers
Up to now, we have been hopping around several different topics in order to paint a rough mental picture of typical browser architectures as they exist today. Before we dive into the practicalities of building these engines, we will take some time to introduce the two web browsers this training will focus on.
Safari (WebKit)
Safari is the default browser on Apple products and is developed and maintained in-house by Apple. Safari itself is a wrapper around WebKit with macOS/iOS specific tweaks applied where necessary. WebKit makes up close to the entirety of Safari, and that is where we will focus our attention.
WebKit has its origins in KHTML, an internal project at Apple circa 2001. It was open-sourced around 2005 and has grown into the WebKit project since then. The two primary components of WebKit are:
- WebCore - This is the Rendering Engine
- JavaScriptCore - This is the JavaScript Engine
It's not uncommon for other software projects to embed all or parts of WebKit. For example, WebKit is used in the browsers built into Nintendo consoles, Sony Playstation consoles, and the Amazon Kindle. Just as Safari is a WebKit wrapper with Apple-platform-specific tweaks, these projects similarly wrap WebKit with hardware-specific implementations.
Chrome (Chromium)
Chrome is perhaps the most well known web browser in the world, and certainly the most used (at least on Desktop systems). Chrome is built on top of the Chromium browser engine. Like WebKit, it is open source, and was open-sourced by Google in 2008.
The primary components of Chrome are:
- Blink - The Rendering Engine
- Originally used WebCore from WebKit, forked in 2013
- V8 - The JavaScript Engine
Like WebKit, many projects make use of Chromium. For instance, the browsers in Tesla vehicles and Samsung smart TVs.