🔗 JavaScriptCore Engine Internals

As we did with V8, this chapter will explain how JavaScriptCore (usually abbreviated JSC) chooses to implement core features of JavaScript.

🔗 JSC NaN Boxing

Among the first things we explored in V8 was how it differentiates between pointers and doubles. There, a method called "pointer-tagging" was used, but JSC uses a different approach, called NaN Boxing.

The main idea behind NaN Boxing is to cleverly utilize the fact that NaN has many different bit pattern representations. These bit patterns can all be normalized to one true NaN.

An IEEE 754 double is NaN if the following is true:
[sign]  [exponent] [significand]
     * 11111111111 ****************************************************
                   signficand != 0
JSC normalized NaN:
     0 11111111111 1000000000000000000000000000000000000000000000000000

JSC range of non-NaN doubles:
     0 00000000000 0000000000000000000000000000000000000000000000000000
                               ...                                      Example: 0x3ff199999999999a
     1 11111111111 0000000000000000000000000000000000000000000000000000
          (anything greater than this would become NaN)

Suppose for all doubles, we perform an addition of 1<<48 or 0x0001000000000000, the goal being to give doubles and pointers distinct high bit patterns. To top it off, add 32-bit integers with a yet-unused pattern for the high bits:

JSC Objects (pointers):
     0 00000000000 0000************************************************ Example: 0x00007f48991e4158

[BOXED] JSC normalized NaN:
     0 11111111111 1001000000000000000000000000000000000000000000000000

[BOXED] JSC range of non-NaN doubles:
     0 00000000000 0001000000000000000000000000000000000000000000000000
                               ...                                      Example: 0x3ff299999999999a
     1 11111111111 0001000000000000000000000000000000000000000000000000

[BOXED] JSC integers:
     1 11111111111 11110000000000000000******************************** Example: 0xffff000041424344

We've ended up with 3 types representible in 64-bits, with the type implied by the high 16 bits:

switch (val>>48):
         0 -> ptr
    0xffff -> 32-bit int
      else -> double (subtract 1<<48 to get actual value)

Pointer {  0000:PPPP:PPPP:PPPP
         / 0001:****:****:****
Double  {         ...
         \ FFFE:****:****:****
Integer {  FFFF:0000:IIII:IIII

Note that the only edge case for adding 1<<48 to double bit patterns is the value "becoming" an integer if the high bits become ffff (or more dangerously, become 0000 due to overflow). However, all such unsafe bit patterns are various representations of NaN (with different significands), which all become the normalized pattern that can perform the addition safely.

Noteinfo

WebKit has since changed the NaN-boxing scheme slightly to use one less bit, meaning 1<<49 or 0x0002000000000000 is added/subtracted to doubles. This doesn't affect exploit development other than changing a constant or two.

🔗 JSC NaN Boxing Exercise

[open exercise]

🔗 Objects in JSC

The simplest object class in JSC is JSCell.

C++
class JSCell : public HeapCell { ... StructureID m_structureID; IndexingType m_indexingType; JSType m_type; TypeInfo::InlineTypeFlags m_flags; CellState m_cellState; ... }

You may notice that there isn't a Map pointer like V8. JSC uses a different strategy, namely the concept of structures and structureIDs to keep track of Objects' types.

Below is a diagram showing some more details about the JSCell Header:

JSCell 8 Byte header
00: [  StructureID  ] <-- Index into StructureIDTable for Structure object
04: [ Indexing Type ] <-- Storage mode for elements
05: [   Cell Type   ] <-- JS Type (e.g. object, string, function, array, ...)
06: [  Type Flags   ] <-- Some inline flags for object type
07: [  Cell State   ] <-- Garbage collector flags

🔗 JSC Structure IDs

Instead of a pointer to another class that contains type information, JSC stores an index into the StructureIDTable which contains Structure objects. Those Structure objects are what actually contain the type information for the object.

Below are some snippets of relevant code from the JSC source:

C++
typedef uint32_t StructureID; class StructureIDTable { UniqueArray<StructureOrOffset> m_table; } inline Structure* StructureIDTable::get(StructureID structureID) { ASSERT_WITH_SECURITY_IMPLICATION(structureID); ASSERT_WITH_SECURITY_IMPLICATION(!isNuked(structureID)); ASSERT_WITH_SECURITY_IMPLICATION(structureID < m_capacity); uint32_t structureIndex = structureID >> s_numberOfEntropyBits; RELEASE_ASSERT_WITH_SECURITY_IMPLICATION(structureIndex < m_capacity); return decode(table()[structureIndex].encodedStructureBits, structureID); }

🔗 JSC Structure

Now that we've described the machinery involved in storing type information, let's take a look at JSC's Structure class:

C++
class Structure final : public JSCell { ... uint8_t m_inlineCapacity; WriteBarrier<Unknown> m_prototype; const ClassInfo* m_classInfo; StructureTransitionTable m_transitionTable; ... }

There are many interesting properties here, but we will focus on m_transitionTable.

🔗 JSC Transitions

In V8, we saw how the engine keeps track of the "shape" of objects via their Map. When an object changes in some way, its associated Map is updated using Map Transitions, or an entirely new Map is created if necessary.

JSC solves this in a similar way, but uses Property Transitions:

C++
Structure* Structure::addPropertyTransition(VM& vm, Structure* structure, PropertyName propertyName, unsigned attributes, PropertyOffset& offset) { Structure* newStructure = addPropertyTransitionToExistingStructure(structure, propertyName, attributes, offset); if (newStructure) return newStructure; return addNewPropertyTransition( vm, structure, propertyName, attributes, offset, PutPropertySlot::UnknownContext); }

An important difference is that JSC does not perform a transition when the type of a property changes. This is due to properties already being stored as generic JSValues.

In the the past (removed in 2019), these transitions were also used for tracking type generalizations in the JIT compiler.

🔗 JSC JSObject

As in V8, Objects in JSC are represented by a class called JSObject. We can see part of its definition below:

C++
class JSObject : public JSCell { ... AuxiliaryBarrier<Butterfly*> m_butterfly; ... inline size_t JSObject::offsetOfInlineStorage() { return sizeof(JSObject); } }

Beyond a normal JSCell, JSObjects also contain a Butterfly pointer and (optionally) inline properties:

00: [   JSCell Header   ]
08: [     Butterfly*    ] <-- Pointer to butterfly structure
10: [ Inline Properties ] <-- Fast property values stored inline

🔗 JSC Butterfly

The Butterfly is a somewhat exotic data structure unique to JSC. It is a structure that is used to hold both the (out-of-line) properties and the elements for an object.

[ <------------- Properties ] [ Elements Length ] [ Element Array -----------> ]
                                                 /|\
                                                  | 
                                             m_butterfly

The Butterfly can be expanded dynamically for more properties to the left or elements to the right, with the butterfly pointer (usually m_butterfly) pointing into the middle of the structure. The name is related to the left/right expansion being somewhat like a butterfly opening its wings.

We can see some relevant code below:

C++
class Butterfly { IndexingHeader* indexingHeader() { return IndexingHeader::from(this); } ... } class IndexingHeader { ... union { struct { // The meaning of this field depends on the array type, but for all // JSArrays we rely on this being the publicly visible length (array.length) uint32_t publicLength; // The length of the indexed property storage. The actual size of the // storage depends on this, and the type. uint32_t vectorLength; } lengths; struct { ArrayBuffer* buffer; } typedArray; } u; }

Importantly, there are two different Lengths to pay attention to:

  • publicLength - the semantic length of the element array (i.e. maximum filled index + 1)
  • vectorLength - the allocated capacity of the element array

As mentioned above, m_butterfly itself points to the start of the element array, rather than to the "start" of the structure as one normally thinks of it.

🔗 Butterfly Exercise

[open exercise]

🔗 JSC JSArray

Arrays in JSC follow a similar pattern to that of V8: they are essentially regular Objects with a few things changed. The function to get an "array-like" object's length is defined by JSObject and applies the same to both arrays and all objects with elements (indexed properties):

C++
class JSObject : public JSCell { unsigned getArrayLength() const { if (!hasIndexedProperties(indexingType())) return 0; return m_butterfly->publicLength(); } } // JSNonFinalObject is a type of JSObject that has some internal storage, // but also preserves some space in the collector cell for additional // data members in derived types. class JSNonFinalObject : public JSObject { ... } class JSArray : public JSNonFinalObject { ... unsigned length() const { return getArrayLength(); } ... }

Notably, this means JSC does not store length inline like Arrays do in V8. Instead, the standard butterfly lengths apply, i.e. publicLength.

🔗 Indexing Type

The IndexingType (part of the JSCell header) is how the engine decides how to access elements in m_butterfly. This is analogous to the concept of Elements Kind in V8. Under certain circumstances, it becomes possible for the engine to perform optimizations for certain element types.

C++
class JSCell : public HeapCell { ... IndexingType m_indexingTypeAndMisc; ... }

Below is a list of some values for IndexingType:

C++
typedef uint8_t IndexingType; ... static const IndexingType Int32Shape = 0x04; static const IndexingType DoubleShape = 0x06; static const IndexingType ContiguousShape = 0x08; ... static const IndexingType NonArrayWithInt32 = Int32Shape; static const IndexingType NonArrayWithDouble = DoubleShape; static const IndexingType NonArrayWithContiguous = ContiguousShape; ... static const IndexingType ArrayWithInt32 = IsArray | Int32Shape; static const IndexingType ArrayWithDouble = IsArray | DoubleShape; static const IndexingType ArrayWithContiguous = IsArray | ContiguousShape; ...

These indexing types are similar to the element kinds we've seen in V8:

  • *WithInt32 for integer elements, like *_SMI_ELEMENTS
  • *WithDouble for native doubles, like *_DOUBLE_ELEMENTS
  • *WithContiguous for generic JSValue elements, like *_ELEMENTS

We can create various kinds of Arrays and watch how the IndexingType changes:

>>> a=[1.1,1.1,1.1]
>>> describe(a)
Object: 0x7f4c0ecb4340 ... Array, {}, ArrayWithDouble, ...

>>> a=[1.1]; a[3]=1.1
>>> describe(a)
Object: 0x7f4c0ecb4350 ... Array, {}, ArrayWithDouble, ...

>>> a=[{},1.1]
>>> describe(a)
Object: 0x7f4c0ecb4360 ... Array, {}, ArrayWithContiguous, ...

>>> a=[0x41424344, 0x51525354]
>>> describe(a)
Object: 0x7f4c0ecb4390 ... Array, {}, CopyOnWriteArrayWithInt32, ...

🔗 WithDouble

Indexing Type WithDouble allows doubles to be un-NaN-boxed, for the same reason we could drop pointer tagging / "boxing" our doubles in V8. That is, the engine knows all the elements will be doubles:

>>> a = [1.1,1.1,{}]
>>> describe(a)
Object: ... with butterfly 0x7f48991e4158 ... Array, {}, ArrayWithContiguous, ...
pwndbg> x/6xg 0x7f48991e4158-8
0x7f48991e4150: 0x0000000500000003  0x3ff299999999999a
0x7f48991e4160: 0x3ff299999999999a  0x00007f4c0ecb00c0
0x7f48991e4170: 0x0000000000000000  0x0000000000000000

>>> a = [1.1,1.1,1.1]
>>> describe(a)
Object: ... with butterfly 0x7f48991e4128 ... Array, {}, ArrayWithDouble, ... 
pwndbg> x/6xg 0x7f48991e4128-8
0x7f48991e4120: 0x0000000500000003  0x3ff199999999999a
0x7f48991e4130: 0x3ff199999999999a  0x3ff199999999999a
0x7f48991e4140: 0x7ff8000000000000  0x7ff8000000000000

Boxed 0x3ff299999999999a vs unboxed 0x3ff199999999999a

🔗 Element Array Holes

Unlike V8, there is no differentiation between "packed" and "holey" elements. The encoding of "holes" in the element array changes based on the IndexingType; NaN for doubles, zeroes otherwise:

>>> a = [1.1]; a[3] = 1.1
>>> describe(a)
Object: ... with butterfly 0x7f48991e4188 ... Array, {}, ArrayWithDouble, ...
pwndbg> x/6xg 0x7f48991e4188-8
0x7f48991e4180: 0x0000000500000004  0x3ff199999999999a
0x7f48991e4190: 0x7ff8000000000000  0x7ff8000000000000
0x7f48991e41a0: 0x3ff199999999999a  0x7ff8000000000000

>>> a = [1.1]; a[3] = {}
>>> describe(a)
Object: ... with butterfly 0x7f48991e41b8 ... Array, {}, ArrayWithContiguous, ...
pwndbg> x/6xg 0x7f48991e41b8-8
0x7f48991e41b0: 0x0000000500000004  0x3ff299999999999a
0x7f48991e41c0: 0x0000000000000000  0x0000000000000000
0x7f48991e41d0: 0x00007f4c0ecb0100  0x0000000000000000

WithDouble: 0x7ff8000000000000 vs WithContiguous: 0x0000000000000000

The same value is used for unused space at the end of the vector.

🔗 Array Storage

As in V8, if we try to store elements in a way that is too sparse, the engine will "switch modes" and use a hashmap implementation rather than continuing to use a simple vector.

JavaScript
>>> a=[1.1];a[10000]=1.1 >>> describe(a) Object: 0x7f4c0ecb43a0 ... Array, {}, ArrayWithArrayStorage, ...

In the code above, we see the Object is using ArrayWithArrayStorage, which we can see is essentially a wrapper for a hashmap:

C++
struct ArrayStorage { ... WriteBarrier<SparseArrayValueMap> m_sparseMap; ... }

🔗 Indexing Type Exercise

[open exercise]

🔗 JSC JSArrayBuffer

In JSC, a JSArrayBuffer is essentially just a container class holding a reference to an ArrayBuffer *:

C++
class JSArrayBuffer final : public JSNonFinalObject { ... Poisoned<JSArrayBufferPoison, ArrayBuffer*> m_impl; ... }

The associated structure-diagram:

00: [   JSCell Header   ]
08: [     Butterfly*    ]
10: [    ArrayBuffer*   ] <--- This is a pointer to the enclosed array buffer
18: [ Inline Properties ]

🔗 JSC JSArrayBufferView

JSArrayBufferView is more interesting, as the data pointer is stored inline:

C++
class JSArrayBufferView : public JSNonFinalObject { ... VectorPtr m_vector; uint32_t m_length; TypedArrayMode m_mode; ... }

Whenever we have an inline pointer like this, it makes the object potentially useful for exploitation.

00: [   JSCell Header   ]
08: [     Butterfly*    ]
10: [   Backing Store*  ] <--- Pointer to the backing data buffer
18: [    Byte Length    ] <--- Size of the buffer in bytes
1c: [     Array Mode    ] <--- What type of backing allocation
20: [ Inline Properties ]

🔗 Key Points

JSC Objects store type information via a Structure pointer indexed by the structureID

  • Largely performs the function of Map from V8

JSC Structure transitions do not track value types

Properties stored with Elements in Butterfly

  • Length of Elements stored in Butterfly
  • Switches to ArrayStorage for sparse arrays

Indexing Type controls how elements are stored

  • Analogous to Element Kind in V8