Protocols, Generics, and Existential Containers — Wait What?
For the longest time now, I thought that the two functions above were the same.
But in actuality, while they may do exactly the same thing between open and closed braces (which in this case is nothing at all), what’s going on behind the scenes is different. To understand what’s going on we’ll first have to talk about the Container.
The Container
Revealed in more detail in Session 416 of WWDC 2016, the container is a wrapper around parameters adhering to a protocol and is used non-generically. The container functions as a box of fixed size (we’ll get back to this in a sec), thus allowing all adherers of a protocol to be of the same size, which is necessary for them to be used interchangeably.
var vehicles: [Drivable]
The fixed size of the container also allows us to store classes/structs that adhere to a protocol in an array of type protocol (as seen above), since the elements are now of the same size and can be stored in contiguous memory.
So what goes into the container?
The container is more or less a box with 5 rows:
1. payload_data_0 = 0x0000000000000004,
2. payload_data_1 = 0x0000000000000000,
3. payload_data_2 = 0x0000000000000000,
4. instance_type = 0x000000010d6dc408 ExistentialContainers`type
metadata for ExistentialContainers.Car,
5. protocol_witness_0 = 0x000000010d6dc1c0
ExistentialContainers`protocol witness table for
ExistentialContainers.Car : ExistentialContainers.Drivable
in ExistentialContainers
The first 3 rows labeled payload_data 0–3, respectively, represent the Value Buffer. The value buffer holds 3 words, each word is a chunk of memory representing 8 bytes. If your struct has just 3 properties and each property has a size within that 8 byte range, then the values are offloaded to the Value Buffer.
If your struct has more than 3 properties or has properties not within the 8 byte range, say a Character (9 bytes) or a String (24 bytes), then the values are stored in a separate value table allocated on the heap. In this case payload_data_0 would hold a pointer to the value table on the heap and the other two payload variables would remain uninitialized. This indirection is what maintains the sizing of the Container.
For clarity here are a few structs, adhering to the Drivable protocol, and their respective payloads:
Structs adhering to the Drivable protocol
car =
payload_data_0 = 0x0000000000000004,
payload_data_1 = 0x0000000000000000,
payload_data_2 = 0x0000000000000000,
instance_type = 0x000000010b50e410
ExistentialContainers`type metadata for
ExistentialContainers.Car,
protocol_witness_0 = 0x000000010b50e1c8
ExistentialContainers`protocol witness table for
ExistentialContainers.Car: ExistentialContainers.Drivable
in ExistentialContainers)
motorcycle =
payload_data_0 = 0x0000608000036820,
payload_data_1 = 0x0000000000000000,
payload_data_2 = 0x0000000000000000,
instance_type = 0x000000010b50e4d8
ExistentialContainers`type metadata for
ExistentialContainers.Motorcycle,
protocol_witness_0 = 0x000000010b50e1d8
ExistentialContainers`protocol witness table for
ExistentialContainers.Motorcycle:
ExistentialContainers.Drivable in ExistentialContainers
bus =
payload_data_0 = 0x00006000000364a0,
payload_data_1 = 0x0000000000000000,
payload_data_2 = 0x0000000000000000,
instance_type = 0x000000010b50e5a8
ExistentialContainers`type metadata for
ExistentialContainers.Bus,
protocol_witness_0 = 0x000000010b50e1e8
ExistentialContainers`protocol witness table for
ExistentialContainers.Bus: ExistentialContainers.Drivable
in ExistentialContainers
As you can see, Car has the expected payload, but Motorcycle has only one payload entry, even though it has two properties. As mentioned before, String variables are 24 bytes, so the licensePlate property causes all of the properties to be stored on the heap, thus having only one payload entry — the pointer to the values on the heap. Bus has 4 properties, so as expected, there is just one payload entry.
Now for the final two rows.
The instance_type variable (4th row) is a pointer to the Value Witness Table (VWT), which is another table structure that contains Type specific information on how to Allocate, Copy, and Destroy the value represented by the container.
The protocol_witness_0 variable (5th row) holds a pointer to the Protocol Witness Table (PWT). The PWT is another table structure that holds references to the implementation of protocol functions defined by an object adhering to the protocol. The PWT is the reason why if we called drive() on a Drivable that happened to be a car object, it knows to execute the Car objects drive function and not, say, the Bus’s implementation.
Function Parameters
So what does all of this have to do with the original question? What’s the difference between our two functions?
Functions in question
Well, there are actually quite a few things — how they’re dispatched, how local variables are instantiated, accessing of associated types for generic return types, compiler optimizations, dynamic behavior … the list goes on.
But for now we’ll focus on how instantiation occurs and the accessing of associated types. Links will be provide below for more details on most of these.
On to how local variable instantiation occurs: The protocol based function on line 6 receives its input in the form of an container since it must support multiple types. A local variable, transportation, is then created using the VWT and PWT of the container.
On the other hand, the generic based function will receive its input without the container, despite also supporting multiple Drivable types. Why is that?
Instead of passing an container to the generic function so that the local variable can be instantiated, the generic function becomes specialized at compile time, aware of type specific information generated at the function’s call site. So, suppose a Car object were passed into startTraveling(), swift will generate a Car specific version of the function, say:
func startTravelingWithCar(transportation: Car) { }
Behind the scenes the function also receives the car’s PWT and VWT, giving the function the necessary information to be able to set up a value buffer if necessary and determine the car object’s protocol specific function implementation of drive(). This newly generated function is now type specific, giving us access to any associated types of the Car object and all of this type information is determined at compile time — which is part of the reason why we can have an associated type be the return type of a generic function, but can’t do the same for protocol based functions.
protocol Returnable {
associateType ReturnType
}
//This will compile
func returnTheType<T: Returnable>(object: T) -> T.ReturnType { } ✅
//This won't compile
func returnTheType(object: Returnable) -> object.ReturnType { } ❌
However protocols based functions aren’t bad, despite the fact that we can’t utilize associated types as return types. Protocol based functions, unlike their generic counterparts, offer a higher degree of dynamism and flexability at runtime. But, this post is long enough as is 😅, so check out these awesome articles and videos to find out more about everything here and more!
- Airspeed Velocity — Protocols and Generics
- WWDC 2016 Session 416 (Also linked to further up above ☝🏽)
- Why Swift Is Swift — Gwendolyn Weston
https://medium.com/@vhart/protocols-generics-and-existential-containers-wait-what-e2e698262ab1