Protocols, Generics, and Existential Containers — Wait What?

For the longest time now, I thought that the two functions above were the same.

But in actuality, while they may do exactly the same thing between open and closed braces (which in this case is nothing at all), what’s going on behind the scenes is different. To understand what’s going on we’ll first have to talk about the Container.

The Container

Revealed in more detail in Session 416 of WWDC 2016, the container is a wrapper around parameters adhering to a protocol and is used non-generically. The container functions as a box of fixed size (we’ll get back to this in a sec), thus allowing all adherers of a protocol to be of the same size, which is necessary for them to be used interchangeably.

var vehicles: [Drivable]

The fixed size of the container also allows us to store classes/structs that adhere to a protocol in an array of type protocol (as seen above), since the elements are now of the same size and can be stored in contiguous memory.

So what goes into the container?

The container is more or less a box with 5 rows:

1. payload_data_0 = 0x0000000000000004,

2. payload_data_1 = 0x0000000000000000, 

3. payload_data_2 = 0x0000000000000000, 

4. instance_type = 0x000000010d6dc408 ExistentialContainers`type    

       metadata for ExistentialContainers.Car, 

5. protocol_witness_0 = 0x000000010d6dc1c0 

       ExistentialContainers`protocol witness table for 

       ExistentialContainers.Car : ExistentialContainers.Drivable 

       in ExistentialContainers

The first 3 rows labeled payload_data 0–3, respectively, represent the Value Buffer. The value buffer holds 3 words, each word is a chunk of memory representing 8 bytes. If your struct has just 3 properties and each property has a size within that 8 byte range, then the values are offloaded to the Value Buffer.

If your struct has more than 3 properties or has properties not within the 8 byte range, say a Character (9 bytes) or a String (24 bytes), then the values are stored in a separate value table allocated on the heap. In this case payload_data_0 would hold a pointer to the value table on the heap and the other two payload variables would remain uninitialized. This indirection is what maintains the sizing of the Container.

For clarity here are a few structs, adhering to the Drivable protocol, and their respective payloads:

 

 

Structs adhering to the Drivable protocol

car

    payload_data_0 = 0x0000000000000004, 

    payload_data_1 = 0x0000000000000000, 

    payload_data_2 = 0x0000000000000000, 

    instance_type = 0x000000010b50e410 

        ExistentialContainers`type metadata for 

        ExistentialContainers.Car,

    protocol_witness_0 = 0x000000010b50e1c8 

        ExistentialContainers`protocol witness table for 

        ExistentialContainers.Car: ExistentialContainers.Drivable 

        in ExistentialContainers)

motorcycle

    payload_data_0 = 0x0000608000036820,

    payload_data_1 = 0x0000000000000000,

    payload_data_2 = 0x0000000000000000,

    instance_type = 0x000000010b50e4d8   

        ExistentialContainers`type metadata for 

        ExistentialContainers.Motorcycle, 

    protocol_witness_0 = 0x000000010b50e1d8 

        ExistentialContainers`protocol witness table for  

        ExistentialContainers.Motorcycle: 

        ExistentialContainers.Drivable in ExistentialContainers

bus

    payload_data_0 = 0x00006000000364a0,

    payload_data_1 = 0x0000000000000000,

    payload_data_2 = 0x0000000000000000, 

    instance_type = 0x000000010b50e5a8 

        ExistentialContainers`type metadata for 

        ExistentialContainers.Bus, 

    protocol_witness_0 = 0x000000010b50e1e8        

        ExistentialContainers`protocol witness table for 

        ExistentialContainers.Bus: ExistentialContainers.Drivable 

        in ExistentialContainers

As you can see, Car has the expected payload, but Motorcycle has only one payload entry, even though it has two properties. As mentioned before, String variables are 24 bytes, so the licensePlate property causes all of the properties to be stored on the heap, thus having only one payload entry — the pointer to the values on the heap. Bus has 4 properties, so as expected, there is just one payload entry.

Now for the final two rows.

The instance_type variable (4th row) is a pointer to the Value Witness Table (VWT), which is another table structure that contains Type specific information on how to Allocate, Copy, and Destroy the value represented by the container.

The protocol_witness_0 variable (5th row) holds a pointer to the Protocol Witness Table (PWT). The PWT is another table structure that holds references to the implementation of protocol functions defined by an object adhering to the protocol. The PWT is the reason why if we called drive() on a Drivable that happened to be a car object, it knows to execute the Car objects drive function and not, say, the Bus’s implementation.

Function Parameters

So what does all of this have to do with the original question? What’s the difference between our two functions?

 

 

Functions in question

Well, there are actually quite a few things — how they’re dispatched, how local variables are instantiated, accessing of associated types for generic return types, compiler optimizations, dynamic behavior … the list goes on.

But for now we’ll focus on how instantiation occurs and the accessing of associated types. Links will be provide below for more details on most of these.

On to how local variable instantiation occurs: The protocol based function on line 6 receives its input in the form of an container since it must support multiple types. A local variable, transportation, is then created using the VWT and PWT of the container.

On the other hand, the generic based function will receive its input without the container, despite also supporting multiple Drivable types. Why is that?

Instead of passing an container to the generic function so that the local variable can be instantiated, the generic function becomes specialized at compile time, aware of type specific information generated at the function’s call site. So, suppose a Car object were passed into startTraveling(), swift will generate a Car specific version of the function, say:

func startTravelingWithCar(transportation: Car) { }

Behind the scenes the function also receives the car’s PWT and VWT, giving the function the necessary information to be able to set up a value buffer if necessary and determine the car object’s protocol specific function implementation of drive(). This newly generated function is now type specific, giving us access to any associated types of the Car object and all of this type information is determined at compile time — which is part of the reason why we can have an associated type be the return type of a generic function, but can’t do the same for protocol based functions.

protocol Returnable {

    associateType ReturnType

}

//This will compile

func returnTheType<T: Returnable>(object: T) -> T.ReturnType { } ✅

//This won't compile

func returnTheType(object: Returnable) -> object.ReturnType { }  ❌

However protocols based functions aren’t bad, despite the fact that we can’t utilize associated types as return types. Protocol based functions, unlike their generic counterparts, offer a higher degree of dynamism and flexability at runtime. But, this post is long enough as is 😅, so check out these awesome articles and videos to find out more about everything here and more!

https://medium.com/@vhart/protocols-generics-and-existential-containers-wait-what-e2e698262ab1

posted @ 2019-01-23 16:52  zzfx  阅读(217)  评论(0编辑  收藏  举报