Understanding C++ Modules In C++20 (1)
Compiling evironment: linux (ubuntu 16.04)+ gcc-10.2.
The Post will clarify and discuss what modules are,what they can do,and what they intended to do ,what they cannot do and how they are used.
1 Using Modules
Here is a hello-world program using c++ modules:
//speech.cpp export module speech; export const char* get_phrase() { return "Hello, world!"; } // main.cpp import speech; #include <iostream>;
using namespace std; int main() { cout << get_phrase() <<endl; }
This is an extremely straightforward example. We have a single source file that exposes a module speech. main.cpp imports speech and uses the single function get_phrase() defined in speech.cpp.
The effect of importing a module is to make the exported entities declared within that module to visible by the importing translation unit.
2 Module Names
A module is identitfied by the aptly named module-name. The following is the grammar of mudule-name:
module-name: [module-name-qualifier] identifier ; module-name-qualifier: identifier "." | module-name-qualifier identifier "." ;
This means thar a module's name is some non-zero number of identifiers joined by a literal dot. The identifier rules are the same as the rest of the language,except that the identifiers export and module may not be used in a module name,obviously.
What's the significance of the dot? Literally nothing. It's purely for the benefit of the developer. A module named "bost.asio.async_completion" make it easier to understanding the logical hierarchy than a module named "bost_asio_async_completion" ,but there is no semantic difference between the two naming styles according to the standar.
3 A New C++ Source Entity
For all C++'s history, there has been one standard concept to encapsulate the idea of a C++ source unit: The translation unit.
C++ modules introduce a new type of translation unit called a module unit. The definition is fairly simple:
A module unit is a translation unit that contains a module-declaration.
What is a "module declaration"? We've already seen one in our example. The grammer is very simple:
module-declaration: ["export"] "module" module-name [module-partition] [attribute-specifier-seq] ";" ; module-partition: ":" module-name ;
Basically, any file that contains a module line at the top level is a module unit. (The next section will cover the meaning of module-partition).
Important subdivisions: There are serveral different types of module units, and it is important to understand the meaning of each of them:
- A module interface unit is a module unit where the module-declaration contains the export keyword. There can be any number of these in a module.
- A module implementation unit is any module unit that is not a module interface unit (Does not have the export keyward in the module-declaration).
- A module partition is a module unit where the module-declaration contains the module-partition component.
- A module interface partition is a module interface unit that is also a module partition (Contains both the export keyword and the module-partition component).
- A module implementation partition is a module implementation unit that is also a module patition.(Contains the module-partition component but no export keyword).
- The primary module interface unit is the non-partition module unit that is also a module interface unit. There must be exactly one primary interface unit in a module. All other module interface units must be module partitions.
It is not named explicitly, and it is not entirely clear from the above reading,but it is possible to have a module implementation unit that is not a partition. A module-declaration without export and without a module-partition lable is a module implementation unit. There is no way to import this module unit form another file, but it is useful to provide the implementation of entities declared in different module units.
4 Modules Partitions
Just like with C++ headers, there is no requirement that modules be split and subdivided into mutiple files. Nevertheless, large source files can become intractable,so C++ modules also have a way to subdivide a single module into distinct translation units that are merged together to form the total module. These subdivisions are known as partitions.
Suppose we have two enormous, cumbersome, and unwieldy functions that we do not want to include in the same module:
//Image that there is a ton of code below export module speech;
export const char* get_phrase_en() { return "Hello, world!"; } export const char* get_phrase_es() { return "¡Hola Mundo!"; }
Then,let's subdivide it using partitions:
// speech.cpp export module speech; export import :english; export import :spanish;
// speech_english.cpp export module speech:english; export const char* get_phrase_en() { return "Hello, world!"; }
// speech_spanish.cpp export module speech:spanish; export const char* get_phrase_es() { return "¡Hola Mundo!"; }
// main.cpp import speech; import <iostream>; import <cstdlib>; int main() { if (std::rand() % 2) { std::cout << get_phrase_en() << '\n'; } else { std::cout << get_phrase_es() << '\n'; } }
What's going on here?
- We have one module named speech.
- Speech has two partitions: english and spanish.
- The syntax export module <module-name>:<part-name> declares that the given module unit is a module interface partition belongs to <module-name> with the partition named by <part-name>.
- The syntax import :<part-name>(with a leading colon) imports the patition named by <part-name>. The given <part-name> must belong to the import module. Translation units that are not module units of a module A are not allowed to import partitions from A.
- The syntax export import :<part-name> make the exported entities in the module partition visible as part of the module interface.
The name of a module partition follows the same rules as module names, except private is not allowed.
When a user imports a module, then all entities described in all module interface unitis for that module become visible in the importing file. module interface partitions are module interface units.
the module unit that contains export module <module-name>(with no partition name) is known as the primary module interface unit. There must be exactly one primary module interface unit, but any number of module interface partitions.
In the above example, get_phrase_en and get_phrase_es both live in the speech module. The subdivision into partitions is not exposed to users.
The module interface is defined as the union of all module interface units within that module.
We can classify the above source files as such:
- speech.cpp is the primary module interface unit.
- speech_english.cpp and speech_spanish.cpp are module interface partitions.
- main.cpp is a regular traslation unit.
Important: The primary interface unit for a module must export all of the interface partions for the module(either directly or indirectly) via export import :<part-name>. Otherwise, the program is ill-formed, no diagnostic requied.
5 "Submodules" are not a Thing(Technically)
Another way we could have subdivided the prior example might took like this:
// speech.cpp export module speech; export import speech.english; export import speech.spanish;
// speech_english.cpp export module speech.english; export const char* get_phrase_en() { return "Hello, world!"; }
// speech_spanish.cpp export module speech.spanish; export const char* get_phrase_es() { return "¡Hola Mundo!"; }
// main.cpp import speech; import <iostream>; import <cstdlib>; int main() { if (std::rand() % 2) { std::cout << get_phrase_en() << '\n'; } else { std::cout << get_phrase_es() << '\n'; } }
Intead of using partitions, we move the declarations of get_phrase_en and get_phrase_es into their own modules, and the speech module export imports them. The export import <name> syntax declares that users who import the module will transitively import the module of the given name.
The conten of main.cpp is unaffected between the two layouts. It implicitly imports speech.english and speech.spanish by its import of speech.
If you familiar with some other languages' module designs, it should be note that is not valid:
// speech.cpp export module speech; // NOT OK: export import .english; export import .spanish;
Python users might be familiar with such syntax as a qualified relative import,where the leading dot tells the module mechanism to look for a sibling module with the given name. C++ Modules do not work like this as their is no intrinsic hierarchy. The compiler sees no relationship between modules speech, speech.english , and speech.spanish. The above snippet is completely nosesical in the eyes of the language.
speech.english and speech.spanish are not submodules of speech. They are completely disjoint modules.
When using partitions,every entity in the interface partitions is part of the same module. The module that owns an entity is intended to be part of that entity's ABI. This means that moving an entity from one module to another is potentially ABI breaking.
When using "submodules",you give user the ability to be more granular in what they import. Despite the potential speed-up from modules, an import boost; that imports the entirety of Boost could be deathly expensive to compile times.
6 Module Implementation Units
So far we've poked at module interface unit, but there is another type of module unit: The module implementation uinit.
A module implementation unit is a module unit which does not have the export keyword before the module keyword in its module-declaration. The implementation units belong to a named module. Entities declared within an implementation unit are visible only to the module of which they belong to. This has the probable advantage of keeping details hidden and possible benefit of helping accelerate incremental builds as modification to the implementation units might not effect downstream modules.
Here's what our example would look like if we use implementation units(Acually, it compiled error by using following codes in ubuntu and gcc-10.2, and wonder what compiling tools the author used):
// speech.cpp export module speech; import :english; import :spanish; export const char* get_phrase_en(); export const char* get_phrase_es();
// speech_english.cpp module speech:english; const char* get_phrase_en() { return "Hello, world!"; }
// speech_spanish.cpp module speech:spanish; const char* get_phrase_es() { return "¡Hola Mundo!"; }
This looks similar to before,but has a few changes:
- The import of the partitions in the primary interface unit no longer has the export keyword.
- The module declarations in the partitions are also missing the export keyword. This makes the partitions implementation partitions.
- The functions are definded in the implementation unit without the export keyword. Their expoted-ness is not relevant in the implementation units.
- The functions are declared in the primary interface unit with the export keyword. This make the function s part of the module interface, even if not defined.
This is nanalogous to having a header file that declares two functions and two source files that define those functions. Changes to the implementation units do not effect the interface, and there is no way to modify the module interface by make changes to the implementation units.
I've been thinking about it ,and it does not seem like we have too much code to divide up into multiple files. Still, I'd like to be able to save on incremental build times when I only modify the implementation. so we can do it as below:
// speech.cpp export module speech; export const char* get_phrase_en(); export const char* get_phrase_es();
// speech_impl_english.cpp module speech; const char* get_phrase_en() { return "Hello, world!"; }
// speech_impl_spanish.cpp module speech; const char* get_phrase_es() { return "¡Hola Mundo!"; }
The partition syntax has disappeared, and we only have three source files (You can combine three source files into two,either) . The module speech delcaration (without export) delcares that module unit to be an implementation unit(not a partition). There is no way to import this file separately, nor is there reason to do so. These two files will be used to create the module speech.
7 Restrictions on [export] {import,module} and Partitions
meaning is just what it sounds like: Import the given module, and then export that import so that downstream importers will also import the same module transitively.
Because of the way they are defined, there are a few important restrictions on how you can import and export module partitions.
1) export import is only allowed for interface partitions
The following is not allowed:
module A:Foo;
export module A; export import :Foo; // NO! :Foo is not an interface unit!
The reasoning is fairly simple: Because A:Foo does not contribute to the interface of A, it is nonsensical to propagate the entities of A:Foo to importers.
2) export module must appear once per module
Suppose we are defining a module Cats. In order to define it, we’ll need at least one module interface unit that is not a partition of Cats.
export module Cats; export void meow(); export void purr();
What happens if we add another module unit that extends Cats?
export module Cats; export void hiss();
Not allowed! The export module without a partition name is the primary interface unit, and there can only be one.
both files simultaneously and do the merge, but then you have questions about how the two files interact. Supporting such a design would be very complex with little benefit.
Additionally, supporting multiple primary interface units raises another question: What stops a user from injecting things into someone else’s module by defining another Cats file?
3) export is not allowed in implementation units
This one is self-explanatory. Allowing an implementation unit to export an entity (or module import) is senseless.
4) All interface partitions must be re-exported from the primary interface unit
Because an interface partition may extend the interface of a module (i.e. introduce entities not declared in the primary interface unit), it is essential that a compiler be able to see the entirety of a module’s interface just by looking into the primary interface unit.
export module Cats:Sounds; export void meow(); export void hiss();
export module Cats:Behaviors; export void eat(); export void sleep();
export module Cats;
export import :Sounds;
// importer.cpp import Cats; void foo() { meow(); // Okay hiss(); // Okay eat(); // Er... }
importer.cpp uses eat, and eat is defined with export in an interface partition, but how is the compiler supposed to know that eat is a member of Cats? The primary Cats unit does not export-import :Behaviors!
The program above is ill-formed, no diagnostic required! NDR is one of the most frightening terms in the C++ standard. If often means undefined behavior.
However: You can reasonably expect that this issue will most often result in a compile error, as the definition of eat is not visible. It may simply be unclear why eat is not visible when you can clearly see export void eat(); defined in export module Cats:Behaviors. So you’ll most likely see a diagnostic, just not a diagnostic related to the missing re-export.
5) Module implementation units are spooky beasts
One can define a module interface and implementation in separate files without the need for partitions. The may be unintuitive, but is perfectly reasonable upon inspection.
export module Cats; export void dream(); export void sleep();
// cats_sleep.cpp module Cats; import sleep_info; void sleep() { if (is_rem_sleep()) { dream(); } }
In the above, cats_sleep.cpp is an implementation unit for Cats. The primary interface unit makes no mention of it, so you might expect this to be an issue. What’s going on?
The answer is that the Cats interface unit contains sufficient information that downstream users can import and use the module successfully. They need not see the implementation of sleep and dream, so there is no need to mention where it is defined. It can be assumed that the definitions will be resolved by the linker.
Another thing you may notice: cats_sleep.cpp calls dream, but we never import or declare it! This is perfectly fine: non-partition module implementation units implicitly import the module of which they are a member. Since there is no way for the interface to import this anonymous implementation unit, there is no risk of cyclic imports.
This begs a question: Why make a module implementation partition at all? You could just make anonymous implementation units instead, right?
Not always. Even though the entities in implementation units are not visible to importers, they are visible to other members of the module as long as they are imported. This is one possible way you might use PIMPL with modules (and before you ask: There are still good reasons to use PIMPL with C++ modules)
module Gadgets:PrivWidget; struct PrivWidget { // ... };
export module Gadgets; import :PrivWidget; export class Widget { std::unique_ptr<PrivWidget> _data; // ... }; export Widget get_widget() { auto priv = std::make_unique<PrivWidget>(1, 2, 3); return Widget{std::move(priv)}; }