Why is gcc required to build the kernel?
https://www.reddit.com/r/linux/comments/627kgl/why_is_gcc_required_to_build_the_kernel/
Q:
Hey, quick question - might seem daft to some more experienced people here but my experience with Linux ends at actually using it - I've never played around with the source code or needed to build it myself. I read on a thread today about how Linux couldn't exist without gcc and the related compilation toolchain. I understand this historically, as gcc was the only FOSS compiler that was capable of doing the job, or the only one that the Linux devs had access to. Why is it still required now, however? There are several standards-compliant C compilers that anyone can use, clang being the obvious alternative. If you have access to it, the Intel C compiler is also standards compliant, as far as I'm aware. Surely C is C, no? Why are other compilers unable to compile Linux? Is it a licensing issue (eg. you can't compile a GPL product with a more permissively licensed compiler?) Does gcc support something that other compilers don't, like outputting non-elf executables? Which bit of the kernel relies on gcc-specific features? I tried to look for answers here on reddit and elsewhere but I could only find answers on how to use GCC to compile the kernel which isn't what I'm looking for.
Cheers. :)
A:
Surely C is C, no?
No. Linux uses various language extensions provided by gcc, but not other compilers. Even without that though, there are so many implementation-specific parts of the language, compiler and all tooling that switching is non-trivial task. There are projects aiming at changing this (http://llvm.linuxfoundation.org/index.php/Main_Page) but don't hold your breath. For most people gcc works fine, and even if you'd managed to compile it with something else, you'd struggle get any support for it in case of problems.
There are projects aiming at changing this (http://llvm.linuxfoundation.org/index.php/Main_Page)
The bugs page there gives pretty good idea what sort of things are the pain points. The llvm meta issue dependency tree is also quite insightful.
There are projects aiming at changing this (http://llvm.linuxfoundation.org/index.php/Main_Page) but don't hold your breath
I've recently been researching clang and the llvmlinux project seems completely dead. Last commit to their git is from January 2015. I doubt we see a clang-compilable kernel any time soon (unless they are having some kind of secret clubhouse somewhere working on new patches)
For most people gcc works fine, and even if you'd managed to compile it with something else, you'd struggle get any support for it in case of problems.
This is a very real, fundamental truth of software development tucked away in this statement.
In practice, theory and practice just don't work out to be the same. Sure, two tools might provide the same set of features and functionality, but there are always nuances and gotchas. Two libraries might each be implementations of the same specification, but will vary in the ambiguous gray areas where the specification did a poor job of being, well... specific. So, as a developer building an app that needs that functionality, I pick one of the two libraries. The degree to which my app relies on the library's full suite of functionality dictates how locked in I am to that particular implementation.
I could write my app so that it could use whichever of the two libraries is available on a particular user's system, but that's extra work and increases the likelihood of bugs (and thus unhappy users). The cost-benefit equation just doesn't come up positive in many cases like this.
Of course the other library does the same thing, and of course it could be drop-in replaced... in theory. But I'm not going to spend my time doing that. I'm going to spend my time adding that new whiz bang button the users are clamoring for and fixing that stupid crash bug I accidentally introduced in the release a few months back.
Library, compiler, doesn't matter what alternative tool is under question. They're tools, and all tools have nuances and gotchas, meaning "drop in replacement" actually means "quite nearly, but not exactly, drop in replacement" in the best case. The size of the project dictates how costly that "but not exactly" part is.
Plus, the bigger the project, the more the choice of build toolchain matters in very real terms of developers spending hours on the project to make it build instead of actually being in the code fixing bugs and adding features.
Well, C is C, but gcc has all sorts of options to let you do things like embed assembler and control how code is generated and how variables are laid out in memory.
A quick example would be structures. A typical C program doesn't care how structure items are arranged in memory, and in fact, most compilers pad structures so that items fall on some sort of natural word boundary.
An operating system, on the other hand, may be using a structure to access elements of some physical memory mapped hardware. In that case, you have to be able to tell the compiler exactly how you want the items in the structure aligned.
Those precise memory layout examples you mention won't happen unbidden, you have to explicitly declare the byte-alignment you want with compiler pragma directives.
Of course, the way each compiler specifies packing differs
There are several standards-compliant C compilers that anyone can use...
This is your answer...though maybe not the one you want to hear. ;)
The linux kernel makes use of lots of non-standard extensions (getting less I think...but as far as I know it still does).
A bit dated..but should still work to give a general idea: https://www.ibm.com/developerworks/library/l-gcc-hacks/
Most of those described in the document are "style" problems...where sticking to standard c would make the code harder to read so gcc extensions are used. But some are actually functional addition that are simply needed for write kernel code. And some are additions that allow programmers to tell the compiler exactly what kind of code it should produce.
Another thing that people don't seem to be mentioning is Undefined Behavior. Even if you use no extensions whatsoever, and 2 compilers follows the C spec to the letter, each compiler may produce different results. This is because much of the C standard is undefined, and left up to the discretion of the compiler. So with two standards-compliant compilers, one could produce a working program, and the other could produce a pile of segfaults, from the same code.
(I don't know how much UB the kernel relies upon, if any).
[–]Quackmatic[S] 2 指標 1 年前
Good point. I wonder if there's a linter to highlight undefined behaviour? Or does gcc already catch that if you compile with -pedantic
maybe?
this gives you an idea why even gcc at times is not good enough to compile the kernel. If you google for Linus complaining about gcc you'll find plenty of other examples over the years.
Move to non-gcc and issues will only increase (without even going into gcc extensions). The kernel is a complex piece of software and many of the bugs are extremely difficult to diagnose and fix. You just don't want to add the noise introduced by a different compiler just for the sake of it, especially when the one you use is open source and freely available to everybody.
If there were a need, Intel could put in the effort to make the kernel compilable w/ ICC and Apple/Google could do the same for LLVM.
And then there's the issue of GCC extensions, but I don't think that is the biggest problem.
[–]bumblebritches57 1 指標 1 年前
If there were a need, Intel could put in the effort to make the kernel compilable w/ ICC and Apple/Google could do the same for LLVM.
That's some Microsoft-tier bullshit.
You don't change a damn compiler so an app (ANY app) will build, you fix the fucking app.
see, there's this thing called the real world where application are complex enough that you cannot make them correct, just usable.
and often is not even your application that is complex enough, they interact with other applications that you don't have control over and little or no visibility.
And last, the compiler itself is an "application" by your definition. Fix the compiler, fix the kernel, both, neither: all trade-offs that every company and every employee in every company have to do daily, not only Microsoft.And some times in not your application that is wrong or the compiler. Some times it is the CPU itself or other hardware devices that are wrong and in those cases you have to work around the problems.
[–] 2 指標 1 年前*
This might be an interesting read for you (How to Build the Linux Kernel with ICC - Intel's C Compiler)
https://software.intel.com/sites/default/files/article/146679/linuxkernelbuildwhitepaper.pdf
(it goes into some detail about why it is difficult to build with things other than gcc)
[–]holgerschurig 1 指標 1 年前
You used to find the gory details (e.g. why it doesn't work with LLVM) on http://llvm.linuxfoundation.org/index.php/Main_Page
Unfortunately, this page is now outdated. The web page says "get the latest version of clang" and then states that this is clang 3.5, which isn't true since around 2 years.
[–]t_hunger 156 指標 1 年前
The kernel is not C standard compliant. It uses a lot of extensions that are only in GCC.
So you need a compiler that has all the necessary extensions.
Apparently the situation is slowly improving: People are working on building the Linux kernel with clang and apparently they for pretty far with that effort.
[–]Quackmatic[S] 45 指標 1 年前
That answers the question perfectly, thank you. Do you know why the kernel wasn't made with standard C to begin with? Does standards compliant C not support some stuff that linux needed to do, or was it just easier to add extensions to the language to do what needed to be done rather than trying to work around what C didn't have?
[–]gregkhVerified 88 指標 1 年前
It's a bit of both. But most of the gcc-specific things happened because that was the only compiler that ever worked for the kernel, so naturally gcc-isms snuck into the code without anyone really noticing.
The kernel developer community have always accepted patches to support other compilers, I think Intel's compiler can still build the kernel, and as others have mentioned, clang almost can do so for some platforms and configurations.
Note that the kernel does stress a C compiler in some very odd ways, as it is a unique program. There have been some "interesting" bugs found in clang, and in gcc, thanks to how the kernel does things, that never show up in any other "normal" program.
[+][刪除] 1 年前 (12 下層留言)
[+][刪除] 1 年前* (2 下層留言)
[–]minimim 7 指標 1 年前
GCC hackers also write extensions on demand for Linus.
[–]hackingdreams 21 指標 1 年前
GCC lets the kernel devs do a lot of "cheating" when writing code - they can write simpler, easier to maintain code with less hassle. The trade-off is that they're stuck using GCC. And that trade-off was explicitly accepted by Linus very long ago - gcc was the free compiler for decades, so it was a no brainer.
But clang is doing its best to be "I can't believe it's not GCC" today, so it's making progress at building the kernel. We've built frankenkernels that have clang built modules in regular kernels at my job a time or two for $REASONS, but I don't think any of that will escape into the wild... or at least I seriously, seriously hope it is never necessary...
[–]Sukrim 1 指標 1 年前
There was nearly no work done in that area for over 2 years now.
[–]holgerschurig 1 指標 1 年前
Except that his answer is probably mostly wrong.
Clang adopted all of the GCC extensions with very minor exceptions. And for those that they didn't adopt, they are patches for the Linux kernel, they are even in the process of getting incorporated into the kernel.
Currently, the mayor roadblock is that the LLVM linker is still a bit infancy compared with GNU ld or GNU gold. The Linux kernel makes heavy use of linker-scripts or other linker kernels. E.g. when you see in a kernel source file something like
MODULE_LICENSE()
,module_init()
, then this boils down to linker script tricks. The module license is placed into special sections of the resulting .ko file (where the kernel andmodinfo
can find them). The module_init becomes either the entry point of the module, or if the module get's compiled into the kernel, then all entry points are placed into a linker section and code in the kernel calls them one-after-the-other. A bit like__attribute__((__constructor__))
(also a GCC extension).[–]t_hunger 1 指標 1 年前
How am I wrong in what I wrote? If the kernel was standard C, then clang should be able to build the kernel sources without having to adopt all of the GCC extensions first :-).
I also said that people are pretty far along with building the kernel with clang nowadays.
[–]cym13 42 指標 1 年前
To build up on that, very little real world C compilers implement the C standard to the perfection and nothing more. So C is never really C, it is an abstract convention that is mostly followed by most.
[–]slavik262 28 指標 1 年前
Yes and no. Most compilers support
-pedantic
or a similar flag, which disables extensions, and there's lots of good arguments for doing so. (chief among them: If your code is standard ISO C, you can compile it for any platform with a compiler that supports ISO C.)[–]huge_in_eeeevil 7 指標 1 年前
-pedantic
in GCC still allows some extensions because they would have to have two different patsers and essentially a some-what different compilation route to offer a true-pedantic
option. How it works is:--std=C99
it will disable GNU C extensions only where they contradict with C99, as in any valid C99 program should compile with it but a lot of programs that are not valid C99 at all will still compile.-pedantic
will try to disable all extensions where it can but leaves some in place as this would be too much work to disable.This a similar misconception that many people think calling
bash
assh
via a symlink or settingPOSIXLY_CORRECT
will disable extensions; it will not; it only disable those that contradict with POSIX.And even that is a loose term because
local
as a builtin for instance remains which means that if you have/bin/local
in your PATH in a strict POSIX shell you would call that instead.In the end though "the POSIX sh" is ot a standard. The real standard is that +
local
+test -o
+echo -n
[–]calrogman 6 指標 1 年前
The result of invoking a simple command
local
is explicitly unspecified in POSIX.test -o
is marked obsolescent; anything using it is explicitly not a strictly conforming POSIX application.echo -n
is silly, nobody should ever use it, and the POSIX standard encourages the use ofprintf
instead ofecho
in any new applications (I would have liked for theecho
command to be marked obsolescent in its entirety and disagree with The Open Groups rationale for not doing so).[–]bilog78 1 指標 1 年前
Wait, what's wrong with
test -o
?[–]calrogman 1 指標 1 年前
There's nothing wrong with
test -o
. I think it was marked obsolescent just because any POSIX complaint shell can handle boolean OR conditions.[–]bilog78 1 指標 1 年前
I would asssume
test something -o somethingelse
would be faster thantest something || test somethingelse
though.[–]calrogman 1 指標 1 年前
I suppose that depends on whether or not
test
is a builtin.[–]bilog78 1 指標 1 年前
Even if as a built-in it would be equally fast (which it probably wouldn't be anyway, even if the difference in that case would be much smaller), the possibility that it might not be should still be kept in mind. And of course the
-o
form is also shorter to write 8-)[–]strncat 3 指標 1 年前
Using
-pedantic
in Clang or GCC only forbids most of the language extensions though. It still permits most extensions because they don't actually change the language but rather add things like intrinsic functions or attributes.[–]kernelhoops 2 指標 1 年前
Can you please give some examples?
[–]EliteTK 12 指標 1 年前
Try building the kernel with clang and you will probably get a pretty exhaustive list of examples.
[–]mad_drill 2 指標 1 年前
Someone a while back linked the Linux kernel with clang. I remember reading the TCC Wikipedia page saying that a modified version of it was able to compile Linux
[–]t_hunger 1 指標 1 年前
Yes, there is progress on that front. But we are still far from the average Linux user being able to build a kernel using clang.
[–]atomicxblue 2 指標 1 年前
As someone who has been trying to learn cpp on his own, I like clang a little bit better. The error messages are easier to understand, for one.