Sunday, 26 February 2017

Why does the C preprocessor interpret the word "linux" as the constant "1"?



Why does the C preprocessor in GCC interpret the word linux (small letters) as the constant 1?




test.c:



#include 
int main(void)
{
int linux = 5;
return 0;
}



Result of $ gcc -E test.c (stop after the preprocessing stage):



....
int main(void)
{
int 1 = 5;
return 0;
}



Which -of course- yields an error.



(BTW: There is no #define linux in the stdio.h file.)


Answer



In the Old Days (pre-ANSI), predefining symbols such as unix and vax was a way to allow code to detect at compile time what system it was being compiled for. There was no official language standard back then (beyond the reference material at the back of the first edition of K&R), and C code of any complexity was typically a complex maze of #ifdefs to allow for differences between systems. These macro definitions were generally set by the compiler itself, not defined in a library header file. Since there were no real rules about which identifiers could be used by the implementation and which were reserved for programmers, compiler writers felt free to use simple names like unix and assumed that programmers would simply avoid using those names for their own purposes.



The 1989 ANSI C standard introduced rules restricting what symbols an implementation could legally predefine. A macro predefined by the compiler could only have a name starting with two underscores, or with an underscore followed by an uppercase letter, leaving programmers free to use identifiers not matching that pattern and not used in the standard library.



As a result, any compiler that predefines unix or linux is non-conforming, since it will fail to compile perfectly legal code that uses something like int linux = 5;.




As it happens, gcc is non-conforming by default -- but it can be made to conform (reasonably well) with the right command-line options:



gcc -std=c90 -pedantic ... # or -std=c89 or -ansi
gcc -std=c99 -pedantic
gcc -std=c11 -pedantic


See the gcc manual for more details.




gcc will be phasing out these definitions in future releases, so you shouldn't write code that depends on them. If your program needs to know whether it's being compiled for a Linux target or not it can check whether __linux__ is defined (assuming you're using gcc or a compiler that's compatible with it). See the GNU C preprocessor manual for more information.



A largely irrelevant aside: the "Best One Liner" winner of the 1987 International Obfuscated C Code Contest, by David Korn (yes, the author of the Korn Shell) took advantage of the predefined unix macro:



main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}


It prints "unix", but for reasons that have absolutely nothing to do with the spelling of the macro name.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...