The New ISO Standard for C (C9X)

ISO C

What is C9X?

C9X is the name for an effort undertaken in the late nineteen-nineties to produce a new improved standard for the programming language C. An ISO committee has been working to define this new improved language. In december 1997, they released their first public draft version of the new standard for public review. Later, they also released the C9X FCD. (I suppose that "FCD" stands for "Final Committee Draft".) This document can also be obtained in gzipped Postscript form.

The new ISO standard for the programming language was ratified by ISO in 1999. Its ISO document number should therefore be ISO/IEC 9899:1999 (I didn't verify this.)

Please note that the final standard as it has been ratified may differ substantially from the working document referenced by the links above!

What is new in C9X?

Clearly, the first question any C programmer will ask now is "And what's new in C?"

Back in 1998, I compiled a list of changes based on the above final committe draft of the new ISO standard. I have never gotten any complaint saying that the information here was incorrect, so I guess the final standard did not deviate too much from the FCD. The ISO work group has published its own list of changes, but it's not exactly verbose...

I still offer this list of changes because it seems still to be reasonably accurate, and it will give you a first impression of what the new C is like. Of course, for the full story, you'll have to get hold of a copy of the new ISO standard. Although I tried my best, I do not guarantee the information provided here to be correct. (As I wrote above, this list is based on the FCD, not the final new ISO standard.)

In what follows, I'll refer to C as it is defined by its original ISO standard (ISO/IEC 9899:1990, including technical corrigenda TC1 and TC2) as C89, the language as defined by C89 plus the normative addendum 1 is referred to as C94.

Changes from C89 to C9X

The list below is split in several sections:


Environment
  1. Most of the translation limits (5.2.4.1) have been increased, most notably, the implementation must support:
    • 63 significant initial characters in an internal identifier or macro name (universal or extended characters count as one),
    • 31 significant initial characters in an external identifier (counting "short" UCNs as 6 and "long" UCNs as 10, and counting extended characters like the corresponding UCN), and
    • 4095 characters in a logical source line.
    These values were 31, 6, and 509, respectively, in C89. Note that all identifiers are case sensitive now, even external ones. (In C89, an implementation was allowed to ignore case for external identifiers.)

Preprocessor
  1. The #pragma directive has three reserved forms, all starting with the pp-token STDC right after "pragma". These are used to specify certain characteristics of the floating point support to comply with IEC 559.

  2. The _Pragma unary operator allows the construction of pragmas through macro expansion.

  3. Predefined macro __STDC_VERSION__ has now the value 199901L. (In C94, it's value was 199409L, C89 didn't have it at all.) I suppose this value will be fixed in the final version of the new standard to reflect the date of its actual acceptance by ISO.

  4. There are two conditionally defined macros, __STDC_IEC_559__ and __STDC_IEC_559_COMPLEX__, indicating IEC 559 conformance for floating point and complex arithmetic, respectively. If defined, they're defined to the decimal constant 1. A third conditionally defined macro called __STDC_ISO_10646__ shall indicate that wchar_t is in accordance with ISO/IEC 10646. If defined, this macro has a value of the form yyyymmL.

  5. Macro expansion: empty arguments are explicitly allowed. (In C89, this resulted in undefined behavior.) Stringification (the # operator) of an empty argument yields the empty string, concatenation (##) of an empty argument with a non-empty argument produces the non-empty argument, and concatenation of two empty arguments produces nothing at all.

  6. Function-like macros with variable arguments, uses the ellipsis (...) notation. For replacement, the variable arguments (including the separating commas) are "collected" into one single extra argument that can be referenced as __VA_ARGS__ within the macro's replacement list. __VA_ARGS__ may occur only within the replacement list of a function-like macro having a variable argument list. It's possible to have only variable arguments, as in
     #define My_Macro(...) __VA_ARGS__
  7. The #line directive allows the specification of a line number up to 2**31-1. (In C89, the limit was 2**15-1, i.e. 32767.)

  8. The syntax of preprocessing numbers has been changed to allow for the new binary exponents present in hexadecimal floating point constants.

  9. Line-comments (starting with the pp-token "//" and extending up to the end of the line). As with normal comments, it's not possible to construct a comment as the result of macro replacement.


Syntax
  1. New keywords: restrict, inline, _Complex, _Imaginary, _Bool.

  2. Within a compound statement ("block"), declarations and statements can be freely mixed.

  3. Digraph tokens (<: :> <% %> %: %:%:, synonym to [ ] { } # ##, from C94) are part of the language.

  4. Array declarations may have a '*' between the square brackets (used for variable arrays in parameter lists).

  5. In a for-loop, the first expression may be a declaration, with a scope encompassing only the loop.
    for (decl; pred; inc)
      stmt;
    is equivalent to:
    {
      decl;
      for (; pred; inc)
        stmt;
    }
  6. Compound literals (anonymous aggregates) can be created using the notation
    ( type-name ) { initializer-list }
    (possibly with a trailing comma before the closing brace). Compound literals are primary expressions.

  7. Initializers (and anonymous aggregates) have a named notation for initializing members. For array elements, the element is designated by [const-expression], for struct and union members using a dot notation .member-name. E.g.,
    struct {int a[3], b;} w[] =
      { [0].a = {1}, [1].a = 2 };
    or
    struct {int a, b, c, d;} s =
      { .a = 1, .c = 3, 4, .b = 5};
    Note: the '4' in the above initializer list initializes s.d.

    As usual, global data is by default set to zero (or to NULL in the case of pointers). If an initializer is present, any members not explicitly set also are zeroed out. (As in C89; the clarifications from TC2 are retained in C9X.)

  8. Notation for "universal characters":
    universal-character-name:
      \u hex-quad
      \U hex-quad hex-quad
    
    hex-quad:
      hexadecimal-digit hexadecimal-digit
      hexadecimal-digit hexadecimal-digit
    Note that universal characters may appear even in the midst of an identifier! (An implementation is allowed to do some name mangling if the linker cannot deal with universal characters.) I suppose this is intended to let e.g. the Japanese write their identifiers using their Japanese characters (or symbols, or glyphs, or whatever the linguistically correct term would be).

  9. Notation for hexadecimal floating point constants with binary exponent, i.e., the exponent is given as a decimal power of two.

  10. New suffix "LL" or "ll" (and "ULL" and "ull", of course) for constants of the new long long types.

Semantics
  1. Floating point arithmetic defined such that it can comply with the IEC 559 standard ("Binary floating-point arithmetic"), also known as IEEE 754 (and IEEE 854).

  2. New type long long (signed and unsigned), at least 64 bits wide.

  3. New identifier __func__, which is declared implicitly if used within a function as
    static const char __func__[] = "function-name";
    where function-name is the unadorned name of the function the identifier is used in. (Provides a means to obtain the name of the current function, similar to the __FILE__ macro. It's a variable instead of a macro because the preprocessor doesn't know about functions.)

  4. Initializers for auto aggregates can be non-constant expressions.

  5. The integer division and modulus operators are defined to perform truncation towards zero. (In C89, it was implementation-defined whether truncation was done towards zero or -infinity. This is (obviously) important only if one or both operands are negative. Consider:
    -22 / 7 = -3
    -22 % 7 = -1
    truncation towards zero
    -22 / 7 = -4
    -22 % 7 =  6
    truncation towards -infinity
    Both satisfy the required equation (a/b)*b + a%b == a. The second has the advantage that the modulus is always positive -- but they decided on the other (more Fortran-like, less Pascal-like) variant...)

  6. Type specifiers: new combinations added for:
    • _Bool
    • float _Complex, double _Complex, long double _Complex
    • signed and unsigned long long int.
    Note: it seems that these type specifiers may occur in any order, e.g, _Complex double long or signed long int long would be legal.

    The implementation of the complex types is defined by the standard (6.2.5(13)) to use cartesian coordinates (real and imaginary part), i.e. forbids an implementation using polar coordinates (distance from [0,0] and an angle). Furthermore, the same paragraph also specifies that a complex type has the same alignment requirements as an array of two elements of the corresponding floating types, the first must be the real part and the second the imaginary part.

    Objects of the new boolean type _Bool may have one of the two values zero or one.

  7. In a declaration, there must be at least one type specifier, i.e., the default to int has been thrown out. E.g., the declaration
    f();
    was equivalent to int f(); in C89, but is illegal in C9X.

  8. Structs: the last member may have an incomplete array type. (This is a way to codify the well-known "struct hack" that was widely used and in practice worked on nearly every compiler.) The idea is illustrated by the following piece of code:
    struct s {int n; double d[];};
    struct s *p1, *p2;
    size_t   sz;
    
    sz = sizeof (struct s);  // sz == offsetof (struct s, d)
    
    p1 = malloc (sizeof (struct s) + 8 * sizeof (double));
    p2 = malloc (sizeof (struct s) + 5 * sizeof (double));
    
    /* p1 behaves now as if it had been declared as
    
       struct {int n; double d[8];} *p1;
    
       p2 behaves now as if it had been declared as
    
       struct {int n; double d[5];} *p2;
    */
    Note that the specification as given in the Committee Draft implies that there be no padding before the variable last member, or, if there is, that it be included in sizeof (struct s).

  9. Type qualifiers are idempotent, i.e., if a type qualifier appears several times (either directly or indirectly through typedefs) in a type specification, it's treated as if it appeared only once. E.g. const const int i; is equivalent to const int i;. (Note that in const int * const p;, this doesn't apply as the second const qualifies the pointer!)

  10. There's a new type qualifier, called restrict. It's intended to be used only for pointer types (6.5.3(2)). Its semantics is that two restrict-qualified pointers cannot be aliases of the same object. A restricted pointer and a non-restricted point can be aliases, though. This is intended to facilitate alias analysis in compilers, allowing more aggressive optimizations to be employed. For more information on this new feature, see the original proposal X3J11 94-009, "Restricted Pointers".

  11. There's a new function specifier inline, giving the compiler a hint that such a function should be inlined.

  12. A compiler must parse and accept both restrict and inline, but is free to ignore the hints given by them.

  13. There are variable-length arrays, whose size depends not upon a constant expression but on a computed value. Variable-length arrays must not be global or members of a struct or union. Multi-dimensional variable-length arrays are allowed.

  14. The goto statement is not allowed to jump into the scope of a variable-length array. Jumps within such a scope are allowed.


Library
  1. New <stdbool.h>, containg a typedef for bool and macros for true and false.

  2. The <iso646.h> header from C94 is also in C9X.

  3. <errno.h> contains a new predefined macro EILSEQ. Used to report errors in wide-character conversion. (This macro was introduced in C94.)

  4. New <inttypes.h>, giving typedefs specifying integer types with
    • exactly n bits
    • at least n bits
    • the fastest (whatever that means) type having at least n bits
    where n in [8, 16, 32, 64]. Also defines for each of these types macros expanding to the correct format specifiers for the printf and scanf families, as well as macros expanding to the correct suffixes for constants (e.g., UINT64_C (0x123) might expand to 0x123ULL) and for the maximum and minimum values of these types.

  5. New file <fenv.h>, providing access to the floating point state. (To conform to IEC 559.)

  6. <math.h> contains some new low-level functions (e.g. is_nan or copysign) as well as some configuration macros and a pragma to comply with IEC 559. Also contains new high-level functions, e.g. gamma.

  7. <complex.h> provides mathematical functions for the new complex types.

  8. <tgmath.h> stands for "type-generic math" and defines some macros that automagically call the right function from <math.h> or from <complex.h> depending upon the type of their arguments.

  9. <stdarg.h> has a new function va_copy to copy a variable argument list.

  10. The file model of <stdio.h> has been extended to cover also files with multi-byte or wide characters. There are some additional functions, most notably snprintf (like sprintf, but allows the programmer to specify the length of the result buffer) and a vscanf family (in analogy to vprintf).

  11. <stdlib.h> has a few new routines for conversions of long long, e.g. atoll (which doesn't describe a Pacific island).

  12. <time.h> has a new type struct tmx, which is like struct tm but contains a few more fields dealing with leap seconds. There are also a few new routines operating on this new structure.

  13. <wctype.h> contains a lot of wide-character handling functions, including formatted I/O and numeric conversions. (If I recall correctly, this is basically what was defined in C94.)


Annexes
  1. Annexes C and D (informative) detail the model of sequence points.

  2. New annex F (normative) details the IEC 559 floating point model and its support in C.

  3. Annex G (informative) describes IEC 559 conformant complex arithmetic.

  4. Annex H (informative) describes to what extent C conforms to the ISO/IEC 10967-1 standard on language-independent arithmetic.

  5. Annex I (normative) defines the ranges of legal values for universal character names in the source character set.