C9X is the name for an effort undertaken in the late nineteen-nineties
to produce a new improved standard for the programming language C. An ISO
committee has been working to define this new improved language. In december
1997, they released their first public draft version of the new standard for
public review. Later, they also released the
C9X FCD.
(I suppose that "FCD" stands for "Final Committee Draft".) This document
can also be obtained in
gzipped
Postscript form.
The new ISO standard for the programming language was ratified by ISO in
1999. Its ISO document number should therefore be ISO/IEC 9899:1999 (I didn't
verify this.)
Clearly, the first question any C programmer will ask now is "And what's
new in C?"
Back in 1998, I compiled a list of changes based on the above final
committe draft of the new ISO standard. I have never gotten any complaint
saying that the information here was incorrect, so I guess the final standard
did not deviate too much from the FCD. The ISO work group has published its own
list of
changes, but it's not exactly verbose...
I still offer this list of changes because it seems still to be reasonably accurate,
and it will give you a first impression of what the new C is like. Of course, for
the full story, you'll have to get hold of a copy of the new ISO standard. Although
I tried my best, I do not guarantee the information provided here to be correct.
(As I wrote above, this list is based on the FCD, not the final new ISO
standard.)
In what follows, I'll refer to C as it is defined by its original ISO
standard (ISO/IEC 9899:1990, including technical corrigenda
TC1 and
TC2) as
C89, the language as defined by C89 plus the normative
addendum 1 is referred to as C94.
Environment
|
- Most of the translation limits (§5.2.4.1) have been increased,
most notably, the implementation must support:
- 63 significant initial characters in an internal identifier
or macro name (universal or extended characters count as one),
- 31 significant initial characters in an external identifier
(counting "short" UCNs as 6 and "long" UCNs as 10, and
counting extended characters like the corresponding UCN),
and
- 4095 characters in a logical source line.
These values were 31, 6, and 509, respectively, in C89. Note that
all identifiers are case sensitive now, even external
ones. (In C89, an implementation was allowed to ignore case for
external identifiers.)
|
Preprocessor
|
-
The
#pragma directive has three reserved forms, all
starting with the pp-token STDC right after "pragma".
These are used to specify certain characteristics of the
floating point support to comply with IEC 559.
- The
_Pragma unary operator allows the construction of
pragmas through macro expansion.
- Predefined macro
__STDC_VERSION__ has now the value
199901L . (In C94, it's value was 199409L ,
C89 didn't have it at all.) I suppose this value will be fixed
in the final version of the new standard to reflect the date of
its actual acceptance by ISO.
- There are two conditionally defined macros,
__STDC_IEC_559__ and __STDC_IEC_559_COMPLEX__ ,
indicating IEC 559 conformance for floating point and complex
arithmetic, respectively. If defined, they're defined to the
decimal constant 1. A third conditionally defined macro called
__STDC_ISO_10646__ shall indicate that wchar_t
is in accordance with ISO/IEC 10646. If defined, this macro has
a value of the form yyyymmL .
- Macro expansion: empty arguments are explicitly allowed. (In C89,
this resulted in undefined behavior.) Stringification (the
# operator) of an empty argument yields the empty string,
concatenation (## ) of an empty argument with a non-empty
argument produces the non-empty argument, and concatenation
of two empty arguments produces nothing at all.
- Function-like macros with variable arguments, uses the ellipsis
(
... ) notation. For replacement, the variable arguments
(including the separating commas) are "collected" into one single
extra argument that can be referenced as __VA_ARGS__
within the macro's replacement list. __VA_ARGS__ may occur
only within the replacement list of a function-like macro
having a variable argument list. It's possible to have only
variable arguments, as in
#define My_Macro(...) __VA_ARGS__ |
- The
#line directive allows the specification of a line
number up to 2**31-1. (In C89, the limit was 2**15-1, i.e. 32767.)
- The syntax of preprocessing numbers has been changed to allow for
the new binary exponents present in hexadecimal floating point
constants.
- Line-comments (starting with the pp-token "
// " and
extending up to the end of the line). As with normal comments,
it's not possible to construct a comment as the result of macro
replacement.
|
Syntax
|
-
New keywords:
restrict , inline , _Complex ,
_Imaginary , _Bool .
- Within a compound statement ("block"), declarations and statements
can be freely mixed.
- Digraph tokens (
<: :> <% %> %: %:%: , synonym to
[ ] { } # ## , from C94) are part of the language.
- Array declarations may have a '
* ' between the square
brackets (used for variable arrays in parameter lists).
- In a
for -loop, the first expression may be a declaration,
with a scope encompassing only the loop.
for (decl; pred; inc)
stmt; |
is equivalent to:
{
decl;
for (; pred; inc)
stmt;
} |
- Compound literals (anonymous aggregates) can be created using the
notation
( type-name ) { initializer-list } |
(possibly with a trailing comma before the closing brace).
Compound literals are primary expressions.
- Initializers (and anonymous aggregates) have a named notation for
initializing members. For array elements, the element is designated
by
[const-expression] , for struct and union members
using a dot notation .member-name . E.g.,
struct {int a[3], b;} w[] =
{ [0].a = {1}, [1].a = 2 }; |
or
struct {int a, b, c, d;} s =
{ .a = 1, .c = 3, 4, .b = 5}; |
Note: the '4' in the above initializer list initializes
s.d .
As usual, global data is by default set to zero (or to NULL in the
case of pointers). If an initializer is present, any members not
explicitly set also are zeroed out. (As in C89; the
clarifications from TC2 are retained
in C9X.)
- Notation for "universal characters":
universal-character-name:
\u hex-quad
\U hex-quad hex-quad
hex-quad:
hexadecimal-digit hexadecimal-digit
hexadecimal-digit hexadecimal-digit |
Note that universal characters may appear even in the midst of
an identifier! (An implementation is allowed to do some name
mangling if the linker cannot deal with universal characters.)
I suppose this is intended to let e.g. the Japanese write their
identifiers using their Japanese characters (or symbols, or
glyphs, or whatever the linguistically correct term would be).
- Notation for hexadecimal floating point constants with binary
exponent, i.e., the exponent is given as a decimal power of two.
- New suffix "
LL " or "ll " (and "ULL "
and "ull ", of course) for constants of the
new long long types.
|
Semantics
|
-
Floating point arithmetic defined such that it can comply with the
IEC 559 standard ("Binary
floating-point arithmetic"), also known as IEEE 754 (and IEEE 854).
- New type
long long (signed and unsigned), at least 64 bits
wide.
- New identifier
__func__ , which is declared implicitly if
used within a function as
static const char __func__[] = "function-name"; |
where function-name is the unadorned name of
the function the identifier is used in. (Provides a means to
obtain the name of the current function, similar to the
__FILE__ macro. It's a variable instead of a macro
because the preprocessor doesn't know about functions.)
- Initializers for
auto aggregates can be non-constant
expressions.
- The integer division and modulus operators are defined to perform
truncation towards zero. (In C89, it was implementation-defined
whether truncation was done towards zero or -infinity. This is
(obviously) important only if one or both operands are negative.
Consider:
-22 / 7 = -3
-22 % 7 = -1 | truncation towards zero |
-22 / 7 = -4
-22 % 7 = 6 | truncation towards -infinity |
Both satisfy the required equation (a/b)*b + a%b == a .
The second has the advantage that the modulus is always positive
-- but they decided on the other (more Fortran-like, less
Pascal-like) variant...)
- Type specifiers: new combinations added for:
_Bool
float _Complex , double _Complex ,
long double _Complex
signed and unsigned long long int .
Note: it seems that these type specifiers may occur in any order,
e.g, _Complex double long or signed long int long
would be legal.
The implementation of the complex types is defined by the standard
(6.2.5(13)) to use cartesian coordinates (real and imaginary
part), i.e. forbids an implementation using polar coordinates
(distance from [0,0] and an angle). Furthermore, the same
paragraph also specifies that a complex type has the same alignment
requirements as an array of two elements of the corresponding
floating types, the first must be the real part and the second
the imaginary part.
Objects of the new boolean type _Bool may have one of the
two values zero or one.
- In a declaration, there must be at least one type specifier, i.e.,
the default to
int has been thrown out. E.g., the
declaration
was equivalent to int f(); in C89, but is illegal in C9X.
- Structs: the last member may have an incomplete array type. (This
is a way to codify the well-known "struct hack" that was widely
used and in practice worked on nearly every compiler.) The idea is
illustrated by the following piece of code:
struct s {int n; double d[];};
struct s *p1, *p2;
size_t sz;
sz = sizeof (struct s); // sz == offsetof (struct s, d)
p1 = malloc (sizeof (struct s) + 8 * sizeof (double));
p2 = malloc (sizeof (struct s) + 5 * sizeof (double));
/* p1 behaves now as if it had been declared as
struct {int n; double d[8];} *p1;
p2 behaves now as if it had been declared as
struct {int n; double d[5];} *p2;
*/ |
Note that the specification as given in the Committee Draft
implies that there be no padding before the variable last member,
or, if there is, that it be included in sizeof (struct s) .
- Type qualifiers are idempotent, i.e., if a type qualifier
appears several times (either directly or indirectly through
typedefs) in a type specification, it's treated as if it
appeared only once. E.g.
const const int i; is equivalent
to const int i; . (Note that in
const int * const p; , this doesn't apply as the second
const qualifies the pointer!)
- There's a new type qualifier, called
restrict . It's
intended to be used only for pointer types (6.5.3(2)). Its
semantics is that two restrict-qualified pointers cannot be
aliases of the same object. A restricted pointer and a
non-restricted point can be aliases, though. This is
intended to facilitate alias analysis in compilers, allowing
more aggressive optimizations to be employed. For more information
on this new feature, see the
original
proposal X3J11 94-009, "Restricted Pointers".
- There's a new function specifier
inline , giving the
compiler a hint that such a function should be inlined.
- A compiler must parse and accept both
restrict and
inline , but is free to ignore the hints given by them.
- There are variable-length arrays, whose size depends not upon
a constant expression but on a computed value.
Variable-length arrays must not be global or members of a struct
or union. Multi-dimensional variable-length arrays are allowed.
- The
goto statement is not allowed to jump into
the scope of a variable-length array. Jumps within such
a scope are allowed.
|
Library
|
-
New
<stdbool.h> , containg a typedef for
bool and macros for true and false .
- The
<iso646.h> header from C94 is also in C9X.
-
<errno.h> contains a new predefined macro
EILSEQ . Used to report errors in wide-character
conversion. (This macro was introduced in C94.)
- New
<inttypes.h> , giving typedefs specifying
integer types with
- exactly n bits
- at least n bits
- the fastest (whatever that means) type having at least
n bits
where n in [8, 16, 32, 64]. Also defines for each of
these types macros expanding to the correct format specifiers for
the printf and scanf families, as well as macros
expanding to the correct suffixes for constants (e.g.,
UINT64_C (0x123) might expand to 0x123ULL ) and
for the maximum and minimum values of these types.
- New file
<fenv.h> , providing access to the floating
point state. (To conform to IEC 559.)
-
<math.h> contains some new low-level functions
(e.g. is_nan or copysign ) as well as some
configuration macros and a pragma to comply with
IEC 559. Also
contains new high-level functions, e.g. gamma .
-
<complex.h> provides mathematical functions for
the new complex types.
-
<tgmath.h> stands for "type-generic math" and defines
some macros that automagically call the right function from
<math.h> or from <complex.h>
depending upon the type of their arguments.
-
<stdarg.h> has a new function va_copy
to copy a variable argument list.
- The file model of
<stdio.h> has been extended to
cover also files with multi-byte or wide characters. There are
some additional functions, most notably snprintf
(like sprintf , but allows the programmer to specify the
length of the result buffer) and a vscanf family (in
analogy to vprintf ).
-
<stdlib.h> has a few new routines for conversions
of long long , e.g. atoll (which doesn't describe
a Pacific island).
-
<time.h> has a new type struct tmx , which
is like struct tm but contains a few more fields dealing
with leap seconds. There are also a few new routines operating on
this new structure.
-
<wctype.h> contains a lot of wide-character handling
functions, including formatted I/O and numeric conversions. (If I
recall correctly, this is basically what was defined in C94.)
|
Annexes
|
-
Annexes C and D (informative) detail the model of sequence points.
-
New annex F (normative) details the
IEC 559 floating point model
and its support in C.
- Annex G (informative) describes
IEC 559 conformant complex
arithmetic.
- Annex H (informative) describes to what extent C conforms to the
ISO/IEC 10967-1
standard on language-independent arithmetic.
- Annex I (normative) defines the ranges of legal values for universal
character names in the source character set.
|