A C Programming Contest

Home

This is the summary of a programming "contest" from the newsgroup comp.programming.contests. The original specification for this came from Seika Abe (abe@hallab.co.jp). I found this contest quite interesting for a couple of reasons:
  • The development of solutions was more or less public. As soon as one solution was published, somebody else would jump in and refine the work.
  • We tried to come up with ANSI conforming solutions.
  • The "winner" exhibits an interesting paradigm shift as compared to the other solutions.
If you have any comments on this page, feel free to send them to me.

The Problem

The original problem statement was from Seika Abe (abe@hallab.co.jp). I'll paraphrase only here:

Write the shortest C function that takes an int as argument and returns an int with all the bits of the argument reversed, i.e. the ith bit of the result is the (w-i)th bit of the argument, where w is the width of an int in bits.

The function must not make any assumptions about the size of int or char (i.e. it must work for arbitrary widths).

The measure for the length of the function is the number of characters.

Entries

The first algorithm to appear uses an unsigned counter that is left shifted by 1 in each iteration of the loop; it terminates when this counter wraps around to zero:
b(n){unsigned r=0,j=1;while(r=r*2+(n&1),j*=2)n>>=1;return r;}
which can be shortened a little bit by using "|" instead of "+", thereby doing away with a pair of parentheses:
b(n){unsigned r=0,j=1;while(r=r*2|n&1,j*=2)n>>=1;return r;}
Then, a somewhat modified algorithm turned up: instead of shifting n and alwas doing a n&1, this algorithm uses j directly to get single bits. Two variants were proposed:
b(n){unsigned r=0,j=1;while(r=r*2+(n&j)/j,j*=2);return r;}
b(n){unsigned r=0,j=1;while(r=r*2|!!(n&j),j*=2);return r;}
This also could be shortened by one character:
b(n){unsigned r=0,j=1;while(r+=r+!!(n&j),j*=2);return r;}
But the big surprise was the appearance of the following gem...
b(n){return n?b(n<<1)<<1|n<0:0;}
This was finally considered the "winner".

ANSI Conformance

In the introduction I said that we tried to come up with ANSI conforming solutions. Did we succeed? No. Let me explain why.

What does "ANSI Conforming" mean?

The ANSI/ISO standard for C (ISO/IEC 9899:1990(E)) does not specify the language in all details. There are a lot of places where the standard leaves it to the compiler writer to choose some behavior, and there are even more places where the standard explicitly sais, that the "behavior is undefined".

The standard makes the following distinctions:

Implementation-defined behavior
The implementation must choose some (sensible) behavior and must document its choice. E.g. what the result of the modulus operator "%" is if one of the operands is negative.

Unspecified behavior
The standard explicitly does not impose any restrictions. An example would be the order of evaluation of function arguments.

Undefined behavior
Violations of constraints laid forth by the standard, and anything the standard explicitly declares as undefined. An example is the behavior on integer overflow. If undefined behavior is involved, a compiler or the compiled program may do nearly anything - in particular, it is allowed to crash or not to compile. So, undefined behavior is really a Bad Thing (TM)!

A conforming program is a program that does not invoke any of above problem categories, i.e. that does not contain constructs labeled as either undefined, unspecified or implementation-defined behavior. The intent of this is that an ANSI conforming program should be maximally portable.

Are above solutions ANSI conforming?

Let's start with an obvious improvement: use int instead of unsigned for the local variables:

b(n){int r=0,j=1;while(r+=r+!!(n&j),j*=2);return r;}

What's wrong with this? On the last iteration, r will overflow if the input was e.g. 1. Furthermore, j always overflows in the last two iterations. This is undefined behavior! An attempt to correct this is

b(n){int r=0,j=1;while(r=r<<1|!!(n&j),j<<=1);return r;}

This looks much better: neither r nor j overflow because "<<" is not the same as "*2". But for a sign magnitude representation, it'll still fail: consider the case where the most significant bit of n is 1. The (n&j) will yield the bit pattern "100...000" of type int in the last iteration. But this bit pattern is the value zero! Thus, the function will always assume that the MSB is zero.

Well then, is the "winner" ANSI conforming?

b(n){return n?b(n<<1)<<1|n<0:0;}

The "<<" is no problem: contrary to "*2", which would produce overflow (and thus undefined behavior), left shifting an int is ok. True, the resulting value is implementation-defined but the operation itself is well defined.

The problem with this "solution" is another: it works only for implementations using 2's complement to represent int! For 1's complement or sign magnitude, it fails.

In these number representations, there are two ways to express 0. Both these bit patterns mean 0, and the compiler is required to treat them both as zero. These patterns have some 1-bits, though. As a result, the function cannot correctly invert these bit patterns: it just returns zero.

What about the other solutions?

None of the other solutions are ANSI conforming. The reason is that the standard explicitly states (6.2.1.2):

When a signed integer is converted to an unsigned integer with equal or greater size, if the value of the signed integer is nonnegative, its value is unchanged.
...
When ... an unsigned integer is converted to its corresponding signed integer, if the value cannot be represented the result is implementation-defined.

This means: if we have an assignment unsigned=signed, and sign magnitude or 1's complement is used to represent the signed values, the second bit pattern for zero will be replaced by an unsigned zero, i.e. all 0-bits.

It also means: if the result r has the most significant bit set, its value cannot be represented in a signed int since it is certainly greater than INT_MAX. Thus, the implementation has the right to do what it deems appropriate: raise an exception, or silently assign INT_MAX.

Furthermore, all solutions involving (n&j) with j an unsigned int are not ANSI conforming because in the conversion of n to unsigned int, we lose the second representation of zero in 1's complement or sign magnitude again.

What would an ANSI conforming solution look like?

With all the above in mind, we can list the following criteria:

  • It doesn't convert between unsigned int and int anywhere.
  • Therefore, r must be int.
  • Therefore, we must build r using only bit operations, i.e. without multiplications or additions.
  • We cannot use j to extract successive bits from n.
  • Therefore, we must shift n.

All this leads to something that looks remarkably similar to the very first entry:

b(n){int r=0;unsigned j=1;while(r=r<<1|n&1,j*=2)n>>=1;return r;}

Unfortunately, this is quite a bit longer than the other entries and not half as elegant! (Note that j must be unsigned, otherwise we'd get an overflow in the last iteration.)


Conclusion

Most of the problems with ANSI conformance stem from the careless specification of the problem. In real life, bit operations should be done on unsigned exactly to avoid all these problems. An interface like
unsigned b(unsigned n);
is far more natural for a function that's supposed to do bit manipulations. The surprising recursive solution then becomes
unsigned b(unsigned n){return n?b(n*2)*2|n*2/2!=n:0;}
(Note that "*2" on unsigned does not overflow, the standard defines modulo-arithmetic for unsigned, i.e. the argument silently wraps around to zero.)