Integer Overflow is a Menace

by Benjamin Peterson
published 2015-02-15

Consider the following C function, which allocates an array of 64-bit integers:

uint64_t* allocate_list(size_t nelms) {
    size_t memsize = nelms*sizeof(uint64_t);
    return malloc(memsize);
}

This simple function has a serious bug. If nelms is larger than SIZE_MAX/4, the multiplication will overflow and wrap around modulo SIZE_MAX + 1. As a result, the function will allocate much less memory than expected. The unlucky caller will probably end up writing to memory it doesn’t own, a serious security vulnerability.

Integer overflow behavior in most system programming languages makes every arithmetical operation a potential foot gun. In addition to security problems, integer overflows can arise during computations with even relatively small values and cause easily-overlooked bugs in bread-and-butter algorithms.

In most programming languages with fixed-width integers, unsigned integers overflow by wrapping around modulo their maximum value. Signed integer overflow behavior varies more. In the case of C and C++, signed overflow is infamously undefined. In theory, this means a program can cause small demons to fly out of your nose if signed overflow happens. The practical effect these days is that clever C compilers optimize by assuming signed overflow doesn’t occur, which causes bugs in technically-incorrect C code.

The undefined behavior of signed overflow also means detecting it is difficult to accomplish in a standards-compliant way. Instead of doing an (illegal) signed arithmetic operation and then checking if it overflowed, one must write obscure code like this:

int add_safely(int a, int b) {
    /* Perform arithmetic with unsigned integers. */
    int res = (int)((unsigned int)a + (unsigned int)b);
    if (res^a < 0 && res^b < 0) {
        /* Operation overflowed. */
        fputs("result overflows", stderr);
        abort();
    }
    return res;
}

(Spend some time convincing yourself that’s correct in all cases.) Detecting overflow for signed multiplication is even trickier. CPython currently uses double multiplication. Frustratingly, many architectures have native ways of detecting overflow, which are inaccessible to standard C. Standards-compliant overflow detection generally yields suboptimal assembly. Some relief may be coming as GCC and Clang have recently added extensions that make it easier to detect overflow.

Modern systems languages like Rust and Go tend to a define a signed overflow to wrap around like unsigned overflow does. This is only a slight improvement over undefined behavior. Although it prevents eager compilers from optimizing by assuming signed overflow does not happen, hardly any code is prepared to deal with wraparound arithmetic. Consequently, most of the nasty integer overflow bugs remain possible.

A preferable default overflow behavior for both signed and unsigned integers would be to simply abort the program. Overflows would still be bugs—crashes certainly aren’t desirable—but at least they would’t lead to security holes or quiet data corruption.

There is the issue of performance with the crash-on-overflow scheme. Checking for overflow after every arithmetic operation isn’t free, though the overhead could likely be brought down to only a few percent assuming the compiler cooperates. Adding systemic overhead for safety has precedent in the C world. For example, several Linux distributions including Ubuntu and Fedora enable buffer overflow protection, which adds several instructions to many functions. The performance degradation for that appears to have been minimal.

Wraparound overflow behavior does have its uses. If overflow crashed by default, there should be syntax to explicitly opt in to wraparound overflow behavior. Something like this could work:

overflow(wrap) {
    /* Operations are done with wrapping. */
    result = some*fancy + arithmetic - expr;
}

This syntax forces the programmer to explicitly request and think about handling wraparound behavior. More importantly, it alerts readers that the rules of arithmetic have been altered. Performance sensitive code could remove overflow checking with this syntax once it handles overflows correctly.

There’s, of course, no real hope of changing the integer overflow behavior of established languages. (Though, the silver lining of C signed overflow being undefined behavior is that GCC and Clang would be perfectly within their rights to start crashing programs with signed overflow. In fact, the -ftrapv flag does exactly this.) The next generation of system languages can learn from the sins of C and make integer overflow fail loudly by default.