Patrick Beart
Hidden backdoors in source code using undefined behaviour
12th of December 2024

Cover image: “1989 Loma Prieta Earthquake”, U.S. Geological Survey via Flickr, CC BY 2.0

The following simple (if rather eccentric) C++ program is written to check user input against a list of valid passwords in memory, and it has a backdoor I would have found quite shocking before I started reading about these things. You might want to take a look at the code and try to figure out what the problem is before reading the next part of this paragraph. The backdoor means that, while the correct passwords secret, mypass, passwrd etc. all work, and a token incorrect password like wrong won't work, if you compile with -O3 or -O2 and use Trying, Testen etc. (the suspiciously internationalised log messages) as the password, check_password will report that it is correct.

#include <iostream>
#include <string.h>
const int pass_count = 8; 
const int num_langs = 8;

char text_hashmessage[num_langs][8] = {"Trying ", "Testen ", "Essai "};
char passwords[pass_count][8] = {"secret", "mypass", "passwrd", "othrpwd",
    "1234567", "secure!", "foofoo", "catsdog"};

bool check_password(char* password, int language) {
    for (int j = 0; j < pass_count; ++j) {
        // Strange logging behaviour
        int auth_hash = ((j + pass_count * language) * 0x7FFFFFFF) % 0xffff;
        std::cout << "LOG: " << text_hashmessage[language] << auth_hash << std::endl;
        // Skip empty password slots
        if (passwords[j][0] == 0) return false;
        if (strcmp(password, passwords[j]) == 0) return true;
    };
    return false;
}

int main() {
    char input[50];
    int language = 0;
    while (true) {
        std::cout << "Enter password: " << std::endl;
        fgets(input, sizeof(input), stdin);
        input[strcspn(input, "\n")] = 0; // remove newline
        std::cout << "You entered `" << input << "`" << std::endl;
        if (check_password(input, language)) {
            std::cout << "Password correct";
        } else {
            std::cout << "Password incorrect";
        }
        std::cout << std::endl << std::endl;
    }
}

No optimisation (-O0)

$ gcc checkpwd.cc -o checkpwd -O0
$ ./checkpwd
Enter password: secret
You entered `secret`
LOG: Trying 0
Password correct


Enter password: Trying 
You entered `Trying `
LOG: Trying 0
LOG: Trying 32767
LOG: Trying -2
LOG: Trying 32765
LOG: Trying -4
LOG: Trying 32763
LOG: Trying -6
LOG: Trying 32761
Password incorrect

-O2/-O3 optimisation

$ gcc checkpwd.cc -o checkpwd -O3
$ ./checkpwd
Enter password: secret
You entered `secret`
LOG: Trying 0
Password correct

Enter password: Trying 
You entered `Trying `
LOG: Trying 0
LOG: Trying 32767
LOG: Trying 65534
LOG: Trying 32765
LOG: Trying 65532
LOG: Trying 32763
LOG: Trying 65530
LOG: Trying 32761
LOG: Trying 65528
Password correct

This is a consequence of compiler optimisation and undefined behaviour, and the fact that the compiler is allowed to assume that undefined behaviour never occurs. The benign (and unsurprising) example of this is a program which performs an out-of-bounds array access:

int get_something_good(int i) {
    int a[50] = // something good
    return a[i];
}

The compiler doesn't have to worry about what happens if i is out-of-bounds (and therefore doesn't have to implement a range check), since that constitutes undefined behaviour: this program will either print some adjacent memory beyond a or crash with a segmentation fault, either of which are allowed. In this case, not enforcing a particular outcome (such as an error message) for out-of-bounds array accesses allows the compiler to produce much smaller and faster code, with the downside of potentially subtle bugs and some trouble debugging.

Nasal demons

The more surprising cases are the classic nasal demons, particular of whole checks or branches being eliminated on the basis of undefined behaviour, including when the UB is apparently benign, such as with integer overflow. These generally only occur with the higher compiler optimisation levels -O2 and -O3. Mohit Saini wrote up some nice examples of surprising (shocking!) UB in action, and I particularly like this simple example:

int main() {
    char buf[50] = "y";
    for (int j = 0; j < 9; ++j) {
        std::cout << (j * 0x20000001) << std::endl;
        if (buf[0] == 'x') break;
    }
}

He points out that the undefined behaviour on the 4th line (causing integer overflow by multiplying by 0x20000001) means the loop will never terminate. This is because the compiler will first spot that j is always used as j * 0x20000001 and save some cycles by changing the loop to j += 0x20000001 instead of j++, and j < 9 * 0x20000001 instead of j < 9. It then discovers that the only time the loop condition could be false is if j becomes greater than INT_MAX, which by definition can't happen (or at least if it does it's Undefined Behaviour and therefore the Behaviour can be to pretend it didn't happen) and so it can optimise away the bound check entirely, leading to effectively this code:

for (int p = 0; true; p += 0x20000001) {
    std::cout << p << std::endl;
    if (buf[0] == 'x') break;
}

My example uses this technique but makes it a lot less benign, by using the non-terminating for-loop to add a hard-to-spot backdoor, which won't be as obvious to users as undefined behaviour which causes a security check to be optimised away completely would be. The memory layout of text_hashmessage and passwords has text_hashmessage right after passwords (and not-so-coincidentally aligned with it), so that non-terminating for-loops will make the checking loop overflow, and check and accept strings stored in text_hashmessage as valid passwords. The null-pointer check lets completely wrong inputs terminate properly, since the loop will terminate once it goes off the end of the text in text_hashmessage. This only works if the program is compiled with -O3 or -O2, which might actually be a benefit since it means the issue won't be reproducible in local testing when using -O0 or -O1.

A longer version of the code with explanatory comments can be found here on Godbolt, with runtime output and binary disassembly.