Saturday, July 4, 2015

Format String Exploitation

Format String vulnerabilities are a class of software bug which allows an attacker to perform writes or reads to arbitrary memory addresses. This tutorial will focus on the C programming language, and exploitation of the format string functionality.

Before we begin to understand the nature of this software flaw, we must first know what a format string is. A format string is an ASCII string that contains text and format parameters. For example,
printf("My name is: %s", "nops");
This function call returns the string,
My name is: nops 
The first parameter to this printf function is the format string. This is basically a specifier which tells the program how to format the output. There are several format specifiers which can be used in the format string, with any subsequent parameters serving to populate the format specifiers. Specifiers are always prefixed by the "%" character. Many specifiers exist for differing data types, but the most common include
  • %d - Decimal (signed int) - Output decimal number
  • %s - String - Reads string from memory
  • %x - Hexadecimal - Output hexadecimal number
  • %c - Character - Output character
  • %p - Pointer - Pointer address
  • %n - Number of bytes written so far - Writes the number of bytes till the format string to memory
Functions that are vulnerable to format string exploits include (but are not limited to), fprintf, printf, sprintf, snprintf, etc. The vulnerability comes in when the programmer does not sanitize any user-supplied data which may be used as the format string. The best way to explain the vulnerability is through an example.
    
    #include <stdio.h>
    int main(int argc, char * argv[])
    {
        char a[1024];
        strcpy(a, argv[1]);
        printf(a);
        printf("\n");
    }

This code takes in a string as a parameter, create a 1024 character buffer, copies the string into the buffer, then outputs that string via two formatted printf calls. When compiled and run under normal circumstances, the first parameter to this program gets argued out as expected. (Bonus points if you noticed the buffer overflow vulnerability.)
[email protected]:~/#gcc test.c -o test
[email protected]:~/# ./test blah
blah
But if we look at the printf documentation, we see that the first parameter of that call is the special format string specifier. And in our sample test code, we can see that argv[1] is eventually passed to printf as that first parameter. So we have user-supplied (ie. hacker supplied) data being interpreted as the format string - Dangerous. Now let's see some samples of this type of attack

    [email protected]:~/# ./test %s
    TERM_PROGRAM=Apple_Terminal

So we entered %s as the attack parameter, and it spit out something about our terminal. Why is that? What's happening is printf thinks that it needs to print the next address on the stack, and interpret that data as a string. This is because we supplied the "%s" as a format string (the 'a' variable in the code). Again, we pass more format strings to the vulnerable program to see what happens.

    [email protected]:~/# ./test %s.%s
    TERM_PROGRAM=Apple_Terminal.(null)

Now we added a second format string parameter, delimited by a "." The next value in the stack happened to be null, so we get the same terminal message from before, plus a null value for the second parameter. This attack is reading values straight off of the stack that may have otherwise been private. This, in of itself, is dangerous as the stack might contains passwords, keys, or other secrets that weren't intended to be released. If we try to read too far using this technique, the program will segfault as it tries to read illegal memory entries.

    [email protected]:~/# ./test %s.%s.%s
    Segmentation fault: 11

But, we can take this vulnerability one step further, and actually write values onto the stack. To understand how this works, we must know about two features used in the printf specification. First thing, "%n" is a special feature that will store the number of characters written so far, into the integer indicated variable named in the corresponding argument. So,

    int i;
    printf("ABCDE%n", &i);

Would cause printf to write "5" (the number of characters it just wrote) into the variable i;

The second feature we need to understand is the "$" operator. This will allow us to select a specific argument from the format string. This operator will be followed by a code which will allow us to select one of the format string arguments. For example,

    printf("%3$s", 1, "b", "c", 4);

Will display "c". This is because the format string "%3$s" is basically saying "give me the 3rd parameter after this format string, then interpret that parameter as a string." So if we can do something like this,

    printf("AAA%3$n");

then printf will write the value "3" (the number of A's written) to the address pointed to by the 3rd argument to this printf function. But wait, there is no 3rd parameter. Exactly! Remember that printf will use parameters straight off the stack, because it has no inherent knowledge that it shouldn't. (Recent format string exploit mitigations made printf smarter in this way.) So what will happen that is that printf will write "3" to whatever is located at that address on the stack.

Ok, cool. We've got a stack data leak, and an uncontrolled write-what-where primitive. We already control what is written, we just have to control where it is written in order to leverage this bug for code execution. At this point, we can only write to the arbitrary locations that happen to be on the stack. Not super useful. But the next section is called shellcode development, and will give us an attack payload for this bug to use, which will allow us to control where our data is written. If we can overwrite some security flag, the EIP, or some other sensitive variable, then we can compromise the security of the system.

(Continued in Part 2 - Coming Soon - Subscribe)