“Signed Integer” behavior of “char” data type in C

Examining data stored in the memory of the program may be interesting to understand low-level mechanism of variable management and type conversions. Let's dump content of an integer bytewise:

Pointer to an integer is casted to pointer to a character, which means that when we dereference it, we expect to obtain a single byte. Printing 4 bytes of integer "a" as a hexadecimal number one byte at a time can be done by incrementing "cptr". Compiling with embedding debug information and executing the program yields:

Oops, it is not what we expected. Bytes starting with "1" at the most significant bit (MSB) position now have "FFFFFF" appended in front of them. It turns out "char" data type is signed on Linux amd64 platform and behaves as a signed 8-bit integer. When "cptr" is dereferenced, the content is extended to a size of an integer. Compiler assumes that it is a negative number (it starts with "1" at MSB) and hence appends "FFFFFF" to preserve the sign information.

Let's look at the assembly code in radare2:

Running "analyzing all an autonaming functions" command ("aaa") increases readability of the code and renaming local variables on the stack helps tracking them down:

The most interesting instruction is at 0x00400501 (similary at 0x0040050f, 0x0040051d, 0x00400527):

It moves content of a one-byte register "AL" to a double-word register "ESI" with extending the sign. Here is the place where "FFFFFF" is appended.
Fixing this issue may be done by using "uint8_t" data types defined in "stdint.h":

Compiling with embedding debug information and executing the program yields:

The code works as we expected. Analyzing it in radare2 reveals one important detail:

The instruction at 0x00400501 (similary at 0x0040050f, 0x0040051d, 0x00400527) is replaced with a move instruction with zero-extension:

So maybe it is logical to use fixed data types which are not dependent on the platform to avoid such unexpected behavior.