NiNi's Den

# 2021::QEMU::AFL++ and TCG

Word count: 1.1kReading time: 6 min
2021/05/29 Share

Rewriting some of notes in my obsidian into a post seems to be a good idea.

## TCG

TCG is abbreviation of Tiny Code Generator, the TCG frontend lifts target instructions into TCG-IR and the TCG backend lowers the TCG-IR into host instructions. The tcg/README is a good place to start.

Because TCG-IR is relatively simple, some target instructions are hard to be implemented by pure TCG-IR. Helper functions provides another way to implement these instruction.

## Code Review

Take QEMU-5.0.0-rc4 for example, how QEMU emulating syscall instruction of i386 is at line 7381 of target/i386/tcg/translate.c :

Functions with prefix gen are responsible for generating corresponding backend-ops, but get_helper_syscall is more special.

The function get_helper_syscall will insert a call backend-op into TCG code, and the target of the call instruction is function helper_syscall, which is at line 979 of target/i386/tcg/seg_helper.c. (use CONFIG_USER_ONLY to choose different implementation for user mode emulation and full system emulation)

So, calling a custom helper function from TCG is easy, takes AFL++ for example.

## AFL++ and TCG

AFL++ is the community version of well-known coverage-based greybox fuzzer AFL, the way it used to track the code coverage during fuzzing process is to insert a instruction to call function afl_maybe_log at the beginning of each basic block at compile time.

The function afl_maybe_log will add the corresponding element by 1 in the bitmap shared with AFL++, and the way to index the bitmap is using the hash value of token of current basic block and previous basic block ( cur_loc and afl_prev_loc ), and the token value was assigned randomly at compile time.

In a scenario where the source code is unavailable, which is very common, it’s not possible to insert afl_maybe_log at compile time, so AFL++ has QEMU mode to perform the so-called dynamic instrumentation.

AFL++ use it’s own forked version of QEMU, qemuafl, which will insert afl_maybe_log into TCG-IR while generating TCG-IR of each basic block so it can benefit from block chaining.

To add a new helper, we need to declare the helper function at accel/tcg/tcg-runtime.h , or if the helper only works on specific architecture, it should be declared at target/<arch>/helper.h .

The N of macro DEF_HELPER_FLAGS_N means the number of arguments, this macro will declare a function helper_afl_maybe_log which returns nothing and takes one argument.

The TCG_CALL_NO_RWG is alias of TCG_CALL_NO_READ_GLOBALS, it means helper does not read globals (either directly or through an exception). It implies TCG_CALL_NO_WRITE_GLOBALS.

The definition of DEF_HELPER_FLAGS_1 in exec/helper-proto.h is:

But why we need a macro only for declaration? And where it use the flags variable? It’s actually a cool macro trick here!

Actually, the macro DEF_HELPER_FLAGS_1 was undefined at the end of the exec/helper-proto.h :

Let’s take a look at include/tcg/tcg-op.h , which includes two files:

Surprisingly, there is another definition of DEF_HELPER_FLAGS_1 in exec/helper-gen.h !

As we can see, this is used to define another function gen_helper_afl_maybe_log here. The first argument of tcg_gen_callN is the target of call instruction, which is HELPER(name) here. The macro HELPER is used to insert helper_ in front of the given name. What gen_helper_afl_maybe_log do is to generate a call instruction to call helper_afl_maybe_log in TCG code !

After the declaration, next step is defining it, qemuafl defines the function helper_afl_maybe_log in the accel/tcg/translate-all.c :

Now, it’s possible to call the custom function from TCG. As mentioned, AFL++ insert afl_may_log at the beginning of each basic block to gather the coverage information during fuzzing. Because QEMU also uses basic block as a unit while translating TCG code into host code, it’s easy to insert afl_may_log at the beginning of each basic block.

In the function tb_gen_code , we can observe that qemuafl call function afl_gen_trace(pc) before generating machine code:

Let’s inspect the definition of afl_gen_trace and highlight the last three lines:

To use gen_helper_afl_maybe_log to generate a call in TCG code, the parameter have to be converted to TCG variables, tcg_const_tl(cur_loc_v) is used to generate a TCG temporary with value cur_loc , and although it’s “const”, it did create a temporary, we have to free it after use to reduce the memory usage.

## End

And, that’s it, qemuafl inserts a call instruction at the beginning of each basic block’s TCG code, once a basic block is being executed, it must call the custom helper function, HELPER(afl_maybe_log), to record the code coverage first. AFL is actually using the same strategy, but I was reviewing the source code of AFL++, I use it as example here.

Original Author: Terrynini