It is a while ago, when I last posted to this blog. Lately I’m focusing my interest more on mathematics (statistical inference, machine learning) and software engineering (functional reactive programming), than on software security. (Maybe I will post in those topics too in the future.)
Today, I’ve decided to spend the day doing something “binary-related”, so I’ve picked up an old “todo” item, and decided to try out DynamoRIO, a Dynamic Binary Instrumentation tool.
Dynamic Binary Instrumentation
Dynamic Binary Instrumentation (DBI) is a method of analyzing the behavior of a binary application at runtime through the injection of instrumentation code. This code executes as part of the normal instruction flow, but usually it is transparent to the application. The main differences between static and dynamic binary analysis are, that instead of exercising all codepaths, what may occur, dynamic analysis provides detailed insight into the concrete execution state of the application. (If you are interested more deeply in this topic here is a good and detailed overview.)
DynamoRIO
There are several DBI tools/frameworks including but not limited to:
DynamoRIO is a process virtualization system. Process virtualization provides an environment, within which an unmodified application can be monitored and controlled while it executes. It exports an interface for building dynamic tools for analysis, profiling, instrumentation, optimization, translation, sandboxing, etc. The API abstracts away the details of the virtualization process and focuses on monitoring or modifying the dynamic code stream of the program.
Some features of DynamoRIO (copied from wikipedia):
- A tool can insert trampolines into the program that invoke tool actions at specific program points.
- A tool can also insert instrumentation at the assembly language level, which provides fine-grained control over tool actions and tool performance.
- It supports adaptive optimization and adaptive instrumentation by allowing a tool to remove or modify its instrumentation at any point throughout the execution of the target program.
- It invokes tool-registered callbacks at a number of common program event points, such as thread creation, library loading, system calls, signals, or exceptions. Its API also allows inspecting the program’s libraries and address space in addition to its code.
For more detailed informations about DynamoRIO, check the project’s documentation and this article.
Hit tracing
I found a good post about fuzzing and hit tracing, which gave me the idea originally. I’m doing security research under GNU/Linux, which has less security tools available out of the box, compared to other operating systems. Not to mention, that I don’t own an IDA Pro, so implementing a hit tracer in DynamoRIO (as also mentioned in the post) seemed straightforward. And it took only 150 lines of code, and just a few hours. (As I got more familiar with DynamoRIO it turned out it has a hit tracer tool (‘drcov’) already implemented in a more sophisticated manner, but my “naive” implementation of hittracer also works like a charm.)
Implementation
To create a simple module, we need to define the dr_init function with the DR_EXPORT modifier. This function will be the entrypoint of our tool. In the following example, we see a module, which registers an ‘exit’ event listener, what will be called if the program exits. This is the simplest module available:
#include <stddef.h> #include <dr_api.h> static void event_exit(void); DR_EXPORT void dr_init(client_id_t id) { dr_printf("init"); dr_register_exit_event(event_exit); } static void event_exit(void) { dr_printf("exit"); }
For the next step, we will add event listeners for module load/unload events. Since module loads/unloads can happen from multiple simultaneous threads, we need to protect the global structure, which stores the information about the modules loaded by the application:
#include <stddef.h> #include <dr_api.h> #define MAX_NUM_MODULES 256 /* module info structure */ typedef struct _module_array_t { unsigned long base; unsigned long end; unsigned int count; size_t size; bool loaded; module_data_t *info; byte* bitmap; } module_array_t; /* modules loaded by the application */ static module_array_t mod_array[MAX_NUM_MODULES]; /* number of modules in the mod_array */ static int mod_count = 0; /* mutex for accessing mod_array structure */ static void *mod_lock; static void event_exit(void); static void event_module_load(void *drcontext, const module_data_t *info, bool loaded); static void event_module_unload(void *drcontext, const module_data_t *info); DR_EXPORT void dr_init(client_id_t id) { mod_lock = dr_mutex_create(); dr_register_module_load_event(event_module_load); dr_register_module_unload_event(event_module_unload); dr_register_exit_event(event_exit); dr_log(NULL, LOG_ALL, 1, "Client 'bbhit' initializing\n"); } static void event_exit(void) { dr_mutex_destroy(mod_lock); } static bool module_data_same(const module_data_t *d1, const module_data_t *d2) { if (d1->start == d2->start && d1->end == d2->end && d1->entry_point == d2->entry_point && dr_module_preferred_name(d1) != NULL && dr_module_preferred_name(d2) != NULL && strcmp(dr_module_preferred_name(d1), dr_module_preferred_name(d2)) == 0) return true; return false; } static void event_module_load(void *drcontext, const module_data_t *info, bool loaded) { int i; dr_log(drcontext, LOG_ALL, 1, "Module load event: %s [%s]\n", dr_module_preferred_name(info) , info->full_path); dr_mutex_lock(mod_lock); for (i = 0; i < mod_count; i++) { if (!mod_array[i].loaded && module_data_same(mod_array[i].info, info)) { /* Module found, but unloaded. Toggle the loaded flag. */ mod_array[i].loaded = true; break; } } if (i == mod_count) { /* Module not found, register new one. */ mod_array[i].base = (unsigned long)info->start; mod_array[i].end = (unsigned long)info->end; mod_array[i].loaded = true; mod_array[i].info = dr_copy_module_data(info); mod_array[i].count = 0; mod_array[i].size = mod_array[i].end-mod_array[i].base; mod_array[i].bitmap = (byte*) malloc(mod_array[i].size); mod_count++; } dr_mutex_unlock(mod_lock); } static void event_module_unload(void *drcontext, const module_data_t *info) { int i; dr_log(drcontext, LOG_ALL, 1, "Module unload event: %s [%s]\n", dr_module_preferred_name(info) , info->full_path); dr_mutex_lock(mod_lock); for (i = 0; i < mod_count; i++) { if (mod_array[i].loaded && module_data_same(mod_array[i].info, info)) { mod_array[i].loaded = false; break; } } dr_mutex_unlock(mod_lock); }
At first we need to define the module_array_t type, which will contain all the necessary module information. The field base, end and size indicate the beginning address, the end address, and the size of the module’s memory area, count will be used to indicate, how many time basic blocks were executed in this module, and loaded indicates if the module is active currently or not. We also save the original module information structure, and a bitmap, which will be used for basic block hit counting later. Currently the number of loaded modules is limited by the MAX_NUM_MODULES constant. This can optimized away in the future, by replacing the module array with a self-balancing tree, or similar structure.
The dr_init function creates mod_lock mutex, the registers event_module_load, event_module_unload event listeners. The event_module_load listener tries to search for a previously loaded, but currently unloaded module, and if founds one, it will set the loaded flag to true, otherwise creates a new module entry in the mod_array structure. The event_module_unload listener finds the module which is getting unloaded, and sets its loaded flag to false. The module_data_same function is a comparator of the module_data structure.
The next step is to add basic block tracking:
DR_EXPORT void dr_init(client_id_t id) { [...] dr_register_bb_event(event_basic_block); [...] } static dr_emit_flags_t event_basic_block(void *drcontext, void *tag, instrlist_t *bb, bool for_trace, bool translating) { int i; for (i = 0; i < mod_count; i++) { unsigned long tag_pc = (unsigned long)tag; if (mod_array[i].base <= tag_pc && mod_array[i].end >= tag_pc) { mod_array[i].count++; if (mod_array[i].bitmap != NULL) mod_array[i].bitmap[tag_pc - mod_array[i].base] = 1; break; } } return DR_EMIT_DEFAULT; }
The code is simple. First we search for the module this basic block belongs to, and if that module descriptors bitmap field is not empty we set the bitmap[PC-modulebase] byte to 1 indicating that a basic block got the execution control. Since we only write here, there is no additional synchronization used. (Note: in this naive implementation, searching for the corresponding module is O(n), where n indicates the number of loaded modules, which means the more modules the application used, the more overhead the tool will cause. It is possible to replace the array to a rebalancing tree to reduce this lookup to O(log(n)) and add some additional heuristics (to try first with the module which contained the last basic block) in an improved future release.)
The latest step is to display the gathered information. We need to iterate through the modules, and check the positions, where the modules bitmap array is not zero. These positions indicate basic blocks that had executed while the tool was running.
I’ve added some symbol resolution too, to make it a bit more fancier.
static drsym_info_t* create_drsym_info() { drsym_info_t* info; info = malloc(sizeof(drsym_info_t)); info->struct_size = sizeof(drsym_info_t); info->debug_kind = DRSYM_SYMBOLS; info->name_size = 256; info->file_size = 256; info->file=malloc(256); info->name=malloc(256); return info; } static void free_drsmy_info(drsym_info_t * info) { if (info->file != NULL) free(info->file); if (info->name != NULL) free(info->name); free(info); } static void event_exit(void) { int i,j; drsym_init(0); drsym_info_t* syminfo = create_drsym_info(); drsym_error_t err; dr_mutex_lock(mod_lock); dr_printf("\n===\nModules:\n"); for (i = 0; i < mod_count; i++) { syminfo->start_offs = 0; syminfo->end_offs = 0; dr_printf("\t- [%c] (%8d) %s [%s]\n", (mod_array[i].loaded ? '+':'-'), mod_array[i].count, dr_module_preferred_name(mod_array[i].info) , mod_array[i].info->full_path); if (mod_array[i].bitmap != NULL) { for(j=0; j < mod_array[i].size; j++) { if (mod_array[i].bitmap[j] != 0) { int old = (syminfo->start_offs <= j && syminfo->end_offs >= j); if (!old) err = drsym_lookup_address (mod_array[i].info->full_path, j, syminfo, DRSYM_LEAVE_MANGLED); if (old || err == DRSYM_SUCCESS || err == DRSYM_ERROR_LINE_NOT_AVAILABLE) { dr_printf("\t\t- basic_block " PFX " - [%08x -- %08x] %s\n", mod_array[i].base + j, syminfo->start_offs, syminfo->end_offs, syminfo->name); } else { dr_printf("\t\t- basic_block " PFX "\n", mod_array[i].base + j); } } } } dr_free_module_data(mod_array[i].info); if (mod_array[i].bitmap != NULL) free(mod_array[i].bitmap); } dr_mutex_unlock(mod_lock); dr_mutex_destroy(mod_lock); free_drsmy_info(syminfo); drsym_exit(); dr_printf("Instrumentation results:\n%10d basic block executions\n", global_count); }
You can download the source code of the tool, and a sample output for a simple program.
Conclusion
I’ve only used just a limited subset of DynamoRIO’s API, but it seems for me that it is a powerfull tool for dynamic binary analysis.