Analyzing code reuse in binaries
Goal
Analyze code re-use in binaries to attribute unknown samples to families / threat actors. To do this we can obtain the list of functions in an executable and hash each function's opcodes using a fuzzy hash algorithm (kind of what Diaphora does). For the fuzzy hashing I will be using ssdeep, and for the opcode extraction r2 (and r2pipe).
Data preparation
First, we need some data to work on (hashes + samples). I will be using a QuantLoader sample set, as it is a rather simple malware.
With that in mind, we need to identify some landmark functions in a Quant