Exploring GNN Based Program Embedding Technologies for Binary Related Tasks

Document Type

Conference Proceeding

Publication Date



With the rapid growth of program scale, program analysis, mainte-nance and optimization become increasingly diverse and complex. Applying learning-assisted methodologies onto program analysis has attracted ever-increasing attention. However, a large number of program factors including syntax structures, semantics, running platforms and compilation configurations block the effective re-alization of these methods. To overcome these obstacles, existing works prefer to be on a basis of source code or abstract syntax tree, but unfortunately are sub-optimal for binary-oriented analysis tasks closely related to the compilation process. To this end, we propose a new program analysis approach that aims at solving program-level and procedure-level tasks with one model, by taking advantage of the great power of graph neural networks from the level of binary code. By fusing the semantics of control flow graphs, data flow graphs and call graphs into one model, and embedding instructions and values simultaneously, our method can effectively work around emerging compilation-related problems. By testing the proposed method on two tasks, binary similarity detection and dead store prediction, the results show that our method is able to achieve as high accuracy as 83.25%, and 82.77%.

Publication Title

IEEE International Conference on Program Comprehension