Learning Execution Semantics from Micro-Traces for Binary Similarity
- Attended CS Colloquium on 2023-03-28.
- [2012.08680] Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity
- In Natural Language, semantics usually release info about surroundings
- In Binary, not so much.
- We can't do Dynamic Analysis due to Scalability problems.
- Teach AI approximate learning on binary code instructions and make mental executions.
- Dynamic Execution has coverage problems.
- So here, we set arbitrary registers to run small code. This is for pre-training purposes, not to make accurate extrapolations.
- Masked Language Modeling. Mask certain portions of binaries and make the AI guess.
- Questions may arise in Microexecutions and Logical reasoning. Employing Higher level programming language mapping is possible often, but not researched through.
- Now, fine-tune this to use static analysis.
- Can analyze Semantically Similar Binary Functions
- Function Dependencies and Signatures are also predictable
- Can find vulnerabilities
- Limitations: in-code only. Cannot find multimodal programming vulnerabilities. i.e., involving many interacting components.