Machine-learning-guided typestate analysis for static use-After-free detection

Publication Type:
Conference Proceeding
Citation:
ACM International Conference Proceeding Series, 2017, Part F132521 pp. 42 - 54
Issue Date:
2017-12-04
Filename Description Size
acsac17.pdfPublished version1.33 MB
Adobe PDF
Full metadata record
Typestate analysis relies on pointer analysis for detecting temporal memory safety errors, such as use-After-free (UAF). For large programs, scalable pointer analysis is usually imprecise in analyzing their hard "corner cases", such as infeasible paths, recursion cycles, loops, arrays, and linked lists. Due to a sound over-Approximation of the points-To information, a large number of spurious aliases will be reported conservatively, causing the corresponding typestate analysis to report a large number of false alarms. Thus, the usefulness of typestate analysis for heap-intensive clients, like UAF detection, becomes rather limited, in practice. We introduce Tac, a static UAF detector that bridges the gap between typestate and pointer analyses by machine learning. Tac learns the correlations between program features and UAF-related aliases by using a Support Vector Machine (SVM) and applies this knowledge to further disambiguate the UAF-related aliases reported imprecisely by the pointer analysis so that only the ones validated by its SVM classifier are further investigated by the typestate analysis. Despite its unsoundness, Tac represents a practical typestate analysis approach for UAF detection. We have implemented Tac in LLVM-3.8.0 and evaluated it using a set of eight open-source C/C++ programs. The results show that Tac is effective (in terms of finding 5 known CVE vulnerabilities, 1 known bug, and 8 new bugs with a low false alarm rate) and scalable (in terms of analyzing a large codebase with 2,098 KLOC in just over 4 hours).
Please use this identifier to cite or link to this item: