Epicure: Distilling Sequence Model Predictions into Patterns

Miltiadis Allamanis, Earl T. Barr. 2023
[ArXiV]  

TLDR: Distill model predictions into interpretable patterns that can be used for anomaly detection.

Most machine learning models predict a probability distribution over concrete outputs and struggle to accurately predict names over high entropy sequence distributions. Here, we explore finding abstract, high-precision patterns intrinsic to these predictions in order to make abstract predictions that usefully capture rare sequences. In this short paper, we present Epicure, a method that distils the predictions of a sequence model, such as the output of beam search, into simple patterns. Epicure maps a model’s predictions into a lattice that represents increasingly more general patterns that subsume the concrete model predictions.

On the tasks of predicting a descriptive name of a function given the source code of its body and detecting anomalous names given a function, we show that Epicure yields accurate naming patterns that match the ground truth more often compared to just the highest probability model prediction. For a false alarm rate of 10%, Epicure predicts patterns that match 61% more ground-truth names compared to the best model prediction, making Epicure well-suited for scenarios that require high precision.