[pdf] [website] [IEEEXplore]
To write code, developers stitch together patterns, like API protocols or data
structure traversals. Discovering these patterns can identify
inconsistencies in code or opportunities to replace these patterns with an
API or a language construct. We present coiling, a
technique for automatically mining code for semantic idioms: surprisingly
probable, semantic patterns. We specialize coiling for loop idioms, semantic
idioms of loops. First, we show that automatically identifiable
patterns exist, in great numbers, with a large-scale empirical study of loops
over 25MLOC. We find that most loops in this corpus are simple and predictable:
90% have fewer than 15LOC and 90% have no nesting and very simple control.
Encouraged by this result, we then mine loop idioms over a second, buildable corpus.
Over this corpus, we show that only 50 loop idioms cover
50% of the concrete loops. Our framework opens the
door to data-driven tool and language design, discovering opportunities to
introduce new API calls and language constructs. Loop idioms show that LINQ
would benefit from an Enumerate
operator. This can be confirmed by the
exitence of a StackOverflow question with 542k views that requests
precisely this feature.