codemining-* is a suite of Java-based tools for tokenizing, parsing and analyzing Java code. The repository also contains code to analyze Git-based repositories.

  • codeminining-core contains code for tokenizing Java, JavaScript, Python, C and C++ in the JVM.
  • codemining-treelm contains Java AST parsing and tree-level language models. The most interesting feature is that ASTs are converted to a uniform data format (useful for machine learning algorithms) allowing roundtrip generation.
  • commitmining-tools contains tools for traversing a Git repository, its history and possibly its files.



A small Python library that uses git tags to record the exact state of the code once a machine learning experiment starts with the results. The goal of the library is to encourage reproducible experimentation in machine learning. For more information visit the GitHub page of the project.