Skip to content

Details

LLMs are data-hungry, and when it comes to source code, the text simply isn't enough to make large-scale inferences about a codebase. As we know, code has a unique structure and strict grammar, as well as dependencies and type information that must be deterministically resolved by a compiler. This information could be incredibly useful for AI but is invisible to the text of the source code.

For example, suppose you'll try to answer a simple question about where Guava is used or where a particular logging library is used. In that case, you’ll find that while uses can occur in the code, the code-as-text may not have a reference to the library you are looking for. Imagine a logger instance inherited as a protected field from a base class defined in a binary dependency. The import statement that identifies which logging library that logger is coming from is IN the binary dependency, not in the text of the call site. A human would do no better in this situation.

This talk addresses improving AI accuracy for large-scale code refactoring by enhancing the data source. We’ll explore a state-of-the-art code data model called the Lossless Semantic Tree (LST), which is part of the open-source OpenRewrite auto-refactoring project. We’re finding that the LST and recipes are amazingly easy tools to equip LLMs with the data they need to make accurate decisions.

The common excuse for inaccuracy or incompleteness in output is that LLMs will improve, but I think the models are quite good enough right now. What they too often lack is the data to make inferences. We’ll show why when evaluating LLMs for large-scale automated refactoring:

  • If it's based on text, you don't want it
  • If it's based on AST, you don't want it
Events in Atlanta, GA
AI Algorithms
Computer Programming
Open Source
Refactoring
Internet Professionals

Sponsors

Sponsor logo
IBM
Diamond Sponsor
Sponsor logo
Red Hat
Unobtanium Sponsor
Sponsor logo
Sonatype
Open Source Cafe Sponsor
Sponsor logo
VMware Tanzu
Platinum Sponsor
Sponsor logo
SUSE
Diamond Sponsor
Sponsor logo
Azul
Annual Sponsor
Sponsor logo
Neo4j
Annual Sponsor
Sponsor logo
Cloudbees
Annual Sponsor
Sponsor logo
Datadog
Annual Sponsor
Sponsor logo
Discover
Annual Sponsor
Sponsor logo
HeroDevs
Annual Sponsor
Sponsor logo
Heroku
Annual Sponsor
Sponsor logo
ICE
Annual Sponsor
Sponsor logo
JetBrains
Annual Sponsor
Sponsor logo
OutSystems
Annual Sponsor
Sponsor logo
Vaadin
Annual Sponsor
Sponsor logo
Webfor
Annual Sponsor

Members are also interested in