Musing 18: NL2KQL: From Natural Language to Kusto Query
A paper out of Microsoft Research on using natural language and LLMs to do formal querying against a database
Today’s paper: NL2KQL: From Natural Language to Kusto Query. Tang et al. Apr 3 2024. https://arxiv.org/pdf/2404.02933.pdf
As data grows in both size and complexity, it's becoming increasingly difficult to manage, even for those skilled in query languages. While these languages offer powerful features, their complexity can be a barrier to many users. To lower these barriers and make data more accessible to a broader audience, researchers are focusing on developing Natural Language Interfaces for Databases (NLIDB). This technology allows users, whether they are tech-savvy or not, to interact with databases using natural language. SQL, being the most common query language, has seen numerous innovations in this area, with solutions evolving from parsing-based methods to the latest Neural Machine Translation techniques. Two key datasets driving this research, WikiSQL and Spider, have inspired nearly a thousand solutions that convert natural language queries into SQL.
In this paper, the authors consider a different language, called Kusto. The Kusto Query Language (KQL) is tailored for handling vast amounts of semi-structured information like logs, telemetry, and time-series data, typical in big data analytics environments. It's specifically built to manage data that doesn't conform to a single structure, often varying within the same dataset. KQL includes a comprehensive suite of functions and operators for dealing with text and time-series data, making it exceptionally capable of uncovering complex patterns. While these features make KQL a potent tool for querying, they also present challenges in translating natural language into KQL effectively.
The authors present an approach called NL2KQL for automatically converting natural language ‘queries’ into KQL. An architectural overview is shown below, and involves several contributions. It uses cluster-specific synthetic few-shot learning and a Semantic Data Catalog. The meanings behind these get more clear in the technical portion of the paper. However, the key point is that the paper offers new techniques for adapting to cluster-specific schemas and refining queries in real time. It also presents a strategy for generating a wide variety of NLQ-KQL pairs specifically designed for Kusto clusters. Another contribution, more related to experiments, is the release of the first benchmark for evaluating KQL generation, including a Semantic Data Catalog with 400 expert-crafted NLQ-KQL pairs across two Kusto clusters, categorized by difficulty (easy and hard).
Let’s dive briefly into some of the technical components, which I’ll try to break down into their essence:
Schema Refiner: Narrows down the database schema to its most relevant elements for a given query.
Few-shot Selector: Dynamically selects relevant examples from a few-shot dataset to aid in the translation process.
Query Refiner: Repairs syntactic and semantic errors in the generated KQL queries.
The details behind each of these seem complex, so the real question before you start to implement or dive into these with any specificity is how they perform experimentally. Before we get into that, I should note that the metrics for evaluating performance of automated NL2KQL is not the same as other problems you might be familiar with, such as accuracy. The authors make a contribution here by proposing a comprehensive set of metrics, both offline and online, to measure the effectiveness of the method, with potential implications beyond just KQL generation.
To get you a sense of how complex even these metrics can be, here’s a snapshot of the ‘offline’ metrics:
There are also three ‘online’ metrics. You don’t really need to understand all of these (unless of course, you’re doing research in this area), but the main point is that, just like accuracy, the higher the better.
Results on these metrics show that NL2KQL is very promising compared to plain use of LLMs, and other baselines. However, the problem is not ‘completely’ solved, especially on the Filter queries, which are more difficult. So database administrators are not (yet) out of a job.
Because this paper is coming out of an important research group, it’s always interesting to see what they’re saying (even if little) on planned future efforts. It occurs toward the end of the paper. The authors see LLMs eventually operating as ‘agent-like’ figures that are iterative and interactive, and correct errors in query formulation when they see them. A benign application would be a kind of ‘spell check’ for database administrators, but I suspect the technology is much more powerful than that, given the results we saw earlier.
The authors do acknowledge some challenges and drawbacks; for example, potential increase in response times, which could move interactions away from being real-time and negatively impact the user experience. Also, further delays could be caused by the inherently slow processing of decoder-only LLMs. Considering the importance of quick responses in the NL2KQL framework, keeping response times low is crucial in design decisions. The authors’ goal is to expand NL2KQL to include a chat-based text interface for users to ask follow-up questions and receive responses that consider additional context. This would greatly improves the usability of Kusto clusters for data analysts, allowing for more natural and intuitive interactions, but is currently too slow.
As we know from experience, however, improved speed and (exponential) improvements in hardware go hand in hand in applied computing, so it’s only a matter of time before this system is ripe for real-time use for the types of applications the authors are envisioning. On a more technical front, it would be interesting to know what other kinds of difficult queries there are where LLM-powered systems like these have trouble, and why such queries are difficult. Is it something fundamental, or just a matter of prompt engineering (which would eventually be discovered)? And how would such systems handle ambiguous, under-specified queries? These are all interesting fronts for future technical research in this very exciting field of using NL to formally query databases, minus the burden of taking a database course.