Natural Language to SQL · Built for Production
We translate business intent into SQL.
Most NL-to-SQL tools translate words into SQL. The difference is measurable.
01 The Problem
The Gap Between a Question and a Correct Query
Ask a generic AI assistant “What was our total revenue last quarter?” and it will write you SQL. It may even look correct. But it will almost certainly miss details that only your schema knows: that revenue is calculated as L_EXTENDEDPRICE × (1 − L_DISCOUNT), not just order totals; that “last quarter” must be anchored to today’s date dynamically; and that the query must join across LINEITEM and ORDERS — not read from a single denormalized view.
These are not edge cases. They are the rule in any production data warehouse. The gap between a syntactically valid query and a semantically correct one is where most NL-to-SQL tools fail.
“The problem isn’t that AI can’t write SQL. It’s that AI doesn’t know what your SQL means.”
nl2sql.pro closes that gap with two layers of structured knowledge: an ontology layer that defines what your business concepts mean, and a semantic layer of curated example queries that demonstrate how those concepts translate into correct SQL. Together, they give the language model the context it needs to reason accurately — not just generate plausibly.
02 Ontology Layer
What Business Concepts Mean
The Schema Metadata and Business Terms collections form the ontology layer of the system. Their purpose is to encode meaning — the relationship between how analysts ask questions and how the database actually stores data.
Schema Metadata
Beyond Column Names
Every column in DBT.DWH is stored with a rich description and business synonyms. When a user asks about “account balance,” the retrieval finds C_ACCTBAL. When they ask about “open orders,” the system knows O_ORDERSTATUS = 'O' is the correct filter.
Critically, the schema metadata encodes SCD Type 2 awareness: every query involving current customers automatically includes the correct history filter — the LLM does not need to infer this from column names alone.
Business Terms
Domain Logic as Structured Knowledge
Business terms define concepts that span multiple columns or require calculation. “Revenue” in this schema is not a stored column — it is computed as L_EXTENDEDPRICE × (1 − L_DISCOUNT) at the line item level.
That definition lives in the Business Terms collection and is retrieved whenever a revenue-related question arrives. The model is given the definition, the formula, and the table it applies to — before it writes a single character of SQL.
03 Semantic Layer
Example Queries as Institutional Memory
The Curated Queries collection is a library of validated NL → SQL pairs that represent proven analytical patterns for this specific schema. These are not AI-generated examples. They are curated, tested queries embedded and stored so they can be retrieved by similarity.
When a new question arrives, the system finds the most semantically similar example and uses it as a few-shot demonstration for the language model. A question requiring a five-table join with SCD filters is shown a working example of exactly that pattern — the join is demonstrated, not guessed.
When a new question matches a stored example above a confidence threshold of 88%, the system returns the curated SQL directly, validates it, and executes it — bypassing generation entirely. This is the highest-confidence path: the SQL was written and tested by a human.
Confidence Signals
Qdrant match ≥ 88% and validation passed. Curated query, human-verified.
Qdrant match 70–87% and validation passed. Review recommended before acting.
AI-generated or validation failed. Shown with a clear advisory.
04 Architecture
Why This Approach Produces Better Results
Standard text-to-SQL approaches give the language model a schema dump and a question, then expect it to produce correct SQL through reasoning alone. This works for simple schemas with obvious column names. It degrades quickly as schemas grow: more tables, more joins, derived metrics, history tracking, business-specific terminology.
The ontology + semantic layer approach inverts the problem. Instead of asking the model to discover schema semantics on its own, we encode those semantics explicitly in a vector database and retrieve only the relevant subset for each question. The model reasons over a small, curated, accurate context rather than a large, ambiguous one.
“The model’s job is synthesis, not discovery. We give it the facts; it writes the query.”
The result is accuracy that does not degrade as question complexity increases. And the system improves over time — every validated query can be added back to the example collection, making retrieval more precise and the trusted fast path more frequently available.
05 See It in Action
Watch the Demo
Ask Your Data a Question
Try the live demo on the TPC-H commerce schema. No SQL required, just a business question.
Try NL2SQL now