Natural Language to SQL  ·  Built for Production

We translate business intent into SQL.

Most NL-to-SQL tools translate words into SQL. The difference is measurable.

NL2SQL architecture — from natural language question through intent analysis, knowledge retrieval, SQL generation, validation, to business answer

01   The Problem

The Gap Between a Question and a Correct Query

Ask a generic AI assistant “What was our total revenue last quarter?” and it will write you SQL. It may even look correct. But it will almost certainly miss details that only your schema knows: that revenue is calculated as L_EXTENDEDPRICE × (1 − L_DISCOUNT), not just order totals; that “last quarter” must be anchored to today’s date dynamically; and that the query must join across LINEITEM and ORDERS — not read from a single denormalized view.

These are not edge cases. They are the rule in any production data warehouse. The gap between a syntactically valid query and a semantically correct one is where most NL-to-SQL tools fail.

“The problem isn’t that AI can’t write SQL. It’s that AI doesn’t know what your SQL means.”

nl2sql.pro closes that gap with two layers of structured knowledge: an ontology layer that defines what your business concepts mean, and a semantic layer of curated example queries that demonstrate how those concepts translate into correct SQL. Together, they give the language model the context it needs to reason accurately — not just generate plausibly.

02   Ontology Layer

What Business Concepts Mean

The Schema Metadata and Business Terms collections form the ontology layer of the system. Their purpose is to encode meaning — the relationship between how analysts ask questions and how the database actually stores data.

Schema Metadata

Beyond Column Names

Every column in DBT.DWH is stored with a rich description and business synonyms. When a user asks about “account balance,” the retrieval finds C_ACCTBAL. When they ask about “open orders,” the system knows O_ORDERSTATUS = 'O' is the correct filter.

Critically, the schema metadata encodes SCD Type 2 awareness: every query involving current customers automatically includes the correct history filter — the LLM does not need to infer this from column names alone.

Business Terms

Domain Logic as Structured Knowledge

Business terms define concepts that span multiple columns or require calculation. “Revenue” in this schema is not a stored column — it is computed as L_EXTENDEDPRICE × (1 − L_DISCOUNT) at the line item level.

That definition lives in the Business Terms collection and is retrieved whenever a revenue-related question arrives. The model is given the definition, the formula, and the table it applies to — before it writes a single character of SQL.

03   Semantic Layer

Example Queries as Institutional Memory

The Curated Queries collection is a library of validated NL → SQL pairs that represent proven analytical patterns for this specific schema. These are not AI-generated examples. They are curated, tested queries embedded and stored so they can be retrieved by similarity.

When a new question arrives, the system finds the most semantically similar example and uses it as a few-shot demonstration for the language model. A question requiring a five-table join with SCD filters is shown a working example of exactly that pattern — the join is demonstrated, not guessed.

When a new question matches a stored example above a confidence threshold of 88%, the system returns the curated SQL directly, validates it, and executes it — bypassing generation entirely. This is the highest-confidence path: the SQL was written and tested by a human.

Confidence Signals

Trusted

Qdrant match ≥ 88% and validation passed. Curated query, human-verified.

Semi-Trusted

Qdrant match 70–87% and validation passed. Review recommended before acting.

Untrusted

AI-generated or validation failed. Shown with a clear advisory.

04   Architecture

Why This Approach Produces Better Results

Standard text-to-SQL approaches give the language model a schema dump and a question, then expect it to produce correct SQL through reasoning alone. This works for simple schemas with obvious column names. It degrades quickly as schemas grow: more tables, more joins, derived metrics, history tracking, business-specific terminology.

The ontology + semantic layer approach inverts the problem. Instead of asking the model to discover schema semantics on its own, we encode those semantics explicitly in a vector database and retrieve only the relevant subset for each question. The model reasons over a small, curated, accurate context rather than a large, ambiguous one.

“The model’s job is synthesis, not discovery. We give it the facts; it writes the query.”

The result is accuracy that does not degrade as question complexity increases. And the system improves over time — every validated query can be added back to the example collection, making retrieval more precise and the trusted fast path more frequently available.

05   See It in Action

Watch the Demo

Ask Your Data a Question

Try the live demo on the TPC-H commerce schema. No SQL required, just a business question.

Try NL2SQL now