NL2SQL is a platform that translates natural language business questions into semantically correct SQL queries. Unlike generic AI tools, it uses a RAG pipeline with an ontology layer and curated query matching to understand what your data actually means.

How is NL2SQL different from ChatGPT or other AI SQL generators?

Generic AI tools translate words into SQL syntax. NL2SQL translates business intent — it knows that "revenue" means L_EXTENDEDPRICE × (1 − L_DISCOUNT), that "last quarter" must be dynamically anchored to today, and which tables to join. This domain knowledge comes from a vector knowledge base of schema metadata, business terms, and curated query examples.

What databases does NL2SQL support?

NL2SQL currently targets Snowflake data warehouses. The demo uses a TPC-H commerce schema (DBT.DWH) with tables like LINEITEM, ORDERS, CUSTOMER, PART, SUPPLIER, NATION, and REGION.

What are the confidence levels in NL2SQL results?

Every query result includes a confidence signal: Trusted (≥88% match to a curated query), Semi-Trusted (70–87% match, review recommended), or Untrusted (AI-generated or validation failed). This helps users know when to rely on results directly and when to review the SQL.

Pixels vs. Numbers: PDF Charts Your RAG Can Query

Every PDF pipeline eventually meets the same wall: the chart. A quarterly revenue bar graph is trivial for a human - four bars, four labels, four numbers - but for most document parsers it is a dead end. The parser sees a rectangle of pixels, crops it out, drops it into a folder, and moves on. Your RAG system is left holding a .jpg. It cannot answer "what was Q1 2026 revenue?" because that number was never turned into text - it stayed locked inside the image.

I ran the same investor presentation through two of the best open-source document parsers available today - MinerU (OpenDataLab) and Docling (IBM, with the Granite Vision chart-extraction model) - to see exactly where they diverge. The headline result is not subtle. Faced with an identical "Revenue ($M)" bar chart, one tool gave me a picture, and the other gave me a table I can query.

This article walks through why, step by step.

The test: one chart, two parsers

The source is a single slide from SMR-1Q26-Presentation.pdf - a navy-blue bar chart titled Revenue ($M) with four bars:

Revenue ($M)

Q2 2025 → $8.1
Q3 2025 → $8.2
Q4 2025 → $1.8
Q1 2026 → $0.6

What MinerU produced

MinerU correctly located the chart and even classified it. Here is the exact record from its _content_list.json:

{
  "type": "chart",
  "img_path": "images/58bd777e...db196463.jpg",
  "content": "",
  "chart_caption": [],
  "chart_footnote": [],
  "bbox": [572, 148, 944, 885],
  "page_idx": 9
}

Look closely at two fields. "type": "chart" - MinerU knows it is a chart. And "content": "" - but it extracted nothing from it. The output Markdown reflects this:

Liquidity & Capital Resources ($M)
![](images/58bd777e...db196463.jpg)

The chart is now a 42 KB JPEG. The numbers $8.1, $8.2, $1.8, $0.6 exist only as colored pixels inside that image. To a downstream LLM, this region is opaque.

See it in the published repo: the cropped chart JPEG, MinerU's full Markdown output, and the _content_list.json record.

What Docling + Granite Vision produced

From the identical slide, Docling emitted this directly into the Markdown - no image, no OCR guesswork:

Bar chart

| Quarter   |   Value |
|-----------|---------|
| Q2 2025   |     8.1 |
| Q3 2025   |     8.2 |
| Q4 2025   |     1.8 |
| Q1 2026   |     0.6 |

Straight from the published Docling output — this Revenue table lives at lines 248–255, and the same values are also written out as a machine-readable charts CSV.

Every value matches the slide exactly. The chart is now structured data. A RAG system can read line 255 - | Q1 2026 | 0.6 | - and answer the revenue question without ever touching an image model.

Same PDF. Same chart. One produced a photograph; the other produced a spreadsheet.

Step by step: why the outputs differ

The difference is not accuracy or model size - it is pipeline design. Each tool makes a deliberate decision about what to do when it meets a figure.

Step 1 - Layout detection (both tools do this)

Both parsers start by segmenting the page into regions: text blocks, titles, tables, and figures. On this run MinerU used its PP-DocLayoutV2 layout detector, with paddleocr_torch for text, unimernet_hf_small_2503 for formulas, and SlanetPlus/UnetStructure for tables; Docling ran its docling-layout-heron layout model with TableFormer for tables. At this stage, both tools agree: "there is a figure here, at this bounding box." So far, identical.

Step 2 - Figure classification (the fork in the road)

Here the paths split.

MinerU tags the region with a type - and in newer versions it even distinguishes chart from a plain image. But classification is where its visual understanding ends. There is no downstream model whose job is to read the chart's data. So the region is cropped, saved as a JPEG, and content is left empty.
Docling, when you enable do_chart_extraction, automatically enables picture classification as a gate. Each detected picture is classified - bar chart, pie chart, line chart, or "not a chart." A positive classification doesn't end the process; it triggers the next stage.

This is the crucial architectural insight: for MinerU, "chart" is a label. For Docling, "chart" is a routing decision that sends the image to a specialist model.

Step 3 - Chart-to-data reconstruction (only Docling has this)

When Docling classifies a picture as a chart, it hands the cropped image to a dedicated IBM Granite Vision model. On the machine used for this test, Docling's default chart-extraction kind is GRANITE_VISION_V4, which loads ibm-granite/granite-vision-4.1-4b - a 4-billion-parameter vision-language model - and drives it with a <chart2csv> task token. (Docling also ships the lighter granite-vision-3.3-2b-chart2csv-preview, which is steered with the natural-language prompt "Convert the information in this chart into a data table in CSV format.")

Either way, the model was trained on code-generated synthetic charts, so it has learned to map bar heights, axis ticks, data labels, and category names back into rows and columns. It outputs structured CSV - not a caption, not a description, but the actual numbers. The supported chart types are bar_chart, pie_chart, and line_chart.

MinerU has no equivalent stage. Its VLM and OCR models are tuned to read text and formulas, not to reverse-engineer a plot's data. There is simply no chart2csv step in its graph, so a chart can never become a table.

Step 4 - Serialization into the document

Docling takes the model's tabular output and threads it back into the document object as a real table. In code, the extracted data lives on the picture item itself:

for item, _level in doc.iterate_items():
    if isinstance(item, PictureItem) and item.meta and item.meta.tabular_chart:
        chart_type = item.meta.classification.get_main_prediction().class_name
        table = item.meta.tabular_chart.chart_data
        grid = [[""] * table.num_cols for _ in range(table.num_rows)]
        for cell in table.table_cells:
            grid[cell.start_row_offset_idx][cell.start_col_offset_idx] = cell.text
        # grid is now [["Q2 2025","8.1"], ["Q3 2025","8.2"], ...]

That grid is what becomes the Markdown table - and, if you want, a CSV or an Excel sheet. MinerU's serializer, by contrast, has only an empty content string and an image path to write, so it emits an ![](...) link.

Putting it in a table

Stage	MinerU	Docling + Granite Vision
Locate the figure	✅ Yes	✅ Yes
Classify as "chart"	✅ Yes (`type: chart`)	✅ Yes (as a routing gate)
Read chart data	❌ None (`content: ""`)	✅ `granite-vision-4.1-4b` (`<chart2csv>`)
Output for this chart	A `.jpg` image crop	A Markdown/CSV data table
RAG can query the values	❌ No (pixels)	✅ Yes (text)

Why this matters for RAG

A retrieval pipeline can only retrieve what was turned into text. When MinerU emits a chart as an image, the four most important numbers on that slide vanish from the searchable corpus. Your vector index never embeds "$0.6M Q1 2026 revenue," because that string was never produced. The user asks the question, the retriever finds the surrounding prose, and the model either says "I don't know" or - worse - hallucinates a figure.

Docling closes that gap. By reconstructing the chart into | Q1 2026 | 0.6 |, it makes the visual data a first-class, embeddable, retrievable citizen of the document. The chart stops being "dark matter" and becomes part of the knowledge base.

The honest caveats

This is not a claim that Docling is universally better - it's a claim that it does something MinerU structurally does not. A few things to keep in mind:

Chart-to-data is probabilistic. Granite Vision reconstructs values from pixels; on unconventional or cluttered charts it can misread a bar or invent a row. Always treat extracted chart tables as high-quality estimates, not audited figures.
It costs compute. The chart2csv stage is an extra 4-billion-parameter VLM pass per chart - meaningfully slower, and it really wants a GPU.
MinerU is excellent at what it targets. For dense text, formulas (→ LaTeX), and tables (→ HTML) across 109 languages, MinerU is fast and strong. If your documents are text- and equation-heavy rather than chart-heavy, the missing chart2csv step may not matter to you at all.

Conclusion

The same PDF, the same bar chart, two philosophies. MinerU treats a chart as an object to preserve - it finds it, labels it, and faithfully saves the picture. Docling, armed with Granite Vision, treats a chart as data to recover - it finds it, routes it to a specialist model, and reconstructs the numbers behind the bars.

For document archival, MinerU's image-faithful output is perfectly reasonable. But for any pipeline whose whole purpose is to make a document answerable - RAG, financial analysis, research aggregation - the difference between 58bd777e....jpg and | Q1 2026 | 0.6 | is the difference between a parser that sees your charts and one that actually reads them.

Appendix: exactly what ran on this machine

For reproducibility, here is the precise stack behind the two outputs above. Both ran locally on an NVIDIA GeForce RTX 5070 (Blackwell, sm_120) with PyTorch 2.12.1+cu130 (CUDA 13.0), on the same source file SMR-1Q26-Presentation.pdf.

Docling (chart → table):

docling 2.107.0, docling-core 2.85.0, docling-ibm-models 3.13.3, transformers 5.8.1
Layout: docling-project/docling-layout-heron
Tables: docling-project/docling-models (TableFormer)
Figure/chart gate: docling-project/DocumentFigureClassifier-v2.5
Chart-to-CSV: ibm-granite/granite-vision-4.1-4b (default kind GRANITE_VISION_V4, <chart2csv> task token)
Pipeline flags: do_chart_extraction=True (which auto-enables picture classification)

MinerU (chart → image):

MinerU 3.4.0, hybrid backend, effort medium, OCR disabled (_ocr_enable: false) - from the run's _middle.json
VLM: opendatalab/MinerU2.5-Pro-2605-1.2B
Pipeline models (opendatalab/PDF-Extract-Kit-1.0, ONNX): layout PP-DocLayoutV2, formula unimernet_hf_small_2503, OCR paddleocr_torch, table classify paddle_table_cls, table structure SlanetPlus / UnetStructure
The chart region was recorded in _content_list.json as "type": "chart" with "content": "" and an img_path to the cropped .jpg.