From Text to Tables

Turn messy text (invoices, emails, reports) into clean tables or JSON using a local LLM via Ollama.

Extraction to table

Overview

This project uses prompt-engineered extraction with a constrained schema to convert unstructured text into structured outputs. It validates fields, handles edge cases (missing/ambiguous values), and exports to CSV/Parquet for analytics.

Pipeline

  1. Source ingestion (PDF/TXT/HTML) → text normalization.
  2. Schema-guided LLM extraction (Ollama) → JSON rows.
  3. Validation & deduplication → tabular output.

Use Cases

Repository

GitHub – Data Extraction with Ollama LLM