A newer version of the Gradio SDK is available:
6.1.0
title: Lineage Graph Accelerator
emoji: π₯
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: true
license: mit
short_description: AI data lineage extraction & export to data catalogs
tags:
- data-lineage
- mcp
- gradio
- data-governance
- dbt
- airflow
- etl
- mcp-in-action-track-productivity
- hackathon
Lineage Graph Accelerator π₯
AI-powered data lineage extraction and visualization for modern data platforms
π Built for the Gradio Agents & MCP Hackathon - Winter 2025 π
Celebrating MCP's 1st Birthday! This project demonstrates the power of MCP integration for enterprise data governance.
π What is Lineage Graph Accelerator?
Lineage Graph Accelerator is an AI-powered tool that helps data teams:
- Extract data lineage from dbt, Airflow, BigQuery, Snowflake, and more
- Visualize complex data dependencies with interactive Mermaid diagrams
- Export lineage to enterprise data catalogs (Collibra, Microsoft Purview, Alation)
- Integrate with MCP servers for enhanced AI-powered processing
Why Data Lineage Matters
Understanding where your data comes from and where it goes is critical for:
- Data Quality: Track data transformations and identify issues
- Compliance: Document data flows for GDPR, CCPA, and other regulations
- Impact Analysis: Understand downstream effects of schema changes
- Data Discovery: Help analysts find and trust data assets
π― Key Features
Multi-Source Support
| Source | Status | Description |
|---|---|---|
| dbt Manifest | β | Parse dbt's manifest.json for model dependencies |
| Airflow DAG | β | Extract task dependencies from DAG definitions |
| SQL DDL | β | Parse CREATE statements for table lineage |
| BigQuery | β | Query INFORMATION_SCHEMA for metadata |
| Custom JSON | β | Flexible node/edge format for any source |
| Snowflake | π | Coming via MCP integration |
Export to Data Catalogs
| Catalog | Status | Format |
|---|---|---|
| OpenLineage | β | Universal open standard |
| Collibra | β | Data Intelligence Platform |
| Microsoft Purview | β | Azure Data Governance |
| Alation | β | Data Catalog |
| Apache Atlas | π | Coming soon |
Visualization Options
- Mermaid Diagrams: Interactive, client-side rendering
- Subgraph Grouping: Organize by data layer (raw, staging, marts)
- Color-Coded Nodes: Distinguish sources, tables, models, reports
- Edge Labels: Show transformation types
π Quick Start
Try Online (HuggingFace Space)
- Visit Lineage Graph Accelerator on HuggingFace
- Click "Load Sample" to load example data
- Click "Extract Lineage" to see the visualization
- Explore the Demo Gallery for more examples
Run Locally
# Clone the repository
git clone https://github.com/YOUR_REPO/lineage-graph-accelerator.git
cd lineage-graph-accelerator
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the app
python app.py
Open http://127.0.0.1:7860 in your browser.
π Usage Guide
1. Text/File Metadata Tab
Paste your metadata directly:
{
"nodes": [
{"id": "source_db", "type": "source", "name": "Source Database"},
{"id": "staging", "type": "table", "name": "Staging Table"},
{"id": "analytics", "type": "table", "name": "Analytics Table"}
],
"edges": [
{"from": "source_db", "to": "staging"},
{"from": "staging", "to": "analytics"}
]
}
2. Sample Data
Load pre-built samples to explore different scenarios:
- Simple JSON: Basic node/edge lineage
- dbt Manifest: Full dbt project with 15+ models
- Airflow DAG: ETL pipeline with 15 tasks
- Data Warehouse: Snowflake-style multi-layer architecture
- ETL Pipeline: Complex multi-source pipeline
- Complex Demo: 50+ node e-commerce platform
3. Export to Data Catalogs
- Extract lineage from your metadata
- Expand "Export to Data Catalog"
- Select format (OpenLineage, Collibra, Purview, Alation)
- Click "Generate Export"
- Copy the JSON for import into your catalog
π MCP Integration
Connect to MCP (Model Context Protocol) servers for enhanced processing:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Lineage Graph ββββββΆβ MCP Server ββββββΆβ AI Model β
β Accelerator β β (HuggingFace) β β (Claude) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Configuration
- Expand "MCP Server Configuration" in the UI
- Enter your MCP server URL
- Add API key (if required)
- Click "Test Connection"
Run Local MCP Server
uvicorn mcp_example.server:app --reload --port 9000
Then use http://localhost:9000/mcp as your server URL.
ποΈ Architecture
flowchart TD
A[User Interface - Gradio] --> B[Input Parser]
B --> C{Source Type}
C -->|dbt| D[dbt Parser]
C -->|Airflow| E[Airflow Parser]
C -->|SQL| F[SQL Parser]
C -->|JSON| G[JSON Parser]
D & E & F & G --> H[LineageGraph]
H --> I[Mermaid Generator]
H --> J[Export Engine]
I --> K[Visualization]
J --> L[OpenLineage]
J --> M[Collibra]
J --> N[Purview]
J --> O[Alation]
subgraph Optional
P[MCP Server] --> H
end
Project Structure
lineage-graph-accelerator/
βββ app.py # Main Gradio application
βββ exporters/ # Data catalog exporters
β βββ __init__.py
β βββ base.py # Base classes
β βββ openlineage.py # OpenLineage format
β βββ collibra.py # Collibra format
β βββ purview.py # Microsoft Purview format
β βββ alation.py # Alation format
βββ samples/ # Sample data files
β βββ sample_metadata.json
β βββ dbt_manifest_sample.json
β βββ airflow_dag_sample.json
β βββ sql_ddl_sample.sql
β βββ warehouse_lineage_sample.json
β βββ etl_pipeline_sample.json
β βββ complex_lineage_demo.json
βββ mcp_example/ # Example MCP server
β βββ server.py
βββ tests/ # Unit tests
β βββ test_app.py
βββ memories/ # Agent configuration
βββ USER_GUIDE.md # Comprehensive user guide
βββ BUILD_PLAN.md # Development roadmap
βββ requirements.txt
π§ͺ Testing
# Activate virtual environment
source .venv/bin/activate
# Run unit tests
python -m unittest tests.test_app -v
# Run setup validation
python test_setup.py
π Requirements
- Python 3.9+
- Gradio 5.49.1+
- See
requirements.txtfor full dependencies
ποΈ Competition Submission
Track: Track 2 - MCP in Action (Productivity)
Team Members:
Judging Criteria Alignment
| Criteria | Implementation |
|---|---|
| UI/UX Design | Clean, professional interface with tabs, accordions, and color-coded visualizations |
| Functionality | Full MCP integration, multiple input formats, 5 export formats |
| Creativity | Novel approach to data lineage visualization with AI-powered parsing |
| Documentation | Comprehensive README, USER_GUIDE.md, inline comments |
| Real-world Impact | Solves critical enterprise need for data governance and compliance |
Demo Video
πΊ YouTube: Watch the Demo π₯ Loom: Alternative Link
Highlights:
- AI Assistant with Google Gemini generating lineage from natural language
- MCP Integration with Local Demo server
- Demo Gallery with 50+ node complex pipelines
- Export to Collibra, Purview, and Apache Atlas
- Interactive Mermaid visualizations with zoom and download
Social Media Post
π± LinkedIn: View the announcement post
π Roadmap
- Gradio 6 upgrade for enhanced UI components
- Agentic chatbot for natural language queries (Google Gemini)
- Apache Atlas export support
- File upload functionality
- Graph export as PNG/SVG
- Batch processing API
- Column-level lineage
π€ Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
See CONTRIBUTING.md for guidelines.
π License
MIT License - see LICENSE for details.
π Acknowledgments
- Anthropic - MCP Protocol and Claude
- Gradio Team - Amazing UI framework
- HuggingFace - Hosting and community
- dbt Labs - Inspiration for metadata standards
- OpenLineage - Open lineage specification
π Support
- Documentation: USER_GUIDE.md
- Author Website: aamanlamba.com
- Issues: GitHub Issues
- Discussion: HuggingFace Community
Built with β€οΈ by Aaman Lamba for the Gradio Agents & MCP Hackathon - Winter 2025
Celebrating MCP's 1st Birthday! π