File size: 10,442 Bytes
5bb0a78 0510038 5bb0a78 0510038 5bb0a78 0510038 5bb0a78 0510038 48c82ae 5bb0a78 66a4b03 f12921d 0510038 f12921d b304992 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 f12921d 0510038 b304992 0fb81b1 d71b95a 0510038 d71b95a 0510038 b304992 0510038 d71b95a 0510038 d71b95a 418445b d71b95a 0510038 d71b95a 1c8f3f8 0510038 d71b95a b304992 ffe0724 b304992 0510038 b304992 0510038 0fb81b1 0510038 f12921d 0510038 0fb81b1 0510038 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 |
---
title: Lineage Graph Accelerator
emoji: π₯
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: true
license: mit
short_description: AI data lineage extraction & export to data catalogs
tags:
- data-lineage
- mcp
- gradio
- data-governance
- dbt
- airflow
- etl
- mcp-in-action-track-productivity
- hackathon
---
# Lineage Graph Accelerator π₯
**AI-powered data lineage extraction and visualization for modern data platforms**
[](https://huggingface.co/spaces/aamanlamba/Lineage-graph-accelerator)
[](https://opensource.org/licenses/MIT)
[](https://gradio.app)
> π **Built for the Gradio Agents & MCP Hackathon - Winter 2025** π
>
> Celebrating MCP's 1st Birthday! This project demonstrates the power of MCP integration for enterprise data governance.
---
## π What is Lineage Graph Accelerator?
Lineage Graph Accelerator is an AI-powered tool that helps data teams:
- **Extract** data lineage from dbt, Airflow, BigQuery, Snowflake, and more
- **Visualize** complex data dependencies with interactive Mermaid diagrams
- **Export** lineage to enterprise data catalogs (Collibra, Microsoft Purview, Alation)
- **Integrate** with MCP servers for enhanced AI-powered processing
### Why Data Lineage Matters
Understanding where your data comes from and where it goes is critical for:
- **Data Quality**: Track data transformations and identify issues
- **Compliance**: Document data flows for GDPR, CCPA, and other regulations
- **Impact Analysis**: Understand downstream effects of schema changes
- **Data Discovery**: Help analysts find and trust data assets
---
## π― Key Features
### Multi-Source Support
| Source | Status | Description |
|--------|--------|-------------|
| dbt Manifest | β
| Parse dbt's manifest.json for model dependencies |
| Airflow DAG | β
| Extract task dependencies from DAG definitions |
| SQL DDL | β
| Parse CREATE statements for table lineage |
| BigQuery | β
| Query INFORMATION_SCHEMA for metadata |
| Custom JSON | β
| Flexible node/edge format for any source |
| Snowflake | π | Coming via MCP integration |
### Export to Data Catalogs
| Catalog | Status | Format |
|---------|--------|--------|
| OpenLineage | β
| Universal open standard |
| Collibra | β
| Data Intelligence Platform |
| Microsoft Purview | β
| Azure Data Governance |
| Alation | β
| Data Catalog |
| Apache Atlas | π | Coming soon |
### Visualization Options
- **Mermaid Diagrams**: Interactive, client-side rendering
- **Subgraph Grouping**: Organize by data layer (raw, staging, marts)
- **Color-Coded Nodes**: Distinguish sources, tables, models, reports
- **Edge Labels**: Show transformation types
---
## π Quick Start
### Try Online (HuggingFace Space)
1. Visit [Lineage Graph Accelerator on HuggingFace](https://huggingface.co/spaces/YOUR_SPACE)
2. Click "Load Sample" to load example data
3. Click "Extract Lineage" to see the visualization
4. Explore the Demo Gallery for more examples
### Run Locally
```bash
# Clone the repository
git clone https://github.com/YOUR_REPO/lineage-graph-accelerator.git
cd lineage-graph-accelerator
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the app
python app.py
```
Open http://127.0.0.1:7860 in your browser.
---
## π Usage Guide
### 1. Text/File Metadata Tab
Paste your metadata directly:
```json
{
"nodes": [
{"id": "source_db", "type": "source", "name": "Source Database"},
{"id": "staging", "type": "table", "name": "Staging Table"},
{"id": "analytics", "type": "table", "name": "Analytics Table"}
],
"edges": [
{"from": "source_db", "to": "staging"},
{"from": "staging", "to": "analytics"}
]
}
```
### 2. Sample Data
Load pre-built samples to explore different scenarios:
- **Simple JSON**: Basic node/edge lineage
- **dbt Manifest**: Full dbt project with 15+ models
- **Airflow DAG**: ETL pipeline with 15 tasks
- **Data Warehouse**: Snowflake-style multi-layer architecture
- **ETL Pipeline**: Complex multi-source pipeline
- **Complex Demo**: 50+ node e-commerce platform
### 3. Export to Data Catalogs
1. Extract lineage from your metadata
2. Expand "Export to Data Catalog"
3. Select format (OpenLineage, Collibra, Purview, Alation)
4. Click "Generate Export"
5. Copy the JSON for import into your catalog
---
## π MCP Integration
Connect to MCP (Model Context Protocol) servers for enhanced processing:
```
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Lineage Graph ββββββΆβ MCP Server ββββββΆβ AI Model β
β Accelerator β β (HuggingFace) β β (Claude) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
```
### Configuration
1. Expand "MCP Server Configuration" in the UI
2. Enter your MCP server URL
3. Add API key (if required)
4. Click "Test Connection"
### Run Local MCP Server
```bash
uvicorn mcp_example.server:app --reload --port 9000
```
Then use `http://localhost:9000/mcp` as your server URL.
---
## ποΈ Architecture
```mermaid
flowchart TD
A[User Interface - Gradio] --> B[Input Parser]
B --> C{Source Type}
C -->|dbt| D[dbt Parser]
C -->|Airflow| E[Airflow Parser]
C -->|SQL| F[SQL Parser]
C -->|JSON| G[JSON Parser]
D & E & F & G --> H[LineageGraph]
H --> I[Mermaid Generator]
H --> J[Export Engine]
I --> K[Visualization]
J --> L[OpenLineage]
J --> M[Collibra]
J --> N[Purview]
J --> O[Alation]
subgraph Optional
P[MCP Server] --> H
end
```
### Project Structure
```
lineage-graph-accelerator/
βββ app.py # Main Gradio application
βββ exporters/ # Data catalog exporters
β βββ __init__.py
β βββ base.py # Base classes
β βββ openlineage.py # OpenLineage format
β βββ collibra.py # Collibra format
β βββ purview.py # Microsoft Purview format
β βββ alation.py # Alation format
βββ samples/ # Sample data files
β βββ sample_metadata.json
β βββ dbt_manifest_sample.json
β βββ airflow_dag_sample.json
β βββ sql_ddl_sample.sql
β βββ warehouse_lineage_sample.json
β βββ etl_pipeline_sample.json
β βββ complex_lineage_demo.json
βββ mcp_example/ # Example MCP server
β βββ server.py
βββ tests/ # Unit tests
β βββ test_app.py
βββ memories/ # Agent configuration
βββ USER_GUIDE.md # Comprehensive user guide
βββ BUILD_PLAN.md # Development roadmap
βββ requirements.txt
```
---
## π§ͺ Testing
```bash
# Activate virtual environment
source .venv/bin/activate
# Run unit tests
python -m unittest tests.test_app -v
# Run setup validation
python test_setup.py
```
---
## π Requirements
- Python 3.9+
- Gradio 5.49.1+
- See `requirements.txt` for full dependencies
---
## ποΈ Competition Submission
**Track**: Track 2 - MCP in Action (Productivity)
**Team Members**:
- [Aaman Lamba](https://aamanlamba.com) | [HuggingFace](https://huggingface.co/aamanlamba) | [GitHub](https://github.com/aamanlamba)
### Judging Criteria Alignment
| Criteria | Implementation |
|----------|----------------|
| **UI/UX Design** | Clean, professional interface with tabs, accordions, and color-coded visualizations |
| **Functionality** | Full MCP integration, multiple input formats, 5 export formats |
| **Creativity** | Novel approach to data lineage visualization with AI-powered parsing |
| **Documentation** | Comprehensive README, USER_GUIDE.md, inline comments |
| **Real-world Impact** | Solves critical enterprise need for data governance and compliance |
### Demo Video
πΊ **YouTube**: [Watch the Demo](https://youtu.be/U4Dfc7txa_0)
π₯ **Loom**: [Alternative Link](https://www.loom.com/share/3de27e88e01f4e97bfd13e4f0031f416)
**Highlights**:
- AI Assistant with Google Gemini generating lineage from natural language
- MCP Integration with Local Demo server
- Demo Gallery with 50+ node complex pipelines
- Export to Collibra, Purview, and Apache Atlas
- Interactive Mermaid visualizations with zoom and download
### Social Media Post
π± **LinkedIn**: [View the announcement post](https://www.linkedin.com/posts/aamanlamba_lineage-graph-accelerator-a-hugging-face-activity-7400658296166297600-n9a6)
---
## π Roadmap
- [x] Gradio 6 upgrade for enhanced UI components
- [x] Agentic chatbot for natural language queries (Google Gemini)
- [x] Apache Atlas export support
- [ ] File upload functionality
- [x] Graph export as PNG/SVG
- [ ] Batch processing API
- [ ] Column-level lineage
---
## π€ Contributing
Contributions welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
---
## π License
MIT License - see [LICENSE](LICENSE) for details.
---
## π Acknowledgments
- **Anthropic** - MCP Protocol and Claude
- **Gradio Team** - Amazing UI framework
- **HuggingFace** - Hosting and community
- **dbt Labs** - Inspiration for metadata standards
- **OpenLineage** - Open lineage specification
---
## π Support
- **Documentation**: [USER_GUIDE.md](USER_GUIDE.md)
- **Author Website**: [aamanlamba.com](https://aamanlamba.com)
- **Issues**: [GitHub Issues](https://github.com/aamanlamba/lineage-graph-accelerator/issues)
- **Discussion**: [HuggingFace Community](https://huggingface.co/spaces/aamanlamba/Lineage-graph-accelerator/discussions)
---
<p align="center">
Built with β€οΈ by <a href="https://aamanlamba.com"><strong>Aaman Lamba</strong></a> for the <strong>Gradio Agents & MCP Hackathon - Winter 2025</strong>
<br>
Celebrating MCP's 1st Birthday! π
</p>
|