Spaces:

aamanlamba
/

Lineage-graph-accelerator

Sleeping

App Files Files Community

aamanlamba commited on 27 days ago

Commit

d71b95a

1 Parent(s): cd3cd19

updated tests

Browse files

Files changed (5) hide show

LINKEDIN_POST.md +116 -0
README.md +19 -12
app.py +110 -6
mcp_example/server.py +31 -0
samples/sample_bigquery.sql +5 -0

LINKEDIN_POST.md ADDED Viewed

	@@ -0,0 +1,116 @@

+# LinkedIn Post - Lineage Graph Accelerator #MCP-1st-Birthday
+## Post Text (Ready to Share)
+Excited to share **Lineage Graph Accelerator** 🔥 — an AI-powered agent I've built as a submission for the **#MCP-1st-Birthday celebration**!
+🎯 **What it does:**
+Extract and visualize complex data lineage from multiple sources (BigQuery, dbt, Airflow, APIs) in seconds. Transform metadata into clear, interactive Mermaid diagrams that help teams understand data relationships and dependencies.
+🏗️ **Architecture Highlights:**
+- Modular agent-based design with sub-agents for parsing, visualization, and integrations
+- Built with **Langsmith's Agent Builder** for flexible orchestration
+- **Gradio** UI for interactive exploration
+- Client-side **Mermaid** rendering for instant visual feedback
+✨ **Key Features:**
+- Multi-source metadata ingestion (Text, BigQuery, URLs/APIs)
+- AI-assisted relationship extraction
+- Dynamic graph visualization (Mermaid + DOT support)
+- Lightweight, extensible, and fully tested
+📦 **Try it now:** Live demo available on Hugging Face Spaces (link in comments)
+This project showcases how modular AI agents can tackle real-world data challenges. Special thanks to **@Langsmith**, **@Gradio**, and the **@HuggingFace** teams for the amazing tools!
+🔗 **Learn more:** Check out the repo for quickstart guide, unit tests, and integration examples.
+---
+## Hashtags & Tags
+### Primary Tags (LinkedIn):
+#MCP-1st-Birthday
+#MCP
+#ModelContextProtocol
+### Technology Tags:
+#Gradio
+#HuggingFace
+#LangSmith
+#DataLineage
+#DataEngineering
+#AI
+#Agents
+#Python
+### Topic Tags:
+#DataViz
+#MetadataManagement
+#BigQuery
+#dbt
+#Airflow
+#DataGovernance
+### Community Tags:
+#OpenSource
+#Developers
+#TechInnovation
+---
+## Social Media Context
+**For LinkedIn Comment (link to live demo):**
+"Live demo: [Hugging Face Spaces URL]"
+**For GitHub/Repo Share:**
+"Open source project ready for contributions! Check the README for setup, testing, and integration guides. 🚀"
+**For Twitter/X (condensed version):**
+"Just released Lineage Graph Accelerator 🔥—an AI-powered agent for visualizing data lineage across BigQuery, dbt, Airflow & more. Built with Langsmith + Gradio. Submitted for #MCP-1st-Birthday 🎉 #DataEngineering #OpenSource"
+---
+## Suggested Mentions (Tag in comments or separate posts):
+1. @LangSmith / @LangChain (for Agent Builder inspiration)
+2. @Gradio (for the UI framework)
+3. @Hugging Face (for Spaces hosting)
+4. @MCP-1st-Birthday (Hugging Face organization)
+5. @dbt-labs (if integrating dbt features)
+6. @BigQuery team (for metadata integration)
+---
+## Optional Follow-up Post (48 hours later):
+"Week 1 update on Lineage Graph Accelerator 🔥:
+- [X] Deployed to Hugging Face Spaces
+- [X] Full test coverage + CI/CD ready
+- [ ] Community feedback & PRs
+- [ ] dbt integration alpha
+- [ ] Snowflake connector
+Open source projects thrive on collaboration—if you're interested in data lineage, metadata management, or AI agents, I'd love your input! 👇"
+---
+## Tips for Sharing:
+✅ **Do:**
+- Add a screenshot of the Mermaid diagram in action
+- Link to the GitHub repo and live demo
+- Tag the relevant teams/organizations
+- Use 2-3 relevant hashtags per line (LinkedIn limit)
+- Post during peak hours (8-10 AM, 12-1 PM, 5-6 PM on weekdays)
+⚠️ **Avoid:**
+- Overly technical jargon without explanation
+- Too many hashtags (stick to 15-20 max)
+- Posting at midnight (low engagement)
+- Neglecting to engage with comments in first 2 hours
+---
+**Questions?** Feel free to customize this post with your personal voice and experiences!

README.md CHANGED Viewed

@@ -107,17 +107,24 @@ Contributions welcome — open a PR or issue with ideas, bug reports, or integra
 ## License
 MIT
----
-title: Lineage Graph Accelerator
-emoji: 🔥
-colorFrom: gray
-colorTo: gray
-sdk: gradio
-sdk_version: 5.49.1
-app_file: app.py
-pinned: false
-license: mit
-short_description: An agent that extracts data lineage, pipeline dependencies
----
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ## License
 MIT
+## Example MCP server (local testing)
+If you want to test the MCP flow locally, start the example MCP server included in `mcp_example/`.
+Run the example server (from project root):
+```bash
+# Activate venv first if you use one
+uvicorn mcp_example.server:app --reload --port 9000
+```
+Then set the `MCP Server URL` in the UI to:
+```
+http://127.0.0.1:9000/mcp
+```
+When `MCP Server URL` is configured in the app the extraction buttons will prefer the MCP server and send metadata to it; if the MCP server returns a visualization the app will render it. If `MCP Server URL` is empty, the app falls back to local extractor stubs.
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -6,6 +6,7 @@ A Gradio-based web interface for extracting and visualizing data lineage from va
 import gradio as gr
 import json
 import os
 from typing import Optional, Tuple
@@ -29,6 +30,79 @@ def render_mermaid(viz_code: str) -> str:
     )
     return f"<div class=\"mermaid\">{safe_viz}</div>{init_script}"
 # Note: This is a template. You'll need to integrate with your actual agent backend.
 # This could be through an API, Claude SDK, or other agent framework.
@@ -156,6 +230,7 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
                         placeholder="Paste your metadata here (JSON, YAML, SQL, etc.)",
                         lines=15
                     )
                     source_type_text = gr.Dropdown(
                         choices=["dbt Manifest", "Airflow DAG", "SQL DDL", "Custom JSON", "Other"],
                         label="Source Type",
@@ -179,10 +254,19 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
                     )
             extract_btn_text.click(
-                fn=extract_lineage_from_text,
-                inputs=[metadata_input, source_type_text, viz_format_text],
                 outputs=[output_viz_text, output_summary_text]
             )
         # Tab 2: BigQuery
         with gr.Tab("BigQuery"):
@@ -197,6 +281,7 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
                         placeholder="SELECT * FROM `project.dataset.INFORMATION_SCHEMA.TABLES`",
                         lines=8
                     )
                     bq_api_key = gr.Textbox(
                         label="API Key / Credentials",
                         placeholder="Enter your credentials",
@@ -220,10 +305,19 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
                     )
             extract_btn_bq.click(
-                fn=extract_lineage_from_bigquery,
-                inputs=[bq_project, bq_query, bq_api_key, viz_format_bq],
                 outputs=[output_viz_bq, output_summary_bq]
             )
         # Tab 3: URL/API
         with gr.Tab("URL/API"):
@@ -233,6 +327,7 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
                         label="URL",
                         placeholder="https://api.example.com/metadata"
                     )
                     viz_format_url = gr.Dropdown(
                         choices=["Mermaid", "DOT/Graphviz", "Text", "All"],
                         label="Visualization Format",
@@ -251,10 +346,19 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
                     )
             extract_btn_url.click(
-                fn=extract_lineage_from_url,
-                inputs=[url_input, viz_format_url],
                 outputs=[output_viz_url, output_summary_url]
             )
     gr.Markdown("""
     ---

 import gradio as gr
 import json
 import os
+import requests
 from typing import Optional, Tuple
     )
     return f"<div class=\"mermaid\">{safe_viz}</div>{init_script}"
+def send_to_mcp(server_url: str, api_key: str, metadata_text: str, source_type: str, viz_format: str) -> Tuple[str, str]:
+    """Send the metadata to an external MCP server (e.g., hosted on Hugging Face) and return visualization + summary.
+    This is optional — if no MCP server is configured the local stub extractors will be used.
+    """
+    if not server_url:
+        return "", "No MCP server URL configured."
+    try:
+        payload = {
+            "metadata": metadata_text,
+            "source_type": source_type,
+            "viz_format": viz_format,
+        }
+        headers = {}
+        if api_key:
+            headers["Authorization"] = f"Bearer {api_key}"
+        resp = requests.post(server_url, json=payload, headers=headers, timeout=15)
+        if resp.status_code >= 200 and resp.status_code < 300:
+            data = resp.json()
+            viz = data.get("visualization") or data.get("viz") or data.get("mermaid", "")
+            summary = data.get("summary", "Processed by MCP server.")
+            if viz:
+                return render_mermaid(viz), summary
+            else:
+                return "", summary
+        else:
+            return "", f"MCP server returned status {resp.status_code}: {resp.text[:200]}"
+    except Exception as e:
+        return "", f"Error contacting MCP server: {e}"
+def test_mcp_connection(server_url: str, api_key: str) -> str:
+    """Simple health-check to MCP server (sends a small ping)."""
+    if not server_url:
+        return "No MCP server URL configured."
+    try:
+        headers = {}
+        if api_key:
+            headers["Authorization"] = f"Bearer {api_key}"
+        resp = requests.get(server_url, headers=headers, timeout=10)
+        return f"MCP server responded: {resp.status_code} {resp.reason}"
+    except Exception as e:
+        return f"Error contacting MCP server: {e}"
+# Wrapper handlers: prefer MCP server if configured, otherwise fall back to local extractors.
+def handle_extract_text(metadata_text: str, source_type: str, visualization_format: str, mcp_server: str, mcp_api_key: str) -> Tuple[str, str]:
+    if mcp_server:
+        viz, summary = send_to_mcp(mcp_server, mcp_api_key, metadata_text, source_type, visualization_format)
+        # If MCP returned something, use it. Otherwise fall back to local.
+        if viz or (summary and not summary.startswith("Error")):
+            return viz, summary
+    return extract_lineage_from_text(metadata_text, source_type, visualization_format)
+def handle_extract_bigquery(project_id: str, query: str, api_key: str, visualization_format: str, mcp_server: str, mcp_api_key: str) -> Tuple[str, str]:
+    if mcp_server:
+        # Send query as metadata to MCP; source_type indicates BigQuery
+        viz, summary = send_to_mcp(mcp_server, mcp_api_key, query, "BigQuery", visualization_format)
+        if viz or (summary and not summary.startswith("Error")):
+            return viz, summary
+    return extract_lineage_from_bigquery(project_id, query, api_key, visualization_format)
+def handle_extract_url(url: str, visualization_format: str, mcp_server: str, mcp_api_key: str) -> Tuple[str, str]:
+    if mcp_server:
+        # Send the URL (MCP can fetch it or interpret it) as metadata
+        viz, summary = send_to_mcp(mcp_server, mcp_api_key, url, "URL", visualization_format)
+        if viz or (summary and not summary.startswith("Error")):
+            return viz, summary
+    return extract_lineage_from_url(url, visualization_format)
 # Note: This is a template. You'll need to integrate with your actual agent backend.
 # This could be through an API, Claude SDK, or other agent framework.
                         placeholder="Paste your metadata here (JSON, YAML, SQL, etc.)",
                         lines=15
                     )
+                    load_sample_text_btn = gr.Button("Load sample metadata")
                     source_type_text = gr.Dropdown(
                         choices=["dbt Manifest", "Airflow DAG", "SQL DDL", "Custom JSON", "Other"],
                         label="Source Type",
                     )
             extract_btn_text.click(
+                fn=handle_extract_text,
+                inputs=[metadata_input, source_type_text, viz_format_text, mcp_server, mcp_api_key],
                 outputs=[output_viz_text, output_summary_text]
             )
+            def load_sample_text():
+                p = os.path.join(os.path.dirname(__file__), "samples", "sample_metadata.json")
+                try:
+                    with open(p, "r") as f:
+                        return f.read()
+                except Exception:
+                    return "{\"error\": \"Could not load sample metadata\"}"
+            load_sample_text_btn.click(fn=load_sample_text, inputs=[], outputs=[metadata_input])
         # Tab 2: BigQuery
         with gr.Tab("BigQuery"):
                         placeholder="SELECT * FROM `project.dataset.INFORMATION_SCHEMA.TABLES`",
                         lines=8
                     )
+                    load_sample_bq_btn = gr.Button("Load sample BigQuery query")
                     bq_api_key = gr.Textbox(
                         label="API Key / Credentials",
                         placeholder="Enter your credentials",
                     )
             extract_btn_bq.click(
+                fn=handle_extract_bigquery,
+                inputs=[bq_project, bq_query, bq_api_key, viz_format_bq, mcp_server, mcp_api_key],
                 outputs=[output_viz_bq, output_summary_bq]
             )
+            def load_sample_bq():
+                p = os.path.join(os.path.dirname(__file__), "samples", "sample_bigquery.sql")
+                try:
+                    with open(p, "r") as f:
+                        return f.read()
+                except Exception:
+                    return "-- Could not load sample BigQuery SQL"
+            load_sample_bq_btn.click(fn=load_sample_bq, inputs=[], outputs=[bq_query])
         # Tab 3: URL/API
         with gr.Tab("URL/API"):
                         label="URL",
                         placeholder="https://api.example.com/metadata"
                     )
+                    load_sample_url_btn = gr.Button("Load sample API metadata")
                     viz_format_url = gr.Dropdown(
                         choices=["Mermaid", "DOT/Graphviz", "Text", "All"],
                         label="Visualization Format",
                     )
             extract_btn_url.click(
+                fn=handle_extract_url,
+                inputs=[url_input, viz_format_url, mcp_server, mcp_api_key],
                 outputs=[output_viz_url, output_summary_url]
             )
+            def load_sample_url():
+                p = os.path.join(os.path.dirname(__file__), "samples", "sample_api_metadata.json")
+                try:
+                    with open(p, "r") as f:
+                        return f.read()
+                except Exception:
+                    return "{\"error\": \"Could not load sample API metadata\"}"
+            load_sample_url_btn.click(fn=load_sample_url, inputs=[], outputs=[url_input])
     gr.Markdown("""
     ---

mcp_example/server.py ADDED Viewed

	@@ -0,0 +1,31 @@

+from fastapi import FastAPI
+from pydantic import BaseModel
+app = FastAPI(title="Example MCP Server")
+class MCPRequest(BaseModel):
+    metadata: str
+    source_type: str
+    viz_format: str
+@app.get("/")
+def root():
+    return {"status": "ok", "message": "Example MCP server running"}
+@app.post("/mcp")
+def mcp_endpoint(req: MCPRequest):
+    """Simple example endpoint that returns a sample mermaid diagram and summary.
+    This is intentionally minimal — a real MCP server would run the agent pipeline and
+    return an appropriate visualization and structured metadata.
+    """
+    # Create a small mermaid graph that incorporates the source_type for demonstration
+    mermaid = f"graph TD\n    A[{req.source_type}] --> B[Processed by Example MCP]"
+    summary = f"Example MCP processed source_type={req.source_type}, viz_format={req.viz_format}"
+    return {"mermaid": mermaid, "visualization": mermaid, "summary": summary}
+# Run with: uvicorn mcp_example.server:app --reload --port 9000

samples/sample_bigquery.sql ADDED Viewed

	@@ -0,0 +1,5 @@

+-- Sample BigQuery metadata query (for demonstration)
+-- This is not run automatically; it's a sample string users can paste into the UI.
+SELECT table_name, column_name, data_type
+FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
+WHERE table_name IN ('raw_customers', 'clean_customers', 'orders');