aamanlamba commited on
Commit
d71b95a
·
1 Parent(s): cd3cd19

updated tests

Browse files
Files changed (5) hide show
  1. LINKEDIN_POST.md +116 -0
  2. README.md +19 -12
  3. app.py +110 -6
  4. mcp_example/server.py +31 -0
  5. samples/sample_bigquery.sql +5 -0
LINKEDIN_POST.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LinkedIn Post - Lineage Graph Accelerator #MCP-1st-Birthday
2
+
3
+ ## Post Text (Ready to Share)
4
+
5
+ Excited to share **Lineage Graph Accelerator** 🔥 — an AI-powered agent I've built as a submission for the **#MCP-1st-Birthday celebration**!
6
+
7
+ 🎯 **What it does:**
8
+ Extract and visualize complex data lineage from multiple sources (BigQuery, dbt, Airflow, APIs) in seconds. Transform metadata into clear, interactive Mermaid diagrams that help teams understand data relationships and dependencies.
9
+
10
+ 🏗️ **Architecture Highlights:**
11
+ - Modular agent-based design with sub-agents for parsing, visualization, and integrations
12
+ - Built with **Langsmith's Agent Builder** for flexible orchestration
13
+ - **Gradio** UI for interactive exploration
14
+ - Client-side **Mermaid** rendering for instant visual feedback
15
+
16
+ ✨ **Key Features:**
17
+ - Multi-source metadata ingestion (Text, BigQuery, URLs/APIs)
18
+ - AI-assisted relationship extraction
19
+ - Dynamic graph visualization (Mermaid + DOT support)
20
+ - Lightweight, extensible, and fully tested
21
+
22
+ 📦 **Try it now:** Live demo available on Hugging Face Spaces (link in comments)
23
+
24
+ This project showcases how modular AI agents can tackle real-world data challenges. Special thanks to **@Langsmith**, **@Gradio**, and the **@HuggingFace** teams for the amazing tools!
25
+
26
+ 🔗 **Learn more:** Check out the repo for quickstart guide, unit tests, and integration examples.
27
+
28
+ ---
29
+
30
+ ## Hashtags & Tags
31
+
32
+ ### Primary Tags (LinkedIn):
33
+ #MCP-1st-Birthday
34
+ #MCP
35
+ #ModelContextProtocol
36
+
37
+ ### Technology Tags:
38
+ #Gradio
39
+ #HuggingFace
40
+ #LangSmith
41
+ #DataLineage
42
+ #DataEngineering
43
+ #AI
44
+ #Agents
45
+ #Python
46
+
47
+ ### Topic Tags:
48
+ #DataViz
49
+ #MetadataManagement
50
+ #BigQuery
51
+ #dbt
52
+ #Airflow
53
+ #DataGovernance
54
+
55
+ ### Community Tags:
56
+ #OpenSource
57
+ #Developers
58
+ #TechInnovation
59
+
60
+ ---
61
+
62
+ ## Social Media Context
63
+
64
+ **For LinkedIn Comment (link to live demo):**
65
+ "Live demo: [Hugging Face Spaces URL]"
66
+
67
+ **For GitHub/Repo Share:**
68
+ "Open source project ready for contributions! Check the README for setup, testing, and integration guides. 🚀"
69
+
70
+ **For Twitter/X (condensed version):**
71
+ "Just released Lineage Graph Accelerator 🔥—an AI-powered agent for visualizing data lineage across BigQuery, dbt, Airflow & more. Built with Langsmith + Gradio. Submitted for #MCP-1st-Birthday 🎉 #DataEngineering #OpenSource"
72
+
73
+ ---
74
+
75
+ ## Suggested Mentions (Tag in comments or separate posts):
76
+
77
+ 1. @LangSmith / @LangChain (for Agent Builder inspiration)
78
+ 2. @Gradio (for the UI framework)
79
+ 3. @Hugging Face (for Spaces hosting)
80
+ 4. @MCP-1st-Birthday (Hugging Face organization)
81
+ 5. @dbt-labs (if integrating dbt features)
82
+ 6. @BigQuery team (for metadata integration)
83
+
84
+ ---
85
+
86
+ ## Optional Follow-up Post (48 hours later):
87
+
88
+ "Week 1 update on Lineage Graph Accelerator 🔥:
89
+ - [X] Deployed to Hugging Face Spaces
90
+ - [X] Full test coverage + CI/CD ready
91
+ - [ ] Community feedback & PRs
92
+ - [ ] dbt integration alpha
93
+ - [ ] Snowflake connector
94
+
95
+ Open source projects thrive on collaboration—if you're interested in data lineage, metadata management, or AI agents, I'd love your input! 👇"
96
+
97
+ ---
98
+
99
+ ## Tips for Sharing:
100
+
101
+ ✅ **Do:**
102
+ - Add a screenshot of the Mermaid diagram in action
103
+ - Link to the GitHub repo and live demo
104
+ - Tag the relevant teams/organizations
105
+ - Use 2-3 relevant hashtags per line (LinkedIn limit)
106
+ - Post during peak hours (8-10 AM, 12-1 PM, 5-6 PM on weekdays)
107
+
108
+ ⚠️ **Avoid:**
109
+ - Overly technical jargon without explanation
110
+ - Too many hashtags (stick to 15-20 max)
111
+ - Posting at midnight (low engagement)
112
+ - Neglecting to engage with comments in first 2 hours
113
+
114
+ ---
115
+
116
+ **Questions?** Feel free to customize this post with your personal voice and experiences!
README.md CHANGED
@@ -107,17 +107,24 @@ Contributions welcome — open a PR or issue with ideas, bug reports, or integra
107
  ## License
108
 
109
  MIT
110
- ---
111
- title: Lineage Graph Accelerator
112
- emoji: 🔥
113
- colorFrom: gray
114
- colorTo: gray
115
- sdk: gradio
116
- sdk_version: 5.49.1
117
- app_file: app.py
118
- pinned: false
119
- license: mit
120
- short_description: An agent that extracts data lineage, pipeline dependencies
121
- ---
 
 
 
 
 
 
 
122
 
123
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
107
  ## License
108
 
109
  MIT
110
+
111
+ ## Example MCP server (local testing)
112
+
113
+ If you want to test the MCP flow locally, start the example MCP server included in `mcp_example/`.
114
+
115
+ Run the example server (from project root):
116
+
117
+ ```bash
118
+ # Activate venv first if you use one
119
+ uvicorn mcp_example.server:app --reload --port 9000
120
+ ```
121
+
122
+ Then set the `MCP Server URL` in the UI to:
123
+
124
+ ```
125
+ http://127.0.0.1:9000/mcp
126
+ ```
127
+
128
+ When `MCP Server URL` is configured in the app the extraction buttons will prefer the MCP server and send metadata to it; if the MCP server returns a visualization the app will render it. If `MCP Server URL` is empty, the app falls back to local extractor stubs.
129
 
130
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py CHANGED
@@ -6,6 +6,7 @@ A Gradio-based web interface for extracting and visualizing data lineage from va
6
  import gradio as gr
7
  import json
8
  import os
 
9
  from typing import Optional, Tuple
10
 
11
 
@@ -29,6 +30,79 @@ def render_mermaid(viz_code: str) -> str:
29
  )
30
  return f"<div class=\"mermaid\">{safe_viz}</div>{init_script}"
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  # Note: This is a template. You'll need to integrate with your actual agent backend.
33
  # This could be through an API, Claude SDK, or other agent framework.
34
 
@@ -156,6 +230,7 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
156
  placeholder="Paste your metadata here (JSON, YAML, SQL, etc.)",
157
  lines=15
158
  )
 
159
  source_type_text = gr.Dropdown(
160
  choices=["dbt Manifest", "Airflow DAG", "SQL DDL", "Custom JSON", "Other"],
161
  label="Source Type",
@@ -179,10 +254,19 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
179
  )
180
 
181
  extract_btn_text.click(
182
- fn=extract_lineage_from_text,
183
- inputs=[metadata_input, source_type_text, viz_format_text],
184
  outputs=[output_viz_text, output_summary_text]
185
  )
 
 
 
 
 
 
 
 
 
186
 
187
  # Tab 2: BigQuery
188
  with gr.Tab("BigQuery"):
@@ -197,6 +281,7 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
197
  placeholder="SELECT * FROM `project.dataset.INFORMATION_SCHEMA.TABLES`",
198
  lines=8
199
  )
 
200
  bq_api_key = gr.Textbox(
201
  label="API Key / Credentials",
202
  placeholder="Enter your credentials",
@@ -220,10 +305,19 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
220
  )
221
 
222
  extract_btn_bq.click(
223
- fn=extract_lineage_from_bigquery,
224
- inputs=[bq_project, bq_query, bq_api_key, viz_format_bq],
225
  outputs=[output_viz_bq, output_summary_bq]
226
  )
 
 
 
 
 
 
 
 
 
227
 
228
  # Tab 3: URL/API
229
  with gr.Tab("URL/API"):
@@ -233,6 +327,7 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
233
  label="URL",
234
  placeholder="https://api.example.com/metadata"
235
  )
 
236
  viz_format_url = gr.Dropdown(
237
  choices=["Mermaid", "DOT/Graphviz", "Text", "All"],
238
  label="Visualization Format",
@@ -251,10 +346,19 @@ with gr.Blocks(title="Lineage Graph Extractor", theme=gr.themes.Soft()) as demo:
251
  )
252
 
253
  extract_btn_url.click(
254
- fn=extract_lineage_from_url,
255
- inputs=[url_input, viz_format_url],
256
  outputs=[output_viz_url, output_summary_url]
257
  )
 
 
 
 
 
 
 
 
 
258
 
259
  gr.Markdown("""
260
  ---
 
6
  import gradio as gr
7
  import json
8
  import os
9
+ import requests
10
  from typing import Optional, Tuple
11
 
12
 
 
30
  )
31
  return f"<div class=\"mermaid\">{safe_viz}</div>{init_script}"
32
 
33
+
34
+ def send_to_mcp(server_url: str, api_key: str, metadata_text: str, source_type: str, viz_format: str) -> Tuple[str, str]:
35
+ """Send the metadata to an external MCP server (e.g., hosted on Hugging Face) and return visualization + summary.
36
+
37
+ This is optional — if no MCP server is configured the local stub extractors will be used.
38
+ """
39
+ if not server_url:
40
+ return "", "No MCP server URL configured."
41
+ try:
42
+ payload = {
43
+ "metadata": metadata_text,
44
+ "source_type": source_type,
45
+ "viz_format": viz_format,
46
+ }
47
+ headers = {}
48
+ if api_key:
49
+ headers["Authorization"] = f"Bearer {api_key}"
50
+ resp = requests.post(server_url, json=payload, headers=headers, timeout=15)
51
+ if resp.status_code >= 200 and resp.status_code < 300:
52
+ data = resp.json()
53
+ viz = data.get("visualization") or data.get("viz") or data.get("mermaid", "")
54
+ summary = data.get("summary", "Processed by MCP server.")
55
+ if viz:
56
+ return render_mermaid(viz), summary
57
+ else:
58
+ return "", summary
59
+ else:
60
+ return "", f"MCP server returned status {resp.status_code}: {resp.text[:200]}"
61
+ except Exception as e:
62
+ return "", f"Error contacting MCP server: {e}"
63
+
64
+
65
+ def test_mcp_connection(server_url: str, api_key: str) -> str:
66
+ """Simple health-check to MCP server (sends a small ping)."""
67
+ if not server_url:
68
+ return "No MCP server URL configured."
69
+ try:
70
+ headers = {}
71
+ if api_key:
72
+ headers["Authorization"] = f"Bearer {api_key}"
73
+ resp = requests.get(server_url, headers=headers, timeout=10)
74
+ return f"MCP server responded: {resp.status_code} {resp.reason}"
75
+ except Exception as e:
76
+ return f"Error contacting MCP server: {e}"
77
+
78
+
79
+ # Wrapper handlers: prefer MCP server if configured, otherwise fall back to local extractors.
80
+ def handle_extract_text(metadata_text: str, source_type: str, visualization_format: str, mcp_server: str, mcp_api_key: str) -> Tuple[str, str]:
81
+ if mcp_server:
82
+ viz, summary = send_to_mcp(mcp_server, mcp_api_key, metadata_text, source_type, visualization_format)
83
+ # If MCP returned something, use it. Otherwise fall back to local.
84
+ if viz or (summary and not summary.startswith("Error")):
85
+ return viz, summary
86
+ return extract_lineage_from_text(metadata_text, source_type, visualization_format)
87
+
88
+
89
+ def handle_extract_bigquery(project_id: str, query: str, api_key: str, visualization_format: str, mcp_server: str, mcp_api_key: str) -> Tuple[str, str]:
90
+ if mcp_server:
91
+ # Send query as metadata to MCP; source_type indicates BigQuery
92
+ viz, summary = send_to_mcp(mcp_server, mcp_api_key, query, "BigQuery", visualization_format)
93
+ if viz or (summary and not summary.startswith("Error")):
94
+ return viz, summary
95
+ return extract_lineage_from_bigquery(project_id, query, api_key, visualization_format)
96
+
97
+
98
+ def handle_extract_url(url: str, visualization_format: str, mcp_server: str, mcp_api_key: str) -> Tuple[str, str]:
99
+ if mcp_server:
100
+ # Send the URL (MCP can fetch it or interpret it) as metadata
101
+ viz, summary = send_to_mcp(mcp_server, mcp_api_key, url, "URL", visualization_format)
102
+ if viz or (summary and not summary.startswith("Error")):
103
+ return viz, summary
104
+ return extract_lineage_from_url(url, visualization_format)
105
+
106
  # Note: This is a template. You'll need to integrate with your actual agent backend.
107
  # This could be through an API, Claude SDK, or other agent framework.
108
 
 
230
  placeholder="Paste your metadata here (JSON, YAML, SQL, etc.)",
231
  lines=15
232
  )
233
+ load_sample_text_btn = gr.Button("Load sample metadata")
234
  source_type_text = gr.Dropdown(
235
  choices=["dbt Manifest", "Airflow DAG", "SQL DDL", "Custom JSON", "Other"],
236
  label="Source Type",
 
254
  )
255
 
256
  extract_btn_text.click(
257
+ fn=handle_extract_text,
258
+ inputs=[metadata_input, source_type_text, viz_format_text, mcp_server, mcp_api_key],
259
  outputs=[output_viz_text, output_summary_text]
260
  )
261
+ def load_sample_text():
262
+ p = os.path.join(os.path.dirname(__file__), "samples", "sample_metadata.json")
263
+ try:
264
+ with open(p, "r") as f:
265
+ return f.read()
266
+ except Exception:
267
+ return "{\"error\": \"Could not load sample metadata\"}"
268
+
269
+ load_sample_text_btn.click(fn=load_sample_text, inputs=[], outputs=[metadata_input])
270
 
271
  # Tab 2: BigQuery
272
  with gr.Tab("BigQuery"):
 
281
  placeholder="SELECT * FROM `project.dataset.INFORMATION_SCHEMA.TABLES`",
282
  lines=8
283
  )
284
+ load_sample_bq_btn = gr.Button("Load sample BigQuery query")
285
  bq_api_key = gr.Textbox(
286
  label="API Key / Credentials",
287
  placeholder="Enter your credentials",
 
305
  )
306
 
307
  extract_btn_bq.click(
308
+ fn=handle_extract_bigquery,
309
+ inputs=[bq_project, bq_query, bq_api_key, viz_format_bq, mcp_server, mcp_api_key],
310
  outputs=[output_viz_bq, output_summary_bq]
311
  )
312
+ def load_sample_bq():
313
+ p = os.path.join(os.path.dirname(__file__), "samples", "sample_bigquery.sql")
314
+ try:
315
+ with open(p, "r") as f:
316
+ return f.read()
317
+ except Exception:
318
+ return "-- Could not load sample BigQuery SQL"
319
+
320
+ load_sample_bq_btn.click(fn=load_sample_bq, inputs=[], outputs=[bq_query])
321
 
322
  # Tab 3: URL/API
323
  with gr.Tab("URL/API"):
 
327
  label="URL",
328
  placeholder="https://api.example.com/metadata"
329
  )
330
+ load_sample_url_btn = gr.Button("Load sample API metadata")
331
  viz_format_url = gr.Dropdown(
332
  choices=["Mermaid", "DOT/Graphviz", "Text", "All"],
333
  label="Visualization Format",
 
346
  )
347
 
348
  extract_btn_url.click(
349
+ fn=handle_extract_url,
350
+ inputs=[url_input, viz_format_url, mcp_server, mcp_api_key],
351
  outputs=[output_viz_url, output_summary_url]
352
  )
353
+ def load_sample_url():
354
+ p = os.path.join(os.path.dirname(__file__), "samples", "sample_api_metadata.json")
355
+ try:
356
+ with open(p, "r") as f:
357
+ return f.read()
358
+ except Exception:
359
+ return "{\"error\": \"Could not load sample API metadata\"}"
360
+
361
+ load_sample_url_btn.click(fn=load_sample_url, inputs=[], outputs=[url_input])
362
 
363
  gr.Markdown("""
364
  ---
mcp_example/server.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI
2
+ from pydantic import BaseModel
3
+
4
+ app = FastAPI(title="Example MCP Server")
5
+
6
+
7
+ class MCPRequest(BaseModel):
8
+ metadata: str
9
+ source_type: str
10
+ viz_format: str
11
+
12
+
13
+ @app.get("/")
14
+ def root():
15
+ return {"status": "ok", "message": "Example MCP server running"}
16
+
17
+
18
+ @app.post("/mcp")
19
+ def mcp_endpoint(req: MCPRequest):
20
+ """Simple example endpoint that returns a sample mermaid diagram and summary.
21
+
22
+ This is intentionally minimal — a real MCP server would run the agent pipeline and
23
+ return an appropriate visualization and structured metadata.
24
+ """
25
+ # Create a small mermaid graph that incorporates the source_type for demonstration
26
+ mermaid = f"graph TD\n A[{req.source_type}] --> B[Processed by Example MCP]"
27
+ summary = f"Example MCP processed source_type={req.source_type}, viz_format={req.viz_format}"
28
+ return {"mermaid": mermaid, "visualization": mermaid, "summary": summary}
29
+
30
+
31
+ # Run with: uvicorn mcp_example.server:app --reload --port 9000
samples/sample_bigquery.sql ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ -- Sample BigQuery metadata query (for demonstration)
2
+ -- This is not run automatically; it's a sample string users can paste into the UI.
3
+ SELECT table_name, column_name, data_type
4
+ FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
5
+ WHERE table_name IN ('raw_customers', 'clean_customers', 'orders');