A newer version of the Gradio SDK is available:
6.1.0
Local Setup Guide - Lineage Graph Extractor
This guide provides detailed instructions for setting up and running the Lineage Graph Extractor agent locally.
Table of Contents
- System Requirements
- Installation Methods
- Configuration
- Usage Scenarios
- Advanced Configuration
- Troubleshooting
System Requirements
Minimum Requirements
- OS: Windows 10+, macOS 10.15+, or Linux
- Python: 3.9 or higher
- Memory: 2GB RAM minimum
- Disk Space: 100MB for agent files
Recommended Requirements
- Python: 3.10+
- Memory: 4GB RAM
- Internet: Stable connection for API calls
Installation Methods
Method 1: Standalone Use (Recommended)
This method uses the agent configuration files with any platform that supports the Anthropic API.
Download the agent
# If you have a git repository git clone <repository-url> cd local_clone # Or extract from downloaded archive unzip lineage-graph-extractor.zip cd lineage-graph-extractorSet up environment
# Copy environment template cp .env.example .envEdit .env file
# Edit with your preferred editor nano .env # or vim .env # or code .env # VS CodeAdd your credentials:
ANTHROPIC_API_KEY=sk-ant-your-key-here GOOGLE_CLOUD_PROJECT=your-gcp-project GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.jsonInstall Python dependencies (optional, for examples)
pip install anthropic google-cloud-bigquery requests pyyaml
Method 2: Claude Desktop Integration
If you're using Claude Desktop or similar platforms:
Locate your agent configuration directory
- Claude Desktop:
~/.config/claude/agents/(Linux/Mac) or%APPDATA%\claude\agents\(Windows) - Other platforms: Check platform documentation
- Claude Desktop:
Copy the memories folder
# Linux/Mac cp -r memories ~/.config/claude/agents/lineage-extractor/ # Windows xcopy /E /I memories %APPDATA%\claude\agents\lineage-extractor\Configure API credentials in your platform's settings
Restart the application
Method 3: Python Integration
To integrate into your own Python application:
Install dependencies
pip install anthropic python-dotenvUse the integration example
from anthropic import Anthropic from dotenv import load_dotenv import os # Load environment variables load_dotenv() # Initialize client client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY")) # Load agent configuration with open("memories/agent.md", "r") as f: system_prompt = f.read() # Use the agent response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4000, system=system_prompt, messages=[{ "role": "user", "content": "Extract lineage from this metadata: ..." }] ) print(response.content[0].text)
Configuration
API Keys Setup
Anthropic API Key
- Go to https://console.anthropic.com/
- Create an account or sign in
- Navigate to API Keys
- Create a new key
- Copy to
.envfile
Google Cloud (for BigQuery)
- Go to https://console.cloud.google.com/
- Create a project or select existing
- Enable BigQuery API
- Create a service account:
- Go to IAM & Admin β Service Accounts
- Create service account
- Grant "BigQuery Data Viewer" role
- Create JSON key
- Download JSON and reference in
.env
Tavily (for web search)
- Go to https://tavily.com/
- Sign up for an account
- Get your API key
- Add to
.envfile
Tool Configuration
Edit memories/tools.json to customize available tools:
{
"tools": [
"bigquery_execute_query", // Query BigQuery
"read_url_content", // Fetch from URLs
"google_sheets_read_range", // Read Google Sheets
"tavily_web_search" // Web search
],
"interrupt_config": {
"bigquery_execute_query": false,
"read_url_content": false,
"google_sheets_read_range": false,
"tavily_web_search": false
}
}
Available Tools:
bigquery_execute_query: Execute SQL queries on BigQueryread_url_content: Fetch content from URLs/APIsgoogle_sheets_read_range: Read data from Google Sheetstavily_web_search: Perform web searches
Subagent Configuration
Customize subagents by editing their configuration files:
Metadata Parser (memories/subagents/metadata_parser/)
agent.md: Instructions for parsing metadatatools.json: Tools available to parser
Graph Visualizer (memories/subagents/graph_visualizer/)
agent.md: Instructions for creating visualizationstools.json: Tools available to visualizer
Usage Scenarios
Scenario 1: BigQuery Lineage Extraction
from anthropic import Anthropic
import os
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
with open("memories/agent.md", "r") as f:
system_prompt = f.read()
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4000,
system=system_prompt,
messages=[{
"role": "user",
"content": "Extract lineage from BigQuery project: my-project, dataset: analytics"
}]
)
print(response.content[0].text)
Scenario 2: File-Based Metadata
# Read metadata from file
with open("dbt_manifest.json", "r") as f:
metadata = f.read()
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4000,
system=system_prompt,
messages=[{
"role": "user",
"content": f"Extract lineage from this dbt manifest:\n\n{metadata}"
}]
)
Scenario 3: API Metadata
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4000,
system=system_prompt,
messages=[{
"role": "user",
"content": "Extract lineage from API: https://api.example.com/metadata"
}]
)
Advanced Configuration
Custom Visualization Formats
To add custom visualization formats, edit memories/subagents/graph_visualizer/agent.md:
### 4. Custom Format
Generate a custom format with:
- Your specific requirements
- Custom styling rules
- Special formatting needs
Adding New Metadata Sources
To support new metadata sources:
- Add tool to
memories/tools.json - Update
memories/agent.mdwith source-specific instructions - Update
memories/subagents/metadata_parser/agent.mdif needed
MCP Integration
To integrate with Model Context Protocol servers:
- Check if MCP tools are available:
/toolsdirectory - Add MCP tools to
memories/tools.json - Configure MCP server connection
- See
memories/mcp_integration.md(if available)
Troubleshooting
Common Issues
1. Authentication Errors
Problem: API authentication fails Solutions:
- Verify API key is correct in
.env - Check key hasn't expired
- Ensure environment variables are loaded
- Try regenerating the API key
# Test Anthropic API key
python -c "from anthropic import Anthropic; import os; from dotenv import load_dotenv; load_dotenv(); client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY')); print('β API key works')"
2. BigQuery Access Issues
Problem: Cannot access BigQuery Solutions:
- Verify service account has BigQuery permissions
- Check project ID is correct
- Ensure JSON key file path is correct
- Test credentials:
# Test BigQuery access
gcloud auth activate-service-account --key-file=/path/to/key.json
bq ls --project_id=your-project-id
3. Import Errors
Problem: ModuleNotFoundError
Solutions:
# Install missing packages
pip install anthropic google-cloud-bigquery requests pyyaml python-dotenv
# Or install all at once
pip install -r requirements.txt # if you create one
4. Environment Variables Not Loading
Problem: .env file not being read
Solutions:
# Explicitly load .env
from dotenv import load_dotenv
load_dotenv()
# Or specify path
load_dotenv(".env")
# Verify loading
import os
print(os.getenv("ANTHROPIC_API_KEY")) # Should not be None
5. File Path Issues
Problem: Cannot find memories/agent.md
Solutions:
# Use absolute path
import os
base_dir = os.path.dirname(os.path.abspath(__file__))
agent_path = os.path.join(base_dir, "memories", "agent.md")
# Or change working directory
os.chdir("/path/to/local_clone")
Performance Issues
Slow Response Times
Causes:
- Large metadata files
- Complex lineage graphs
- Network latency
Solutions:
- Break large metadata into chunks
- Use filtering to focus on specific entities
- Increase API timeout settings
- Cache frequently used results
Debugging Tips
Enable verbose logging
import logging logging.basicConfig(level=logging.DEBUG)Test each component separately
- Test API connection first
- Test metadata retrieval
- Test parsing separately
- Test visualization separately
Validate metadata format
- Ensure JSON is valid
- Check for required fields
- Verify structure matches expected format
Check agent configuration
- Verify
memories/agent.mdis readable - Check
tools.jsonsyntax - Ensure subagent files exist
- Verify
Getting Help
Documentation
- Agent instructions:
memories/agent.md - Subagent docs:
memories/subagents/*/agent.md - Anthropic API: https://docs.anthropic.com/
Testing Your Setup
Run this complete test:
from anthropic import Anthropic
from dotenv import load_dotenv
import os
# Load environment
load_dotenv()
# Test 1: API Connection
try:
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
print("β Anthropic API connection successful")
except Exception as e:
print(f"β API connection failed: {e}")
exit(1)
# Test 2: Load Agent Config
try:
with open("memories/agent.md", "r") as f:
system_prompt = f.read()
print("β Agent configuration loaded")
except Exception as e:
print(f"β Failed to load agent config: {e}")
exit(1)
# Test 3: Simple Query
try:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1000,
system=system_prompt,
messages=[{
"role": "user",
"content": "Hello, what can you help me with?"
}]
)
print("β Agent response successful")
print(f"\nAgent says: {response.content[0].text}")
except Exception as e:
print(f"β Agent query failed: {e}")
exit(1)
print("\nβ All tests passed! Your setup is ready.")
Save as test_setup.py and run:
python test_setup.py
Next Steps
- β Complete setup
- β Test with sample metadata
- π Extract your first lineage
- π¨ Customize visualization preferences
- π§ Integrate with your workflow
Setup complete? Try the usage examples in README.md or run your own lineage extraction!