Lineage-graph-accelerator

Running

# If you have a git repository
git clone <repository-url>
cd local_clone

# Or extract from downloaded archive
unzip lineage-graph-extractor.zip
cd lineage-graph-extractor

Set up environment

# Copy environment template
cp .env.example .env

Edit .env file

# Edit with your preferred editor
nano .env
# or
vim .env
# or
code .env  # VS Code

Add your credentials:

ANTHROPIC_API_KEY=sk-ant-your-key-here
GOOGLE_CLOUD_PROJECT=your-gcp-project
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

Install Python dependencies (optional, for examples)

pip install anthropic google-cloud-bigquery requests pyyaml

Method 2: Claude Desktop Integration

If you're using Claude Desktop or similar platforms:

Locate your agent configuration directory
- Claude Desktop: ~/.config/claude/agents/ (Linux/Mac) or %APPDATA%\claude\agents\ (Windows)
- Other platforms: Check platform documentation

Copy the memories folder

# Linux/Mac
cp -r memories ~/.config/claude/agents/lineage-extractor/

# Windows
xcopy /E /I memories %APPDATA%\claude\agents\lineage-extractor\

Configure API credentials in your platform's settings
Restart the application

Method 3: Python Integration

To integrate into your own Python application:

Install dependencies
```
pip install anthropic python-dotenv
```

Use the integration example

from anthropic import Anthropic
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Initialize client
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

# Load agent configuration
with open("memories/agent.md", "r") as f:
    system_prompt = f.read()

# Use the agent
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=4000,
    system=system_prompt,
    messages=[{
        "role": "user",
        "content": "Extract lineage from this metadata: ..."
    }]
)

print(response.content[0].text)

Configuration

API Keys Setup

Anthropic API Key

Go to https://console.anthropic.com/
Create an account or sign in
Navigate to API Keys
Create a new key
Copy to .env file

Google Cloud (for BigQuery)

Go to https://console.cloud.google.com/
Create a project or select existing
Enable BigQuery API
Create a service account:
- Go to IAM & Admin → Service Accounts
- Create service account
- Grant "BigQuery Data Viewer" role
- Create JSON key
Download JSON and reference in .env

Tavily (for web search)

Go to https://tavily.com/
Sign up for an account
Get your API key
Add to .env file

Tool Configuration

Edit memories/tools.json to customize available tools:

{
  "tools": [
    "bigquery_execute_query",      // Query BigQuery
    "read_url_content",             // Fetch from URLs
    "google_sheets_read_range",     // Read Google Sheets
    "tavily_web_search"             // Web search
  ],
  "interrupt_config": {
    "bigquery_execute_query": false,
    "read_url_content": false,
    "google_sheets_read_range": false,
    "tavily_web_search": false
  }
}

Available Tools:

bigquery_execute_query: Execute SQL queries on BigQuery
read_url_content: Fetch content from URLs/APIs
google_sheets_read_range: Read data from Google Sheets
tavily_web_search: Perform web searches

Subagent Configuration

Customize subagents by editing their configuration files:

Metadata Parser (memories/subagents/metadata_parser/)

agent.md: Instructions for parsing metadata
tools.json: Tools available to parser

Graph Visualizer (memories/subagents/graph_visualizer/)

agent.md: Instructions for creating visualizations
tools.json: Tools available to visualizer

Usage Scenarios

Scenario 1: BigQuery Lineage Extraction

from anthropic import Anthropic
import os

client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

with open("memories/agent.md", "r") as f:
    system_prompt = f.read()

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=4000,
    system=system_prompt,
    messages=[{
        "role": "user",
        "content": "Extract lineage from BigQuery project: my-project, dataset: analytics"
    }]
)

print(response.content[0].text)

Scenario 2: File-Based Metadata

# Read metadata from file
with open("dbt_manifest.json", "r") as f:
    metadata = f.read()

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=4000,
    system=system_prompt,
    messages=[{
        "role": "user",
        "content": f"Extract lineage from this dbt manifest:\n\n{metadata}"
    }]
)

Scenario 3: API Metadata

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=4000,
    system=system_prompt,
    messages=[{
        "role": "user",
        "content": "Extract lineage from API: https://api.example.com/metadata"
    }]
)

Advanced Configuration

Custom Visualization Formats

To add custom visualization formats, edit memories/subagents/graph_visualizer/agent.md:

### 4. Custom Format
Generate a custom format with:
- Your specific requirements
- Custom styling rules
- Special formatting needs

Adding New Metadata Sources

To support new metadata sources:

Add tool to memories/tools.json
Update memories/agent.md with source-specific instructions
Update memories/subagents/metadata_parser/agent.md if needed

MCP Integration

To integrate with Model Context Protocol servers:

Check if MCP tools are available: /tools directory
Add MCP tools to memories/tools.json
Configure MCP server connection
See memories/mcp_integration.md (if available)

Troubleshooting

Common Issues

1. Authentication Errors

Problem: API authentication fails Solutions:

Verify API key is correct in .env
Check key hasn't expired
Ensure environment variables are loaded
Try regenerating the API key

# Test Anthropic API key
python -c "from anthropic import Anthropic; import os; from dotenv import load_dotenv; load_dotenv(); client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY')); print('✓ API key works')"

2. BigQuery Access Issues

Problem: Cannot access BigQuery Solutions:

Verify service account has BigQuery permissions
Check project ID is correct
Ensure JSON key file path is correct
Test credentials:

# Test BigQuery access
gcloud auth activate-service-account --key-file=/path/to/key.json
bq ls --project_id=your-project-id

3. Import Errors

Problem: ModuleNotFoundError Solutions:

# Install missing packages
pip install anthropic google-cloud-bigquery requests pyyaml python-dotenv

# Or install all at once
pip install -r requirements.txt  # if you create one

4. Environment Variables Not Loading

Problem: .env file not being read Solutions:

# Explicitly load .env
from dotenv import load_dotenv
load_dotenv()

# Or specify path
load_dotenv(".env")

# Verify loading
import os
print(os.getenv("ANTHROPIC_API_KEY"))  # Should not be None

5. File Path Issues

Problem: Cannot find memories/agent.md Solutions:

# Use absolute path
import os
base_dir = os.path.dirname(os.path.abspath(__file__))
agent_path = os.path.join(base_dir, "memories", "agent.md")

# Or change working directory
os.chdir("/path/to/local_clone")

Performance Issues

Slow Response Times

Causes:

Large metadata files
Complex lineage graphs
Network latency

Solutions:

Break large metadata into chunks
Use filtering to focus on specific entities
Increase API timeout settings
Cache frequently used results

Debugging Tips

Enable verbose logging

import logging
logging.basicConfig(level=logging.DEBUG)

Test each component separately
- Test API connection first
- Test metadata retrieval
- Test parsing separately
- Test visualization separately
Validate metadata format
- Ensure JSON is valid
- Check for required fields
- Verify structure matches expected format
Check agent configuration
- Verify memories/agent.md is readable
- Check tools.json syntax
- Ensure subagent files exist

Getting Help

Documentation

Agent instructions: memories/agent.md
Subagent docs: memories/subagents/*/agent.md
Anthropic API: https://docs.anthropic.com/

Testing Your Setup

Run this complete test:

from anthropic import Anthropic
from dotenv import load_dotenv
import os

# Load environment
load_dotenv()

# Test 1: API Connection
try:
    client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
    print("✓ Anthropic API connection successful")
except Exception as e:
    print(f"✗ API connection failed: {e}")
    exit(1)

# Test 2: Load Agent Config
try:
    with open("memories/agent.md", "r") as f:
        system_prompt = f.read()
    print("✓ Agent configuration loaded")
except Exception as e:
    print(f"✗ Failed to load agent config: {e}")
    exit(1)

# Test 3: Simple Query
try:
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1000,
        system=system_prompt,
        messages=[{
            "role": "user",
            "content": "Hello, what can you help me with?"
        }]
    )
    print("✓ Agent response successful")
    print(f"\nAgent says: {response.content[0].text}")
except Exception as e:
    print(f"✗ Agent query failed: {e}")
    exit(1)

print("\n✓ All tests passed! Your setup is ready.")

Save as test_setup.py and run:

python test_setup.py

Next Steps

✅ Complete setup
✅ Test with sample metadata
📊 Extract your first lineage
🎨 Customize visualization preferences
🔧 Integrate with your workflow

Setup complete? Try the usage examples in README.md or run your own lineage extraction!

Local Setup Guide - Lineage Graph Extractor

Table of Contents

System Requirements

Minimum Requirements

Recommended Requirements

Installation Methods

Method 1: Standalone Use (Recommended)