| # Setup Guide for Lineage Graph Extractor Space | |
| This guide will help you deploy the Lineage Graph Extractor as a Hugging Face Space. | |
| ## Prerequisites | |
| 1. A Hugging Face account (create one at https://huggingface.co/join) | |
| 2. API credentials for the services you want to integrate: | |
| - Anthropic API key (for Claude AI) | |
| - Google Cloud credentials (for BigQuery, optional) | |
| - Other service credentials as needed | |
| ## Step 1: Create a New Space | |
| 1. Go to https://huggingface.co/spaces | |
| 2. Click "Create new Space" | |
| 3. Fill in the details: | |
| - **Name**: `lineage-graph-extractor` (or your preferred name) | |
| - **License**: MIT (or your choice) | |
| - **SDK**: Gradio | |
| - **Hardware**: CPU Basic (free tier) or upgrade for better performance | |
| - **Visibility**: Public or Private (your choice) | |
| ## Step 2: Upload Files | |
| You need to upload these files to your Space repository: | |
| ### Required Files | |
| - `app.py` - Main application file | |
| - `requirements.txt` - Python dependencies | |
| - `README.md` - Space description and documentation | |
| ### Optional Files | |
| - `.env.example` - Example environment variables | |
| - `SETUP_GUIDE.md` - This setup guide | |
| ### Upload Methods | |
| **Option A: Web Interface** | |
| 1. Click "Files and versions" in your Space | |
| 2. Click "Add file" → "Upload files" | |
| 3. Upload all the files from `/hf_space/` directory | |
| **Option B: Git** | |
| ```bash | |
| # Clone your Space repository | |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/lineage-graph-extractor | |
| cd lineage-graph-extractor | |
| # Copy files | |
| cp /path/to/hf_space/* . | |
| # Commit and push | |
| git add . | |
| git commit -m "Initial commit: Lineage Graph Extractor" | |
| git push | |
| ``` | |
| ## Step 3: Configure Secrets | |
| For security, store sensitive credentials as Space secrets: | |
| 1. Go to your Space settings | |
| 2. Click "Repository secrets" | |
| 3. Add the following secrets: | |
| ### Required Secrets | |
| - `ANTHROPIC_API_KEY`: Your Claude API key from https://console.anthropic.com/ | |
| ### Optional Secrets (based on features you need) | |
| - `GOOGLE_CLOUD_PROJECT`: Your GCP project ID | |
| - `GOOGLE_APPLICATION_CREDENTIALS_JSON`: Service account JSON (as a string) | |
| - `MCP_SERVER_URL`: MCP server endpoint (if using MCP) | |
| - `MCP_API_KEY`: MCP authentication key | |
| ### Accessing Secrets in Code | |
| Update `app.py` to read from environment variables: | |
| ```python | |
| import os | |
| ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY") | |
| GOOGLE_CLOUD_PROJECT = os.environ.get("GOOGLE_CLOUD_PROJECT") | |
| ``` | |
| ## Step 4: Integrate the Agent Backend | |
| The current `app.py` is a template. You need to connect it to your actual agent: | |
| ### Option A: Use Anthropic SDK | |
| ```python | |
| import anthropic | |
| client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY")) | |
| def extract_lineage_from_text(metadata_text, source_type, viz_format): | |
| # Call your agent with metadata_parser and graph_visualizer workers | |
| response = client.messages.create( | |
| model="claude-3-5-sonnet-20241022", | |
| max_tokens=4000, | |
| messages=[{ | |
| "role": "user", | |
| "content": f"Extract lineage from this {source_type} metadata and visualize as {viz_format}: {metadata_text}" | |
| }] | |
| ) | |
| return response.content[0].text, "Processed successfully" | |
| ``` | |
| ### Option B: Use Agent API Endpoint | |
| If you have your agent deployed as an API: | |
| ```python | |
| import requests | |
| def extract_lineage_from_text(metadata_text, source_type, viz_format): | |
| response = requests.post( | |
| "https://your-agent-api.com/extract", | |
| json={ | |
| "metadata": metadata_text, | |
| "source_type": source_type, | |
| "format": viz_format | |
| } | |
| ) | |
| return response.json()["visualization"], response.json()["summary"] | |
| ``` | |
| ### Option C: Bundle Agent Files | |
| Include your agent configuration directly in the Space: | |
| 1. Copy `/memories/` directory to Space | |
| 2. Copy `/subagents/` if needed | |
| 3. Import and use agent logic in `app.py` | |
| ## Step 5: Test Your Space | |
| 1. Once deployed, Hugging Face will automatically build and run your Space | |
| 2. Check the "Logs" tab for any errors | |
| 3. Test each feature: | |
| - Text/File metadata extraction | |
| - BigQuery integration (if configured) | |
| - URL/API fetching | |
| ## Step 6: Customize and Enhance | |
| ### Add Authentication | |
| For production use, add authentication: | |
| ```python | |
| demo.launch(auth=("username", "password")) | |
| ``` | |
| Or integrate with Hugging Face authentication: | |
| ```python | |
| demo.launch(auth_required=True) | |
| ``` | |
| ### Improve Error Handling | |
| Add try-catch blocks and user-friendly error messages: | |
| ```python | |
| try: | |
| result = extract_lineage_from_text(metadata_text, source_type, viz_format) | |
| return result | |
| except Exception as e: | |
| return "", f"Error: {str(e)}" | |
| ``` | |
| ### Add More Features | |
| - File upload support | |
| - Export visualizations as images | |
| - History/session management | |
| - Batch processing | |
| ## Troubleshooting | |
| ### Space won't start | |
| - Check logs for error messages | |
| - Verify all dependencies in `requirements.txt` | |
| - Ensure Python version compatibility | |
| ### API errors | |
| - Verify secrets are correctly set | |
| - Check API key validity and permissions | |
| - Review rate limits | |
| ### Slow performance | |
| - Upgrade to better hardware (CPU or GPU) | |
| - Optimize metadata parsing logic | |
| - Add caching for repeated queries | |
| ## Security Best Practices | |
| 1. **Never commit API keys** to the repository | |
| 2. **Use Space secrets** for all credentials | |
| 3. **Validate user input** to prevent injection attacks | |
| 4. **Use read-only credentials** when possible | |
| 5. **Add rate limiting** to prevent abuse | |
| 6. **Enable authentication** for production use | |
| ## Getting Help | |
| - Hugging Face Spaces docs: https://huggingface.co/docs/hub/spaces | |
| - Gradio documentation: https://gradio.app/docs | |
| - Anthropic API docs: https://docs.anthropic.com/ | |
| ## Next Steps | |
| 1. Test the Space thoroughly | |
| 2. Share with your team or community | |
| 3. Collect feedback and iterate | |
| 4. Consider upgrading hardware for production workloads | |
| 5. Add analytics to track usage | |
| --- | |
| **Need help?** Check the Hugging Face community forums or reach out to support. | |