Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

VibecoderMcSwaggins commited on 28 days ago

Commit

dacd086

1 Parent(s): 4878d51

docs: Archive brainstorming documents and related files

- Remove outdated brainstorming documentation from the `docs/brainstorming` directory, including README, PubMed, ClinicalTrials, Europe PMC, and embeddings meta files.
- Consolidate all archived documents into the `archive/` folder for better organization and future reference.
- Ensure that the README reflects the current state of the project and directs users to the appropriate resources.

Files changed (7) hide show

docs/brainstorming/README.md +0 -22
docs/brainstorming/archive/00_ROADMAP_SUMMARY.md +0 -194
docs/brainstorming/archive/01_PUBMED_IMPROVEMENTS.md +0 -125
docs/brainstorming/archive/02_CLINICALTRIALS_IMPROVEMENTS.md +0 -193
docs/brainstorming/archive/03_EUROPEPMC_IMPROVEMENTS.md +0 -211
docs/brainstorming/archive/BRAINSTORM_EMBEDDINGS_META.md +0 -74
docs/brainstorming/archive/UI_MODE_SELECTION_UX.md +0 -133

docs/brainstorming/README.md DELETED Viewed

@@ -1,22 +0,0 @@
-# Brainstorming
-> **Status**: All brainstorming docs archived (December 2025)
-This folder contained early hackathon ideation. All documents have been moved to `archive/`.
-## Archived Documents
-| Document | Status | Notes |
-|----------|--------|-------|
-| `00_ROADMAP_SUMMARY.md` | Superseded | See `docs/future-roadmap/` |
-| `01_PUBMED_IMPROVEMENTS.md` | Future work | Rate limiting, full-text retrieval |
-| `02_CLINICALTRIALS_IMPROVEMENTS.md` | Partially done | Outcomes in SPEC-14, rest is future |
-| `03_EUROPEPMC_IMPROVEMENTS.md` | Future work | Full-text, citations |
-| `BRAINSTORM_EMBEDDINGS_META.md` | Closed | Decision: don't implement internal embeddings |
-| `UI_MODE_SELECTION_UX.md` | Obsolete | Simple mode removed, Anthropic removed |
-## Where to Look Now
-- **Future improvements**: `docs/future-roadmap/`
-- **Current architecture**: `docs/ARCHITECTURE.md`
-- **Active bugs**: `docs/bugs/ACTIVE_BUGS.md`

docs/brainstorming/archive/00_ROADMAP_SUMMARY.md DELETED Viewed

@@ -1,194 +0,0 @@
-# DeepBoner Data Sources: Roadmap Summary
-**Created**: 2024-11-27
-**Purpose**: Future maintainability and hackathon continuation
----
-## Current State
-### Working Tools
-| Tool | Status | Data Quality |
-|------|--------|--------------|
-| PubMed | ✅ Works | Good (abstracts only) |
-| ClinicalTrials.gov | ✅ Works | Good (filtered for interventional) |
-| Europe PMC | ✅ Works | Good (includes preprints) |
-### Removed Tools
-| Tool | Status | Reason |
-|------|--------|--------|
-| bioRxiv | ❌ Removed | No search API - only date/DOI lookup |
----
-## Priority Improvements
-### P0: Critical (Do First)
-1. **Add Rate Limiting to PubMed**
-   - NCBI will block us without it
-   - Use `limits` library (see reference repo)
-   - 3/sec without key, 10/sec with key
-### P1: High Value, Medium Effort
-2. **Add OpenAlex as 4th Source**
-   - Citation network (huge for drug repurposing)
-   - Concept tagging (semantic discovery)
-   - Already implemented in reference repo
-   - Free, no API key
-3. **PubMed Full-Text via BioC**
-   - Get full paper text for PMC papers
-   - Already in reference repo
-### P2: Nice to Have
-4. **ClinicalTrials.gov Results**
-   - Get efficacy data from completed trials
-   - Requires more complex API calls
-5. **Europe PMC Annotations**
-   - Text-mined entities (genes, drugs, diseases)
-   - Automatic entity extraction
----
-## Effort Estimates
-| Improvement | Effort | Impact | Priority |
-|-------------|--------|--------|----------|
-| PubMed rate limiting | 1 hour | Stability | P0 |
-| OpenAlex basic search | 2 hours | High | P1 |
-| OpenAlex citations | 2 hours | Very High | P1 |
-| PubMed full-text | 3 hours | Medium | P1 |
-| CT.gov results | 4 hours | Medium | P2 |
-| Europe PMC annotations | 3 hours | Medium | P2 |
----
-## Architecture Decision
-### Option A: Keep Current + Add OpenAlex
-```
-                    User Query
-                        ↓
-    ┌───────────────────┼───────────────────┐
-    ↓                   ↓                   ↓
- PubMed          ClinicalTrials        Europe PMC
- (abstracts)     (trials only)         (preprints)
-    ↓                   ↓                   ↓
-    └───────────────────┼───────────────────┘
-                        ↓
-                   OpenAlex              ← NEW
-               (citations, concepts)
-                        ↓
-                  Orchestrator
-                        ↓
-                     Report
-```
-**Pros**: Low risk, additive
-**Cons**: More complexity, some overlap
-### Option B: OpenAlex as Primary
-```
-                    User Query
-                        ↓
-    ┌───────────────────┼───────────────────┐
-    ↓                   ↓                   ↓
- OpenAlex          ClinicalTrials      Europe PMC
- (primary          (trials only)       (full-text
-  search)                               fallback)
-    ↓                   ↓                   ↓
-    └───────────────────┼───────────────────┘
-                        ↓
-                  Orchestrator
-                        ↓
-                     Report
-```
-**Pros**: Simpler, citation network built-in
-**Cons**: Lose some PubMed-specific features
-### Recommendation: Option A
-Keep current architecture working, add OpenAlex incrementally.
----
-## Quick Wins (Can Do Today)
-1. **Add `limits` to `pyproject.toml`**
-   ```toml
-   dependencies = [
-       "limits>=3.0",
-   ]
-   ```
-2. **Copy OpenAlex tool from reference repo**
-   - File: `reference_repos/DeepBoner/DeepResearch/src/tools/openalex_tools.py`
-   - Adapt to our `SearchTool` base class
-3. **Enable NCBI API Key**
-   - Add to `.env`: `NCBI_API_KEY=your_key`
-   - 10x rate limit improvement
----
-## External Resources Worth Exploring
-### Python Libraries
-| Library | For | Notes |
-|---------|-----|-------|
-| `limits` | Rate limiting | Used by reference repo |
-| `pyalex` | OpenAlex wrapper | [GitHub](https://github.com/J535D165/pyalex) |
-| `metapub` | PubMed | Full-featured |
-| `sentence-transformers` | Semantic search | For embeddings |
-### APIs Not Yet Used
-| API | Provides | Effort |
-|-----|----------|--------|
-| RxNorm | Drug name normalization | Low |
-| DrugBank | Drug targets/mechanisms | Medium (license) |
-| UniProt | Protein data | Medium |
-| ChEMBL | Bioactivity data | Medium |
-### RAG Tools (Future)
-| Tool | Purpose |
-|------|---------|
-| [PaperQA](https://github.com/Future-House/paper-qa) | RAG for scientific papers |
-| [txtai](https://github.com/neuml/txtai) | Embeddings + search |
-| [PubMedBERT](https://huggingface.co/NeuML/pubmedbert-base-embeddings) | Biomedical embeddings |
----
-## Files in This Directory
-| File | Contents |
-|------|----------|
-| `00_ROADMAP_SUMMARY.md` | This file |
-| `01_PUBMED_IMPROVEMENTS.md` | PubMed enhancement details |
-| `02_CLINICALTRIALS_IMPROVEMENTS.md` | ClinicalTrials.gov details |
-| `03_EUROPEPMC_IMPROVEMENTS.md` | Europe PMC details |
-| `04_OPENALEX_INTEGRATION.md` | OpenAlex integration plan |
----
-## For Future Maintainers
-If you're picking this up after the hackathon:
-1. **Start with OpenAlex** - biggest bang for buck
-2. **Add rate limiting** - prevents API blocks
-3. **Don't bother with bioRxiv** - use Europe PMC instead
-4. **Reference repo is gold** - `reference_repos/DeepBoner/` has working implementations
-Good luck! 🚀

docs/brainstorming/archive/01_PUBMED_IMPROVEMENTS.md DELETED Viewed

@@ -1,125 +0,0 @@
-# PubMed Tool: Current State & Future Improvements
-**Status**: Currently Implemented
-**Priority**: High (Core Data Source)
----
-## Current Implementation
-### What We Have (`src/tools/pubmed.py`)
-- Basic E-utilities search via `esearch.fcgi` and `efetch.fcgi`
-- Query preprocessing (strips question words, expands synonyms)
-- Returns: title, abstract, authors, journal, PMID
-- Rate limiting: None implemented (relying on NCBI defaults)
-### Current Limitations
-1. **No Full-Text Access**: Only retrieves abstracts, not full paper text
-2. **No Rate Limiting**: Risk of being blocked by NCBI
-3. **No BioC Format**: Missing structured full-text extraction
-4. **No Figure Retrieval**: No supplementary materials access
-5. **No PMC Integration**: Missing open-access full-text via PMC
----
-## Reference Implementation (DeepBoner Reference Repo)
-The reference repo at `reference_repos/DeepBoner/DeepResearch/src/tools/bioinformatics_tools.py` has a more sophisticated implementation:
-### Features We're Missing
-```python
-# Rate limiting (lines 47-50)
-from limits import parse
-from limits.storage import MemoryStorage
-from limits.strategies import MovingWindowRateLimiter
-storage = MemoryStorage()
-limiter = MovingWindowRateLimiter(storage)
-rate_limit = parse("3/second")  # NCBI allows 3/sec without API key, 10/sec with
-# Full-text via BioC format (lines 108-120)
-def _get_fulltext(pmid: int) -> dict[str, Any] | None:
-    pmid_url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    # Returns structured JSON with full text for open-access papers
-# Figure retrieval via Europe PMC (lines 123-149)
-def _get_figures(pmcid: str) -> dict[str, str]:
-    suppl_url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles"
-    # Returns base64-encoded images from supplementary materials
-```
----
-## Recommended Improvements
-### Phase 1: Rate Limiting (Critical)
-```python
-# Add to src/tools/pubmed.py
-from limits import parse
-from limits.storage import MemoryStorage
-from limits.strategies import MovingWindowRateLimiter
-storage = MemoryStorage()
-limiter = MovingWindowRateLimiter(storage)
-# With NCBI_API_KEY: 10/sec, without: 3/sec
-def get_rate_limit():
-    if settings.ncbi_api_key:
-        return parse("10/second")
-    return parse("3/second")
-```
-**Dependencies**: `pip install limits`
-### Phase 2: Full-Text Retrieval
-```python
-async def get_fulltext(pmid: str) -> str | None:
-    """Get full text for open-access papers via BioC API."""
-    url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    # Only works for PMC papers (open access)
-```
-### Phase 3: PMC ID Resolution
-```python
-async def get_pmc_id(pmid: str) -> str | None:
-    """Convert PMID to PMCID for full-text access."""
-    url = f"https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?ids={pmid}&format=json"
-```
----
-## Python Libraries to Consider
-| Library | Purpose | Notes |
-|---------|---------|-------|
-| [Biopython](https://biopython.org/) | `Bio.Entrez` module | Official, well-maintained |
-| [PyMed](https://pypi.org/project/pymed/) | PubMed wrapper | Simpler API, less control |
-| [metapub](https://pypi.org/project/metapub/) | Full-featured | Tested on 1/3 of PubMed |
-| [limits](https://pypi.org/project/limits/) | Rate limiting | Used by reference repo |
----
-## API Endpoints Reference
-| Endpoint | Purpose | Rate Limit |
-|----------|---------|------------|
-| `esearch.fcgi` | Search for PMIDs | 3/sec (10 with key) |
-| `efetch.fcgi` | Fetch metadata | 3/sec (10 with key) |
-| `esummary.fcgi` | Quick metadata | 3/sec (10 with key) |
-| `pmcoa.cgi/BioC_json` | Full text (PMC only) | Unknown |
-| `idconv/v1.0` | PMID ↔ PMCID | Unknown |
----
-## Sources
-- [PubMed E-utilities Documentation](https://www.ncbi.nlm.nih.gov/books/NBK25501/)
-- [NCBI BioC API](https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/)
-- [Searching PubMed with Python](https://marcobonzanini.com/2015/01/12/searching-pubmed-with-python/)
-- [PyMed on PyPI](https://pypi.org/project/pymed/)

docs/brainstorming/archive/02_CLINICALTRIALS_IMPROVEMENTS.md DELETED Viewed

@@ -1,193 +0,0 @@
-# ClinicalTrials.gov Tool: Current State & Future Improvements
-**Status**: Currently Implemented
-**Priority**: High (Core Data Source for Drug Repurposing)
----
-## Current Implementation
-### What We Have (`src/tools/clinicaltrials.py`)
-- V2 API search via `clinicaltrials.gov/api/v2/studies`
-- Filters: `INTERVENTIONAL` study type, `RECRUITING` status
-- Returns: NCT ID, title, conditions, interventions, phase, status
-- Query preprocessing via shared `query_utils.py`
-### Current Strengths
-1. **Good Filtering**: Already filtering for interventional + recruiting
-2. **V2 API**: Using the modern API (v1 deprecated)
-3. **Phase Info**: Extracting trial phases for drug development context
-### Current Limitations
-1. **No Outcome Data**: Missing primary/secondary outcomes
-2. **No Eligibility Criteria**: Missing inclusion/exclusion details
-3. **No Sponsor Info**: Missing who's running the trial
-4. **No Result Data**: For completed trials, no efficacy data
-5. **Limited Drug Mapping**: No integration with drug databases
----
-## API Capabilities We're Not Using
-### Fields We Could Request
-```python
-# Current fields
-fields = ["NCTId", "BriefTitle", "Condition", "InterventionName", "Phase", "OverallStatus"]
-# Additional valuable fields
-additional_fields = [
-    "PrimaryOutcomeMeasure",      # What are they measuring?
-    "SecondaryOutcomeMeasure",    # Secondary endpoints
-    "EligibilityCriteria",        # Who can participate?
-    "LeadSponsorName",            # Who's funding?
-    "ResultsFirstPostDate",       # Has results?
-    "StudyFirstPostDate",         # When started?
-    "CompletionDate",             # When finished?
-    "EnrollmentCount",            # Sample size
-    "InterventionDescription",    # Drug details
-    "ArmGroupLabel",              # Treatment arms
-    "InterventionOtherName",      # Drug aliases
-]
-```
-### Filter Enhancements
-```python
-# Current
-aggFilters = "studyType:INTERVENTIONAL,status:RECRUITING"
-# Could add
-"status:RECRUITING,ACTIVE_NOT_RECRUITING,COMPLETED"  # Include completed for results
-"phase:PHASE2,PHASE3"  # Only later-stage trials
-"resultsFirstPostDateRange:2020-01-01_"  # Trials with posted results
-```
----
-## Recommended Improvements
-### Phase 1: Richer Metadata
-```python
-EXTENDED_FIELDS = [
-    "NCTId",
-    "BriefTitle",
-    "OfficialTitle",
-    "Condition",
-    "InterventionName",
-    "InterventionDescription",
-    "InterventionOtherName",  # Drug synonyms!
-    "Phase",
-    "OverallStatus",
-    "PrimaryOutcomeMeasure",
-    "EnrollmentCount",
-    "LeadSponsorName",
-    "StudyFirstPostDate",
-]
-```
-### Phase 2: Results Retrieval
-For completed trials, we can get actual efficacy data:
-```python
-async def get_trial_results(nct_id: str) -> dict | None:
-    """Fetch results for completed trials."""
-    url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}"
-    params = {
-        "fields": "ResultsSection",
-    }
-    # Returns outcome measures and statistics
-```
-### Phase 3: Drug Name Normalization
-Map intervention names to standard identifiers:
-```python
-# Problem: "Metformin", "Metformin HCl", "Glucophage" are the same drug
-# Solution: Use RxNorm or DrugBank for normalization
-async def normalize_drug_name(intervention: str) -> str:
-    """Normalize drug name via RxNorm API."""
-    url = f"https://rxnav.nlm.nih.gov/REST/rxcui.json?name={intervention}"
-    # Returns standardized RxCUI
-```
----
-## Integration Opportunities
-### With PubMed
-Cross-reference trials with publications:
-```python
-# ClinicalTrials.gov provides PMID links
-# Can correlate trial results with published papers
-```
-### With DrugBank/ChEMBL
-Map interventions to:
-- Mechanism of action
-- Known targets
-- Adverse effects
-- Drug-drug interactions
----
-## Python Libraries to Consider
-| Library | Purpose | Notes |
-|---------|---------|-------|
-| [pytrials](https://pypi.org/project/pytrials/) | CT.gov wrapper | V2 API support unclear |
-| [clinicaltrials](https://github.com/ebmdatalab/clinicaltrials-act-tracker) | Data tracking | More for analysis |
-| [drugbank-downloader](https://pypi.org/project/drugbank-downloader/) | Drug mapping | Requires license |
----
-## API Quirks & Gotchas
-1. **Rate Limiting**: Undocumented, be conservative
-2. **Pagination**: Max 1000 results per request
-3. **Field Names**: Case-sensitive, camelCase
-4. **Empty Results**: Some fields may be null even if requested
-5. **Status Changes**: Trials change status frequently
----
-## Example Enhanced Query
-```python
-async def search_drug_repurposing_trials(
-    drug_name: str,
-    condition: str,
-    include_completed: bool = True,
-) -> list[Evidence]:
-    """Search for trials repurposing a drug for a new condition."""
-    statuses = ["RECRUITING", "ACTIVE_NOT_RECRUITING"]
-    if include_completed:
-        statuses.append("COMPLETED")
-    params = {
-        "query.intr": drug_name,
-        "query.cond": condition,
-        "filter.overallStatus": ",".join(statuses),
-        "filter.studyType": "INTERVENTIONAL",
-        "fields": ",".join(EXTENDED_FIELDS),
-        "pageSize": 50,
-    }
-```
----
-## Sources
-- [ClinicalTrials.gov API Documentation](https://clinicaltrials.gov/data-api/api)
-- [CT.gov Field Definitions](https://clinicaltrials.gov/data-api/about-api/study-data-structure)
-- [RxNorm API](https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.findRxcuiByString.html)

docs/brainstorming/archive/03_EUROPEPMC_IMPROVEMENTS.md DELETED Viewed

@@ -1,211 +0,0 @@
-# Europe PMC Tool: Current State & Future Improvements
-**Status**: Currently Implemented (Replaced bioRxiv)
-**Priority**: High (Preprint + Open Access Source)
----
-## Why Europe PMC Over bioRxiv?
-### bioRxiv API Limitations (Why We Abandoned It)
-1. **No Search API**: Only returns papers by date range or DOI
-2. **No Query Capability**: Cannot search for "metformin cancer"
-3. **Workaround Required**: Would need to download ALL preprints and build local search
-4. **Known Issue**: [Gradio Issue #8861](https://github.com/gradio-app/gradio/issues/8861) documents the limitation
-### Europe PMC Advantages
-1. **Full Search API**: Boolean queries, filters, facets
-2. **Aggregates bioRxiv**: Includes bioRxiv, medRxiv content anyway
-3. **Includes PubMed**: Also has MEDLINE content
-4. **34 Preprint Servers**: Not just bioRxiv
-5. **Open Access Focus**: Full-text when available
----
-## Current Implementation
-### What We Have (`src/tools/europepmc.py`)
-- REST API search via `europepmc.org/webservices/rest/search`
-- Preprint flagging via `firstPublicationDate` heuristics
-- Returns: title, abstract, authors, DOI, source
-- Marks preprints for transparency
-### Current Limitations
-1. **No Full-Text Retrieval**: Only metadata/abstracts
-2. **No Citation Network**: Missing references/citations
-3. **No Supplementary Files**: Not fetching figures/data
-4. **Basic Preprint Detection**: Heuristic, not explicit flag
----
-## Europe PMC API Capabilities
-### Endpoints We Could Use
-| Endpoint | Purpose | Currently Using |
-|----------|---------|-----------------|
-| `/search` | Query papers | Yes |
-| `/fulltext/{ID}` | Full text (XML/JSON) | No |
-| `/{PMCID}/supplementaryFiles` | Figures, data | No |
-| `/citations/{ID}` | Who cited this | No |
-| `/references/{ID}` | What this cites | No |
-| `/annotations` | Text-mined entities | No |
-### Rich Query Syntax
-```python
-# Current simple query
-query = "metformin cancer"
-# Could use advanced syntax
-query = "(TITLE:metformin OR ABSTRACT:metformin) AND (cancer OR oncology)"
-query += " AND (SRC:PPR)"  # Only preprints
-query += " AND (FIRST_PDATE:[2023-01-01 TO 2024-12-31])"  # Date range
-query += " AND (OPEN_ACCESS:y)"  # Only open access
-```
-### Source Filters
-```python
-# Filter by source
-"SRC:MED"     # MEDLINE
-"SRC:PMC"     # PubMed Central
-"SRC:PPR"     # Preprints (bioRxiv, medRxiv, etc.)
-"SRC:AGR"     # Agricola
-"SRC:CBA"     # Chinese Biological Abstracts
-```
----
-## Recommended Improvements
-### Phase 1: Rich Metadata
-```python
-# Add to search results
-additional_fields = [
-    "citedByCount",           # Impact indicator
-    "source",                 # Explicit source (MED, PMC, PPR)
-    "isOpenAccess",           # Boolean flag
-    "fullTextUrlList",        # URLs for full text
-    "authorAffiliations",     # Institution info
-    "grantsList",             # Funding info
-]
-```
-### Phase 2: Full-Text Retrieval
-```python
-async def get_fulltext(pmcid: str) -> str | None:
-    """Get full text for open access papers."""
-    # XML format
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextXML"
-    # Or JSON
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextJSON"
-```
-### Phase 3: Citation Network
-```python
-async def get_citations(pmcid: str) -> list[str]:
-    """Get papers that cite this one."""
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/citations"
-async def get_references(pmcid: str) -> list[str]:
-    """Get papers this one cites."""
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/references"
-```
-### Phase 4: Text-Mined Annotations
-Europe PMC extracts entities automatically:
-```python
-async def get_annotations(pmcid: str) -> dict:
-    """Get text-mined entities (genes, diseases, drugs)."""
-    url = f"https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds"
-    params = {
-        "articleIds": f"PMC:{pmcid}",
-        "type": "Gene_Proteins,Diseases,Chemicals",
-        "format": "JSON",
-    }
-    # Returns structured entity mentions with positions
-```
----
-## Supplementary File Retrieval
-From reference repo (`bioinformatics_tools.py` lines 123-149):
-```python
-def get_figures(pmcid: str) -> dict[str, str]:
-    """Download figures and supplementary files."""
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles?includeInlineImage=true"
-    # Returns ZIP with images, returns base64-encoded
-```
----
-## Preprint-Specific Features
-### Identify Preprint Servers
-```python
-PREPRINT_SOURCES = {
-    "PPR": "General preprints",
-    "bioRxiv": "Biology preprints",
-    "medRxiv": "Medical preprints",
-    "chemRxiv": "Chemistry preprints",
-    "Research Square": "Multi-disciplinary",
-    "Preprints.org": "MDPI preprints",
-}
-# Check if published version exists
-async def check_published_version(preprint_doi: str) -> str | None:
-    """Check if preprint has been peer-reviewed and published."""
-    # Europe PMC links preprints to final versions
-```
----
-## Rate Limiting
-Europe PMC is more generous than NCBI:
-```python
-# No documented hard limit, but be respectful
-# Recommend: 10-20 requests/second max
-# Use email in User-Agent for polite pool
-headers = {
-    "User-Agent": "DeepBoner/1.0 (mailto:your@email.com)"
-}
-```
----
-## vs. The Lens & OpenAlex
-| Feature | Europe PMC | The Lens | OpenAlex |
-|---------|------------|----------|----------|
-| Biomedical Focus | Yes | Partial | Partial |
-| Preprints | Yes (34 servers) | Yes | Yes |
-| Full Text | PMC papers | Links | No |
-| Citations | Yes | Yes | Yes |
-| Annotations | Yes (text-mined) | No | No |
-| Rate Limits | Generous | Moderate | Very generous |
-| API Key | Optional | Required | Optional |
----
-## Sources
-- [Europe PMC REST API](https://europepmc.org/RestfulWebService)
-- [Europe PMC Annotations API](https://europepmc.org/AnnotationsApi)
-- [Europe PMC Articles API](https://europepmc.org/ArticlesApi)
-- [rOpenSci medrxivr](https://docs.ropensci.org/medrxivr/)
-- [bioRxiv TDM Resources](https://www.biorxiv.org/tdm)

docs/brainstorming/archive/BRAINSTORM_EMBEDDINGS_META.md DELETED Viewed

@@ -1,74 +0,0 @@
-# Embeddings Brainstorm - Conclusions
-**Date**: November 2025
-**Status**: CLOSED - Conclusions reached, no action needed
----
-## The Question
-Should DeepBoner implement:
-1. Internal codebase embeddings/ingestion pipeline?
-2. mGREP for internal tool selection?
-3. Self-knowledge components for agents?
-## The Answer: NO
-After research and first-principles analysis, the conclusion is clear:
-### Why Not Internal Embeddings/Ingestion
-```text
-DeepBoner's Core Task:
-┌─────────────────────────────────────────────────────────┐
-│  User Query: "Evidence for testosterone in HSDD?"       │
-│                         ↓                               │
-│  1. Search PubMed, ClinicalTrials, Europe PMC          │
-│  2. Judge: Is evidence sufficient?                      │
-│  3. Synthesize: Generate report                         │
-│                         ↓                               │
-│  Output: Research report with citations                 │
-└─────────────────────────────────────────────────────────┘
-Does ANY step require self-knowledge of codebase? NO.
-```
-### Why Not mGREP for Tool Selection
-| Approach | Complexity | Accuracy |
-|----------|------------|----------|
-| Embeddings + mGREP for tool selection | High | Medium (semantic similarity ≠ correct tool) |
-| Direct prompting with tool descriptions | Low | High (LLM reasons about applicability) |
-**No real agent system uses embeddings for tool selection.** All major frameworks (LangChain, OpenAI, Anthropic, Magentic) use prompt-based tool selection because:
-1. LLMs are already doing semantic matching internally
-2. Tool count is small (5-20) - fits easily in context
-3. Prompts allow reasoning, not just similarity
-### What We Already Have
-DeepBoner already uses embeddings for the **right thing**: research evidence retrieval.
-- `src/services/embeddings.py` - ChromaDB + sentence-transformers
-- `src/services/llamaindex_rag.py` - OpenAI embeddings for premium tier
-### The Real Priority
-Instead of internal embeddings/mGREP, focus on:
-1. **Deduplication** across PubMed/Europe PMC/OpenAlex
-2. **Outcome measures** from ClinicalTrials.gov
-3. **Citation graph traversal** via OpenAlex
-See: `TOOL_ANALYSIS_CRITICAL.md` for detailed improvement roadmap.
----
-## Research Sources
-- [SICA Paper (ICLR 2025)](https://arxiv.org/abs/2504.15228) - Self-improving agents
-- [Gödel Agent (ACL 2025)](https://arxiv.org/abs/2410.04444) - Recursive self-modification
-- [Introspection Paradox (EMNLP 2025)](https://aclanthology.org/2025.emnlp-main.352/) - Self-knowledge can hurt performance
-- [Anthropic Introspection Research](https://www.anthropic.com/research/introspection) - ~20% accuracy on genuine introspection
----
-*This document is closed. The conclusion is: don't implement internal embeddings/mGREP for this use case.*

docs/brainstorming/archive/UI_MODE_SELECTION_UX.md DELETED Viewed

@@ -1,133 +0,0 @@
-# UI/UX Brainstorm: Mode Selection & API Key Experience
-**Date**: 2025-11-28
-**Status**: IMPLEMENTED (2025-11-28)
-**Related**: Issues #52, #53, PR #58
----
-## CRITICAL FINDING: Anthropic Key is Nearly Useless
-**Code verification** (2025-11-28):
-```
-grep -r "AnthropicChatClient" src/  → NO RESULTS
-grep -r "OpenAIChatClient" src/     → 22 RESULTS (all Magentic agents)
-```
-The `agent-framework` package (Microsoft's Magentic) **ONLY** has `OpenAIChatClient`.
-There is no `AnthropicChatClient`. This means:
-| Feature | OpenAI Key | Anthropic Key |
-|---------|------------|---------------|
-| Simple mode (Judge LLM) | ✅ GPT-5.1 | ✅ Claude Sonnet 4.5 |
-| Advanced mode (Multi-agent) | ✅ Full orchestration | ❌ **DOES NOT WORK** |
-| Value proposition | Full access | Simple mode only |
-**Decision**: Keep Anthropic support for Simple mode, but ensure UX clearly differentiates capabilities.
----
-## Current State (After PR #58)
-### What Users See (Screenshot 2025-11-28)
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│ ≡ Examples                                                                   │
-├──────────────────────────────────────────────────────┬──────────────────────┤
-│                                                      │ Orchestrator Mode    │
-├──────────────────────────────────────────────────────┼──────────────────────┤
-│ What drugs improve female libido post-menopause?     │ simple               │
-│ Clinical trials for erectile dysfunction altern...   │ advanced             │
-│ Evidence for testosterone therapy in women with...   │ simple               │
-└──────────────────────────────────────────────────────┴──────────────────────┘
-┌─────────────────────────────────────────────────────────────────────────────┐
-│ ⚙️ Mode & API Key (Free tier works!)                                  [▼]   │
-├─────────────────────────────────────────────────────────────────────────────┤
-│                                                                             │
-│ Orchestrator Mode                                                           │
-│ ⚡ Simple: Fast (Free/Any Key) | 🔬 Advanced: Deep Multi-Agent (OpenAI Key Only)    │
-│ [● simple] [○ advanced]                                                     │
-│                                                                             │
-│ 🔑 API Key (Optional)                                                       │
-│ Leave empty for free tier. Auto-detects provider from key prefix.           │
-│ ┌─────────────────────────────────────────────────────────────────────────┐ │
-│ │ sk-... (OpenAI) or sk-ant-... (Anthropic)                               │ │
-│ └─────────────────────────────────────────────────────────────────────────┘ │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-### Observations from Screenshot
-1. **Examples table**: 2 columns (Query + Mode) - clean, one example now shows "advanced" ✅
-2. **One example shows "advanced"**: Improves discoverability of Advanced mode ✅
-3. **Accordion collapsed by default**: Still collapsed, but with more inviting label ✅
-4. **Placeholder mentions Anthropic**: Correct, but now clearly tied to Simple mode only via info text ✅
-5. **"Advanced: Requires OpenAI key"**: Now more prominent with emojis and clearer phrasing in info text ✅
-### The Two Modes
-| Mode | Backend | Capabilities | Requirements |
-|------|---------|--------------|--------------|
-| **Simple** | Linear orchestrator | Search → Judge → Report (single pass) | None (free tier) or any API key |
-| **Advanced** | Magentic multi-agent | SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent working together with iterative refinement | **OpenAI API key only** |
----
-## Problems Identified (Addressed)
-### P1: Advanced Mode is Hidden → ADDRESSED
-- **Fix**: One example now shows "advanced" mode.
-- **Fix**: Accordion label is more descriptive.
-### P2: Mode/Key Relationship is Unclear → ADDRESSED
-- **Fix**: `gr.Radio` info text clearly states "OpenAI Key Only" for Advanced mode, using emojis for emphasis.
-### P3: No Incentive to Try Advanced → PARTIALLY ADDRESSED
-- **Fix**: Emojis and "Deep Multi-Agent" hint at the value. Further marketing/documentation still needed for full "wow" moment.
-### P4: Anthropic Users Left Out → ADDRESSED (Clarified)
-- **Fix**: Anthropic keys still work for Simple mode, and the info text clarifies the limitation for Advanced mode.
----
-## Options to Consider (Decision Made)
-The recommendation of **Modified Option A (Better Education + Examples)** with slight modification to accordion label was implemented.
----
-## Implementation Notes (Completed)
-```python
-# From src/app.py
-examples=[
-    ["What drugs improve female libido post-menopause?", "simple"],
-    ["Clinical trials for erectile dysfunction alternatives to PDE5 inhibitors?", "advanced"],  # Changed
-    ["Evidence for testosterone therapy in women with HSDD?", "simple"],
-],
-additional_inputs_accordion=gr.Accordion(
-    label="⚙️ Mode & API Key (Free tier works!)", # Changed
-    open=False
-),
-gr.Radio(
-    choices=["simple", "advanced"],
-    value="simple",
-    label="Orchestrator Mode",
-    info=( # Changed
-        "⚡ Simple: Fast (Free/Any Key) | "
-        "🔬 Advanced: Deep Multi-Agent (OpenAI Key Only)"
-    ),
-),
-```
----
-## Decision Log
-| Date | Decision | Rationale |
-|------|----------|-----------|
-| 2025-11-28 | Implemented Modified Option A | Minimal changes, high impact on discoverability, graceful fallback, user-approved accordion label. |