Gengram-10B

This repository hosts the model weights for Gengram-10B. For instructions and details, please refer to the Gengram GitHub.

Gengram is a novel conditional memory module designed for genomic foundation models (GFMs) that introduces explicit motif memory retrieval to enhance Transformer-based DNA sequence modeling. Unlike traditional GFMs that rely on dense computation to implicitly infer multi-nucleotide motifs, Gengram provides an efficient lookup mechanism for biological patterns through a genomic-specific hashing scheme.

✨ Key Features

🎯 Explicit Motif Memory: Stores and retrieves k-mers (k=1-6) via hash-based lookup tables
🧬 Local Window Aggregation: 21bp window mechanism aligned with DNA helical structure
⚡ Computational Efficiency: Linear time complexity with minimal overhead
🔧 Architecture Agnostic: Compatible with various attention mechanisms (MHA, GQA, MLA)
⚖️ Stable Training: Improves load balancing in Mixture-of-Experts models
🔍 Biological Interpretability: Learns meaningful motif representations

✨ Biological Interpretability

Reverse-complement symmetry in memory embeddings
Context-dependent gating aligned with functional regions
Hierarchical representation from shallow to deep layers

For full documentation, training details, and usage instructions, please visit the GitHub repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ZhejiangLab/Gengram

Genos

Collection

Foundation models for the human genome • 8 items • Updated about 6 hours ago