Gengram-10B

This repository hosts the model weights for Gengram-10B. For instructions and details, please refer to the Gengram GitHub.

Gengram is a novel conditional memory module designed for genomic foundation models (GFMs) that introduces explicit motif memory retrieval to enhance Transformer-based DNA sequence modeling. Unlike traditional GFMs that rely on dense computation to implicitly infer multi-nucleotide motifs, Gengram provides an efficient lookup mechanism for biological patterns through a genomic-specific hashing scheme.

✨ Key Features

  • 🎯 Explicit Motif Memory: Stores and retrieves k-mers (k=1-6) via hash-based lookup tables
  • 🧬 Local Window Aggregation: 21bp window mechanism aligned with DNA helical structure
  • ⚡ Computational Efficiency: Linear time complexity with minimal overhead
  • 🔧 Architecture Agnostic: Compatible with various attention mechanisms (MHA, GQA, MLA)
  • ⚖️ Stable Training: Improves load balancing in Mixture-of-Experts models
  • 🔍 Biological Interpretability: Learns meaningful motif representations

✨ Biological Interpretability

  • Reverse-complement symmetry in memory embeddings
  • Context-dependent gating aligned with functional regions
  • Hierarchical representation from shallow to deep layers

For full documentation, training details, and usage instructions, please visit the GitHub repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ZhejiangLab/Gengram