Activity Feed

AI & ML interests

None defined yet.

Recent Activity

alvarobartt 
posted an update 6 days ago
view post
Post
2790
💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.
  • 1 reply
·
codelion 
posted an update 11 days ago
view post
Post
3031
Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models

I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.

Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.

The article covers:

- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens

Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop

Try the model: codelion/malm-165m

Code: https://github.com/codelion/hash-hop
  • 1 reply
·
mlabonne 
posted an update 28 days ago
codelion 
posted an update about 1 month ago
view post
Post
6074
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!

Key findings from our research on optimal architectures for small language models:

→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning

We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.

Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m
  • 1 reply
·
codelion 
posted an update about 2 months ago
view post
Post
2403
Introducing PTS Visualizer - an interactive tool for exploring how language models reason!

Visualize pivotal tokens, thought anchors, and reasoning circuits. See which tokens and sentences significantly impact success probability, explore embedding clusters, and trace reasoning step-by-step.

Try it: codelion/pts-visualizer

Explore PTS datasets:
- Qwen3-0.6B: codelion/Qwen3-0.6B-pts
- DeepSeek-R1: codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts

Or upload your own JSONL files!

GitHub: https://github.com/codelion/pts
anakin87 
posted an update about 2 months ago
view post
Post
311
💭 Do thinking traces make Language Models learn better? Curious what others think

𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼
You take an instruction-following LM.
You want to train it with a GRPO-style RL algorithm on a task like Tic Tac Toe.
Rewards are outcome-based, applied only at the end of each episode: win/loss/draw, format adherence...

During training, the model could just output answers, but a common choice is to make it also output thinking traces.

𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻
Does forcing the model to produce thinking traces during training actually improve learning❓

💬 I'd like to hear your thoughts. Share ideas and links to relevant papers and resources.

From what I've understood so far, the answer seems to be 𝘆𝗲𝘀.

1️⃣ If you force the model to think during training, it becomes a model that thinks at inference time. It naturally allocates more budget (tokens) to a problem, which tends to improve performance.

2️⃣ While the model's "reasoning" already exists in its activation space, using explicit thinking traces as a scratchpad allows training to steer and shape that reasoning.

3️⃣ As the model produces more traces during training, the RL algorithm can progressively give higher rewards to the reasoning patterns that lead to better outcomes.
codelion 
posted an update about 2 months ago
view post
Post
2596
Recently, Essential AI released a new 8B base model EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning -

"In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. "

This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training -
https://huggingface.co/blog/codelion/optimal-dataset-mixing
codelion 
posted an update about 2 months ago
codelion 
posted an update 2 months ago
view post
Post
2319
Perplexity released a dataset (BrowseSafe) and benchmark to catch and prevent malicious prompt-injection instructions in real-time.

We trained a prompt injection classifier on BrowseSafe using adaptive-classifier with ModernBERT-base embeddings.

74.9% F1 on detecting prompt injection in web content.

Model -> adaptive-classifier/browsesafe
Dataset -> perplexity-ai/browsesafe-bench
Repo -> https://github.com/codelion/adaptive-classifier
  • 1 reply
·
codelion 
posted an update 2 months ago
view post
Post
1612
I just published Ellora - 6 production-ready LoRA recipes for enhancing LLMs with specific capabilities. Each recipe costs under $100 to run and includes complete training code, data generation, and evaluation.

The 6 Recipes:
Recipe 1: Accuracy Recovery - Recover 75% of quantization losses with self-distillation
Recipe 2: Reasoning LoRA - Add structured thinking with GRPO (0% to 60% adoption, 75% quality boost)
Recipe 3: Tool Calling - Real execution on actual codebases
Recipe 4: Context Extension - Scale from 32K to 2M tokens (61x increase)
Recipe 5: Secure Code Generation - 97% vulnerability reduction using automated Semgrep analysis
Recipe 6: Execution-Aware World Models - Teaching models runtime behavior

Why Recipes?
Ellora provides methodologies, not frameworks. Use them with your existing tools (PEFT, LoRAX, vLLM, Unsloth, HuggingFace). Each recipe uses self-supervised data generation (Magpie approach) - no expensive human labeling required.

All recipes include Jupyter notebooks you can run immediately with clear success metrics.

GitHub: https://github.com/codelion/ellora
Full Article: https://huggingface.co/blog/codelion/ellora-lora-recipes

Built something with these recipes? I'd love to see what you create!
anakin87 
posted an update 2 months ago
nroggendorff 
posted an update 2 months ago
view post
Post
2927
I am now being charged for paused and unstarted spaces out of the blue.
I think this is it, folks. o7


The unstarted spaces I can get behind. I would've appreciated a warning email first, but whatever. However, every time I restart the active usage goes up, despite all of my spaces being moved to CPU (free), and being paused.
·
codelion 
posted an update 2 months ago
view post
Post
1996
Introducing OpenEvolve Prompt Optimizer - a Space that automatically evolves and optimizes your prompts using OpenEvolve!

This tool uses OpenEvolve to iteratively improve prompts by testing them on real datasets and evolving better versions. No more manual prompt engineering guesswork - let OpenEvolve find the optimal prompts for you.

How it works:
- Enter your initial prompt using {input} as a placeholder for dataset inputs
- Input any HuggingFace dataset name you want to use for optimization
- Specify the dataset split and field names for your use case
- Click Optimize Prompt and the system will validate everything first
- Compare your initial prompt vs the evolved best prompt side-by-side

Try it here: algorithmicsuperintelligence/prompt-optimizer

OpenEvolve GitHub: https://github.com/algorithmicsuperintelligence/openevolve
codelion 
posted an update 3 months ago
view post
Post
2480
🎯 Introducing Chayan: A Calibrated 4-Model LLM Router Achieving 69% Accuracy on RouterArena

We're excited to share Chayan, a cost-efficient LLM router that intelligently routes queries between 4 models to maximize accuracy while minimizing cost. Chayan just submitted to the RouterArena leaderboard and achieved 69.05% accuracy on the benchmark!

🔗 Model: adaptive-classifier/chayan
🔗 Dataset: RouteWorks/RouterArena

📊 Performance Highlights

Chayan achieves impressive results on the RouterArena benchmark:
• 69.05% accuracy (would rank #1 on current leaderboard)
• $0.333 per 1K queries
• +12.07pp improvement over all-mini baseline (56.98%)
• 99% of perfect 2-model oracle performance at 57% lower cost

Compared to our previous 2-model router (61.43% accuracy), Chayan delivers +7.62pp improvement through smarter 4-model routing.

🧠 How It Works

Chayan uses an Adaptive K-NN classifier with prototype memory to route between 4 models:
• openai/gpt-4o-mini (fast & cheap)
• google/gemini-2.5-flash-lite (balanced)
• google/gemini-2.5-flash (capable)
• openai/gpt-4o (most powerful)

🚀 Getting Started

You can use Chayan directly from HuggingFace:

from adaptive_classifier import AdaptiveClassifier

Load Chayan
router = AdaptiveClassifier.load("adaptive-classifier/chayan")

Route a query
query = "What is the capital of France?"
predictions = router.predict(query, k=4)

Get top model recommendation
best_model = predictions[0][0]
print(f"Recommended model: {best_model}")

Built with the adaptive-classifier library: https://github.com/codelion/adaptive-classifier
nroggendorff 
posted an update 3 months ago
view post
Post
2382
Developing with ZeroGPU without a PRO account is painful. They give you so many requests at once, but then have like a 24 hour cooldown. I vote less requests in a batch, but then a shorter cooldown.


or just less of a cooldown, but i understand if that is not allowed
  • 3 replies
·