3 22 179

Wang Weiyi PRO

kaupane

Mtrya

AI & ML interests

None yet

Recent Activity

liked a dataset about 4 hours ago

Qwen/DeepPlanning

liked a dataset about 13 hours ago

AstroMLab/USAAAO_QA

liked a dataset 1 day ago

peteromallet/dataclaw-peteromallet

View all activity

Organizations

None yet

liked a dataset about 4 hours ago

Qwen/DeepPlanning

Viewer • Updated about 7 hours ago • 2.14k • 1.15k • 187

liked a dataset about 13 hours ago

AstroMLab/USAAAO_QA

Viewer • Updated about 24 hours ago • 12 • 4 • 1

liked a dataset 1 day ago

peteromallet/dataclaw-peteromallet

Viewer • Updated 2 days ago • 549 • 1.56k • 214

updated a model 1 day ago

kaupane/ArtFlow

Updated 1 day ago

reacted to robtacconelli's post with 🚀 2 days ago

Post

3553

🏆 Nacrith: a 135M model that out-compresses everything on natural language

What if a tiny LM could compress english text better than _every_ compressor out there — classical or neural, small or large?

Nacrith pairs SmolLM2-135M with an ensemble of online predictors and high-precision arithmetic coding.

What's inside

The standard LLM+arithmetic coding approach wastes ~75% of CDF precision on large vocabularies. Our CDF-24 fix alone recovers 0.5 bpb. On top: a token N-gram that skips the GPU on predictable tokens, an adaptive bias head, llama.cpp backend (7× faster than PyTorch), multi-GPU parallel compression, and a binary file format (NC06) — the first LLM-based binary compressor we know of.

Runs on a GTX 1050 Ti. ~500 MB weights, ~1.2 GB VRAM per worker.

💻 Code: https://github.com/robtacconelli/Nacrith-GPU
⭐ Space: robtacconelli/Nacrith-GPU
📄 Paper: Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding (2602.19626)

Try it, break it, share your results — all feedback welcome. ⭐ on the repo appreciated!

Results across all systems we tested:
- alice29.txt → 0.918 bpb (−44% vs CMIX, −20% vs ts_zip) — below the 2nd-order Shannon entropy bound
- enwik8 (100 MB) → 0.9389 bpb (−8% vs FineZip/LLMZip's 8B model, −15% vs ts_zip)
- Unseen text → 0.723 bpb on a doc published after training cutoff — no memorization, 26% better than FineZip/LLMZip on the same model

SmolLM2-135M by

HuggingFaceTB

1 reply

upvoted an article 3 days ago

Article

Training strategies of Z-Image-Turbo

Dec 16, 2025

•

liked a Space 3 days ago

Draw To Search Art

🐠

Draw/upload image and search among WikiART using SigLIP

updated a Space 4 days ago

ArtFlow

👁

Generate artistic images from text prompts

updated a dataset 5 days ago

kaupane/nano-banana-pro-gen

Viewer • Updated 5 days ago • 1.25k • 153 • 6

published a Space 9 days ago

ArtFlow

👁

Generate artistic images from text prompts

published a model 9 days ago

kaupane/ArtFlow

Updated 1 day ago

liked a dataset 11 days ago

codyshen/portrait_dataset

Viewer • Updated Dec 22, 2025 • 3.1k • 207 • 1

liked 2 models 13 days ago

MiniMaxAI/MiniMax-M2.5

Text Generation • 229B • Updated 11 days ago • 294k • • 963

zai-org/GLM-5

Text Generation • 754B • Updated 14 days ago • 189k • • 1.63k

reacted to mrs83's post with 🔥 13 days ago

Post

2335

In 2017, my RNNs were babbling. Today, they are hallucinating beautifully.

10 years ago, getting an LSTM to output coherent English was a struggle.
10 years later, after a "cure" based on FineWeb-EDU and a custom synthetic mix for causal conversation, the results are fascinating.

We trained this on ~10B tokens on a single AMD GPU (ROCm). It is not a Transformer: Echo-DSRN (400M) is a novel recurrent architecture inspired by Hymba, RWKV, and xLSTM, designed to challenge the "Attention is All You Need" monopoly on the Edge.

The ambitious goal is to build a small instruct model with RAG and tool usage capabilities ( ethicalabs/Kurtis-EON1)

📊 The Benchmarks (Size: 400M)

For a model this size (trained on <10B tokens), the specialized performance is surprising:

*SciQ*: 73.8% 🦄 (This rivals billion-parameter models in pure fact retrieval).
*PIQA*: 62.3% (Solid physical intuition for a sub-1B model).

The Reality Check:

HellaSwag (29.3%) and Winogrande (50.2%) show the limits of 400M parameters and 10B tokens training.

We are hitting the "Reasoning Wall" which confirms we need to scale to (hopefully) unlock deeper common sense. As you can see in the visualization (to be released soon on HF), the FineWeb-EDU bias is strong. The model is convinced it is in a classroom ("In this course, we explore...").

The Instruct Model is not ready yet and we are currently using curriculum learning to test model plasticity.

Source code and weights will not be released yet. This is not a fork or a fine-tune: the base model is built in-house at https://www.ethicalabs.ai/, with novel components that do not exist in current open libraries.

🤝 Call for Collaboration: I am looking for Peer Reviewers interested in recurrent/hybrid architectures. If you want to explore what lies beyond Transformers, let’s connect!

Training diary: ethicalabs/Kurtis-EON1