codelion/dhara-70m
Text Generation
•
71.3M
•
Updated
•
3.18k
•
22
Diffusion Language Models combining deep narrow networks, Canon layers (depthwise causal convolutions), and WSD (Warmup-Stable-Decay) training.