Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models Paper • 2602.08658 • Published 2 days ago • 12
Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models Paper • 2602.08658 • Published 2 days ago • 12
Context Compression via Explicit Information Transmission Paper • 2602.03784 • Published 8 days ago • 14
Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models Paper • 2602.08658 • Published 2 days ago • 12
Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation Paper • 2602.02007 • Published 9 days ago • 12
No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding Paper • 2602.03709 • Published 8 days ago • 8
No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding Paper • 2602.03709 • Published 8 days ago • 8
No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding Paper • 2602.03709 • Published 8 days ago • 8
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models Paper • 2512.24618 • Published Dec 31, 2025 • 150
An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift Paper • 2601.05882 • Published Jan 9 • 21
An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift Paper • 2601.05882 • Published Jan 9 • 21
An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift Paper • 2601.05882 • Published Jan 9 • 21
Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks Paper • 2601.03448 • Published Jan 6 • 13
Olmo 3 Pre-training Collection All artifacts related to Olmo 3 pre-training • 10 items • Updated Dec 23, 2025 • 33
Olmo 3 Post-training Collection All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated Dec 23, 2025 • 50
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 27
Deconstructing Attention: Investigating Design Principles for Effective Language Modeling Paper • 2510.11602 • Published Oct 13, 2025 • 15 • 2