24 10

Jialiang Cheng

Julius-L

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 hour ago

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models

authored a paper 15 days ago

View all activity

Organizations

upvoted 2 papers about 1 hour ago

SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models

Paper • 2602.07616 • Published 18 days ago • 2

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models

Paper • 2412.07210 • Published Dec 10, 2024 • 1

upvoted an article 9 months ago

Article

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

Feb 4, 2025

•

upvoted a collection 11 months ago

🧠 Reasoning datasets

Collection

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 183

upvoted a collection about 1 year ago

Deepseek Papers

Collection

Deepseek papers collection • 29 items • Updated 2 days ago • 324

upvoted a paper about 1 year ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10, 2025 • 53

upvoted 14 papers over 1 year ago

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

Paper • 2410.20650 • Published Oct 28, 2024 • 17

A Survey of Small Language Models

Paper • 2410.20011 • Published Oct 25, 2024 • 46

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

Paper • 2410.19313 • Published Oct 25, 2024 • 19

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

Paper • 2408.07666 • Published Aug 14, 2024 • 3

Memory-Efficient LLM Training with Online Subspace Descent

Paper • 2408.12857 • Published Aug 23, 2024 • 15

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

Paper • 2409.12903 • Published Sep 19, 2024 • 22

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 180

What Matters for Model Merging at Scale?

Paper • 2410.03617 • Published Oct 4, 2024 • 9

Jialiang Cheng

AI & ML interests

Recent Activity

Organizations

Julius-L's activity

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons