VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation Paper • 2511.17199 • Published 17 days ago • 7
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments Paper • 2508.08791 • Published Aug 12 • 16
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2 • 69
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models Paper • 2506.18369 • Published Jun 23 • 2
Seedance 1.0: Exploring the Boundaries of Video Generation Models Paper • 2506.09113 • Published Jun 10 • 104
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 187
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator Paper • 2411.15466 • Published Nov 23, 2024 • 39
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper • 2411.15098 • Published Nov 22, 2024 • 61
Style-Friendly SNR Sampler for Style-Driven Generation Paper • 2411.14793 • Published Nov 22, 2024 • 39
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 130
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models Paper • 2410.11081 • Published Oct 14, 2024 • 18
Guiding a Diffusion Model with a Bad Version of Itself Paper • 2406.02507 • Published Jun 4, 2024 • 17