Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeFaster logconcave sampling from a cold start in high dimension
We present a faster algorithm to generate a warm start for sampling an arbitrary logconcave density specified by an evaluation oracle, leading to the first sub-cubic sampling algorithms for inputs in (near-)isotropic position. A long line of prior work incurred a warm-start penalty of at least linear in the dimension, hitting a cubic barrier, even for the special case of uniform sampling from convex bodies. Our improvement relies on two key ingredients of independent interest. (1) We show how to sample given a warm start in weaker notions of distance, in particular q-R\'enyi divergence for q=mathcal{O}(1), whereas previous analyses required stringent infty-R\'enyi divergence (with the exception of Hit-and-Run, whose known mixing time is higher). This marks the first improvement in the required warmness since Lov\'asz and Simonovits (1991). (2) We refine and generalize the log-Sobolev inequality of Lee and Vempala (2018), originally established for isotropic logconcave distributions in terms of the diameter of the support, to logconcave distributions in terms of a geometric average of the support diameter and the largest eigenvalue of the covariance matrix.
Cold-Start Active Preference Learning in Socio-Economic Domains
Active preference learning offers an efficient approach to modeling preferences, but it is hindered by the cold-start problem, which leads to a marked decline in performance when no initial labeled data are available. While cold-start solutions have been proposed for domains such as vision and text, the cold-start problem in active preference learning remains largely unexplored, underscoring the need for practical, effective methods. Drawing inspiration from established practices in social and economic research, the proposed method initiates learning with a self-supervised phase that employs Principal Component Analysis (PCA) to generate initial pseudo-labels. This process produces a warmed-up model based solely on the data's intrinsic structure, without requiring expert input. The model is then refined through an active learning loop that strategically queries a simulated noisy oracle for labels. Experiments conducted on various socio-economic datasets, including those related to financial credibility, career success rate, and socio-economic status, consistently show that the PCA-driven approach outperforms standard active learning strategies that start without prior information. This work thus provides a computationally efficient and straightforward solution that effectively addresses the cold-start problem.
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduce ProverBench, a collection of 325 formalized problems, to enrich our evaluation, including 15 selected problems from the recent AIME competitions (years 24-25). Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them. In comparison, DeepSeek-V3 solves 8 of these problems using majority voting, highlighting that the gap between formal and informal mathematical reasoning in large language models is substantially narrowing.
KAT-V1: Kwai-AutoThink Technical Report
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks, where an automatic thinking training paradigm is proposed to dynamically switch between reasoning and non-reasoning modes based on task complexity. Specifically, first, we construct the dual-regime dataset based on a novel tagging pipeline and a multi-agent synthesis strategy, and then we apply Multi-Token Prediction (MTP)-enhanced knowledge distillation, enabling efficient and fine-grained reasoning transfer with minimal pretraining cost. Besides, we implement a cold-start initialization strategy that introduces mode-selection priors using majority-vote signals and intent-aware prompting. Finally, we propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework, offering structured guidance over both reasoning-mode selection and response accuracy. Extensive experiments across multiple benchmarks demonstrate that KAT consistently matches or even outperforms current state-of-the-art models, including DeepSeek-R1-0528 and Qwen3-235B-A22B, across a wide range of reasoning-intensive tasks while reducing token usage by up to approximately 30\%. Beyond academic evaluation, KAT has been successfully deployed in Kwaipilot (i.e., Kuaishou's internal coding assistant), and improves real-world development workflows with high accuracy, efficiency, and controllable reasoning behaviors. Moreover, we are actively training a 200B Mixture-of-Experts (MoE) with 40B activation parameters, where the early-stage results already demonstrate promising improvements in performance and efficiency, further showing the scalability of the AutoThink paradigm.
Serverless Cold Starts and Where to Find Them
This paper releases and analyzes a month-long trace of 85 billion user requests and 11.9 million cold starts from Huawei's serverless cloud platform. Our analysis spans workloads from five data centers. We focus on cold starts and provide a comprehensive examination of the underlying factors influencing the number and duration of cold starts. These factors include trigger types, request synchronicity, runtime languages, and function resource allocations. We investigate components of cold starts, including pod allocation time, code and dependency deployment time, and scheduling delays, and examine their relationships with runtime languages, trigger types, and resource allocation. We introduce pod utility ratio to measure the pod's useful lifetime relative to its cold start time, giving a more complete picture of cold starts, and see that some pods with long cold start times have longer useful lifetimes. Our findings reveal the complexity and multifaceted origins of the number, duration, and characteristics of cold starts, driven by differences in trigger types, runtime languages, and function resource allocations. For example, cold starts in Region 1 take up to 7 seconds, dominated by dependency deployment time and scheduling. In Region 2, cold starts take up to 3 seconds and are dominated by pod allocation time. Based on this, we identify opportunities to reduce the number and duration of cold starts using strategies for multi-region scheduling. Finally, we suggest directions for future research to address these challenges and enhance the performance of serverless cloud platforms. Our datasets and code are available here https://github.com/sir-lab/data-release
Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning
Recently, Multimodal Large Language Models (MLLMs) excel at visual grounding in single-image scenarios with textual references. However, their performance degrades when handling real-world applications that involve complex multi-image compositions and multi-modal instructions, revealing limitations in cross-image reasoning and generalization. To address these challenges, we adopt a Reinforcement Learning (RL) based post-training strategy to improve the reasoning of MLLMs in multi-image grounding tasks. Our approach begins with synthesizing high-quality chain-of-thought (CoT) data for cold-start initialization, followed by supervised fine-tuning (SFT) using low-rank adaptation (LoRA). The cold-start training stage enables the model to identify correct solutions. Subsequently, we perform rejection sampling using the merged SFT model to curate high-quality RL data and leverage rule-based RL to guide the model toward optimal reasoning paths. Extensive experimental results demonstrate the effectiveness of our approach, yielding improvements of +9.04% on MIG-Bench, +6.37% on MC-Bench, and +4.98% on several out-of-domain reasoning grounding benchmarks compared to the SFT baseline. Furthermore, our method exhibits strong generalization in multi-image perception, with gains of +3.1% and +2.4% over the base model on BLINK and MMIU benchmarks, respectively.
