43 16 39

instruction-pretrain

https://huggingface.co/papers/2406.14491

DaixuanC45443

AI & ML interests

Synthetic Instructions for Pre-Training

Recent Activity

updated a dataset 3 days ago

instruction-pretrain/general-instruction-augmented-corpora

updated a dataset 3 days ago

instruction-pretrain/medicine-instruction-augmented-corpora

updated a model 3 days ago

instruction-pretrain/instruction-synthesizer

View all activity

Organizations

None yet

updated 2 datasets 3 days ago

instruction-pretrain/general-instruction-augmented-corpora

Preview • Updated 3 days ago • 33.4k • 20

instruction-pretrain/medicine-instruction-augmented-corpora

Preview • Updated 3 days ago • 105 • 13

updated 5 models 3 days ago

updated a dataset 3 days ago

instruction-pretrain/ft-instruction-synthesizer-collection

Viewer • Updated 3 days ago • 249k • 255 • 63

New activity in instruction-pretrain/general-instruction-augmented-corpora 3 days ago

Cannot download ALL data files

#11 opened over 1 year ago by

amezasor

upvoted a paper about 1 month ago

LLM-in-Sandbox Elicits General Agentic Intelligence

Paper • 2601.16206 • Published Jan 22 • 84

upvoted a paper 5 months ago

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 116

upvoted a paper 8 months ago

Reasoning with Exploration: An Entropy Perspective

Paper • 2506.14758 • Published Jun 17, 2025 • 31

New activity in instruction-pretrain/finance-Llama3-8B 9 months ago

How large is the corpus size used for pretraining the finance LLaMA?

#2 opened over 1 year ago by

dhkong

updated a dataset 12 months ago

AdaptLLM/food-visual-instructions

Viewer • Updated Aug 21, 2025 • 301k • 63 • 3

liked 2 datasets about 1 year ago

tttx/r1-arcagi-successful-trajectories

Viewer • Updated Feb 2, 2025 • 1.46k • 4 • 3

INK-USC/riddle_sense

Updated Jan 18, 2024 • 201 • 26

upvoted a paper about 1 year ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 96

New activity in instruction-pretrain/ft-instruction-synthesizer-collection about 1 year ago

Query about the size of dataset

#4 opened over 1 year ago by

Applauz

New activity in instruction-pretrain/instruction-synthesizer about 1 year ago

Continued pre-training with replay?

#4 opened about 1 year ago by

ostapeno

upvoted a paper about 1 year ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53

instruction-pretrain

AI & ML interests

Recent Activity

Organizations