- Compression for AGI 2023.02
- Language Modeling Is Compression 2023.09
- Training Compute-Optimal Large Language Models 2022.03
- The Platonic Representation Hypothesis 2024.05
- Learning to Reason with LLMs 2024.09.12
- Parables on the Power of Planning in AI: From Poker to Diplomacy 2024.09.18
- Don't teach. Incentivize. 2024.09.20
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025.01
- Kimi k1.5: Scaling Reinforcement Learning with LLMs 2025.01
- Attention Is All You Need 2017.06
- Improving Language Understanding by Generative Pre-Training 2018.06
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018.10
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models 2024.01
- Scaling Instruction-Finetuned Language Models 2022.10
- Reflexion: Language Agents with Verbal Reinforcement Learning 2023.03
- WizardLM: Empowering Large Language Models to Follow Complex Instructions 2023.04
- LIMA: Less Is More for Alignment 2023.05
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model 2023.05
- Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models 2023.05
- Preference Ranking Optimization for Human Alignment 2023.06
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4 2023.06
- Self-Alignment with Instruction Backtranslation 2023.08
- Self-Rewarding Language Models 2024.01
- From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification 2024.03
- From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models 2024.04
- Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models 2024.04
- The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions 2024.04
- Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models 2024.06
- Inverse Constitutional AI: Compressing Preferences into Principles 2024.06
- Following Length Constraints in Instructions 2024.06
- LIMO: Less is More for Reasoning 2025.02
- Self-critiquing models for assisting human evaluators 2022.06
- Weak-to-strong generalization 2023.12
- Prover-Verifier Games improve legibility of LLM outputs 2024.07
- Larger language models do in-context learning differently 2023.03
- Many-Shot In-Context Learning 2024.04
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models 2022.01
- Let’s Verify Step by Step 2023.05
- Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks 2023.05
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models 2023.05
- Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations 2023.12
- Solving olympiad geometry without human demonstrations 2024.01
- Large Language Models Can Learn Temporal Reasoning 2024.01
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 2024.02
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking 2024.03
- Llama 2: Open Foundation and Fine-Tuned Chat Models 2023.07
- Gemini 1.0 2023.12
- Gemini 1.5 2024.02
- The Llama 3 Herd of Models 2024.07
- DeepSeek-V3 Technical Report 2024.12
- Distinguishing three alignment taxes 2022.12
- State of GPT 2023.05
- An Observation on Generalization 2023.08
- An Initial Exploration of Theoretical Support for Language Model Data Engineering. Part 1: Pretraining 2023.09
- Some intuitions about large language models 2023.11
- MiniCPM:揭示端侧大语言模型的无限潜力 2024.04
- Llama 3 Opens the Second Chapter of the Game of Scale 2024.04
- Successful language model evals 2024.05
- OpenAI Model Spec 2024.05
- Claude’s Character 2024.06
- AI achieves silver-medal standard solving International Mathematical Olympiad problems 2024.07
- Three hypotheses on LLM reasoning 2024.12
- Scaling Paradigms for Large Language Models 2025.01
- Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process 2024.07
- Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems 2024.08
- Physics of Language Models: Part 3.1, Knowledge Storage and Extraction 2023.09
- Physics of Language Models: Part 3.2, Knowledge Manipulation 2023.09
- Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws 2024.04
- Killed by LLM
- Challenging BIG-Bench tasks and whether chain-of-thought can solve them 2022.10
- COLLIE: Systematic Construction of Constrained Text Generation Tasks 2023.07
- FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation 2023.10
- FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models 2023.10
- Instruction-Following Evaluation for Large Language Models 2023.11
- GAIA: a benchmark for General AI Assistants 2023.11
- Beyond Instruction Following: Evaluating Rule Following of Large Language Models 2024.07
- Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning 2024.06
- Introducing SimpleQA 2024.10, Measuring short-form factuality in large language models 2024.11
- Humanity's Last Exam 2025.01