Selected Publications

The full listing can also be found on my Google Scholar Profile. But here you can find handy links to related material like blog posts and presentation slides.

2023

Let’s verify step by step
Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
Paper (arXiv)

Scaling laws for single-agent reinforcement learning
Jacob Hilton, Jie Tang, John Schulman
Paper (arXiv)

2022

Scaling laws for reward model overoptimization
Leo Gao, John Schulman, Jacob Hilton
Paper (arXiv) / ICML talk with discussion of this paper

2021

WebGPT: Browser-assistant question answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman
Paper (arXiv) / Blog

Batch size-invariance for policy optimization
Jacob Hilton, Karl Cobbe, John Schulman
Paper (arXiv)

Training verifiers to solve math word problems (Grade school math dataset)
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman
Paper (arXiv) / Code+Data / Blog post

Unsolved problems in ML Safety
Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt
Paper (arXiv)

2020

Phasic Policy Gradient
Karl Coobbe, Jacob Hilton, Oleg Klimov, John Schulman
Paper (arXiv)

2019

Leveraging Procedural Generation to Benchmark Reinforcement Learning
Karl Cobbe, Chris Hesse, Jacob Hilton, John Schulman
Paper (arXiv) / Blog post

Semi-supervised Learning by Label Gradient Alignment
Jacob Jackson, John Schulman
Paper (arXiv)

Quantifying Generalization in Reinforcement Learning
Karl Kobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, John Schulman
Paper (arXiv) / Blog post

2018

Gotta Learn Fast: A New Benchmark for Generalization in RL
Alex Nichol, Vicki Pfau, Christopher Hesse, Oleg Klimov, John Schulman
Paper (arXiv) / Blog post

On First-Order Meta-Learning Algorithms
Alex Nichol, Joshua Achiam, John Schulman
Paper (arXiv) / Blog post

2017

Meta Learning Shared Hierarchies
Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman
Paper (arXiv) / Blog post

Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
Paper (arXiv) / Blog post

Teacher-Student Curriculum Learning
Tambet Matiisen, Avital Oliver, Taco Cohen, John Schulman
Paper (arXiv)

UCB Exploration via Q-Ensembles
Richard Chen, Szymon Sidor, Pieter Abbeel, John Schulman
Paper (arXiv)

Equivalence Between Policy Gradients and Soft Q-Learning
John Schulman, Xi Chen, Pieter Abbeel
Paper (arXiv)

2016 and earlier

Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs
PhD Dissertation, 2016
Paper (PDF)

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning
Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever,
Pieter Abbeel
Paper (arXiv)

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
Paper (arXiv)

Variational Lossy Autoencoder
Xi Chen, Diederik Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel
Paper (arXiv)

Concrete Problems in AI Safety
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané
Paper (arXiv) / Blog post

OpenAI Gym
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba
Paper (arXiv)

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
Neural Information Processing Systems (NIPS), 2016
Paper (arXiv)

Variational Information Maximizing Exploration
Rein Houthooft, Xi Chen, Yan Duan, Filip De Turck, John Schulman, Pieter Abbeel.
Neural Information Processing Systems (NIPS), 2016
Paper (arXiv)

Benchmarking Deep Reinforcement Learning for Continuous Control
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel.
International Conference of Machine Learning (ICML), 2016
Paper (arXiv)

High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, Pieter Abbeel
International Conference of Learning Representations (ICLR), 2016
Paper (arXiv) / Video

Spike Sorting for Large, Dense Electrode Arrays
Cyrille Rossant, Shabnam Kadir, Dan F. M. Goodman, John Schulman,
Mariano Belluscio, Gyorgy Buzsaki, Kenneth D. Harris
Nature Neuroscience, 2016
Paper

Gradient Estimation Using Stochastic Computation Graphs
John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel
Neural Information Processing System (NIPS), 2015
Paper (arXiv)

Trust Region Policy Optimization
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
International Conference on Machine Learning (ICML), 2015
Paper (arXiv) / Videos

Scaling up Gaussian Belief Space Planning Through Covariance-Free Trajectory Optimization and Automatic Differentiation
Sachin Patil, Greg Kahn, Michael Laskey, John Schulman, Ken Goldberg, Pieter Abbeel.
Workshop on Algorithm Foundations of Robotics (WAFR), 2014
Paper

Motion Planning with Sequential Convex Optimization and Convex Collision Checking
John Schulman, Yan Duan, Jonathan Ho, Alex Lee, Ibrahim Awwal, Henry Bradlow, Jia Pan, Sachin Patil, Ken Goldberg, Pieter Abbeel.
International Journal of Robotics Research (IJRR), 2014
Paper

Planning Locally Optimal, Curvature-Constrained Trajectories in 3D Using Sequential Convex Optimization
Yan Duan, Sachin Patil, John Schulman, Ken Goldberg, Pieter Abbeel.
International Conference on Robotics and Automation (ICRA), 2014
Paper

Generalization in Robotic Manipulation Through the Use of Non-Rigid Registration
John Schulman, Jonathan Ho, Cameron Lee, and Pieter Abbeel
International Symposium on Robotics Research (ISRR), 2013
Paper / Videos

A Case Study of Trajectory Transfer Through Non-Rigid Registration for a Simplified Suturing Scenario
John Schulman, Ankush Gupta, Sibi Venkatesan, Mallory Tayson-Frederick, Pieter Abbeel
International Conference on Intelligent Robots and Systems (IROS), 2013
Paper / Videos

Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization
John Schulman, Jonathan Ho, Alex Lee, Ibrahim Awwal, Henry Bradlow, Pieter Abbeel
Robotics: Science and Systems (RSS), 2013
Paper / Documentation / Github / Videos / Slides (With & Without Notes)

Tracking Deformable Objects with Point Clouds
John Schulman, Alex Lee, Jonathan Ho, Pieter Abbeel
International Conference on Robotics and Automation (ICRA), 2013,
Winner of Best Vision Paper
Paper / Website / Video (Youtube, MP4) / Slides (With & Without Notes)

Grasping and Fixturing as Submodular Coverage Problems
John Schulman, Ken Goldberg, Pieter Abbeel
International Symposium on Robotics Research (ISRR), 2011
Paper