Welcome to my stodgy academic homepage!
I'm a research scientist at OpenAI. I received my PhD in Computer Science from UC Berkeley, where I had the good fortune of being advised by Pieter Abbeel. My primary interest is reinforcement learning, and I believe that motor learning is key to many aspects of intelligence. My work on policy optimization made it possible for a robot to learn to run and get up off the ground (in simulation).
I am co-teaching a course on deep reinforcement learning this spring (2017) at UC Berkeley.
My current research is inspired by my earlier work in robotics, where I mainly investigated the following two problems: (1) teaching robots to perform manipulation tasks using human demonstrations, work that enabled autonomous knot tying and surgical suturing; (2) using trajectory optimization for motion planning. The software library developed for this project has been used on a variety of real robots, including one scary humanoid.
- Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs
PhD Dissertation, 2016
- RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel
- #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
- Variational Lossy Autoencoder
Xi Chen, Diederik Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel
- Concrete Problems in AI Safety
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané
- OpenAI Gym
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba
- InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
Neural Information Processing Systems (NIPS), 2016
- Variational Information Maximizing Exploration
Rein Houthooft, Xi Chen, Yan Duan, Filip De Turck, John Schulman, Pieter Abbeel.
Neural Information Processing Systems (NIPS), 2016
- Benchmarking Deep Reinforcement Learning for Continuous Control
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel.
International Conference of Machine Learning (ICML), 2016
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, Pieter Abbeel
International Conference of Learning Representations (ICLR), 2016
Paper (arXiv) / Video
- Spike Sorting for Large, Dense Electrode Arrays
Cyrille Rossant, Shabnam Kadir, Dan F. M. Goodman, John Schulman, Mariano Belluscio, Gyorgy Buzsaki, Kenneth D. Harris
Nature Neuroscience, 2016
- Gradient Estimation Using Stochastic Computation Graphs
John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel
Neural Information Processing System (NIPS), 2015
- Trust Region Policy Optimization
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
International Conference on Machine Learning (ICML), 2015
Paper (arXiv) / Videos
- Scaling up Gaussian Belief Space Planning Through Covariance-Free Trajectory Optimization and Automatic Differentiation
Sachin Patil, Greg Kahn, Michael Laskey, John Schulman, Ken Goldberg, Pieter Abbeel.
Workshop on Algorithm Foundations of Robotics (WAFR), 2014
- Motion Planning with Sequential Convex Optimization and Convex Collision Checking
John Schulman, Yan Duan, Jonathan Ho, Alex Lee, Ibrahim Awwal, Henry Bradlow, Jia Pan, Sachin Patil, Ken Goldberg, Pieter Abbeel.
International Journal of Robotics Research (IJRR), 2014
- Planning Locally Optimal, Curvature-Constrained Trajectories in 3D Using Sequential Convex Optimization
Yan Duan, Sachin Patil, John Schulman, Ken Goldberg, Pieter Abbeel.
International Conference on Robotics and Automation (ICRA), 2014
- Generalization in Robotic Manipulation Through the Use of Non-Rigid Registration
John Schulman, Jonathan Ho, Cameron Lee, and Pieter Abbeel
International Symposium on Robotics Research (ISRR), 2013
Paper / Videos
- A Case Study of Trajectory Transfer Through Non-Rigid Registration for a Simplified Suturing Scenario
John Schulman, Ankush Gupta, Sibi Venkatesan, Mallory Tayson-Frederick, Pieter Abbeel
International Conference on Intelligent Robots and Systems (IROS), 2013
Paper / Videos
- Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization
John Schulman, Jonathan Ho, Alex Lee, Ibrahim Awwal, Henry Bradlow, Pieter Abbeel
Robotics: Science and Systems (RSS), 2013
Paper / Documentation / Github / Videos / Slides (With & Without Notes)
- Tracking Deformable Objects with Point Clouds
John Schulman, Alex Lee, Jonathan Ho, Pieter Abbeel
International Conference on Robotics and Automation (ICRA), 2013, Winner of Best Vision Paper
Paper / Website / Video (Youtube, MP4) / Slides (With & Without Notes)
- Grasping and Fixturing as Submodular Coverage Problems
John Schulman, Ken Goldberg, Pieter Abbeel
International Symposium on Robotics Research (ISRR), 2011
OpenAI Gym (2016-future):
Blog post /
Article on NVIDIA blog
Computation Graph Toolkit (2015):
Documentation. Computation Graph Toolkit (CGT) is an automatic differentiation library, intended to be "Theano reloaded" with fast compilation, multithreading, improved compile-time inference, and a simpler codebase. I stopped developing it after Tensorflow came out and turned out to be excellent.
- TrajOpt (developed 2012-2013) is a software framework for generating robot trajectories by local optimization. The following core capabilities are included: a solver for generic nonlinear optimization problems by sequential quadratic programming, cost and constraint functions for kinematics and collision avoidance, and a JSON-based problem specification format for trajectory optimization problems. The core libraries are implemented in C++, and a Python API using Boost.Python is provided.
- Caton (developed 2009-2010) is a software package that automates the process of spike sorting, a common task in the analysis of neural data. spikedetekt, developed in the Cortical Processing Lab at University College London, is somewhat based on this code.
Sometimes people ask for slides or videos of my presentations, so I'll keep some recent links here.
- Nuts and Bolts of Deep RL Research, December 2016.
- Tutorial on Deep Reinforcement Learning at NIPS 2016 with Pieter Abbeel, December 2016.
- Deep Reinforcement Learning: Policy Gradients and Q-Learning at Bay Area Deep Learning School, September 2016.
Video / Slides (PDF)