Robust Bipedal Locomotion on Uneven Terrain via Deep Reinforcement Learning

For my 16-745 project at CMU, I explored robust bipedal locomotion on uneven terrain with deep reinforcement learning. Using MuJoCo and a PPO policy, we targeted traversal without explicit trajectory planning, modeling the JVRC-1 as a 12-DOF walker and feeding the policy proprioception from encoders, the IMU, and motor current sensors.

In testing, the policy learned stable walking on flat ground after about 2,500 episodes, while training on uneven terrain remained unstable even after 20,000 episodes. We evaluated on both staircase and undulating terrains to probe stability and generalization. Our receding-horizon iLQR baseline struggled to track the reference, deviated in pelvis motion, and often fell after the first step, which points to cost tuning limits and motivates hybrid approaches next.