Teaching Machines to Run Like Humans — Without Explicit Rewards

Published on June 3, 2025

How do you teach a robot to move like a human — without explicitly telling it what's good or bad? That's the challenge we took on in our recent project at Carnegie Mellon University, where we explored inverse reinforcement learning (IRL) and generative adversarial methods to infer reward functions directly from motion data.

Traditional reinforcement learning (RL) depends heavily on hand-designed reward functions, which are often brittle and fail to generalize across variations in movement. We asked: what if a machine could learn what desirable movement looks like — by watching and imitating expert demonstrations?

We developed a learning pipeline using Bidirectional GANs and a PPO expert policy trained in Brax to generate synthetic human-like running. The GAN learned to distinguish expert states from random ones, and this discriminator was used as a reward function for training a new policy from scratch.

You can also include code snippets to explain key parts of the implementation:

# Simple reward shaping using a discriminator output
def get_reward(discriminator, state):
    with torch.no_grad():
        score = discriminator(state)
    return float(score)  # Higher = more expert-like

While the GAN approach was promising, it ran into stability issues. The discriminator learned quickly, but the policy struggled due to weak or noisy reward gradients. We then implemented Generative Adversarial Imitation Learning (GAIL), which evaluates full motion trajectories rather than individual states — and observed much more realistic, stable behavior.

Key takeaways:

GAN-based IRL can infer rewards from expert data — but training dynamics are fragile.
State-level discrimination often lacks temporal context needed for locomotion.
Trajectory-level methods like GAIL perform better for tasks like running.

We're excited about future work that combines these ideas with Wasserstein GANs, Soft Actor-Critic, and real human motion capture data — to build agents that not only move, but move naturally.