Oussama Kharouiche

Proximal Policy Optimization Algorithms implementation

This project is a PPO paper implementation using Gymnasium environments, PyTorch, and Wandb for metrics visualization.


Overview

In this project, I implement PPO, a state-of-the-art policy gradient algorithm, from scratch in PyTorch. The goal is to train agents on classic control tasks provided by Gymnasium (e.g., CartPole-v1, LunarLander-v3). Key highlights:


Features


Installation

  1. Clone the repository:

    git clone https://github.com/oussamakharouiche/PPO-Implementation.git
    cd PPO-Implementation
    
  2. Create a virtual environment and install dependencies:

    python3 -m venv ppo
    source ppo/bin/activate
    pip install -r requirements.txt
    

Bibliography

  1. Create config file if not found
  2. Train the ppo agent:
    python3 ppo.py --config-path ./configs/cartpole_config.yaml
    
  3. evaluate the agent:
    python3 evaluate.py --config-path ./configs/cartpole_config.yaml
    

Results

Results averaged over 100 evaluation runs:

Environment Avg. Reward Std. Dev
CartPole-v1 499.87 1.29
LunarLander-v3 275.54 36.18
Acrobot-v1 -81.00 20.74

Note: Results may vary based on random seed and hyperparameters.


Bibliography

  1. Proximal Policy Optimization Algorithms.

🔗 Links