Project · 2021

Deep Q-learning for pairs trading

A reinforcement-learning agent that learns when to enter and exit mean-reverting equity pairs, improving Sharpe by ~15% over a baseline.

Stack Python · TensorFlow · Keras · OpenAI Gym · Reinforcement Learning

Role Solo research project


Most pairs-trading strategies use fixed entry/exit thresholds on a z-score. This project treats the problem as an MDP instead and lets a Deep Q-Network learn the policy end-to-end.

The agent observes a rolling spread and chooses from enter-long, enter-short, hold, or close. I used experience replay, a separate target network, and epsilon-greedy exploration; the reward shaping was the most important decision, since naive P&L rewards produced degenerate policies. The trained agent improved Sharpe by ~15% against a rule-based baseline on held-out pairs.

← all projects