Skip to main content
ML Quest
Python Idle

The warehouse robot goes live tomorrow. The CEO wants a demo that shows everything: an agent that learns from nothing, explores intelligently, converges on the optimal path, and has metrics to prove it. You must combine Q-learning with epsilon-greedy action selection on a grid world — and track every reward to show the learning curve. No scaffolding. No hints until you really need them. This is the final test of your RL skills.

~35 minproject
Loading Python runtime...
Goals: 4 tests
Q-table should be trained (non-trivial values)
trained agent should reach the goal in over 80% of test episodes
epsilon should decay during training
should track reward history across episodes
Python loading...