ML Quest — Learn Machine Learning by Playing

Python Idle

The warehouse robot works — but it always takes the same path, even when better routes might exist. The team realizes the agent needs a smarter action selection strategy. Enter the multi-armed bandit: imagine a row of 5 slot machines, each with a different hidden payout probability. You must figure out which arm is best by pulling them — but every pull on a bad arm is wasted money. Pull randomly and you explore but earn less. Always pull your current best and you might miss the true winner. The epsilon-greedy strategy elegantly balances this fundamental RL dilemma.

~20 minscenario

Loading Python runtime...

Goals: 4 tests

should collect at least 1000 rewards

epsilon should decay from start to a lower end value

average reward should improve over time

Python loading...

Epsilon-Greedy Strategy