18/08/26 17:33:32.21 CL1hr8qnX BE:58745546-2BP(3)
Page 14
0 2500 5000 7500 10000 12500 15000 17500
0 25000 50000 75000 100000 125000 150000 175000 200000
Extrinsic Reward per Episode
Number of gradient updates
Scale in Mario
Batch of 128 environments
Batch of 1024 environments
Figure 9:
Best extrinsic returns on the Mario scaling experiments.
We observe that larger batches allow the agent to explore more effectively, reaching the same performance in less parameter updates, and also achieving better ultimate scores.
B.2 Mario We show the analogue of the plot shown in Figure 3(a) showing max extrinsic returns.
See Figure 9
.
14
Page 15
15