(強いAI)技術的特異点/(世界加速) 23

(強いAI)技術的特異点/(世界加速) 23at FUTURE

(強いAI)技術的特異点/(世界加速) 23 - 暇つぶし2ch527:YAMAGUTIseisei
18/08/26 17:33:32.21 CL1hr8qnX BE:58745546-2BP(3)
Page 14

0 2500 5000 7500 10000 12500 15000 17500
0 25000 50000 75000 100000 125000 150000 175000 200000

Extrinsic Reward per Episode
Number of gradient updates

Scale in Mario

Batch of 128 environments
Batch of 1024 environments

Figure 9:
Best extrinsic returns on the Mario scaling experiments.
We observe that larger batches allow the agent to explore more effectively, reaching the same performance in less parameter updates, and also achieving better ultimate scores.

B.2 Mario We show the analogue of the plot shown in Figure 3(a) showing max extrinsic returns.
See Figure 9
.
14

Page 15

15

次ページ

続きを表示

1を表示