Skip to content

Commit c830dc1

Browse files
authored
Develop environment bc fix and doc update (#1317)
* split the config into two files * fixed the Training-ML-Agents.md doc * added the configs for all of the IL scenes
1 parent 0c06ebf commit c830dc1

File tree

5 files changed

+156
-77
lines changed

5 files changed

+156
-77
lines changed

config/bc_config.yaml

Lines changed: 0 additions & 56 deletions
This file was deleted.

config/offline_bc_config.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
default:
2+
trainer: offline_bc
3+
batch_size: 64
4+
summary_freq: 1000
5+
max_steps: 5.0e4
6+
batches_per_epoch: 10
7+
use_recurrent: false
8+
hidden_units: 128
9+
learning_rate: 3.0e-4
10+
num_layers: 2
11+
sequence_length: 32
12+
memory_size: 256
13+
demo_path: ./UnitySDK/Assets/Demonstrations/<Your_Demon_File>.demo
14+
15+
HallwayBrain:
16+
trainer: offline_bc
17+
max_steps: 5.0e5
18+
num_epoch: 5
19+
batch_size: 64
20+
batches_per_epoch: 5
21+
num_layers: 2
22+
hidden_units: 128
23+
sequence_length: 16
24+
use_recurrent: true
25+
memory_size: 256
26+
sequence_length: 32
27+
demo_path: ./UnitySDK/Assets/Demonstrations/Hallway.demo

config/online_bc_config.yaml

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
default:
2+
trainer: online_bc
3+
brain_to_imitate: <Your_Brain_Asset_Name>
4+
batch_size: 64
5+
time_horizon: 64
6+
summary_freq: 1000
7+
max_steps: 5.0e4
8+
batches_per_epoch: 10
9+
use_recurrent: false
10+
hidden_units: 128
11+
learning_rate: 3.0e-4
12+
num_layers: 2
13+
sequence_length: 32
14+
memory_size: 256
15+
16+
BananaLearning:
17+
trainer: online_bc
18+
max_steps: 10000
19+
summary_freq: 1000
20+
brain_to_imitate: BananaPlayer
21+
batch_size: 16
22+
batches_per_epoch: 5
23+
num_layers: 4
24+
hidden_units: 64
25+
use_recurrent: false
26+
sequence_length: 16
27+
28+
BouncerLearning:
29+
trainer: online_bc
30+
max_steps: 10000
31+
summary_freq: 10
32+
brain_to_imitate: BouncerPlayer
33+
batch_size: 16
34+
batches_per_epoch: 1
35+
num_layers: 1
36+
hidden_units: 64
37+
use_recurrent: false
38+
sequence_length: 16
39+
40+
HallwayLearning:
41+
trainer: online_bc
42+
max_steps: 10000
43+
summary_freq: 1000
44+
brain_to_imitate: HallwayPlayer
45+
batch_size: 16
46+
batches_per_epoch: 5
47+
num_layers: 4
48+
hidden_units: 64
49+
use_recurrent: false
50+
sequence_length: 16
51+
52+
PushBlockLearning:
53+
trainer: online_bc
54+
max_steps: 10000
55+
summary_freq: 1000
56+
brain_to_imitate: PushBlockPlayer
57+
batch_size: 16
58+
batches_per_epoch: 5
59+
num_layers: 4
60+
hidden_units: 64
61+
use_recurrent: false
62+
sequence_length: 16
63+
64+
PyramidsLearning:
65+
trainer: online_bc
66+
max_steps: 10000
67+
summary_freq: 1000
68+
brain_to_imitate: PyramidsPlayer
69+
batch_size: 16
70+
batches_per_epoch: 5
71+
num_layers: 4
72+
hidden_units: 64
73+
use_recurrent: false
74+
sequence_length: 16
75+
76+
TennisLearning:
77+
trainer: online_bc
78+
max_steps: 10000
79+
summary_freq: 1000
80+
brain_to_imitate: TennisPlayer
81+
batch_size: 16
82+
batches_per_epoch: 5
83+
num_layers: 4
84+
hidden_units: 64
85+
use_recurrent: false
86+
sequence_length: 16
87+
88+
StudentBrain:
89+
trainer: online_bc
90+
max_steps: 10000
91+
summary_freq: 1000
92+
brain_to_imitate: TeacherBrain
93+
batch_size: 16
94+
batches_per_epoch: 5
95+
num_layers: 4
96+
hidden_units: 64
97+
use_recurrent: false
98+
sequence_length: 16
99+
100+
StudentRecurrentBrain:
101+
trainer: online_bc
102+
max_steps: 10000
103+
summary_freq: 1000
104+
brain_to_imitate: TeacherBrain
105+
batch_size: 16
106+
batches_per_epoch: 5
107+
num_layers: 4
108+
hidden_units: 64
109+
use_recurrent: true
110+
sequence_length: 32

docs/Training-Imitation-Learning.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,9 @@ With offline behavioral cloning, we can use demonstrations (`.demo` files) gener
4646
1. Choose an agent you would like to learn to imitate some set of demonstrations.
4747
2. Record a set of demonstration using the `Demonstration Recorder` (see above). For illustrative purposes we will refer to this file as `AgentRecording.demo`.
4848
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to Control in the Broadcast Hub. For more information on Brains, see [here](Learning-Environment-Design-Brains.md).
49-
4. Open the `config/bc_config.yaml` file.
49+
4. Open the `config/offline_bc_config.yaml` file.
5050
5. Modify the `demo_path` parameter in the file to reference the path to the demonstration file recorded in step 2. In our case this is: `./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
51-
6. Launch `mlagent-learn`, and providing `./config/bc_config.yaml` as the config parameter, and your environment as the `--env` parameter.
51+
6. Launch `mlagent-learn`, and providing `./config/offline_bc_config.yaml` as the config parameter, and your environment as the `--env` parameter.
5252
7. (Optional) Observe training performance using Tensorboard.
5353

5454
This will use the demonstration file to train a nerual network driven agent to directly imitate the actions provided in the demonstration. The environment will launch and be used for evaluating the agent's performance during training.
@@ -69,13 +69,13 @@ It is also possible to provide demonstrations in realtime during training, witho
6969
and check the `Control` checkbox on the "Student" brain.
7070
4. Link the Brains to the desired Agents (one Agent as the teacher and at least
7171
one Agent as a student).
72-
5. In `config/trainer_config.yaml`, add an entry for the "Student" Brain. Set
72+
5. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
7373
the `trainer` parameter of this entry to `imitation`, and the
7474
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
7575
Additionally, set `batches_per_epoch`, which controls how much training to do
7676
each moment. Increase the `max_steps` option if you'd like to keep training
7777
the Agents for a longer period of time.
78-
6. Launch the training process with `mlagents-learn config/trainer_config.yaml
78+
6. Launch the training process with `mlagents-learn config/online_bc_config.yaml
7979
--train --slow`, and press the :arrow_forward: button in Unity when the
8080
message _"Start training by pressing the Play button in the Unity Editor"_ is
8181
displayed on the screen

docs/Training-ML-Agents.md

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ project to decide the best course of action for an agent.
2020
Use the command `mlagents-learn` to train your agents. This command is installed
2121
with the `mlagents` package and its implementation can be found at
2222
`ml-agents/mlagents/trainers/learn.py`. The [configuration file](#training-config-file),
23-
`config/trainer_config.yaml` specifies the hyperparameters used during training.
23+
like `config/trainer_config.yaml` specifies the hyperparameters used during training.
2424
You can edit this file with a text editor to add a specific configuration for
2525
each Brain.
2626

@@ -87,12 +87,7 @@ When training is finished, you can find the saved model in the `models` folder
8787
under the assigned run-id — in the cats example, the path to the model would be
8888
`models/cob_1/CatsOnBicycles_cob_1.bytes`.
8989

90-
On Mac and Linux platform, you can press Ctrl+c to terminate your training
91-
early, the model will be saved as if you set your max_steps to the current step.
92-
(**Note:** There is a known bug on Windows that causes the saving of the model
93-
to fail when you early terminate the training, it's recommended to wait until
94-
Step has reached the max_steps parameter you set in trainer_config.yaml.) While
95-
this example used the default training hyperparameters, you can edit the
90+
While this example used the default training hyperparameters, you can edit the
9691
[training_config.yaml file](#training-config-file) with a text editor to set
9792
different values.
9893

@@ -154,21 +149,24 @@ environment, you can set the following command line options when invoking
154149

155150
### Training config file
156151

157-
The training config file, `config/trainer_config.yaml` specifies the training
158-
method, the hyperparameters, and a few additional values to use during training.
159-
The file is divided into sections. The **default** section defines the default
160-
values for all the available settings. You can also add new sections to override
161-
these defaults to train specific Brains. Name each of these override sections
162-
after the GameObject containing the Brain component that should use these
163-
settings. (This GameObject will be a child of the Academy in your scene.)
164-
Sections for the example environments are included in the provided config file.
152+
The training config files `config/trainer_config.yaml`,
153+
`config/online_bc_config.yaml` and `config/offline_bc_config.yaml` specifies the
154+
training method, the hyperparameters, and a few additional values to use during
155+
training with PPO, online and offline BC. These files are divided into sections.
156+
The **default** section defines the default values for all the available
157+
settings. You can also add new sections to override these defaults to train
158+
specific Brains. Name each of these override sections after the GameObject
159+
containing the Brain component that should use these settings. (This GameObject
160+
will be a child of the Academy in your scene.) Sections for the example
161+
environments are included in the provided config file.
165162

166163
| **Setting** | **Description** | **Applies To Trainer\*** |
167164
| :------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------- |
168165
| batch_size | The number of experiences in each iteration of gradient descent. | PPO, BC |
169166
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model. | BC |
170167
| beta | The strength of entropy regularization. | PPO |
171-
| brain\_to\_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. | BC |
168+
| brain\_to\_imitate | For online imitation learning, the name of the GameObject containing the Brain component to imitate. | (online)BC |
169+
| demo_path | For offline imitation learning, the file path of the recorded demonstration file | (offline)BC |
172170
| buffer_size | The number of experiences to collect before updating the policy model. | PPO |
173171
| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module. | PPO |
174172
| curiosity_strength | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module. | PPO |
@@ -184,7 +182,7 @@ Sections for the example environments are included in the provided config file.
184182
| num_layers | The number of hidden layers in the neural network. | PPO, BC |
185183
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
186184
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |
187-
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, BC |
185+
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, (online)BC |
188186
| trainer | The type of training to perform: "ppo" or "imitation". | PPO, BC |
189187
| use_curiosity | Train using an additional intrinsic reward signal generated from Intrinsic Curiosity Module. | PPO |
190188
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |

0 commit comments

Comments
 (0)