You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Training-Imitation-Learning.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,9 +46,9 @@ With offline behavioral cloning, we can use demonstrations (`.demo` files) gener
46
46
1. Choose an agent you would like to learn to imitate some set of demonstrations.
47
47
2. Record a set of demonstration using the `Demonstration Recorder` (see above). For illustrative purposes we will refer to this file as `AgentRecording.demo`.
48
48
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to Control in the Broadcast Hub. For more information on Brains, see [here](Learning-Environment-Design-Brains.md).
49
-
4. Open the `config/bc_config.yaml` file.
49
+
4. Open the `config/offline_bc_config.yaml` file.
50
50
5. Modify the `demo_path` parameter in the file to reference the path to the demonstration file recorded in step 2. In our case this is: `./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
51
-
6. Launch `mlagent-learn`, and providing `./config/bc_config.yaml` as the config parameter, and your environment as the `--env` parameter.
51
+
6. Launch `mlagent-learn`, and providing `./config/offline_bc_config.yaml` as the config parameter, and your environment as the `--env` parameter.
52
52
7. (Optional) Observe training performance using Tensorboard.
53
53
54
54
This will use the demonstration file to train a nerual network driven agent to directly imitate the actions provided in the demonstration. The environment will launch and be used for evaluating the agent's performance during training.
@@ -69,13 +69,13 @@ It is also possible to provide demonstrations in realtime during training, witho
69
69
and check the `Control` checkbox on the "Student" brain.
70
70
4. Link the Brains to the desired Agents (one Agent as the teacher and at least
71
71
one Agent as a student).
72
-
5. In `config/trainer_config.yaml`, add an entry for the "Student" Brain. Set
72
+
5. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
73
73
the `trainer` parameter of this entry to `imitation`, and the
74
74
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
75
75
Additionally, set `batches_per_epoch`, which controls how much training to do
76
76
each moment. Increase the `max_steps` option if you'd like to keep training
77
77
the Agents for a longer period of time.
78
-
6. Launch the training process with `mlagents-learn config/trainer_config.yaml
78
+
6. Launch the training process with `mlagents-learn config/online_bc_config.yaml
79
79
--train --slow`, and press the :arrow_forward: button in Unity when the
80
80
message _"Start training by pressing the Play button in the Unity Editor"_ is
| batch_size | The number of experiences in each iteration of gradient descent. | PPO, BC |
169
166
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model. | BC |
170
167
| beta | The strength of entropy regularization. | PPO |
171
-
| brain\_to\_imitate | For imitation learning, the name of the GameObject containing the Brain component to imitate. | BC |
168
+
| brain\_to\_imitate | For online imitation learning, the name of the GameObject containing the Brain component to imitate. | (online)BC |
169
+
| demo_path | For offline imitation learning, the file path of the recorded demonstration file | (offline)BC |
172
170
| buffer_size | The number of experiences to collect before updating the policy model. | PPO |
173
171
| curiosity\_enc\_size | The size of the encoding to use in the forward and inverse models in the Curioity module. | PPO |
174
172
| curiosity_strength | Magnitude of intrinsic reward generated by Intrinsic Curiosity Module. | PPO |
@@ -184,7 +182,7 @@ Sections for the example environments are included in the provided config file.
184
182
| num_layers | The number of hidden layers in the neural network. | PPO, BC |
185
183
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
186
184
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, BC |
187
-
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, BC |
185
+
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, (online)BC|
188
186
| trainer | The type of training to perform: "ppo" or "imitation". | PPO, BC |
189
187
| use_curiosity | Train using an additional intrinsic reward signal generated from Intrinsic Curiosity Module. | PPO |
190
188
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, BC |
0 commit comments