@@ -44,9 +44,14 @@ env = UnityToGymWrapper(unity_env, uint8_visual, flatten_branched, allow_multipl
44
44
Discrete. Otherwise, it will be converted into a MultiDiscrete. Defaults to
45
45
` False ` .
46
46
47
- - ` allow_multiple_obs ` will return a list of observations. The first elements contain the visual observations and the
48
- last element contains the array of vector observations. If False the environment returns a single array (containing
49
- a single visual observations, if present, otherwise the vector observation). Defaults to ` False ` .
47
+ - ` allow_multiple_obs ` will return a list of observations. The first elements
48
+ contain the visual observations and the last element contains the array of
49
+ vector observations. If False the environment returns a single array (containing
50
+ a single visual observations, if present, otherwise the vector observation).
51
+ Defaults to ` False ` .
52
+
53
+ - ` action_space_seed ` is the optional seed for action sampling. If non-None, will
54
+ be used to set the random seed on created gym.Space instances.
50
55
51
56
The returned environment ` env ` will function as a gym.
52
57
@@ -86,10 +91,12 @@ pip install git+git://github.com/openai/baselines
86
91
```
87
92
88
93
Next, create a file called ` train_unity.py ` . Then create an ` /envs/ ` directory
89
- and build the GridWorld environment to that directory. For more information on
94
+ and build the environment to that directory. For more information on
90
95
building Unity environments, see
91
- [ here] ( ../docs/Learning-Environment-Executable.md ) . Add the following code to
92
- the ` train_unity.py ` file:
96
+ [ here] ( ../docs/Learning-Environment-Executable.md ) . Note that because of
97
+ limitations of the DQN baseline, the environment must have a single visual
98
+ observation, a single discrete action and a single Agent in the scene.
99
+ Add the following code to the ` train_unity.py ` file:
93
100
94
101
``` python
95
102
import gym
@@ -101,12 +108,12 @@ from mlagents_envs.environment import UnityEnvironment
101
108
from gym_unity.envs import UnityToGymWrapper
102
109
103
110
def main ():
104
- unity_env = UnityEnvironment(" ./envs/GridWorld " )
105
- env = UnityToGymWrapper(unity_env, 0 , uint8_visual = True )
111
+ unity_env = UnityEnvironment(< path - to - environment > )
112
+ env = UnityToGymWrapper(unity_env, uint8_visual = True )
106
113
logger.configure(' ./logs' ) # Change to log in a different directory
107
114
act = deepq.learn(
108
115
env,
109
- " cnn" , # conv_only is also a good choice for GridWorld
116
+ " cnn" , # For visual inputs
110
117
lr = 2.5e-4 ,
111
118
total_timesteps = 1000000 ,
112
119
buffer_size = 50000 ,
@@ -174,8 +181,8 @@ def make_unity_env(env_directory, num_env, visual, start_index=0):
174
181
"""
175
182
def make_env (rank , use_visual = True ): # pylint: disable=C0111
176
183
def _thunk ():
177
- unity_env = UnityEnvironment(env_directory)
178
- env = UnityToGymWrapper(unity_env, rank, uint8_visual = True )
184
+ unity_env = UnityEnvironment(env_directory, base_port = 5000 + rank )
185
+ env = UnityToGymWrapper(unity_env, uint8_visual = True )
179
186
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str (rank)))
180
187
return env
181
188
return _thunk
@@ -186,7 +193,7 @@ def make_unity_env(env_directory, num_env, visual, start_index=0):
186
193
return DummyVecEnv([make_env(rank, use_visual = False )])
187
194
188
195
def main ():
189
- env = make_unity_env(' ./envs/GridWorld ' , 4 , True )
196
+ env = make_unity_env(< path - to - environment > , 4 , True )
190
197
ppo2.learn(
191
198
network = " mlp" ,
192
199
env = env,
@@ -237,12 +244,12 @@ method with the following code.
237
244
``` python
238
245
game_version = ' v0' if sticky_actions else ' v4'
239
246
full_game_name = ' {} NoFrameskip-{} ' .format(game_name, game_version)
240
- unity_env = UnityEnvironment(' ./envs/GridWorld ' )
247
+ unity_env = UnityEnvironment(< path - to - environment > )
241
248
env = UnityToGymWrapper(unity_env, uint8_visual = True )
242
249
return env
243
250
```
244
251
245
- ` ./envs/GridWorld ` is the path to your built Unity executable. For more
252
+ ` <path-to-environment> ` is the path to your built Unity executable. For more
246
253
information on building Unity environments, see
247
254
[ here] ( ../docs/Learning-Environment-Executable.md ) , and note the Limitations
248
255
section below.
@@ -255,8 +262,7 @@ rather than on the Python side.
255
262
256
263
Since Dopamine is designed around variants of DQN, it is only compatible with
257
264
discrete action spaces, and specifically the Discrete Gym space. For
258
- environments that use branched discrete action spaces (e.g.
259
- [ VisualBanana] ( ../docs/Learning-Environment-Examples.md ) ), you can enable the
265
+ environments that use branched discrete action spaces, you can enable the
260
266
` flatten_branched ` parameter in ` UnityToGymWrapper ` , which treats each
261
267
combination of branched actions as separate actions.
262
268
@@ -271,7 +277,7 @@ to the observation dimensions or number of channels.
271
277
The hyperparameters provided by Dopamine are tailored to the Atari games, and
272
278
you will likely need to adjust them for ML-Agents environments. Here is a sample
273
279
` dopamine/agents/rainbow/configs/rainbow.gin ` file that is known to work with
274
- GridWorld.
280
+ a simple GridWorld.
275
281
276
282
``` python
277
283
import dopamine.agents.rainbow.rainbow_agent
@@ -347,11 +353,3 @@ certain features (e.g. dueling-Q) that are not enabled in Dopamine DQN.
347
353
348
354
![ Dopamine on GridWorld] ( images/dopamine_gridworld_plot.png )
349
355
350
- ### Example: VisualBanana
351
-
352
- As an example of using the ` flatten_branched ` option, we also used the Rainbow
353
- algorithm to train on the VisualBanana environment, and provide the results
354
- below. The same hyperparameters were used as in the GridWorld case, except that
355
- ` replay_history ` and ` epsilon_decay ` were increased to 100000.
356
-
357
- ![ Dopamine on VisualBanana] ( images/dopamine_visualbanana_plot.png )
0 commit comments