Skip to content

Commit 326a277

Browse files
[Fix] Edit the gym-unity Readme to fix some issue in the sample code (#5331)
* Edit the gym-unity Readme to fix some issue in the sample code * Update gym-unity/README.md Co-authored-by: andrewcoh <[email protected]> * addressing comments * Adding the action_seed parameter to the documentation Co-authored-by: andrewcoh <[email protected]>
1 parent 8f73ee7 commit 326a277

File tree

1 file changed

+23
-25
lines changed

1 file changed

+23
-25
lines changed

gym-unity/README.md

Lines changed: 23 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,14 @@ env = UnityToGymWrapper(unity_env, uint8_visual, flatten_branched, allow_multipl
4444
Discrete. Otherwise, it will be converted into a MultiDiscrete. Defaults to
4545
`False`.
4646

47-
- `allow_multiple_obs` will return a list of observations. The first elements contain the visual observations and the
48-
last element contains the array of vector observations. If False the environment returns a single array (containing
49-
a single visual observations, if present, otherwise the vector observation). Defaults to `False`.
47+
- `allow_multiple_obs` will return a list of observations. The first elements
48+
contain the visual observations and the last element contains the array of
49+
vector observations. If False the environment returns a single array (containing
50+
a single visual observations, if present, otherwise the vector observation).
51+
Defaults to `False`.
52+
53+
- `action_space_seed` is the optional seed for action sampling. If non-None, will
54+
be used to set the random seed on created gym.Space instances.
5055

5156
The returned environment `env` will function as a gym.
5257

@@ -86,10 +91,12 @@ pip install git+git://github.com/openai/baselines
8691
```
8792

8893
Next, create a file called `train_unity.py`. Then create an `/envs/` directory
89-
and build the GridWorld environment to that directory. For more information on
94+
and build the environment to that directory. For more information on
9095
building Unity environments, see
91-
[here](../docs/Learning-Environment-Executable.md). Add the following code to
92-
the `train_unity.py` file:
96+
[here](../docs/Learning-Environment-Executable.md). Note that because of
97+
limitations of the DQN baseline, the environment must have a single visual
98+
observation, a single discrete action and a single Agent in the scene.
99+
Add the following code to the `train_unity.py` file:
93100

94101
```python
95102
import gym
@@ -101,12 +108,12 @@ from mlagents_envs.environment import UnityEnvironment
101108
from gym_unity.envs import UnityToGymWrapper
102109

103110
def main():
104-
unity_env = UnityEnvironment("./envs/GridWorld")
105-
env = UnityToGymWrapper(unity_env, 0, uint8_visual=True)
111+
unity_env = UnityEnvironment(<path-to-environment>)
112+
env = UnityToGymWrapper(unity_env, uint8_visual=True)
106113
logger.configure('./logs') # Change to log in a different directory
107114
act = deepq.learn(
108115
env,
109-
"cnn", # conv_only is also a good choice for GridWorld
116+
"cnn", # For visual inputs
110117
lr=2.5e-4,
111118
total_timesteps=1000000,
112119
buffer_size=50000,
@@ -174,8 +181,8 @@ def make_unity_env(env_directory, num_env, visual, start_index=0):
174181
"""
175182
def make_env(rank, use_visual=True): # pylint: disable=C0111
176183
def _thunk():
177-
unity_env = UnityEnvironment(env_directory)
178-
env = UnityToGymWrapper(unity_env, rank, uint8_visual=True)
184+
unity_env = UnityEnvironment(env_directory, base_port=5000 + rank)
185+
env = UnityToGymWrapper(unity_env, uint8_visual=True)
179186
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(rank)))
180187
return env
181188
return _thunk
@@ -186,7 +193,7 @@ def make_unity_env(env_directory, num_env, visual, start_index=0):
186193
return DummyVecEnv([make_env(rank, use_visual=False)])
187194

188195
def main():
189-
env = make_unity_env('./envs/GridWorld', 4, True)
196+
env = make_unity_env(<path-to-environment>, 4, True)
190197
ppo2.learn(
191198
network="mlp",
192199
env=env,
@@ -237,12 +244,12 @@ method with the following code.
237244
```python
238245
game_version = 'v0' if sticky_actions else 'v4'
239246
full_game_name = '{}NoFrameskip-{}'.format(game_name, game_version)
240-
unity_env = UnityEnvironment('./envs/GridWorld')
247+
unity_env = UnityEnvironment(<path-to-environment>)
241248
env = UnityToGymWrapper(unity_env, uint8_visual=True)
242249
return env
243250
```
244251

245-
`./envs/GridWorld` is the path to your built Unity executable. For more
252+
`<path-to-environment>` is the path to your built Unity executable. For more
246253
information on building Unity environments, see
247254
[here](../docs/Learning-Environment-Executable.md), and note the Limitations
248255
section below.
@@ -255,8 +262,7 @@ rather than on the Python side.
255262

256263
Since Dopamine is designed around variants of DQN, it is only compatible with
257264
discrete action spaces, and specifically the Discrete Gym space. For
258-
environments that use branched discrete action spaces (e.g.
259-
[VisualBanana](../docs/Learning-Environment-Examples.md)), you can enable the
265+
environments that use branched discrete action spaces, you can enable the
260266
`flatten_branched` parameter in `UnityToGymWrapper`, which treats each
261267
combination of branched actions as separate actions.
262268

@@ -271,7 +277,7 @@ to the observation dimensions or number of channels.
271277
The hyperparameters provided by Dopamine are tailored to the Atari games, and
272278
you will likely need to adjust them for ML-Agents environments. Here is a sample
273279
`dopamine/agents/rainbow/configs/rainbow.gin` file that is known to work with
274-
GridWorld.
280+
a simple GridWorld.
275281

276282
```python
277283
import dopamine.agents.rainbow.rainbow_agent
@@ -347,11 +353,3 @@ certain features (e.g. dueling-Q) that are not enabled in Dopamine DQN.
347353

348354
![Dopamine on GridWorld](images/dopamine_gridworld_plot.png)
349355

350-
### Example: VisualBanana
351-
352-
As an example of using the `flatten_branched` option, we also used the Rainbow
353-
algorithm to train on the VisualBanana environment, and provide the results
354-
below. The same hyperparameters were used as in the GridWorld case, except that
355-
`replay_history` and `epsilon_decay` were increased to 100000.
356-
357-
![Dopamine on VisualBanana](images/dopamine_visualbanana_plot.png)

0 commit comments

Comments
 (0)