Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 101 additions & 77 deletions docs/Training-on-Amazon-Web-Service.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,41 +5,43 @@ Service for training ML-Agents environments.

## Preconfigured AMI

We've prepared an preconfigured AMI for you with the ID: `ami-18642967` in the
We've prepared a preconfigured AMI for you with the ID: `ami-18642967` in the
`us-east-1` region. It was created as a modification of [Deep Learning AMI
(Ubuntu)](https://aws.amazon.com/marketplace/pp/B077GCH38C). If you want to do
training without the headless mode, you need to enable X Server on it. After
launching your EC2 instance using the ami and ssh into it, run the following
commands to enable it:
(Ubuntu)](https://aws.amazon.com/marketplace/pp/B077GCH38C). The AMI has been
tested with p2.xlarge instance. Furthermore, if you want to train without
headless mode, you need to enable X Server.

After launching your EC2 instance using the ami and ssh into it, run the
following commands to enable it:

```console
//Start the X Server, press Enter to come to the command line
# Start the X Server, press Enter to come to the command line
$ sudo /usr/bin/X :0 &

//Check if Xorg process is running
//You will have a list of processes running on the GPU, Xorg should be in the list, as shown below
# Check if Xorg process is running
# You will have a list of processes running on the GPU, Xorg should be in the
# list, as shown below
$ nvidia-smi
/*
* Thu Jun 14 20:27:26 2018
* +-----------------------------------------------------------------------------+
* | NVIDIA-SMI 390.67 Driver Version: 390.67 |
* |-------------------------------+----------------------+----------------------+
* | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
* | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
* |===============================+======================+======================|
* | 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
* | N/A 35C P8 31W / 149W | 9MiB / 11441MiB | 0% Default |
* +-------------------------------+----------------------+----------------------+
*
* +-----------------------------------------------------------------------------+
* | Processes: GPU Memory |
* | GPU PID Type Process name Usage |
* |=============================================================================|
* | 0 2331 G /usr/lib/xorg/Xorg 8MiB |
* +-----------------------------------------------------------------------------+
*/

//Make the ubuntu use X Server for display

# Thu Jun 14 20:27:26 2018
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 390.67 Driver Version: 390.67 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# |===============================+======================+======================|
# | 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
# | N/A 35C P8 31W / 149W | 9MiB / 11441MiB | 0% Default |
# +-------------------------------+----------------------+----------------------+
#
# +-----------------------------------------------------------------------------+
# | Processes: GPU Memory |
# | GPU PID Type Process name Usage |
# |=============================================================================|
# | 0 2331 G /usr/lib/xorg/Xorg 8MiB |
# +-----------------------------------------------------------------------------+

# Make the ubuntu use X Server for display
$ export DISPLAY=:0
```

Expand Down Expand Up @@ -87,29 +89,30 @@ linux executables which use visual observations.
1. Install and setup Xorg:

```console
//Install Xorg
# Install Xorg
$ sudo apt-get update
$ sudo apt-get install -y xserver-xorg mesa-utils
$ sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024

//Get the BusID information
# Get the BusID information
$ nvidia-xconfig --query-gpu-info

//Add the BusID information to your /etc/X11/xorg.conf file
# Add the BusID information to your /etc/X11/xorg.conf file
$ sudo sed -i 's/ BoardName "Tesla K80"/ BoardName "Tesla K80"\n BusID "0:30:0"/g' /etc/X11/xorg.conf

//Remove the Section "Files" from the /etc/X11/xorg.conf file
$ sudo vim /etc/X11/xorg.conf //And remove two lines that contain Section "Files" and EndSection
# Remove the Section "Files" from the /etc/X11/xorg.conf file
# And remove two lines that contain Section "Files" and EndSection
$ sudo vim /etc/X11/xorg.conf
```

2. Update and setup Nvidia driver:

```console
//Download and install the latest Nvidia driver for ubuntu
# Download and install the latest Nvidia driver for ubuntu
$ wget http://download.nvidia.com/XFree86/Linux-x86_64/390.67/NVIDIA-Linux-x86_64-390.67.run
$ sudo /bin/bash ./NVIDIA-Linux-x86_64-390.67.run --accept-license --no-questions --ui=none

//Disable Nouveau as it will clash with the Nvidia driver
# Disable Nouveau as it will clash with the Nvidia driver
$ sudo echo 'blacklist nouveau' | sudo tee -a /etc/modprobe.d/blacklist.conf
$ sudo echo 'options nouveau modeset=0' | sudo tee -a /etc/modprobe.d/blacklist.conf
$ sudo echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
Expand All @@ -125,73 +128,95 @@ linux executables which use visual observations.
4. Make sure there are no Xorg processes running:

```console
//Kill any possible running Xorg processes
//Note that you might have to run this command multiple times depending on how Xorg is configured.
# Kill any possible running Xorg processes
# Note that you might have to run this command multiple times depending on
# how Xorg is configured.
$ sudo killall Xorg

//Check if there is any Xorg process left
//You will have a list of processes running on the GPU, Xorg should not be in the list, as shown below.
# Check if there is any Xorg process left
# You will have a list of processes running on the GPU, Xorg should not be in
# the list, as shown below.
$ nvidia-smi
/*
* Thu Jun 14 20:21:11 2018
* +-----------------------------------------------------------------------------+
* | NVIDIA-SMI 390.67 Driver Version: 390.67 |
* |-------------------------------+----------------------+----------------------+
* | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
* | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
* |===============================+======================+======================|
* | 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
* | N/A 37C P8 31W / 149W | 0MiB / 11441MiB | 0% Default |
* +-------------------------------+----------------------+----------------------+
*
* +-----------------------------------------------------------------------------+
* | Processes: GPU Memory |
* | GPU PID Type Process name Usage |
* |=============================================================================|
* | No running processes found |
* +-----------------------------------------------------------------------------+
*/

# Thu Jun 14 20:21:11 2018
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 390.67 Driver Version: 390.67 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# |===============================+======================+======================|
# | 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
# | N/A 37C P8 31W / 149W | 0MiB / 11441MiB | 0% Default |
# +-------------------------------+----------------------+----------------------+
#
# +-----------------------------------------------------------------------------+
# | Processes: GPU Memory |
# | GPU PID Type Process name Usage |
# |=============================================================================|
# | No running processes found |
# +-----------------------------------------------------------------------------+

```

5. Start X Server and make the ubuntu use X Server for display:

```console
//Start the X Server, press Enter to come to the command line
# Start the X Server, press Enter to come back to the command line
$ sudo /usr/bin/X :0 &

//Check if Xorg process is running
//You will have a list of processes running on the GPU, Xorg should be in the list.
# Check if Xorg process is running
# You will have a list of processes running on the GPU, Xorg should be in the list.
$ nvidia-smi

//Make the ubuntu use X Server for display
# Make the ubuntu use X Server for display
$ export DISPLAY=:0
```

6. Ensure the Xorg is correctly configured:

```console
//For more information on glxgears, see ftp://www.x.org/pub/X11R6.8.1/doc/glxgears.1.html.
# For more information on glxgears, see ftp://www.x.org/pub/X11R6.8.1/doc/glxgears.1.html.
$ glxgears
//If Xorg is configured correctly, you should see the following message
/*
* Running synchronized to the vertical refresh. The framerate should be
* approximately the same as the monitor refresh rate.
* 137296 frames in 5.0 seconds = 27459.053 FPS
* 141674 frames in 5.0 seconds = 28334.779 FPS
* 141490 frames in 5.0 seconds = 28297.875 FPS
*/
# If Xorg is configured correctly, you should see the following message

# Running synchronized to the vertical refresh. The framerate should be
# approximately the same as the monitor refresh rate.
# 137296 frames in 5.0 seconds = 27459.053 FPS
# 141674 frames in 5.0 seconds = 28334.779 FPS
# 141490 frames in 5.0 seconds = 28297.875 FPS

```

## Training on EC2 instance

1. In the Unity Editor, load a project containing an ML-Agents environment (you
can use one of the example environments if you have not created your own).
2. Open the Build Settings window (menu: File > Build Settings).
3. Select Linux as the Target Platform, and x86_64 as the target architecture.
4. Check Headless Mode (If you haven't setup the X Server).
3. Select Linux as the Target Platform, and x86_64 as the target architecture
(the default x86 currently does not work).
4. Check Headless Mode if you have not setup the X Server. (If you do not use
Headless Mode, you have to setup the X Server to enable training.)
5. Click Build to build the Unity environment executable.
6. Upload the executable to your EC2 instance within `ml-agents` folder.
7. Test the instance setup from Python using:
7. Change the permissions of the executable.

```console
chmod +x <your_env>.x86_64
```
8. (Without Headless Mode) Start X Server and use it for display:

```console
# Start the X Server, press Enter to come back to the command line
$ sudo /usr/bin/X :0 &

# Check if Xorg process is running
# You will have a list of processes running on the GPU, Xorg should be in the list.
$ nvidia-smi

# Make the ubuntu use X Server for display
$ export DISPLAY=:0
```
9. Test the instance setup from Python using:

```python
from mlagents.envs import UnityEnvironment
Expand All @@ -202,9 +227,8 @@ linux executables which use visual observations.
Where `<your_env>` corresponds to the path to your environment executable.

You should receive a message confirming that the environment was loaded successfully.
8. Train the executable
10. Train your models

```console
chmod +x <your_env>.x86_64
mlagents-learn <trainer-config-file> --env=<your_env> --train
```