This document details the code, parameters, and configurations used to obtain the results described in Differentiation of Behaviours in Learning Pheromone-based Communication article.
-
Make sure you have Python 3.10 installed.
-
Create a virtual environment:
python -m venv path_to_new_virtual_env source path_to_new_virtual_env/bin/activate
-
Clone this repository:
git clone https://github.com/dav3-b/DBLPC.git cd DBLPC
-
Install the required dependencies:
pip install -r requirements.txt
Operating system: The code was tested on Ubuntu 22.04, CPU only.
DBLPC/
├── agents # Algorithms folder
│ ├── IQLearning # Indipendent Q-Learning implementation
│ │ └── config # Algorithm configuration files
│ ├── NoLearning # Deterministic policy implementation
│ └── utils # Utility functions
└── environments # Multi-agent environments
└── slime # Slime environment
└── config # Env configuration files
The main script is slime_iql.py
, which accepts the following command-line arguments:
Argument | Type | Default value | Description |
---|---|---|---|
--train |
bool | False | If True , training of the agents will be performed, else evaluation. |
--random_seed |
int | 42 | Change the default random seed for reproducibility. |
--qtable_path |
str | None | Path to a .npy file for loading the Q-table to perform evaluation. |
--render |
bool | False | If True , renders the environment visually. |
Example: Training run
python slime_iql.py --train True --random_seed 99
The qtable will be automatically put in the ./runs/weights
folder.
Example: Evaluation run
python slime_iql.py --random_seed 99 --qtable_path ./runs/weights/file_name.npy --render True
The main script is slime_deterministic.py
, which accepts the following command-line arguments:
Argument | Type | Default value | Description |
---|---|---|---|
--random_seed |
int | 42 | Change the default random seed for reproducibility. |
--episodes |
int | 2000 | Number of episodes. |
--render |
bool | False | If True , renders the environment visually. |
Run example
python slime_deterministic.py --random_seed 99 --episodes 100 --render True
Parameter | Tested Values | Used Value | Description |
---|---|---|---|
World-size |
[(20x20), (22x22), (25x25), (31x31)] | (22x22) | Size of the grid world where agents move (torus). |
Clutering-population ( |
[20, 14, 10, 6, 0] | [20, 14, 10, 6, 0] | Number of clustering agents. |
Scattering-population ( |
[20, 14, 10, 6, 0] | [20, 14, 10, 6, 0] | Number of scattering agents. |
Sniff-threshold |
[20, 14, 10, 6, 0] | 0.9 | Minimum amount of pheromone that can be smelled by an agent. |
Sniff-patches |
[3, 5, 8] | 5 | Number of 1-hop neighboring patches in which the agent can smell the pheromone. |
Wiggle-patches |
[3, 5, 8] | 5 | Number of 1-hop neighboring patches the agent can move randomly through. |
Diffuse-area |
[0.5, 1.0, 1.5] | 0.5 | Standard deviation value of the Gaussian function used to spread the pheromone in the environment. |
Diffuse-radius |
1.0 | 1.0 | Radius of the Gaussian function used to spread the pheromone in the environment. |
Evaporation-rate |
[0.8, 0.85, 0.9, 0.95] | 0.95 | Amount of pheromone not evaporating in the environment. |
Lay-area |
[0, 1] | 1 | Number of patches in which the pheromone is released. |
Lay-amount |
[1, 2, 3, 5] | 3 | Amount of pheromone deposited evenly in Lay-area . |
Parameter | Tested Values | Used Value | Description |
---|---|---|---|
Cluster-threshold ( |
[1, 5, 10] | 1 | Minimum amount of agents within cluster-radius to check clustering. |
Cluster-radius |
[1, 2, 3] | 1 | Distance (in number of patches) centered in the agents to control clustering, it is used for calculating rewards and metrics. |
Clustering-reward ( |
[0, 1, 10, 100] | 10 | Base reward given upon clustering. |
Clustering-penalty ( |
[0, -1, -10, -100] | -1 | Base penalty given for not clustering. |
Scattering-reward ( |
[0, 1, 10, 100] | 0 | Base reward given upon scattering. |
Scattering-penalty ( |
[0, -1, -10, -100] | -1 | Base penalty given for not scattering. |
Ticks-per-episode |
[250, 500, 1000] | 500 | Learning episode duration in simulation ticks. |
episodes |
[3000, 5000, 10000] | 3000 | Number of learning episodes. |
learning-rate ( |
[0.01, 0.025, 0.05, 0.1] | [0.01, 0.025, 0.05] | Magnitude of Q-values updates. |
discount-factor ( |
[0.9, 0.95, 0.99, 0.999] | [0.9, 0.99] | How much future rewards are given value. |
epsilon-init ( |
1.0 | 1.0 | Initial exploration rate. |
epsilon-min ( |
[5e−3, 1e−3, 5e−4, 1e−4, 0.0] | 0.0 | Minimum value of epsilon. |
epsilon-decay ( |
[0.995, 0.997, 0.999] | 0.995 | How much epsilon lowers after each action, it goes from ( |
The following .json configuration files are used to manage the experiment's parameters:
File Name | Purpose |
---|---|
/environments/slime/config/env-params.json |
Defines the environment settings. |
/environments/slime/config/env_visualizer.json |
Controls the rendering configuration for visualizing the environment |
/agent/IQLearning/config/learning-params.json |
Contains learning-related parameters such as learning rate, epsilon decay, etc. |
/agent/IQLearning/config/logger-params.json |
Configures the logging behavior and export mode. |
All evaluation metrics described in the paper are automatically logged in /runs/train
(for training) and in /runs/eval
(for evaluation).
Ours paper results presents the average result of 10 identical experiments conducted on a population of 20 total agents.
The random seeds we used: [10, 20, 30, 40, 50, 60, 70 , 80, 90, 100]
.
If you use this codebase in your research, please cite the following article:
Differentiation of Behaviours in Learning Pheromone-based Communication
Authors: Davide Borghi, Stefano Mariani, and Franco Zambonelli
Under review for the 4th DISCOLI workshop on DIStributed COLlective Intelligence, 2025.