Machine Learning Engineering with Multiple Agents (MLE-STAR)

Overview

The Machine Learning Engineering Agent is an approach to building Machine Learning Engineering (MLE) agents that can train state-of-the-art machine learning models on various tasks (including classification and regression tasks), through a novel approach of leveraging web search and targeted code block refinement. Using the example of predicting California housing prices, we show how MLE-STAR can create a regression model based on factors like population, income, etc. that outperforms traditional approaches to training ML models. The experimental results show that MLE-STAR achieves medals in 63.6% of the Kaggle competitions on the MLE-bench-Lite, significantly outperforming the best alternative. The implementation is based on the Google Cloud AI Research paper "MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement" (https://www.arxiv.org/abs/2506.15692).

Performance of MLE agents on MLE-Bench-Lite datasets.

MLE Agents	Base LLM	Any Medals	Gold Medals	Silver Medals	Bronze Medals
MLE-STAR	Gemini-2.5-Pro	63.6%	36.4%	21.2%	6.1%
MLE-STAR	Gemini-2.5-Flash	43.9%	30.3%	4.5%	9.1%

Agent Details

The key features of the Machine Learning Agent include:

Feature	Description
Interaction Type	Conversational
Complexity	Advanced
Agent Type	Multi Agent
Components	Tools: Code execution, Retrieval
Vertical	All

Agent architecture

This diagram shows the detailed architecture of the agents and tools used to implement this workflow.

Key Features

Initial Solution Generation: Uses a search engine to retrieve state-of-the-art models and their example codes, then merges the best-performing candidates into a consolidated initial solution.
Code Block Refinement: Iteratively improves the solution by identifying and targeting specific code blocks (ML pipeline components) that have the most significant impact on performance, determined through ablation studies. An inner loop refines the targeted block with various strategies.
Ensemble Strategies: Introduces a novel ensembling method where the Agent proposes and refines ensemble strategies to combine multiple solutions, aiming for superior performance than individual best solutions.
Robustness Modules: Includes a debugging agent for error correction, a data leakage checker to prevent improper data access during preprocessing, and a data usage checker to ensure all provided data sources are utilized.

Setup and Installation

Prerequisites
- Python 3.12+
- Poetry
  - For dependency management and packaging. Please follow the instructions on the official Poetry website for installation.
```
pip install poetry
```
- Git
  - Git can be downloaded from https://git-scm.com/. Then follow the installation guide.
- Google Cloud Account
  - You need a Google Cloud account
- A project on Google Cloud Platform
- Google Cloud CLI
  - For installation, please follow the instruction on the official Google Cloud website.
Installation and Setup
- Clone repository
```
# Clone this repository.
git clone https://github.com/google/adk-samples.git
cd adk-samples/python/agents/machine-learning-engineering
```
- Install Poetry
```
# Install the Poetry package and dependencies.
poetry install
```
  This command reads the pyproject.toml file and installs all the necessary dependencies into a virtual environment managed by Poetry.
  
  If the above command returns with a command not found error, then use:
```
python -m poetry install
```
- Activate the shell
```
poetry env activate
```
  This activates the virtual environment, allowing you to run commands within the project's environment. To make sure the environment is active, use for example
```
$> poetry env list
machine-learning-engineering-Gb54hHID-py3.12 (Activated)
```
  If the above command did not activate the environment for you, you can also activate it through
```
source $(poetry env info --path)/bin/activate
```

Configuration

Set up Google Cloud credentials.

You may set the following environment variables in your shell, or in a .env file instead.

export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_PROJECT=<your-project-id>
export GOOGLE_CLOUD_LOCATION=<your-project-location>
export ROOT_AGENT_MODEL=<Google LLM to use>
export GOOGLE_CLOUD_STORAGE_BUCKET=<your-storage-bucket>  # Only required for deployment on Agent Engine

Authenticate your GCloud account.

gcloud auth application-default login
gcloud auth application-default set-quota-project $GOOGLE_CLOUD_PROJECT

Running the Agent

Prepare your task

You should prepare the inputs for your task in the following way:

Create a folder under tasks with the name of your task.
In that folder, create a file containing the description of the task.
Place the data files in this folder.

Using adk

ADK provides convenient ways to bring up agents locally and interact with them. You may talk to the agent using the CLI:

adk run machine_learning_engineering

Or via the Poetry shell:

poetry run adk run machine_learning_engineering

Or on a web interface:

 adk web

The command adk web will start a web server on your machine and print the URL.

Example Interaction

You may open the URL, select "machine_learning_engineering" in the top-left drop-down menu, and a chatbot interface will appear on the right. The conversation is initially blank. Here are some example requests you may ask the Machine Learning Agent to identity itself:

[user]: who are you?

[mle_frontdoor_agent]: I am a machine learning engineer agent.

[user]: what can you do?

[mle_frontdoor_agent]: I am a machine learning engineer. My primary role is to engineer solutions for machine learning tasks, such as the California Housing Task. I can also describe the task if you'd like. I work by executing a sequence of sub-agents to solve the machine learning engineering task.

[user]: describe the task that you have

[mle_frontdoor_agent]: The task I have is the California Housing Task. This task involves predicting the median house value in California districts, given various features about those districts. It's a regression problem where the goal is to build a model that can accurately estimate house prices based on factors like population, median income, and housing age within a district.

[user]: execute the task

[mle_frontdoor_agent]: <intermediate output snipped>.

\# Save the submission file to CSV without the index print(f"Submission file saved successfully to {submission_file_path}")

Running Tests

For running tests and evaluation, install the extra dependencies:

poetry install --with dev

Then the tests and evaluation can be run from the machine-learning-engineering directory using the pytest module:

python3 -m pytest tests
python3 -m pytest eval

tests runs the agent on a sample request, and makes sure that every component is functional. eval is a demonstration of how to evaluate the agent, using the AgentEvaluator in ADK. It sends a couple requests to the agent and expects that the agent's responses match a pre-defined response reasonablly well.

Deployment

You will need to have specified a GCS bucket in the environment variable GOOGLE_CLOUD_BUCKET as detailed in the Configuration section.

If the bucket does not exist, ADK will create one for you. This is the easiest option. If the bucket does exist, then you must provide permissions to the service account as described in this Troubleshooting article.

The Machine Learning Engineering Agent can be deployed to Vertex AI Agent Engine using the following commands:

poetry install --with deployment
python3 deployment/deploy.py --create

When the deployment finishes, it will print a line like this:

Created remote agent: projects/<PROJECT_NUMBER>/locations/<PROJECT_LOCATION>/reasoningEngines/<AGENT_ENGINE_ID>

If you forget the AGENT_ENGINE_ID, you can list the existing agents using:

python3 deployment/deploy.py --list

The output will be like:

All remote agents:

123456789 ("machine_learning_engineering")
- Create time: 2025-07-11 09:46:07+00:00
- Update time: 2025-05-10 09:46:09+00:00

You may interact with the deployed agent using the test_deployment.py script

$ export USER_ID=<any string>
$ python3 deployment/test_deployment.py --resource_id=${AGENT_ENGINE_ID} --user_id=${USER_ID}
Found agent with resource ID: ...
Created session for user ID: ...
Type 'quit' to exit.
Input: Hello. What can you do for me?
Response: Hello! I'm a Machine Learning Engineer Assistant. I can help you achieve competition-level quality in solving machine learning tasks.

To get started, please provide the task description of the competition.

To delete the deployed agent, you may run the following command:

python3 deployment/deploy.py --delete --resource_id=${AGENT_ENGINE_ID}

Appendix

Required Config Parameters

This document describes the required configuration parameters in the DefaultConfig dataclass.

`data_dir`

Description: Specifies the directory path where the machine learning tasks and their data are stored.
Type: str
Default: "./machine_learning_engineering/tasks/"

`task_name`

Description: The name of the specific task to be loaded and processed.
Type: str
Default: "california-housing-prices"

`task_type`

Description: Defines the type of machine learning problem.
Type: str
Default: "Tabular Regression"

`lower`

Description: A boolean flag, indicating whether a lower value of the metric is better.
Type: bool
Default: True

`workspace_dir`

Description: The directory path used for saving intermediate outputs, results, logs, or any other artifacts generated during the task execution.
Type: str
Default: "./machine_learning_engineering/workspace/"

`agent_model`

Description: Specifies the identifier for the LLM model to be used by the agent. It defaults to the value of the environment variable ROOT_AGENT_MODEL or "gemini-2.0-flash-001" if the variable is not set.
Type: str
Default: os.environ.get("ROOT_AGENT_MODEL", "gemini-2.0-flash-001")

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
deployment		deployment
docs_machine-learning-engineering		docs_machine-learning-engineering
eval		eval
machine_learning_engineering		machine_learning_engineering
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
commit_message.txt		commit_message.txt
deepwiki-machine-learning-engineering.txt		deepwiki-machine-learning-engineering.txt
parser.py		parser.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Engineering with Multiple Agents (MLE-STAR)

Overview

Performance of MLE agents on MLE-Bench-Lite datasets.

Agent Details

Agent architecture

Key Features

Setup and Installation

Running the Agent

Example Interaction

Running Tests

Deployment

Appendix

Required Config Parameters

`data_dir`

`task_name`

`task_type`

`lower`

`workspace_dir`

`agent_model`

About

Uh oh!

Releases

Packages

Languages

zsavmak/machine-learning-engineering

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Engineering with Multiple Agents (MLE-STAR)

Overview

Performance of MLE agents on MLE-Bench-Lite datasets.

Agent Details

Agent architecture

Key Features

Setup and Installation

Running the Agent

Example Interaction

Running Tests

Deployment

Appendix

Required Config Parameters

data_dir

task_name

task_type

lower

workspace_dir

agent_model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`data_dir`

`task_name`

`task_type`

`lower`

`workspace_dir`

`agent_model`

Packages