Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions nemo/NeMo-Safe-Synthesizer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# NeMo Safe Synthesizer Example Notebooks


This directory contains the tutorial notebooks for getting started with NeMo Safe Synthesizer.

## 📦 Set Up the Environment

We will use the `uv` python management tool to set up our environment and install the necessary dependencies. If you don't have `uv` installed, you can follow the installation instructions from the [uv documentation](https://docs.astral.sh/uv/getting-started/installation/).

Install the sdk as follows:

```bash
uv venv
source .venv/bin/activate
uv pip install nemo-microservices[safe-synthesizer]
```


Be sure to select this virtual environment as your kernel when running the notebooks.

## 🚀 Deploying the NeMo Safe Synthesizer Microservice

To run these notebooks, you'll need access to a deployment of the NeMo Safe Synthesizer microservice. You have two deployment options:


### 🐳 Deploy the NeMo Safe Synthesizer Microservice Locally

Follow our quickstart guide to deploy the NeMo safe synthesizer microservice locally via Docker Compose.

### 🚀 Deploy NeMo Microservices Platform with Helm

Follow the helm installation guide to deploy the microservices platform.
304 changes: 304 additions & 0 deletions nemo/NeMo-Safe-Synthesizer/advanced/advanced_privacy.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "630e3e17",
"metadata": {},
"source": [
"# 🔐 NeMo Safe Synthesizer: Advanced Privacy (Differential Privacy)\n",
"\n",
"> ⚠️ **Warning**: NeMo Safe Synthesizer is in Early Access and not recommended for production use.\n",
"\n",
"<br>\n",
"\n",
"In this notebook, we create synthetic tabular data using the NeMo Microservices Python SDK with differential privacy enabled. The notebook should take about 1.5 hours to run.\n",
"\n",
"After completing this notebook, you'll be able to:\n",
"- **Use the NeMo Microservices SDK** to interact with Safe Synthesizer\n",
"- **Enable differential privacy** to provide additional privacy protection\n",
"- **Access an evaluation report** on the quality and privacy of the synthetic data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a538526a",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "8be84f5d",
"metadata": {},
"source": [
"#### 💾 Install dependencies\n",
"\n",
"Ensure you have a NeMo Microservices Platform deployment available. If you're using a managed or remote deployment, have the correct base URLs and tokens ready."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f5d6f5a",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from nemo_microservices import NeMoMicroservices\n",
"from nemo_microservices.beta.safe_synthesizer.builder import SafeSynthesizerBuilder\n",
"\n",
"import logging\n",
"\n",
"logging.basicConfig(level=logging.WARNING)\n",
"logging.getLogger(\"httpx\").setLevel(logging.WARNING)"
]
},
{
"cell_type": "markdown",
"id": "7395f0c8",
"metadata": {},
"source": [
"### ⚙️ Initialize the NeMo Safe Synthesizer Client\n",
"\n",
"- The Python SDK provides a wrapper around the NeMo Microservices Platform APIs.\n",
"- `http://localhost:8080` is the default URL for `base_url` in quickstart.\n",
"- If using a managed or remote deployment, ensure you use the correct base URLs and tokens."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8c15ab93",
"metadata": {},
"outputs": [],
"source": [
"client = NeMoMicroservices(\n",
" base_url=\"http://localhost:8080\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "8f1cfb12",
"metadata": {},
"source": [
"NeMo DataStore is launched as one of the services. We'll use it to manage storage, so set the following:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "426186a3",
"metadata": {},
"outputs": [],
"source": [
"datastore_config = {\n",
" \"endpoint\": \"http://localhost:3000/v1/hf\",\n",
" \"token\": \"\",\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "2d66c819",
"metadata": {},
"source": [
"## 📥 Load input data\n",
"\n",
"Safe synthesizer learns the patterns and correlations of an input data set in order to produce synthetic data with similar properties. Use the sample dataset provided or change the following cell to try with your own data.\n",
"\n",
"The sample dataset is of a set of customer default payments. It includes columns of Personally Identifiable Information (PII) such as sex, education level, marriage status, and age. In addition, it contains several billing and payments accounts and a binary indicator of whether the next month's payment would default."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c989a42",
"metadata": {},
"outputs": [],
"source": [
"%pip install ucimlrepo || uv pip install ucimlrepo"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7204f213",
"metadata": {},
"outputs": [],
"source": [
"from ucimlrepo import fetch_ucirepo \n",
" \n",
"# fetch dataset \n",
"default_of_credit_card_clients = fetch_ucirepo(id=350) \n",
"df = default_of_credit_card_clients.data.original\n",
" \n",
"\n",
"# Display the first few rows of the combined DataFrame\n",
"print(df.head()) "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8ca3a11",
"metadata": {},
"outputs": [],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"id": "87d72c68",
"metadata": {},
"source": [
"## 🏗️ Create a Safe Synthesizer job\n",
"\n",
"The `SafeSynthesizerBuilder` provides a fluent interface to configure and submit jobs.\n",
"\n",
"This job will:\n",
"- Initialize the builder with the NeMo Microservices client.\n",
"- Use the loaded DataFrame as the input data source.\n",
"- Configure the job to use the specified datastore for model storage.\n",
"- Enable automatic replacement of personally identifiable information (PII).\n",
"- Enable differential privacy (DP) with a configurable epsilon.\n",
"- Use structured generation to enforce the schema during data generation.\n",
"- Submit the job to the microservices platform."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85d9de56",
"metadata": {},
"outputs": [],
"source": [
"job = (\n",
" SafeSynthesizerBuilder(client)\n",
" .from_data_source(df)\n",
" .with_datastore(datastore_config)\n",
" .with_replace_pii()\n",
" .with_differential_privacy(dp_enabled=True, epsilon=8.0)\n",
" .with_generate(use_structured_generation=True)\n",
" .create_job()\n",
")\n",
"\n",
"print(f\"job_id = {job.job_id}\")\n",
"job.wait_for_completion()\n",
"\n",
"print(f\"Job finished with status {job.fetch_status()}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fa2eacb2",
"metadata": {},
"outputs": [],
"source": [
"# If your notebook shuts down, it's okay, your job is still running on the microservices platform.\n",
"# You can get the same job object and interact with it again by uncommenting the following code\n",
"# snippet, and modifying it with the job id from the previous cell output.\n",
"\n",
"# from nemo_microservices.beta.safe_synthesizer.sdk.job import SafeSynthesizerJob\n",
"# job = SafeSynthesizerJob(job_id=\"<job id>\", client=client)"
]
},
{
"cell_type": "markdown",
"id": "285d4a9d",
"metadata": {},
"source": [
"## 👀 View synthetic data\n",
"\n",
"After the job completes, fetch the generated synthetic dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f25574a",
"metadata": {},
"outputs": [],
"source": [
"# Fetch the synthetic data created by the job\n",
"synthetic_df = job.fetch_data()\n",
"synthetic_df\n"
]
},
{
"cell_type": "markdown",
"id": "472b4f38",
"metadata": {},
"source": [
"## 📊 View evaluation report\n",
"\n",
"An evaluation comparing the synthetic data to the input data is performed automatically.\n",
"\n",
"- Programmatically access key scores (quality and privacy).\n",
"- Download the full HTML report with charts and detailed metrics.\n",
"- Display the report inline below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b691127",
"metadata": {},
"outputs": [],
"source": [
"# Print selected information from the job summary\n",
"summary = job.fetch_summary()\n",
"print(\n",
" f\"Synthetic data quality score (0-10, higher is better): {summary.synthetic_data_quality_score}\"\n",
")\n",
"print(f\"Data privacy score (0-10, higher is better): {summary.data_privacy_score}\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5b1030a",
"metadata": {},
"outputs": [],
"source": [
"# Download the full evaluation report to your local machine\n",
"job.save_report(\"evaluation_report.html\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "45f7e22b",
"metadata": {},
"outputs": [],
"source": [
"# Fetch and display the full evaluation report inline\n",
"job.display_report_in_notebook()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "kendrickb-notebooks",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading