From e9d15fcaea35e8bd26af7debbb1614a47b583a2a Mon Sep 17 00:00:00 2001 From: Paula Date: Thu, 25 Sep 2025 14:52:32 +0200 Subject: [PATCH] initial draft --- .../services/natural-language-to-aql.md | 409 ++++++++++++------ 1 file changed, 274 insertions(+), 135 deletions(-) diff --git a/site/content/3.12/data-science/graphrag/services/natural-language-to-aql.md b/site/content/3.12/data-science/graphrag/services/natural-language-to-aql.md index 6d2610dfe0..10825f2fa7 100644 --- a/site/content/3.12/data-science/graphrag/services/natural-language-to-aql.md +++ b/site/content/3.12/data-science/graphrag/services/natural-language-to-aql.md @@ -5,149 +5,264 @@ description: >- The Natural Language to AQL Translation Service is a powerful tool that allows you to interact with your ArangoDB database using natural language queries weight: 20 -draft: true # Not available in pre-release --- ## Overview -This service translates your questions and commands into AQL (ArangoDB Query Language), -executes the queries, and provides responses in natural language. +The Natural Language to AQL Translation Service provides two distinct capabilities: -## Features +**1. [Process Text](#process-text)**: Ask general questions and get natural language responses without querying your database. +Ideal for: +- General knowledge questions +- Text analysis and processing -- Natural language to AQL query translation -- Support for multiple LLM providers (via OpenAI API or a self-hosted Triton Inference Server) +**2. [Translate Query](#translate-query)**: Convert natural language questions into AQL queries and execute them against your ArangoDB database. +Ideal for: +- Querying your database using natural language +- Converting business questions into database operations +- Exploring data through intuitive interfaces +- Learning AQL by seeing translations + +The Natural Language to AQL Translation Service also includes the following features: +- Support for multiple LLM providers (via OpenAI API or a self-hosted Triton LLM host) - RESTful and gRPC interfaces - Health monitoring endpoints -- Flexible output formats (Natural Language, AQL, JSON) - -## Getting Started +- Flexible output formats (Natural Language, AQL, JSON) for database queries -### Prerequisites +## Prerequisites - ArangoDB instance - OpenAI API key (if using OpenAI as provider) - Triton URL and model name (if using Triton as provider) +## Installation and configuration + +When creating the service, you provide parameters in the API request that become environment variables used at runtime. + +{{< tabs >}} + +{{< tab "Required Parameters" >}} +These parameters must be provided in all service creation requests: + +- `username`: Database username for authentication +- `db_name`: Name of the ArangoDB database +- `api_provider`: LLM provider selection (`openai` or `triton`) +- `genai_project_name`: Name of the project created in Step 1 +{{< /tab >}} -### Configuration +{{< tab "OpenAI Provider" >}} +Additional parameters required when using `api_provider: "openai"`: -The following environment variables are set at installation time and used at runtime: +- `openai_api_key`: API key for OpenAI authentication +- `openai_model`: Model name (defaults to `gpt-3.5-turbo` if not specified) + +Optional OpenAI parameters: +- `openai_temperature`: Controls randomness (0.0 to 2.0) +- `openai_max_retries`: Maximum number of retry attempts +{{< /tab >}} + +{{< tab "Triton Provider" >}} +Additional parameters required when using `api_provider: "triton"`: + +- `triton_url`: URL of the Triton inference server +- `triton_model`: Model name to use with Triton + +Optional Triton parameters: +- `triton_timeout`: Timeout in seconds for Triton requests +{{< /tab >}} + +{{< /tabs >}} + +### Step 1: Create a GenAI GraphRAG project + +The first step is to create a new project: ```bash -# Required Database Configuration -ARANGODB_NAME= -ARANGODB_USER= +curl --request POST \ + --url https://arangodb-platform-dev.pilot.arangodb.com/gen-ai/v1/project \ + --header 'Authorization: Bearer ' \ + --header 'Content-Type: application/json' \ + --data '{ + "project_name": "your-txt2aql-project", + "project_type": "graphrag", + "project_description": "Natural language to AQL translation project" + }' +``` + +**Expected Response:** +```json +{ + "projectName": "your-txt2aql-project", + "projectType": "graphrag", + "projectDescription": "Natural language to AQL translation project" +} +``` + +### Step 2: Create the GraphRAG txt2aql service -# LLM Provider Configuration -API_PROVIDER= # "openai" or "triton" +Create the service instance with your configuration: -# If using OpenAI -OPENAI_API_KEY= -OPENAI_MODEL= # Optional, defaults to GPT-4 -OPENAI_TEMPERATURE= # Optional -OPENAI_MAX_RETRIES= # Optional +```bash +curl --request POST \ + --url https://arangodb-platform-dev.pilot.arangodb.com/gen-ai/v1/graphrag \ + --header 'Authorization: Bearer ' \ + --header 'Content-Type: application/json' \ + --data '{ + "env": { + "username": "", + "db_name": "", + "api_provider": "", + "openai_api_key": "", + "openai_model": "", + "genai_project_name": "" + } + }' +``` -# If using Triton -TRITON_URL= -TRITON_MODEL= -TRITON_TIMEOUT= # Optional +**Expected Response:** +```json +{ + "serviceInfo": { + "serviceId": "arangodb-graph-rag-xxxxx", + "description": "Install complete", + "status": "DEPLOYED", + "namespace": "", + "values": "..." + } +} ``` -### Starting the Service +{{< info >}} +Save the `serviceId` from the above response as you'll need it for subsequent API calls. +{{< /info >}} -To start the service, use GenAI service endpoint `CreateGraphRag`. Please refer to the documentation of GenAI service for more information on how to use it. +### Step 3: Verify the service status -### Required Parameters +Check that the service is properly deployed: -These parameters must be provided in the install request sent to GenAI service. +```bash +curl --request GET \ + --url https://:8529/gen-ai/v1/service/arangodb-graph-rag- \ + --header 'Authorization: Bearer ' +``` -- `username`: Database username for authentication -- `db_name`: Name of the ArangoDB database -- `api_provider`: LLM provider selection (`openai`, `triton`) +**Expected Response:** +```json +{ + "serviceInfo": { + "serviceId": "arangodb-graph-rag-", + "description": "Install complete", + "status": "DEPLOYED", + "namespace": "", + "values": "..." + } +} +``` -### Provider-Specific Required Parameters +### Step 4: Health check -#### OpenAI Provider +Verify that the service is running and healthy: -- `openai_api_key`: API key for OpenAI authentication -- `openai_model`: Model name (defaults to "gpt-3.5-turbo" if not specified) +```bash +curl --request GET \ + --url :8529/graph-rag//v1/health \ + --header 'Authorization: Bearer ' +``` -#### Triton Provider +**Expected Response:** +```json +{ + "status": "SERVING" +} +``` -- `triton_url`: URL of the Triton inference server -- `triton_model`: Model name to use with Triton +{{< info >}} +The `serviceID` in the URL is typically the last part of the full service ID (e.g., `xxxxx` from `arangodb-graph-rag-xxxxx`). +{{< /info >}} -## API Reference +## Process Text + +The **Process Text** endpoint allows you to ask general questions to the LLM and receive natural language responses. + +```bash +POST /v1/process_text +``` -### REST Endpoints - -1. **Process Text** - Ask general questions to the LLM and get a natural language response. This endpoint does not query the database. - ```bash - POST /v1/process_text - Content-Type: application/json - - { - "input_text": "What are the advantages of graph databases?" - } - ``` - -2. **Translate Query** - Convert natural language to AQL and query the database - ```bash - POST /v1/translate_query - Content-Type: application/json - - { - "input_text": "Find all users who are friends with John", - "options": { - "output_formats": ["NL", "AQL", "JSON"] - } - } - ``` - -3. **Health Check** - Monitor service health - ```bash - GET /v1/health - ``` - -### gRPC Endpoints - -The service also provides gRPC endpoints for more efficient communication: - -1. **Process Text** - ```bash - grpcurl -plaintext -d '{"input_text": "Hello world"}' \ - localhost:9090 txt2aql.Txt2AqlService/ProcessText - ``` - -2. **Translate Query** - ```bash - grpcurl -plaintext -d '{ - "input_text": "Find all characters from House Stark", - "options": { - "output_formats": ["NL","AQL","JSON"] - } - }' localhost:9090 txt2aql.Txt2AqlService/TranslateQuery - ``` - -3. **Health Check** - ```bash - grpcurl -plaintext localhost:9090 txt2aql.Txt2AqlService/HealthCheck - ``` - -## Output Formats - -The `translate_query` endpoint of the txt2aql service supports multiple output formats that can be specified in the `output_formats` field of your request. Each format serves a different purpose and can be used individually or in combination: - -### Natural Language (NL) +{{< info >}} +**This endpoint does not query your database**, it is designed for general knowledge questions and text processing. +{{< /info >}} + +**Example**: + +```json +{ + "input_text": "What are the advantages of graph databases?" +} +``` + +```bash +curl --request POST \ + --url https://arangodb-platform-dev.pilot.arangodb.com/graph-rag//v1/process_text \ + --header 'Authorization: Bearer YOUR_ACCESS_TOKEN' \ + --header 'Content-Type: application/json' \ + --data '{ + "input_text": "What are the advantages of graph databases?" + }' +``` + +**Expected output:** + +```json +{ + "responseText": "Graph databases offer several key advantages: 1) Efficient relationship handling - they store relationships as first-class citizens, making traversals much faster than traditional SQL JOINs. 2) Flexible data modeling - schema-less design accommodates evolving datasets naturally. 3) High performance for connected data - query performance remains consistent even with large datasets. 4) Intuitive visualization - relationships can be easily visualized and understood. 5) Real-time capabilities - excellent for recommendation systems, fraud detection, and network analysis." +} +``` + +## Translate Query + +The **Translate Query** endpoint converts natural language questions into AQL queries and executes them against your ArangoDB database. **This endpoint queries your actual data** and returns results in multiple formats. + +```bash +POST /v1/translate_query +``` + +**Example**: + +```json +{ + "input_text": "Find all users who are friends with John", + "options": { + "output_formats": ["NL", "AQL", "JSON"] + } +} +``` + +```bash +curl --request POST \ + --url https://arangodb-platform-dev.pilot.arangodb.com/graph-rag//v1/translate_query \ + --header 'Authorization: Bearer YOUR_ACCESS_TOKEN' \ + --header 'Content-Type: application/json' \ + --data '{ + "input_text": "Find all users who are friends with John", + "options": { + "output_formats": ["NL", "AQL", "JSON"] + } + }' +``` + +### Output formats + +The `translate_query` endpoint supports multiple output formats that can be specified in the `output_formats` field of your request. Each format serves a different purpose and can be used individually or in combination. + +#### Natural Language (NL) - **Format identifier**: `"NL"` - **Returns**: A human-readable explanation of the query results -- **Helpful for**: Understanding what the query found in plain English +- **Helpful for**: Understanding what the query found in plain English. - **Example**: - **Input**: `Find all users who are friends with John` - **Output**: `I found 3 users who are friends with John, including Alice, Bob, and Carol` -### AQL Query (AQL) +#### AQL Query (AQL) - **Format identifier**: `"AQL"` - **Returns**: The generated ArangoDB Query Language (AQL) query @@ -155,16 +270,16 @@ The `translate_query` endpoint of the txt2aql service supports multiple output f - Debugging query translation - Learning AQL syntax - Modifying queries for reuse -- **Shows**: Exactly how your natural language was translated into database operations +- **Shows**: Exactly how your natural language was translated into database operations. - **Example**: - **Input**: `Find all users who are friends with John` - **Output**: `FOR u IN users FILTER u.friends ANY == 'John' RETURN u` -### JSON Results (JSON) +#### JSON Results (JSON) - **Format identifier**: `"JSON"` - **Returns**: The raw query results in JSON format -- **Provides**: Direct access to the complete dataset +- **Provides**: Direct access to the complete dataset. - **Ideal for**: - Programmatic processing - Data integration @@ -173,7 +288,7 @@ The `translate_query` endpoint of the txt2aql service supports multiple output f - **Input**: `Find all users who are friends with John` - **Output**: `[{"name":"Alice","age":30},{"name":"Bob","age":25},{"name":"Carol","age":35}]` -### Example Response +#### Examples ```json { @@ -184,18 +299,52 @@ The `translate_query` endpoint of the txt2aql service supports multiple output f } ``` -### Usage Tips +#### Usage and default behavior -1. Request only the formats you need to minimize response size and processing time -2. Use `NL` for user interfaces, human consumption or when wrapped as an LLM-callable function (e.g. in LLM agent frameworks) -3. Use `AQL` for debugging and learning purposes -4. Use `JSON` for programmatic data processing such as API calls. +- Request only the formats you need to minimize response size and processing time. +- Use `NL` for user interfaces, human consumption, or when wrapped as an LLM-callable function (e.g., in LLM agent frameworks). +- Use `AQL` for debugging and learning purposes. +- Use `JSON` for programmatic data processing such as API calls. +- If no output formats are specified, the service defaults to `NL` format only. +- Multiple formats can be requested simultaneously. +- Formats are processed efficiently, with results cached where possible. -### Default Behavior +## gRPC endpoints -- If no output formats are specified, the service defaults to `NL` format only -- Multiple formats can be requested simultaneously -- Formats are processed efficiently, with results cached where possible +The service also provides gRPC endpoints for more efficient communication. + +### Process Text (gRPC) + +```bash +grpcurl -plaintext -d '{"input_text": "Hello world"}' \ + localhost:9090 txt2aql.Txt2AqlService/ProcessText +``` + +### Translate Query (gRPC) + +```bash +grpcurl -plaintext -d '{ + "input_text": "Find all characters from House Stark", + "options": { + "output_formats": ["NL","AQL","JSON"] + } +}' localhost:9090 txt2aql.Txt2AqlService/TranslateQuery +``` + +### Health check (gRPC) + +```bash +grpcurl -plaintext localhost:9090 txt2aql.Txt2AqlService/HealthCheck +``` + +## Best Practices + +1. Be specific in your queries to get more accurate translations. +2. Use appropriate output formats based on your needs. +3. Monitor the health endpoint for service status. +4. Implement proper error handling in your client applications. +5. Use connection pooling for better performance. +6. Consider rate limiting for production deployments. ## Error Handling @@ -209,31 +358,21 @@ The service provides clear error messages for common issues: Error responses include appropriate HTTP status codes and descriptive messages. -## Best Practices - -1. Be specific in your queries to get more accurate translations -2. Use appropriate output formats based on your needs -3. Monitor the health endpoint for service status -4. Implement proper error handling in your client applications -5. Use connection pooling for better performance -6. Consider rate limiting for production deployments - ## Troubleshooting Common issues and solutions: -1. **Connection Issues** - - Verify ARANGODB_ENDPOINT is accessible - - Check network/firewall settings - - Ensure proper authentication credentials +1. **Connection issues**: + - Verify that ARANGODB_ENDPOINT is accessible. + - Check network/firewall settings. + - Ensure proper authentication credentials. -2. **Query Translation Issues** - - Make queries more specific - - Check LLM provider configuration - - Verify database schema matches query context +2. **Query Translation issues**: + - Make queries more specific. + - Check LLM provider configuration. + - Verify that the database schema matches the query context. - The quality of the generated AQL may vary depending on the LLM model used. - Therefore we recommend using an AQL-capable coding model (e.g. a frontier AQL-capable - LLM or a fine-tuned AQL-capable coding model) for better results. + Therefore, it is recommended to use an AQL-capable coding model (e.g., a frontier AQL-capable LLM or a fine-tuned AQL-capable coding model) for better results. ## API Reference