-
Notifications
You must be signed in to change notification settings - Fork 127
Description
Summary:
Enable the creation of a MongoDB Query Language Agent using MongoDB-MCP_Server, allowing users to dynamically generate and execute MongoDB queries through a natural language input.
Motivation:
An MQL Agent would simplify this process by abstracting query construction and execution behind a consistent interface — especially valuable when paired with LLMs or other language-based systems.
By leveraging mongodb-mcp_server
, the agent can securely and efficiently interface with MongoDB instances in managed environments, following best practices for access control and scalability.
Key Features:
- Interpret structured or natural language inputs into valid MongoDB queries (MQL).
- Execute queries via MongoDB-MCP_Server with secure connection handling.
- Support for standard CRUD operations and aggregation pipelines.
- Integration with LLMs for natural language to MQL translation.
- Validation and error handling for safe query generation.
Tools used for MQL Agent Development:
Below are the custom tools created for the Agent:-
- get_collection - Fetches all collections in the database.
- get_schema - Retrieves the schema for each collection to guide query formation.
- get_aggregate_query - Generates an aggregation pipeline based on the collection’s schema.
- execute_query - Executes the final MQL generated by the LLM (e.g., coder models like Qwen or DeepSeek).
Feedback & Observations:
I experimented with the existing mql-agent from MongoDB. While it worked well for smaller datasets (5–10 collections), I found that accuracy degraded significantly when scaling to 150–200 collections, which is my typical use case. The generated queries became inconsistent or incomplete, likely due to the increased schema complexity.
I've also explored integrating open-source LLMs (such as Qwen-14B-Coder and DeepSeek-Coder) to maintain data privacy and confidentiality. However, the results were not consistently accurate or production-ready.
Request for Input:
I'd appreciate guidance or best practices on the following:
- How to efficiently scale MQL generation across larger schemas (150–200 collections) using mongodb-mcp_server?
- Recommendations for integrating open-source LLMs (e.g., Qwen-14B-Coder, DeepSeek-Coder) with mongodb-mcp_server to improve MQL generation accuracy while keeping data private?
- How to best utilize
mongodb-mcp_server
for my use case, including architecture suggestions, API usage patterns, or example workflows that would help handle large-scale collection structures effectively?