Skip to content

Prompt API: consider local use cases for RAG and Agentic RAG #8

@zolkis

Description

@zolkis

[Refers to / transferred from the Prompt API ]

As a possible complex local application to prompting, Retrieval-Augmented Generation (RAG) enhances AI systems by dynamically retrieving external (to the model) information to improve response accuracy and relevance. Agentic RAG extends these capabilities with autonomous decision-making and iterative refinement, particularly valuable in local deployments for enhanced privacy and customization.

Some of the main RAG use cases include:

  • Enhanced search & question answering: RAG improves search engines with up-to-date featured snippets and powers domain-specific QA systems that combine proprietary data (e.g., medical literature or legal documents) with foundational LLM knowledge.
  • Healthcare: Provides clinicians with real-time access to medical guidelines and research.
  • Legal: Accelerates case law research and compliance checks.
  • E-commerce: Delivers personalized product recommendations using customer behavior data.
  • Content generation: generates context-aware summaries, reports, and marketing copy by retrieving relevant source materials.
  • Enterprise knowledge management: enables chatbots to answer internal queries about company policies, manufacturing protocols, or regulatory updates.

Agentic RAG introduces autonomous decision-making layers to traditional RAG workflows (by using the LLMs not only for generating answers, but also preparing the prompts), enabling:

  • Adaptive problem solving: customer support agents offering discounts proactively.
  • Continuous learning: medical diagnosis systems updating with new research.
  • Multi-step reasoning: legal tools cross-referencing precedents and statutes.
  • Local deployment advantages: privacy-focused healthcare data analysis.

Traditional RAG relies on cloud-based models/vector DBs, where limited customization due to API constraints.

Agentic RAG in a local deployment have advantages like:

  • Data privacy: runs fully offline via tools like Langchain/Qdrant.
  • Customization: supports domain-specific models (e.g., Gemma-3 for multilingual tasks).
  • Cost Efficiency: eliminates API fees using local LLMs.
  • Transparency: provides verifiable source citations from internal databases.

A local Agentic RAG pipeline (e.g. RAGapp deployed via Docker) typically includes:

# Simplified architecture
local_llm = Ollama(model="llama3")  # Local model
vector_db = Qdrant(embeddings=FastEmbed())  # On-premise storage
agent = LangchainAgent(
    tools=[KnowledgeBase(retriever=vector_db)],
    system_prompt="Generate markdown responses with citations"
)

This approach is particularly impactful in regulated industries like healthcare and finance, where data sovereignty and low-latency responses are critical.

In the CG, please consider/discuss/decide if any of these use cases would make it worth developing support as a new type of standardized assistance API, or as example application of prompting (if apps are a better place to manage RAG).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions