From 67d07f86f0367e5285638024d5786404e71692a0 Mon Sep 17 00:00:00 2001 From: Paula Date: Thu, 9 Oct 2025 16:55:42 +0200 Subject: [PATCH] added graphrag uses cases, reorganize content --- .../3.12/data-science/graphrag/_index.md | 169 ++++-------------- .../graphrag/technical-overview.md | 161 +++++++++++++++++ .../graphrag/tutorial-notebook.md | 2 +- .../3.12/data-science/graphrag/use-cases.md | 87 +++++++++ .../data-science/graphrag/web-interface.md | 2 +- 5 files changed, 284 insertions(+), 137 deletions(-) create mode 100644 site/content/3.12/data-science/graphrag/technical-overview.md create mode 100644 site/content/3.12/data-science/graphrag/use-cases.md diff --git a/site/content/3.12/data-science/graphrag/_index.md b/site/content/3.12/data-science/graphrag/_index.md index a58b76c7e3..10038bb9d4 100644 --- a/site/content/3.12/data-science/graphrag/_index.md +++ b/site/content/3.12/data-science/graphrag/_index.md @@ -1,7 +1,7 @@ --- title: GraphRAG menuTitle: GraphRAG -weight: 10 +weight: 5 description: >- ArangoDB's GraphRAG solution combines graph-based retrieval-augmented generation with Large Language Models (LLMs) for turbocharged GenAI solutions @@ -16,148 +16,47 @@ exclusive early access, [get in touch](https://arangodb.com/contact/) with the ArangoDB team. {{< /tip >}} -## Introduction +## Transform unstructured documents into intelligent knowledge graphs -Large language models (LLMs) and knowledge graphs are two prominent and -contrasting concepts, each possessing unique characteristics and functionalities -that significantly impact the methods we employ to extract valuable insights from -constantly expanding and complex datasets. +ArangoDB's GraphRAG solution enables organizations to extract meaningful insights +from their document collections by creating knowledge graphs that capture not just +individual facts, but the intricate relationships between concepts across documents. +This approach goes beyond traditional RAG systems by understanding document +interconnections and providing both granular detail-level responses and high-level +conceptual understanding. -LLMs, such as those powering OpenAI's ChatGPT, represent a class of powerful language -transformers. These models leverage advanced neural networks to exhibit a -remarkable proficiency in understanding, generating, and participating in -contextually-aware conversations. +- **Intelligent document understanding**: Automatically extracts and connects knowledge across multiple document sources +- **Contextual intelligence**: Maintains relationships between concepts, enabling more accurate and comprehensive responses +- **Multi-level insights**: Provides both detailed technical answers and strategic high-level understanding +- **Seamless knowledge access**: Natural language interface for querying complex document relationships -On the other hand, knowledge graphs contain carefully structured data and are -designed to capture intricate relationships among discrete and seemingly -unrelated information. +## Key benefits for enterprise applications -ArangoDB's unique capabilities and flexible integration of knowledge graphs and -LLMs provide a powerful and efficient solution for anyone seeking to extract -valuable insights from diverse datasets. +- **Cross-document relationship intelligence**: +Unlike traditional RAG systems that treat documents in isolation, ArangoDB's GraphRAG +pipeline detects and leverages references between documents and chunks. This enables +more accurate responses by understanding how concepts relate across your entire knowledge base. -The GraphRAG component of the GenAI Suite brings all the capabilities -together with an easy-to-use interface, so you can make the knowledge accessible -to your organization. +- **Multi-level understanding architecture**: +The system provides both detailed technical responses and high-level strategic insights +from the same knowledge base, adapting response depth based on query complexity and user intent. -GraphRAG is particularly valuable for use cases like the following: -- Applications requiring in-depth knowledge retrieval -- Contextual question answering -- Reasoning over interconnected information +- **Reference-aware knowledge graph**: +GraphRAG automatically detects and maps relationships between document chunks while +maintaining context of how information connects across different sources. -## How GraphRAG works +- **Dynamic knowledge evolution**: +The system learns and improves understanding as more documents are added, with +relationships and connections becoming more sophisticated over time. -ArangoDB's GraphRAG solution democratizes the creation and usage of knowledge -graphs with a unique combination of vector search, graphs, and LLMs (privately or publicly hosted) -in a single product. -The overall workflow involves the following steps: -1. **Chunking**: - - Breaking down raw documents into text chunks -2. **Entity and relation extraction for Knowledge Graph construction**: - - LLM-assisted description of entities and relations - - Entities get inserted as nodes with embeddings - - Relations get inserted as edges, these include: entity-entity, entity-chunk, chunk-document -3. **Topology-based clustering into mini-topics (called communities)**: - - Each entity points to its community - - Each community points to its higher-level community, if available - (mini-topics point to major topics) -4. **LLM-assisted community summarization**: - - Community summarization is based on all information available about each topic +## What's next -### Turn text files into a Knowledge Graph +- **[GraphRAG Enterprise Use Cases](use-cases.md)**: Understand the business value through real-world scenarios. +- **[GraphRAG Technical Overview](technical-overview.md)**: Dive into the architecture, services, and implementation details. +- **[GraphRAG Web Interface](web-interface.md)**: Try GraphRAG using the interactive web interface. +- **[GraphRAG Tutorial using integrated Notebook servers](tutorial-notebook.md)**: Follow hands-on examples and implementation guidance via Jupyter Notebooks. -The Importer service is the entry point of the GraphRAG pipeline. It takes a -raw text file as input, processes it using an LLM to extract entities and -relationships, and generates a Knowledge Graph. The Knowledge Graph is then -stored in an ArangoDB database for further use. The Knowledge Graph represents -information in a structured graph format, allowing efficient querying and retrieval. - -1. Pre-process the raw text file to identify entities and their relationships. -2. Use LLMs to infer connections and context, enriching the Knowledge Graph. -3. Store the generated Knowledge Graph in the database for retrieval and reasoning. - -For detailed information about the service, see the -[Importer](services/importer.md) service documentation. - -### Extract information from the Knowledge Graph - -The Retriever service enables intelligent search and retrieval of information -from your previously created Knowledge Graph. -You can extract information from Knowledge Graphs using two distinct methods: -- Global retrieval -- Local retrieval - -For detailed information about the service, see the -[Retriever](services/retriever.md) service documentation. - -#### Global retrieval - -Global retrieval focuses on: -- Extracting information from the entire Knowledge Graph, regardless of specific - contexts or constraints. -- Provides a comprehensive overview and answers queries that span across multiple - entities and relationships in the graph. - -**Use cases:** -- Answering broad questions that require a holistic understanding of the Knowledge Graph. -- Aggregating information from diverse parts of the Knowledge Graph for high-level insights. - -**Example query:** - -Global retrieval can answer questions like _**What are the main themes or topics covered in the document**_? - -During import, the entire Knowledge Graph is analyzed to identify and summarize -the dominant entities, their relationships, and associated themes. Global -retrieval uses these community summaries to answer questions from different -perspectives, then the information gets aggregated into the final response. - -#### Local retrieval - -Local retrieval is a more focused approach for: -- Queries that are constrained to specific subgraphs or contextual clusters - within the Knowledge Graph. -- Targeted and precise information extraction, often using localized sections - of the Knowledge Graph. - -**Use cases:** -- Answering detailed questions about a specific entity or a related group of entities. -- Retrieving information relevant to a particular topic or section in the Knowledge Graph. - -**Example query:** - -Local retrieval can answer questions like _**What is the relationship between entity X and entity Y**_? - -Local queries use hybrid search (semantic and lexical) over the Entities -collection, and then it expands that subgraph over related entities, relations -(and its LLM-generated verbal descriptions), text chunks, and communities. - -### Private LLMs - -If you're working in an air-gapped environment or need to keep your data -private, you can use the private LLM mode with -[Triton Inference Server](services/triton-inference-server.md). - -This option allows you to run the service completely within your own -infrastructure. The Triton Inference Server is a crucial component when -running in private LLM mode. It serves as the backbone for running your -language (LLM) and embedding models on your own machines, ensuring your -data never leaves your infrastructure. The server handles all the complex -model operations, from processing text to generating embeddings, and provides -both HTTP and gRPC interfaces for communication. - -### Public LLMs - -Alternatively, if you prefer a simpler setup and don't have specific privacy -requirements, you can use the public LLM mode. This option connects to cloud-based -services like OpenAI's models via the OpenAI API or a large array of models -(Gemini, Anthropic, publicly hosted open-source models, etc.) via the OpenRouter option. - -## Limitations - -The pre-release version of ArangoDB GraphRAG has the following limitations: - -- You can only import a single file. -- The knowledge graph generated from the file is imported into a named graph - with a fixed name of `KnowledgeGraph` and set of collections which also have - fixed names. +For deeper implementation details, explore the individual services: +- **[Importer Service](services/importer.md)**: Transform documents into knowledge graphs. +- **[Retriever Service](services/retriever.md)**: Query and extract insights from your knowledge graphs. \ No newline at end of file diff --git a/site/content/3.12/data-science/graphrag/technical-overview.md b/site/content/3.12/data-science/graphrag/technical-overview.md new file mode 100644 index 0000000000..4b987a3003 --- /dev/null +++ b/site/content/3.12/data-science/graphrag/technical-overview.md @@ -0,0 +1,161 @@ +--- +title: GraphRAG Technical Overview +menuTitle: Technical Overview +weight: 15 +description: >- + Technical overview of ArangoDB's GraphRAG solution, including + architecture, services, and deployment options +--- +{{< tag "ArangoDB Platform" >}} + +{{< tip >}} +The ArangoDB Platform & GenAI Suite is available as a pre-release. To get +exclusive early access, [get in touch](https://arangodb.com/contact/) with +the ArangoDB team. +{{< /tip >}} + +## Introduction + +Large language models (LLMs) and knowledge graphs are two prominent and +contrasting concepts, each possessing unique characteristics and functionalities +that significantly impact the methods we employ to extract valuable insights from +constantly expanding and complex datasets. + +LLMs, such as those powering OpenAI's ChatGPT, represent a class of powerful language +transformers. These models leverage advanced neural networks to exhibit a +remarkable proficiency in understanding, generating, and participating in +contextually-aware conversations. + +On the other hand, knowledge graphs contain carefully structured data and are +designed to capture intricate relationships among discrete and seemingly +unrelated information. + +ArangoDB's unique capabilities and flexible integration of knowledge graphs and +LLMs provide a powerful and efficient solution for anyone seeking to extract +valuable insights from diverse datasets. + +The GraphRAG component of the GenAI Suite brings all the capabilities +together with an easy-to-use interface, so you can make the knowledge accessible +to your organization. + +GraphRAG is particularly valuable for use cases like the following: +- Applications requiring in-depth knowledge retrieval +- Contextual question answering +- Reasoning over interconnected information + +## How GraphRAG works + +ArangoDB's GraphRAG solution democratizes the creation and usage of knowledge +graphs with a unique combination of vector search, graphs, and LLMs (privately or publicly hosted) +in a single product. + +The overall workflow involves the following steps: +1. **Chunking**: + - Breaking down raw documents into text chunks +2. **Entity and relation extraction for Knowledge Graph construction**: + - LLM-assisted description of entities and relations + - Entities get inserted as nodes with embeddings + - Relations get inserted as edges, these include: entity-entity, entity-chunk, chunk-document +3. **Topology-based clustering into mini-topics (called communities)**: + - Each entity points to its community + - Each community points to its higher-level community, if available + (mini-topics point to major topics) +4. **LLM-assisted community summarization**: + - Community summarization is based on all information available about each topic + +### Turn text files into a Knowledge Graph + +The Importer service is the entry point of the GraphRAG pipeline. It takes a +raw text file as input, processes it using an LLM to extract entities and +relationships, and generates a Knowledge Graph. The Knowledge Graph is then +stored in an ArangoDB database for further use. The Knowledge Graph represents +information in a structured graph format, allowing efficient querying and retrieval. + +1. Pre-process the raw text file to identify entities and their relationships. +2. Use LLMs to infer connections and context, enriching the Knowledge Graph. +3. Store the generated Knowledge Graph in the database for retrieval and reasoning. + +For detailed information about the service, see the +[Importer](services/importer.md) service documentation. + +### Extract information from the Knowledge Graph + +The Retriever service enables intelligent search and retrieval of information +from your previously created Knowledge Graph. +You can extract information from Knowledge Graphs using two distinct methods: +- Global retrieval +- Local retrieval + +For detailed information about the service, see the +[Retriever](services/retriever.md) service documentation. + +#### Global retrieval + +Global retrieval focuses on: +- Extracting information from the entire Knowledge Graph, regardless of specific + contexts or constraints. +- Provides a comprehensive overview and answers queries that span across multiple + entities and relationships in the graph. + +**Use cases:** +- Answering broad questions that require a holistic understanding of the Knowledge Graph. +- Aggregating information from diverse parts of the Knowledge Graph for high-level insights. + +**Example query:** + +Global retrieval can answer questions like _**What are the main themes or topics covered in the document**_? + +During import, the entire Knowledge Graph is analyzed to identify and summarize +the dominant entities, their relationships, and associated themes. Global +retrieval uses these community summaries to answer questions from different +perspectives, then the information gets aggregated into the final response. + +#### Local retrieval + +Local retrieval is a more focused approach for: +- Queries that are constrained to specific subgraphs or contextual clusters + within the Knowledge Graph. +- Targeted and precise information extraction, often using localized sections + of the Knowledge Graph. + +**Use cases:** +- Answering detailed questions about a specific entity or a related group of entities. +- Retrieving information relevant to a particular topic or section in the Knowledge Graph. + +**Example query:** + +Local retrieval can answer questions like _**What is the relationship between entity X and entity Y**_? + +Local queries use hybrid search (semantic and lexical) over the Entities +collection, and then it expands that subgraph over related entities, relations +(and its LLM-generated verbal descriptions), text chunks, and communities. + +### Private LLMs + +If you're working in an air-gapped environment or need to keep your data +private, you can use the private LLM mode with +[Triton Inference Server](services/triton-inference-server.md). + +This option allows you to run the service completely within your own +infrastructure. The Triton Inference Server is a crucial component when +running in private LLM mode. It serves as the backbone for running your +language (LLM) and embedding models on your own machines, ensuring your +data never leaves your infrastructure. The server handles all the complex +model operations, from processing text to generating embeddings, and provides +both HTTP and gRPC interfaces for communication. + +### Public LLMs + +Alternatively, if you prefer a simpler setup and don't have specific privacy +requirements, you can use the public LLM mode. This option connects to cloud-based +services like OpenAI's models via the OpenAI API or a large array of models +(Gemini, Anthropic, publicly hosted open-source models, etc.) via the OpenRouter option. + +## Limitations + +The pre-release version of ArangoDB GraphRAG has the following limitations: + +- You can only import a single file. +- The knowledge graph generated from the file is imported into a named graph + with a fixed name of `KnowledgeGraph` and set of collections which also have + fixed names. diff --git a/site/content/3.12/data-science/graphrag/tutorial-notebook.md b/site/content/3.12/data-science/graphrag/tutorial-notebook.md index ee3b6183e0..ac18975b73 100644 --- a/site/content/3.12/data-science/graphrag/tutorial-notebook.md +++ b/site/content/3.12/data-science/graphrag/tutorial-notebook.md @@ -3,7 +3,7 @@ title: GraphRAG Notebook Tutorial menuTitle: Notebook Tutorial description: >- Building a GraphRAG pipeline using ArangoDB's integrated notebook servers -weight: 10 +weight: 25 --- {{< tag "ArangoDB Platform" >}} diff --git a/site/content/3.12/data-science/graphrag/use-cases.md b/site/content/3.12/data-science/graphrag/use-cases.md new file mode 100644 index 0000000000..9b08fe169d --- /dev/null +++ b/site/content/3.12/data-science/graphrag/use-cases.md @@ -0,0 +1,87 @@ +--- +title: GraphRAG Use Cases +menuTitle: Use Cases +weight: 10 +description: >- + Real-world enterprise use cases for ArangoDB's GraphRAG solution and + comparison with traditional RAG approaches, including business benefits + and practical applications +--- + +## GraphRAG Enterprise Use Cases + +Whether you are evaluating GraphRAG for your organization or looking to understand +its business applications, these real-world scenarios demonstrate how GraphRAG can transform the way you extract insights from your data. + +### Enterprise knowledge management + +**Scenario**: A consulting firm has accumulated thousands of PDF reports, research papers, +and client documents over years. Team members struggle to find relevant information +quickly and often miss connections between different projects. + +**GraphRAG solution**: The pipeline processes all documents, creating a knowledge graph +that understands how concepts relate across different projects and time periods. Team +members can now ask questions like "What approaches have we used for supply chain +optimization across different industries?" and get comprehensive answers that reference +multiple documents and projects. + +**Business value**: +- Reduces research time by 70% +- Improves proposal quality by leveraging past insights +- Enables knowledge sharing across teams + +### Research and development + +**Scenario**: A pharmaceutical company's R&D team needs to analyze research papers, +clinical trial data, and regulatory documents to identify potential drug interactions +and development pathways. + +**GraphRAG solution**: The system processes all research documents, clinical data, and +regulatory information, creating connections between different studies and findings. +Researchers can query complex relationships like "What are the common side effects +mentioned across all Phase II trials for similar compounds?" + +**Business value**: +- Accelerates research insights +- Reduces risk of missing critical connections +- Improves decision-making speed + +### Legal document analysis + +**Scenario**: A law firm needs to analyze case law, legal precedents, and client +documents to build comprehensive legal strategies. + +**GraphRAG solution**: The pipeline processes legal documents, creating a knowledge +graph that understands legal precedents, case relationships, and argument patterns. +Lawyers can ask complex questions like "How have similar contract disputes been +resolved in different jurisdictions?" + +**Business value**: +- Improves case preparation quality +- Reduces research time +- Enables more comprehensive legal strategies + +## GraphRAG versus Traditional RAG (VectorRAG) + +Traditional RAG systems find text chunks that are semantically similar to your query. +However, they don't understand the inherent relationships between these chunks. + +For example, when asked, "What is the fix for Issue A?", a VectorRAG system might +retrieve two separate, unstructured chunks: one describing "Issue A" and another +mentioning a "Fix 1" for a related system. Because the connection isn't explicit, +the LLM cannot confidently link them and will often default to a safe, unhelpful answer: + +**VectorRAG Response**: _"The context does not provide a specific fix for Issue A."_ + +GraphRAG overcomes this limitation by retrieving a subgraph of interconnected data. +Instead of just text, it provides the LLM with a clear map of how information is related. + +For the same question, GraphRAG fetches structured triplets (node-relationship-node), +such as `(Issue A) -> [HAS_FIX] -> (Fix 1)`. This context is unambiguous, it explicitly +states the relationship between the problem and the solution, allowing the LLM to +provide a direct and correct answer: + +**GraphRAG Response**: _"The fix for Issue A is Fix 1."_ + +The key difference is that VectorRAG gives the LLM a pile of ingredients, while GraphRAG +provides the actual recipe. \ No newline at end of file diff --git a/site/content/3.12/data-science/graphrag/web-interface.md b/site/content/3.12/data-science/graphrag/web-interface.md index 195659839a..fd8897bd50 100644 --- a/site/content/3.12/data-science/graphrag/web-interface.md +++ b/site/content/3.12/data-science/graphrag/web-interface.md @@ -1,7 +1,7 @@ --- title: How to use GraphRAG in the ArangoDB Platform web interface menuTitle: Web Interface -weight: 5 +weight: 20 description: >- Learn how to create, configure, and run a full GraphRAG workflow in four steps using the Platform web interface