基于 LlamaIndex 的多租户 RAG 系统¶
本笔记本将指导您使用 LlamaIndex 构建多租户 RAG(检索增强生成)系统。
- 环境配置
- 下载数据
- 加载数据
- 创建索引
- 构建数据摄取管道
- 更新元数据并插入文档
- 为每个用户定义查询引擎
- 执行查询
安装¶
请确保已安装 llama-index 和 pypdf 这两个依赖包。
!pip install llama-index pypdf
设置OpenAI密钥¶
import os
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"
from llama_index.core import VectorStoreIndex
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
from llama_index.core import SimpleDirectoryReader
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from IPython.display import HTML
下载数据¶
我们将使用《面向并行函数调用的LLM编译器》和《密集X检索:应采用何种检索粒度?》两篇论文进行演示。
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.04511.pdf" -O "llm_compiler.pdf"
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.06648.pdf" -O "dense_x_retrieval.pdf"
--2024-01-15 14:29:26-- https://arxiv.org/pdf/2312.04511.pdf Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.67.42, 151.101.3.42, ... Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 755837 (738K) [application/pdf] Saving to: ‘llm_compiler.pdf’ llm_compiler.pdf 0%[ ] 0 --.-KB/s llm_compiler.pdf 100%[===================>] 738.12K --.-KB/s in 0.004s 2024-01-15 14:29:26 (163 MB/s) - ‘llm_compiler.pdf’ saved [755837/755837] --2024-01-15 14:29:26-- https://arxiv.org/pdf/2312.06648.pdf Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.67.42, 151.101.3.42, ... Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1103758 (1.1M) [application/pdf] Saving to: ‘dense_x_retrieval.pdf’ dense_x_retrieval.p 100%[===================>] 1.05M --.-KB/s in 0.005s 2024-01-15 14:29:26 (208 MB/s) - ‘dense_x_retrieval.pdf’ saved [1103758/1103758]
加载数据¶
reader = SimpleDirectoryReader(input_files=["dense_x_retrieval.pdf"])
documents_jerry = reader.load_data()
reader = SimpleDirectoryReader(input_files=["llm_compiler.pdf"])
documents_ravi = reader.load_data()
创建空索引¶
index = VectorStoreIndex.from_documents(documents=[])
创建数据摄取管道¶
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=512, chunk_overlap=20),
]
)
更新元数据与插入文档¶
for document in documents_jerry:
document.metadata["user"] = "Jerry"
nodes = pipeline.run(documents=documents_jerry)
# Insert nodes into the index
index.insert_nodes(nodes)
for document in documents_ravi:
document.metadata["user"] = "Ravi"
nodes = pipeline.run(documents=documents_ravi)
# Insert nodes into the index
index.insert_nodes(nodes)
定义查询引擎¶
为具备必要筛选条件的用户定义查询引擎。
# For Jerry
jerry_query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[
ExactMatchFilter(
key="user",
value="Jerry",
)
]
),
similarity_top_k=3,
)
# For Ravi
ravi_query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[
ExactMatchFilter(
key="user",
value="Ravi",
)
]
),
similarity_top_k=3,
)
查询¶
# Jerry has Dense X Rerieval paper and should be able to answer following question.
response = jerry_query_engine.query(
"what are propositions mentioned in the paper?"
)
# Print response
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
{response.response}
'))The paper mentions propositions as an alternative retrieval unit choice. Propositions are defined as atomic expressions of meanings in text that correspond to distinct pieces of meaning in the text. They are minimal and cannot be further split into separate propositions. Each proposition is contextualized and self-contained, including all the necessary context from the text to interpret its meaning. The paper demonstrates the concept of propositions using an example about the Leaning Tower of Pisa, where the passage is split into three propositions, each corresponding to a distinct factoid about the tower.
# Ravi has LLMCompiler paper
response = ravi_query_engine.query("what are steps involved in LLMCompiler?")
# Print response
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
{response.response}
'))LLMCompiler consists of three key components: an LLM Planner, a Task Fetching Unit, and an Executor. The LLM Planner identifies the execution flow by defining different function calls and their dependencies based on user inputs. The Task Fetching Unit dispatches the function calls that can be executed in parallel after substituting variables with the actual outputs of preceding tasks. Finally, the Executor executes the dispatched function calling tasks using the associated tools. These components work together to optimize the parallel function calling performance of LLMs.
# This should not be answered as Jerry does not have information about LLMCompiler
response = jerry_query_engine.query("what are steps involved in LLMCompiler?")
# Print response
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
{response.response}
'))The steps involved in LLMCompiler are not mentioned in the given context information.