UpTrain 回调处理器¶

UpTrain (github || 官网 || 文档) 是一个用于评估和改进生成式AI应用的开源平台。它提供20+项预配置检查的评分（涵盖语言、代码、嵌入用例），对失败案例进行根因分析，并给出解决方案建议。

本笔记本演示如何使用UpTrain回调处理器来评估RAG管道的各个组件。

1. RAG查询引擎评估：¶

RAG查询引擎在检索上下文和生成响应中起关键作用。为确保其性能和响应质量，我们进行以下评估：

上下文相关性：判断检索到的上下文是否包含足够信息来回答用户查询
事实准确性：评估LLM的响应是否可通过检索到的上下文验证
响应完整性：检查响应是否包含全面回答用户查询所需的全部信息

2. 子问题查询生成评估：¶

SubQuestionQueryGeneration算子将问题分解为子问题，使用RAG查询引擎为每个子问题生成响应。我们通过以下指标衡量其准确性：

子查询完整性：确保子问题准确且全面地覆盖原始查询

3. 重排序评估：¶

重排序涉及根据查询相关性重新排序节点并选择顶部节点。根据重排序后返回的节点数量，执行不同的评估：

a. 相同节点数量

上下文重排序：检查重排序后的节点顺序是否比原始顺序更符合查询相关性

b. 不同节点数量：

上下文简洁性：验证减少后的节点数量是否仍能提供所有必需信息

这些评估共同确保LlamaIndex管道中RAG查询引擎、SubQuestionQueryGeneration算子和重排序过程的鲁棒性与有效性。

注意：¶

我们已使用基础 RAG 查询引擎进行评估，同样的评估也可使用高级 RAG 查询引擎进行。
重排序评估同理，我们已使用 SentenceTransformerRerank 进行评估，同样的评估也可使用其他重排序器完成。

安装依赖项并导入库¶

安装 Notebook 所需的依赖项。

In [ ]:

Copied!

%pip install llama-index-readers-web
%pip install llama-index-callbacks-uptrain
%pip install -q html2text llama-index pandas tqdm uptrain torch sentence-transformers
%pip install llama-index-readers-web
%pip install llama-index-callbacks-uptrain
%pip install -q html2text llama-index pandas tqdm uptrain torch sentence-transformers

导入库。

In [ ]:

Copied!





from getpass import getpass

from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core.callbacks import CallbackManager
from llama_index.callbacks.uptrain.base import UpTrainCallbackHandler
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.postprocessor import SentenceTransformerRerank

import os
from getpass import getpass

from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core.callbacks import CallbackManager
from llama_index.callbacks.uptrain.base import UpTrainCallbackHandler
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.postprocessor import SentenceTransformerRerank

import os

安装配置¶

UpTrain 为您提供以下功能：

具备高级下钻和筛选选项的可视化仪表盘
失败案例中的关键洞察与共性主题分析
生产数据的可观测性与实时监控
通过与 CI/CD 管道无缝集成的回归测试

您可以通过以下方式选择使用 UpTrain 进行评估：

1. UpTrain 开源软件（OSS）：¶

您可以使用开源评估服务来评估模型。这种情况下需要提供 OpenAI API 密钥，可通过此链接获取。

如需在 UpTrain 仪表盘中查看评估结果，请通过终端运行以下命令进行配置：

git clone https://github.com/uptrain-ai/uptrain
cd uptrain
bash run_uptrain.sh

这将在本地启动 UpTrain 仪表盘，访问地址为 http://localhost:3000/dashboard。

参数说明：

key_type="openai"
api_key="OPENAI_API_KEY"
project_name="PROJECT_NAME"

2. UpTrain 托管服务与仪表盘：¶

您也可以选择使用 UpTrain 托管服务进行模型评估。可通过此链接创建免费账户获取试用额度。如需更多试用额度，请点击此处与 UpTrain 维护团队预约通话。

托管服务的优势包括：

无需在本地搭建 UpTrain 仪表盘
可直接使用多种大语言模型而无需其 API 密钥

完成评估后，可通过 https://dashboard.uptrain.ai/dashboard 查看评估结果

参数说明：

key_type="uptrain"
api_key="UPTRAIN_API_KEY"
project_name="PROJECT_NAME"

注意： project_name 参数将作为项目名称显示在 UpTrain 仪表盘的评估结果中。

创建 UpTrain 回调处理器¶

In [ ]:

Copied!





os.environ["OPENAI_API_KEY"] = getpass()

callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    project_name="uptrain_llamaindex",
)

Settings.callback_manager = CallbackManager([callback_handler])
os.environ["OPENAI_API_KEY"] = getpass()

callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    project_name="uptrain_llamaindex",
)

Settings.callback_manager = CallbackManager([callback_handler])

加载与解析文档¶

从保罗·格雷厄姆（Paul Graham）的散文《我的工作历程》中加载文档。

In [ ]:

Copied!





documents = SimpleWebPageReader().load_data(
    [
        "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
    ]
)
documents = SimpleWebPageReader().load_data(
    [
        "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
    ]
)

将文档解析为节点。

In [ ]:

Copied!

parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

1. RAG 查询引擎评估¶

UpTrain回调处理器将在生成后自动捕获查询、上下文和响应，并对响应运行以下三项评估(评分范围0到1)：

上下文相关性：判断检索到的上下文是否包含足够信息来回答用户查询
事实准确性：评估大语言模型的响应是否可通过检索上下文进行验证
响应完整性：检查响应是否包含全面回答用户查询所需的全部信息

In [ ]:

Copied!





index = VectorStoreIndex.from_documents(
    documents,
)
query_engine = index.as_query_engine()

max_characters_per_line = 80
queries = [
    "What did Paul Graham do growing up?",
    "When and how did Paul Graham's mother die?",
    "What, in Paul Graham's opinion, is the most distinctive thing about YC?",
    "When and how did Paul Graham meet Jessica Livingston?",
    "What is Bel, and when and where was it written?",
]
for query in queries:
    response = query_engine.query(query)
index = VectorStoreIndex.from_documents(
    documents,
)
query_engine = index.as_query_engine()

max_characters_per_line = 80
queries = [
    "What did Paul Graham do growing up?",
    "When and how did Paul Graham's mother die?",
    "What, in Paul Graham's opinion, is the most distinctive thing about YC?",
    "When and how did Paul Graham meet Jessica Livingston?",
    "What is Bel, and when and where was it written?",
]
for query in queries:
    response = query_engine.query(query)

100%|██████████| 1/1 [00:01<00:00,  1.33s/it]
100%|██████████| 1/1 [00:01<00:00,  1.36s/it]
100%|██████████| 1/1 [00:03<00:00,  3.50s/it]
100%|██████████| 1/1 [00:01<00:00,  1.32s/it]

Question: What did Paul Graham do growing up?
Response: Growing up, Paul Graham worked on writing short stories and programming. He started programming on an IBM 1401 in 9th grade using an early version of Fortran. Later, he got a TRS-80 computer and wrote simple games, a rocket prediction program, and a word processor. Despite his interest in programming, he initially planned to study philosophy in college before eventually switching to AI.

Context Relevance Score: 0.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

100%|██████████| 1/1 [00:01<00:00,  1.59s/it]
100%|██████████| 1/1 [00:00<00:00,  1.01it/s]
100%|██████████| 1/1 [00:01<00:00,  1.76s/it]
100%|██████████| 1/1 [00:01<00:00,  1.28s/it]

Question: When and how did Paul Graham's mother die?
Response: Paul Graham's mother died when he was 18 years old, from a brain tumor.

Context Relevance Score: 0.0
Factual Accuracy Score: 0.0
Response Completeness Score: 0.5

100%|██████████| 1/1 [00:01<00:00,  1.75s/it]
100%|██████████| 1/1 [00:01<00:00,  1.55s/it]
100%|██████████| 1/1 [00:03<00:00,  3.39s/it]
100%|██████████| 1/1 [00:01<00:00,  1.48s/it]

Question: What, in Paul Graham's opinion, is the most distinctive thing about YC?
Response: The most distinctive thing about Y Combinator, according to Paul Graham, is that instead of deciding for himself what to work on, the problems come to him. Every 6 months, a new batch of startups brings their problems, which then become the focus of YC. This engagement with a variety of startup problems and the direct involvement in solving them is what Graham finds most unique about Y Combinator.

Context Relevance Score: 1.0
Factual Accuracy Score: 0.3333333333333333
Response Completeness Score: 1.0

100%|██████████| 1/1 [00:01<00:00,  1.92s/it]
100%|██████████| 1/1 [00:00<00:00,  1.20it/s]
100%|██████████| 1/1 [00:02<00:00,  2.15s/it]
100%|██████████| 1/1 [00:01<00:00,  1.08s/it]

Question: When and how did Paul Graham meet Jessica Livingston?
Response: Paul Graham met Jessica Livingston at a big party at his house in October 2003.

Context Relevance Score: 1.0
Factual Accuracy Score: 0.5
Response Completeness Score: 1.0

100%|██████████| 1/1 [00:01<00:00,  1.82s/it]
100%|██████████| 1/1 [00:01<00:00,  1.14s/it]
100%|██████████| 1/1 [00:03<00:00,  3.19s/it]
100%|██████████| 1/1 [00:01<00:00,  1.50s/it]

Question: What is Bel, and when and where was it written?
Response: Bel is a new Lisp that was written in Arc. It was developed over a period of 4 years, from March 26, 2015 to October 12, 2019. The majority of Bel was written in England.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0

2. 子问题查询引擎评估¶

子问题查询引擎用于解决通过多数据源回答复杂查询的问题。该引擎首先将复杂查询分解为针对每个相关数据源的子问题，然后收集所有中间响应并综合生成最终答案。

UpTrain回调处理器会自动捕获每个生成的子问题及其响应，并对响应运行以下三项评估（评分范围为0到1）：

上下文相关性：判断检索到的上下文是否包含足够信息来回答用户查询
事实准确性：评估大语言模型的响应是否可以通过检索到的上下文验证
响应完整性：检查响应是否包含全面回答用户查询所需的全部信息

除上述评估外，回调处理器还将运行以下评估：

子查询完整性：确保子问题准确且全面地覆盖原始查询

In [ ]:

Copied!





# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
    documents=documents,
    use_async=True,
).as_query_engine()

query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="documents",
            description="Paul Graham essay on What I Worked On",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

response = query_engine.query(
    "How was Paul Grahams life different before, during, and after YC?"
)
# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
    documents=documents,
    use_async=True,
).as_query_engine()

query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="documents",
            description="Paul Graham essay on What I Worked On",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

response = query_engine.query(
    "How was Paul Grahams life different before, during, and after YC?"
)

Generated 3 sub questions.
[documents] Q: What did Paul Graham work on before YC?
[documents] Q: What did Paul Graham work on during YC?
[documents] Q: What did Paul Graham work on after YC?
[documents] A: After Y Combinator, Paul Graham decided to focus on painting as his next endeavor.
[documents] A: Paul Graham worked on writing essays and working on Y Combinator during YC.
[documents] A: Before Y Combinator, Paul Graham worked on projects with his colleagues Robert and Trevor.

100%|██████████| 3/3 [00:02<00:00,  1.47it/s]
100%|██████████| 3/3 [00:00<00:00,  3.28it/s]
100%|██████████| 3/3 [00:01<00:00,  1.68it/s]
100%|██████████| 3/3 [00:01<00:00,  2.28it/s]

Question: What did Paul Graham work on after YC?
Response: After Y Combinator, Paul Graham decided to focus on painting as his next endeavor.

Context Relevance Score: 0.0
Factual Accuracy Score: 0.0
Response Completeness Score: 0.5


Question: What did Paul Graham work on during YC?
Response: Paul Graham worked on writing essays and working on Y Combinator during YC.

Context Relevance Score: 0.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5


Question: What did Paul Graham work on before YC?
Response: Before Y Combinator, Paul Graham worked on projects with his colleagues Robert and Trevor.

Context Relevance Score: 0.0
Factual Accuracy Score: 0.0
Response Completeness Score: 0.5

100%|██████████| 1/1 [00:01<00:00,  1.24s/it]

Question: How was Paul Grahams life different before, during, and after YC?
Sub Query Completeness Score: 1.0

3. 重排序¶

重排序是根据节点与查询的相关性对节点进行重新排序的过程。Llamaindex 提供了多类重排序算法，本示例中我们采用了 LLMRerank 算法。

该重排序器允许您设置重排序后返回的 top n 节点数量。若该值与原始节点数量保持一致，重排序器仅会对节点进行重新排序而不改变节点数量；否则，它将重新排序并返回前 n 个节点。

我们将根据重排序后返回的节点数量进行不同的评估。

3a. 重排序（节点数量相同）¶

如果重排序后返回的节点数量与原始数量相同，将执行以下评估：

上下文重排序: 检查重排序后的节点顺序是否比原始顺序更符合查询意图。

In [ ]:

Copied!





callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    project_name="uptrain_llamaindex",
)
Settings.callback_manager = CallbackManager([callback_handler])

rerank_postprocessor = SentenceTransformerRerank(
    top_n=3,  # number of nodes after reranking
    keep_retrieval_score=True,
)

index = VectorStoreIndex.from_documents(
    documents=documents,
)

query_engine = index.as_query_engine(
    similarity_top_k=3,  # number of nodes before reranking
    node_postprocessors=[rerank_postprocessor],
)

response = query_engine.query(
    "What did Sam Altman do in this essay?",
)
callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    project_name="uptrain_llamaindex",
)
Settings.callback_manager = CallbackManager([callback_handler])

rerank_postprocessor = SentenceTransformerRerank(
    top_n=3,  # number of nodes after reranking
    keep_retrieval_score=True,
)

index = VectorStoreIndex.from_documents(
    documents=documents,
)

query_engine = index.as_query_engine(
    similarity_top_k=3,  # number of nodes before reranking
    node_postprocessors=[rerank_postprocessor],
)

response = query_engine.query(
    "What did Sam Altman do in this essay?",
)

100%|██████████| 1/1 [00:01<00:00,  1.89s/it]

Question: What did Sam Altman do in this essay?
Context Reranking Score: 1.0

100%|██████████| 1/1 [00:01<00:00,  1.88s/it]
100%|██████████| 1/1 [00:01<00:00,  1.44s/it]
100%|██████████| 1/1 [00:02<00:00,  2.77s/it]
100%|██████████| 1/1 [00:01<00:00,  1.45s/it]

Question: What did Sam Altman do in this essay?
Response: Sam Altman was asked to become the president of Y Combinator after the original founders decided to step down and reorganize the company for long-term sustainability.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5

3b. 重排序（不同节点数量情况）¶

如果重排序后返回的节点数量少于原始节点数量，将执行以下评估：

上下文简洁性: 检验减少后的节点数量是否仍能提供所有必需信息。

In [ ]:

Copied!





callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    project_name="uptrain_llamaindex",
)
Settings.callback_manager = CallbackManager([callback_handler])

rerank_postprocessor = SentenceTransformerRerank(
    top_n=2,  # Number of nodes after re-ranking
    keep_retrieval_score=True,
)

index = VectorStoreIndex.from_documents(
    documents=documents,
)
query_engine = index.as_query_engine(
    similarity_top_k=5,  # Number of nodes before re-ranking
    node_postprocessors=[rerank_postprocessor],
)

# Use your advanced RAG
response = query_engine.query(
    "What did Sam Altman do in this essay?",
)
callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    project_name="uptrain_llamaindex",
)
Settings.callback_manager = CallbackManager([callback_handler])

rerank_postprocessor = SentenceTransformerRerank(
    top_n=2,  # Number of nodes after re-ranking
    keep_retrieval_score=True,
)

index = VectorStoreIndex.from_documents(
    documents=documents,
)
query_engine = index.as_query_engine(
    similarity_top_k=5,  # Number of nodes before re-ranking
    node_postprocessors=[rerank_postprocessor],
)

# Use your advanced RAG
response = query_engine.query(
    "What did Sam Altman do in this essay?",
)

100%|██████████| 1/1 [00:02<00:00,  2.22s/it]

Question: What did Sam Altman do in this essay?
Context Conciseness Score: 0.0

100%|██████████| 1/1 [00:01<00:00,  1.58s/it]
100%|██████████| 1/1 [00:00<00:00,  1.19it/s]
100%|██████████| 1/1 [00:01<00:00,  1.62s/it]
100%|██████████| 1/1 [00:01<00:00,  1.42s/it]

Question: What did Sam Altman do in this essay?
Response: Sam Altman offered unsolicited advice to the author during a visit to California for interviews.

Context Relevance Score: 0.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5

UpTrain 仪表盘与洞察分析¶

以下是一段展示仪表盘功能和洞察分析的短视频：