🚀 RAG/LLM 评估工具 - DeepEval¶

本代码教程展示如何轻松将 DeepEval 与 LlamaIndex 集成。DeepEval 让您能够便捷地对 RAG/LLM 进行单元测试。

您可以通过以下链接了解更多关于 DeepEval 框架的信息：https://docs.confident-ai.com/docs/getting-started

欢迎访问我们在 GitHub 上的代码仓库：https://github.com/confident-ai/deepeval

安装与配置¶

我们推荐通过 pip 进行安装和配置！

In [ ]:

Copied!

!pip install -q -q llama-index
!pip install -U -q deepeval
!pip install -q -q llama-index
!pip install -U -q deepeval

此步骤为可选操作，仅适用于需要服务器托管仪表盘的情况！（悄悄告诉你，我觉得你应该试试！）

In [ ]:

Copied!

!deepeval login
!deepeval login

指标类型¶

DeepEval 为 RAG 应用程序的单元测试提供了一套强规范的评估框架。该框架将评估拆分为测试用例，并为每个测试用例提供多种可自由选用的评估指标，包括：

G-Eval
摘要质量
答案相关性
忠实度
上下文召回率
上下文精确度
上下文相关性
RAGAS
幻觉率
偏见度
毒性检测

DeepEval 将最新研究成果融入其评估指标中，这些指标随后被用于驱动 LlamaIndex 的评估器。您可以通过此链接了解完整的指标列表及其计算方式。

第一步 - 配置您的 LlamaIndex 应用¶

In [ ]:

Copied!





from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Read LlamaIndex's quickstart on more details, you will need to store your data in "YOUR_DATA_DIRECTORY" beforehand
documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
index = VectorStoreIndex.from_documents(documents)
rag_application = index.as_query_engine()
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Read LlamaIndex's quickstart on more details, you will need to store your data in "YOUR_DATA_DIRECTORY" beforehand
documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
index = VectorStoreIndex.from_documents(documents)
rag_application = index.as_query_engine()

第二步 - 使用 DeepEval 的 RAG/LLM 评估器¶

DeepEval 提供了 6 种开箱即用的评估器，部分适用于 RAG（检索增强生成），部分直接面向 LLM 输出（但同样适用于 RAG）。让我们尝试使用 faithfulness 评估器（用于评估 RAG 中的幻觉问题）：

In [ ]:

Copied!





from deepeval.integrations.llamaindex import DeepEvalFaithfulnessEvaluator

# An example input to your RAG application
user_input = "What is LlamaIndex?"

# LlamaIndex returns a response object that contains
# both the output string and retrieved nodes
response_object = rag_application.query(user_input)

evaluator = DeepEvalFaithfulnessEvaluator()
evaluation_result = evaluator.evaluate_response(
    query=user_input, response=response_object
)
print(evaluation_result)
from deepeval.integrations.llamaindex import DeepEvalFaithfulnessEvaluator

# An example input to your RAG application
user_input = "What is LlamaIndex?"

# LlamaIndex returns a response object that contains
# both the output string and retrieved nodes
response_object = rag_application.query(user_input)

evaluator = DeepEvalFaithfulnessEvaluator()
evaluation_result = evaluator.evaluate_response(
    query=user_input, response=response_object
)
print(evaluation_result)

评估器完整列表¶

以下是你可以从 deepeval 导入全部 6 个评估器的方式：

from deepeval.integrations.llama_index import (
    DeepEvalAnswerRelevancyEvaluator,
    DeepEvalFaithfulnessEvaluator,
    DeepEvalContextualRelevancyEvaluator,
    DeepEvalSummarizationEvaluator,
    DeepEvalBiasEvaluator,
    DeepEvalToxicityEvaluator,
)

如需查看所有评估器的定义并了解其如何与 DeepEval 测试套件集成，请点击此处。

🚀 RAG/LLM 评估工具 - DeepEval¶

安装与配置¶

指标类型¶

第一步 - 配置您的 LlamaIndex 应用¶

第二步 - 使用 DeepEval 的 RAG/LLM 评估器¶

评估器完整列表¶

实用链接¶