🚀 RAG/LLM 评估工具 - DeepEval¶
本代码教程展示如何轻松将 DeepEval 与 LlamaIndex 集成。DeepEval 让您能够便捷地对 RAG/LLM 进行单元测试。
您可以通过以下链接了解更多关于 DeepEval 框架的信息:https://docs.confident-ai.com/docs/getting-started
欢迎访问我们在 GitHub 上的代码仓库:https://github.com/confident-ai/deepeval
安装与配置¶
我们推荐通过 pip 进行安装和配置!
In [ ]:
Copied!
!pip install -q -q llama-index
!pip install -U -q deepeval
!pip install -q -q llama-index
!pip install -U -q deepeval
此步骤为可选操作,仅适用于需要服务器托管仪表盘的情况!(悄悄告诉你,我觉得你应该试试!)
In [ ]:
Copied!
!deepeval login
!deepeval login
第一步 - 配置您的 LlamaIndex 应用¶
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Read LlamaIndex's quickstart on more details, you will need to store your data in "YOUR_DATA_DIRECTORY" beforehand
documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
index = VectorStoreIndex.from_documents(documents)
rag_application = index.as_query_engine()
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Read LlamaIndex's quickstart on more details, you will need to store your data in "YOUR_DATA_DIRECTORY" beforehand
documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
index = VectorStoreIndex.from_documents(documents)
rag_application = index.as_query_engine()
第二步 - 使用 DeepEval 的 RAG/LLM 评估器¶
DeepEval 提供了 6 种开箱即用的评估器,部分适用于 RAG(检索增强生成),部分直接面向 LLM 输出(但同样适用于 RAG)。让我们尝试使用 faithfulness 评估器(用于评估 RAG 中的幻觉问题):
In [ ]:
Copied!
from deepeval.integrations.llamaindex import DeepEvalFaithfulnessEvaluator
# An example input to your RAG application
user_input = "What is LlamaIndex?"
# LlamaIndex returns a response object that contains
# both the output string and retrieved nodes
response_object = rag_application.query(user_input)
evaluator = DeepEvalFaithfulnessEvaluator()
evaluation_result = evaluator.evaluate_response(
query=user_input, response=response_object
)
print(evaluation_result)
from deepeval.integrations.llamaindex import DeepEvalFaithfulnessEvaluator
# An example input to your RAG application
user_input = "What is LlamaIndex?"
# LlamaIndex returns a response object that contains
# both the output string and retrieved nodes
response_object = rag_application.query(user_input)
evaluator = DeepEvalFaithfulnessEvaluator()
evaluation_result = evaluator.evaluate_response(
query=user_input, response=response_object
)
print(evaluation_result)
评估器完整列表¶
以下是你可以从 deepeval 导入全部 6 个评估器的方式:
from deepeval.integrations.llama_index import (
DeepEvalAnswerRelevancyEvaluator,
DeepEvalFaithfulnessEvaluator,
DeepEvalContextualRelevancyEvaluator,
DeepEvalSummarizationEvaluator,
DeepEvalBiasEvaluator,
DeepEvalToxicityEvaluator,
)
如需查看所有评估器的定义并了解其如何与 DeepEval 测试套件集成,请点击此处。