自校正查询引擎 - 评估与重试机制¶

在本笔记本中，我们将展示几种具备自我修正能力的高级查询引擎。
它们利用最新大语言模型（LLM）的自我评估能力，通过自我修正机制来提供更优质的响应。

如果您在 Colab 上打开此 Notebook，可能需要安装 LlamaIndex 🦙。

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!

# Uncomment to add your OpenAI API key
# import os
# os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"
# Uncomment to add your OpenAI API key
# import os
# os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"

In [ ]:

Copied!





# Uncomment for debug level logging
# import logging
# import sys

# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
# Uncomment for debug level logging
# import logging
# import sys

# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

安装¶

首先我们导入文档。

In [ ]:

Copied!

from llama_index.core import VectorStoreIndex
from llama_index.core import SimpleDirectoryReader

# Needed for running async functions in Jupyter Notebook
import nest_asyncio

nest_asyncio.apply()
from llama_index.core import VectorStoreIndex
from llama_index.core import SimpleDirectoryReader

# Needed for running async functions in Jupyter Notebook
import nest_asyncio

nest_asyncio.apply()

下载数据

In [ ]:

Copied!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载数据

In [ ]:

Copied!

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents)
query = "What did the author do growing up?"
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents)
query = "What did the author do growing up?"

让我们看看默认查询引擎的响应是什么样的

In [ ]:

Copied!

base_query_engine = index.as_query_engine()
response = base_query_engine.query(query)
print(response)
base_query_engine = index.as_query_engine()
response = base_query_engine.query(query)
print(response)

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also mentioned their interest in philosophy and AI.

查询重试引擎¶

重试查询引擎通过评估器来优化基础查询引擎的响应结果。

其工作流程如下：

首先查询基础查询引擎，随后
使用评估器判断响应是否合格
若响应合格，则直接返回结果
否则根据评估结果（包含原始查询、响应及反馈）生成新查询
循环上述过程直至达到最大重试次数（max_retries）

In [ ]:

Copied!





from llama_index.core.query_engine import RetryQueryEngine
from llama_index.core.evaluation import RelevancyEvaluator

query_response_evaluator = RelevancyEvaluator()
retry_query_engine = RetryQueryEngine(
    base_query_engine, query_response_evaluator
)
retry_response = retry_query_engine.query(query)
print(retry_response)
from llama_index.core.query_engine import RetryQueryEngine
from llama_index.core.evaluation import RelevancyEvaluator

query_response_evaluator = RelevancyEvaluator()
retry_query_engine = RetryQueryEngine(
    base_query_engine, query_response_evaluator
)
retry_response = retry_query_engine.query(query)
print(retry_response)

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer, a TRS-80, and started programming more extensively, including writing simple games and a word processor.

重试源查询引擎¶

源重试机制通过基于LLM节点评估筛选查询的现有源节点，从而修改查询源节点。

In [ ]:

Copied!





from llama_index.core.query_engine import RetrySourceQueryEngine

retry_source_query_engine = RetrySourceQueryEngine(
    base_query_engine, query_response_evaluator
)
retry_source_response = retry_source_query_engine.query(query)
print(retry_source_response)
from llama_index.core.query_engine import RetrySourceQueryEngine

retry_source_query_engine = RetrySourceQueryEngine(
    base_query_engine, query_response_evaluator
)
retry_source_response = retry_source_query_engine.query(query)
print(retry_source_response)

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also mentioned their interest in philosophy and AI.

重试指南查询引擎¶

该模块尝试通过指导原则来引导评估者的行为。您可以自定义专属的指导准则。

In [ ]:

Copied!





from llama_index.core.evaluation import GuidelineEvaluator
from llama_index.core.evaluation.guideline import DEFAULT_GUIDELINES
from llama_index.core import Response
from llama_index.core.indices.query.query_transform.feedback_transform import (
    FeedbackQueryTransformation,
)
from llama_index.core.query_engine import RetryGuidelineQueryEngine

# Guideline eval
guideline_eval = GuidelineEvaluator(
    guidelines=DEFAULT_GUIDELINES
    + "\nThe response should not be overly long.\n"
    "The response should try to summarize where possible.\n"
)  # just for example
from llama_index.core.evaluation import GuidelineEvaluator
from llama_index.core.evaluation.guideline import DEFAULT_GUIDELINES
from llama_index.core import Response
from llama_index.core.indices.query.query_transform.feedback_transform import (
    FeedbackQueryTransformation,
)
from llama_index.core.query_engine import RetryGuidelineQueryEngine

# Guideline eval
guideline_eval = GuidelineEvaluator(
    guidelines=DEFAULT_GUIDELINES
    + "\nThe response should not be overly long.\n"
    "The response should try to summarize where possible.\n"
)  # just for example

让我们看看底层发生了什么。

In [ ]:

Copied!





typed_response = (
    response if isinstance(response, Response) else response.get_response()
)
eval = guideline_eval.evaluate_response(query, typed_response)
print(f"Guideline eval evaluation result: {eval.feedback}")

feedback_query_transform = FeedbackQueryTransformation(resynthesize_query=True)
transformed_query = feedback_query_transform.run(query, {"evaluation": eval})
print(f"Transformed query: {transformed_query.query_str}")
typed_response = (
    response if isinstance(response, Response) else response.get_response()
)
eval = guideline_eval.evaluate_response(query, typed_response)
print(f"Guideline eval evaluation result: {eval.feedback}")

feedback_query_transform = FeedbackQueryTransformation(resynthesize_query=True)
transformed_query = feedback_query_transform.run(query, {"evaluation": eval})
print(f"Transformed query: {transformed_query.query_str}")

Guideline eval evaluation result: The response partially answers the query but lacks specific statistics or numbers. It provides some details about the author's activities growing up, such as writing short stories and programming on different computers, but it could be more concise and focused. Additionally, the response does not mention any statistics or numbers to support the author's experiences.
Transformed query: Here is a previous bad answer.
The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They later got a microcomputer and started programming on it, writing simple games and a word processor. They also mentioned their interest in philosophy and AI.
Here is some feedback from the evaluator about the response given.
The response partially answers the query but lacks specific statistics or numbers. It provides some details about the author's activities growing up, such as writing short stories and programming on different computers, but it could be more concise and focused. Additionally, the response does not mention any statistics or numbers to support the author's experiences.
Now answer the question.
What were the author's activities and interests during their childhood and adolescence?

现在让我们运行完整的查询引擎

In [ ]:

Copied!





retry_guideline_query_engine = RetryGuidelineQueryEngine(
    base_query_engine, guideline_eval, resynthesize_query=True
)
retry_guideline_response = retry_guideline_query_engine.query(query)
print(retry_guideline_response)
retry_guideline_query_engine = RetryGuidelineQueryEngine(
    base_query_engine, guideline_eval, resynthesize_query=True
)
retry_guideline_response = retry_guideline_query_engine.query(query)
print(retry_guideline_response)

During their childhood and adolescence, the author worked on writing short stories and programming. They mentioned that their short stories were not very good, lacking plot but focusing on characters with strong feelings. In terms of programming, they tried writing programs on the IBM 1401 computer in 9th grade using an early version of Fortran. However, they mentioned being puzzled by the 1401 and not being able to do much with it due to the limited input options. They also mentioned getting a microcomputer, a TRS-80, and starting to write simple games, a program to predict rocket heights, and a word processor.