🏔️ 使用 Argilla 实现 RAG 的 Step-back 提示工作流¶

本教程将展示如何在 LlamaIndex 工作流中结合 Argilla 实现 RAG（检索增强生成）时使用"回退式提示"方法。

该提示技术基于论文《退一步思考：通过抽象化激发大语言模型的推理能力》。研究表明，通过要求模型退一步以更抽象的方式思考上下文，可以显著提升响应质量。该方法首先对原始查询进行抽象化处理以检索相关信息，随后结合原始上下文、查询语句及检索结果生成最终响应。

Argilla 是 AI 工程师与领域专家协作构建高质量数据集的工具平台。通过该工具，您可以分析并提升数据质量，将人工反馈融入训练循环，从而优化模型性能。本集成方案会自动记录查询语句、生成响应、带评分的检索上下文、完整追踪日志（包括跨度和事件）以及相关元数据至 Argilla 平台。默认支持对响应结果进行评分、提供反馈及评估检索上下文的准确性，有效防止信息偏差。

实现步骤包括：

为 LlamaIndex 配置 Argilla 处理器
设计回退式工作流
运行 LlamaIndex 回退式工作流并自动记录响应至 Argilla

快速开始¶

部署 Argilla 服务器¶¶

若已部署 Argilla 服务，可跳过此步骤。否则，请按照本指南快速完成部署。

环境配置¶¶

要完成本教程，您需要安装此集成组件。

查看GitHub代码库请访问此处。

In [ ]:

Copied!

%pip install "argilla-llama-index>=2.1.0"
%pip install "argilla-llama-index>=2.1.0"

让我们进行必要的导入：

In [ ]:

Copied!





from llama_index.core import (
    Settings,
    SimpleDirectoryReader,
    VectorStoreIndex,
)
from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core.schema import NodeWithScore
from llama_index.core.workflow import (
    Context,
    StartEvent,
    StopEvent,
    Workflow,
    step,
)

from llama_index.core import get_response_synthesizer
from llama_index.core.workflow import Event
from llama_index.utils.workflow import draw_all_possible_flows
from llama_index.llms.openai import OpenAI

from argilla_llama_index import ArgillaHandler
from llama_index.core import (
    Settings,
    SimpleDirectoryReader,
    VectorStoreIndex,
)
from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core.schema import NodeWithScore
from llama_index.core.workflow import (
    Context,
    StartEvent,
    StopEvent,
    Workflow,
    step,
)

from llama_index.core import get_response_synthesizer
from llama_index.core.workflow import Event
from llama_index.utils.workflow import draw_all_possible_flows
from llama_index.llms.openai import OpenAI

from argilla_llama_index import ArgillaHandler

我们需要设置OpenAI API密钥。使用GPT模型运行查询需要OpenAI API密钥。

In [ ]:

Copied!

import os

os.environ["OPENAI_API_KEY"] = "sk-..."
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

设置 Argilla 的 LlamaIndex 处理器¶

为了在 LlamaIndex 工作流中轻松将数据记录到 Argilla，您只需初始化 Argilla 处理器并将其附加到 LlamaIndex 的跨度和事件调度器。这确保使用 LlamaIndex 获得的预测结果会自动记录到 Argilla 实例，同时包含有用的元数据。

dataset_name：数据集名称。如果数据集不存在，将使用指定名称创建；否则将更新现有数据集。
api_url：连接 Argilla 实例的 URL。
api_key：用于 Argilla 实例身份验证的 API 密钥。
number_of_retrievals：需要记录的检索文档数量，默认为 0。
workspace_name：记录数据的工作区名称，默认使用第一个可用工作区。

有关凭证的更多信息，请查阅用户指南和工作区指南文档。

In [ ]:

Copied!





argilla_handler = ArgillaHandler(
    dataset_name="workflow_llama_index",
    api_url="http://localhost:6900",
    api_key="argilla.apikey",
    number_of_retrievals=2,
)
root_dispatcher = get_dispatcher()
root_dispatcher.add_span_handler(argilla_handler)
root_dispatcher.add_event_handler(argilla_handler)
argilla_handler = ArgillaHandler(
    dataset_name="workflow_llama_index",
    api_url="http://localhost:6900",
    api_key="argilla.apikey",
    number_of_retrievals=2,
)
root_dispatcher = get_dispatcher()
root_dispatcher.add_span_handler(argilla_handler)
root_dispatcher.add_event_handler(argilla_handler)

定义回退工作流程¶

首先，我们需要定义在回退工作流中使用的两个事件。接收回退查询的 StepBackEvent，以及在检索后接收原始查询和回退查询相关节点的 RetriverEvent。

In [ ]:

Copied!

class StepBackEvent(Event):
    """Get the step-back query"""

    step_back_query: str

class RetrieverEvent(Event):
    """Result of running the retrievals"""

    nodes_original: list[NodeWithScore]
    nodes_step_back: list[NodeWithScore]
class StepBackEvent(Event):
    """Get the step-back query"""

    step_back_query: str

class RetrieverEvent(Event):
    """Result of running the retrievals"""

    nodes_original: list[NodeWithScore]
    nodes_step_back: list[NodeWithScore]

接下来，我们将根据原始论文定义提示词，以获取退阶查询并最终得到响应结果。

In [ ]:

Copied!





STEP_BACK_TEMPLATE = """
You are an expert at world knowledge. Your task is to step back and
paraphrase a question to a more generic step-back question, which is
easier to answer. Here are a few examples:

Original Question: Which position did Knox Cunningham hold from May 1955 to Apr 1956?
Stepback Question: Which positions have Knox Cunningham held in his career?

Original Question: Who was the spouse of Anna Karina from 1968 to 1974?
Stepback Question: Who were the spouses of Anna Karina?

Original Question: what is the biggest hotel in las vegas nv as of November 28, 1993
Stepback Question: what is the size of the hotels in las vegas nv as of November 28, 1993?

Original Question: {original_query}
Stepback Question:
"""

GENERATE_ANSWER_TEMPLATE = """
You are an expert of world knowledge. I am going to ask you a question.
Your response should be comprehensive and not contradicted with the
following context if they are relevant. Otherwise, ignore them if they are
not relevant.

{context_original}
{context_step_back}

Original Question: {query}
Answer:
"""
STEP_BACK_TEMPLATE = """
You are an expert at world knowledge. Your task is to step back and
paraphrase a question to a more generic step-back question, which is
easier to answer. Here are a few examples:

Original Question: Which position did Knox Cunningham hold from May 1955 to Apr 1956?
Stepback Question: Which positions have Knox Cunningham held in his career?

Original Question: Who was the spouse of Anna Karina from 1968 to 1974?
Stepback Question: Who were the spouses of Anna Karina?

Original Question: what is the biggest hotel in las vegas nv as of November 28, 1993
Stepback Question: what is the size of the hotels in las vegas nv as of November 28, 1993?

Original Question: {original_query}
Stepback Question:
"""

GENERATE_ANSWER_TEMPLATE = """
You are an expert of world knowledge. I am going to ask you a question.
Your response should be comprehensive and not contradicted with the
following context if they are relevant. Otherwise, ignore them if they are
not relevant.

{context_original}
{context_step_back}

Original Question: {query}
Answer:
"""

现在，我们将定义"回退步骤"工作流。在本案例中，工作流将采用线性结构。首先，我们会提示大语言模型对原始查询进行抽象化处理（回退步骤提示）。接着，我们将为原始查询和回退步骤查询检索相关节点。最后，我们将提示大语言模型生成最终响应。

In [ ]:

Copied!





class RAGWorkflow(Workflow):
    @step
    async def step_back(
        self, ctx: Context, ev: StartEvent
    ) -> StepBackEvent | None:
        """Generate the step-back query."""
        query = ev.get("query")
        index = ev.get("index")

        if not query:
            return None

        if not index:
            return None

        llm = Settings.llm
        step_back_query = llm.complete(
            prompt=STEP_BACK_TEMPLATE.format(original_query=query),
            formatted=True,
        )

        await ctx.store.set("query", query)
        await ctx.store.set("index", index)

        return StepBackEvent(step_back_query=str(step_back_query))

    @step
    async def retrieve(
        self, ctx: Context, ev: StepBackEvent
    ) -> RetrieverEvent | None:
        "Retrieve the relevant nodes for the original and step-back queries."
        query = await ctx.store.get("query", default=None)
        index = await ctx.store.get("index", default=None)

        await ctx.store.set("step_back_query", ev.step_back_query)

        retriever = index.as_retriever(similarity_top_k=2)
        nodes_step_back = await retriever.aretrieve(ev.step_back_query)
        nodes_original = await retriever.aretrieve(query)

        return RetrieverEvent(
            nodes_original=nodes_original, nodes_step_back=nodes_step_back
        )

    @step
    async def synthesize(self, ctx: Context, ev: RetrieverEvent) -> StopEvent:
        """Return a response using the contextualized prompt and retrieved nodes."""
        nodes_original = ev.nodes_original
        nodes_step_back = ev.nodes_step_back

        context_original = max(
            nodes_original, key=lambda node: node.get_score()
        ).get_text()
        context_step_back = max(
            nodes_step_back, key=lambda node: node.get_score()
        ).get_text()

        query = await ctx.store.get("query", default=None)
        formatted_query = GENERATE_ANSWER_TEMPLATE.format(
            context_original=context_original,
            context_step_back=context_step_back,
            query=query,
        )

        response_synthesizer = get_response_synthesizer(
            response_mode=ResponseMode.COMPACT
        )

        response = response_synthesizer.synthesize(
            formatted_query, nodes=ev.nodes_original
        )
        return StopEvent(result=response)
class RAGWorkflow(Workflow):
    @step
    async def step_back(
        self, ctx: Context, ev: StartEvent
    ) -> StepBackEvent | None:
        """Generate the step-back query."""
        query = ev.get("query")
        index = ev.get("index")

        if not query:
            return None

        if not index:
            return None

        llm = Settings.llm
        step_back_query = llm.complete(
            prompt=STEP_BACK_TEMPLATE.format(original_query=query),
            formatted=True,
        )

        await ctx.store.set("query", query)
        await ctx.store.set("index", index)

        return StepBackEvent(step_back_query=str(step_back_query))

    @step
    async def retrieve(
        self, ctx: Context, ev: StepBackEvent
    ) -> RetrieverEvent | None:
        "Retrieve the relevant nodes for the original and step-back queries."
        query = await ctx.store.get("query", default=None)
        index = await ctx.store.get("index", default=None)

        await ctx.store.set("step_back_query", ev.step_back_query)

        retriever = index.as_retriever(similarity_top_k=2)
        nodes_step_back = await retriever.aretrieve(ev.step_back_query)
        nodes_original = await retriever.aretrieve(query)

        return RetrieverEvent(
            nodes_original=nodes_original, nodes_step_back=nodes_step_back
        )

    @step
    async def synthesize(self, ctx: Context, ev: RetrieverEvent) -> StopEvent:
        """Return a response using the contextualized prompt and retrieved nodes."""
        nodes_original = ev.nodes_original
        nodes_step_back = ev.nodes_step_back

        context_original = max(
            nodes_original, key=lambda node: node.get_score()
        ).get_text()
        context_step_back = max(
            nodes_step_back, key=lambda node: node.get_score()
        ).get_text()

        query = await ctx.store.get("query", default=None)
        formatted_query = GENERATE_ANSWER_TEMPLATE.format(
            context_original=context_original,
            context_step_back=context_step_back,
            query=query,
        )

        response_synthesizer = get_response_synthesizer(
            response_mode=ResponseMode.COMPACT
        )

        response = response_synthesizer.synthesize(
            formatted_query, nodes=ev.nodes_original
        )
        return StopEvent(result=response)

In [ ]:

Copied!

draw_all_possible_flows(RAGWorkflow, filename="step_back_workflow.html")
draw_all_possible_flows(RAGWorkflow, filename="step_back_workflow.html")

执行回退工作流程¶

我们将使用一个从 LlamaIndex 文档获取的示例 .txt 文件。

In [ ]:

Copied!

# Retrieve the data if needed
!mkdir -p ../../data
!curl https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt -o ../../data/paul_graham_essay.txt
# Retrieve the data if needed
!mkdir -p ../../data
!curl https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt -o ../../data/paul_graham_essay.txt

现在，让我们基于这个文档创建一个 LlamaIndex 索引。由于原始查询和分步回溯查询中评分最高的上下文将被包含在最终提示中，我们将减小分块大小并使用 SentenceSplitter

In [ ]:

Copied!





# LLM settings
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.8)

# Load the data and create the index
transformations = [
    SentenceSplitter(chunk_size=256, chunk_overlap=75),
]

documents = SimpleDirectoryReader("../../data").load_data()
index = VectorStoreIndex.from_documents(
    documents=documents,
    transformations=transformations,
)
# LLM settings
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.8)

# Load the data and create the index
transformations = [
    SentenceSplitter(chunk_size=256, chunk_overlap=75),
]

documents = SimpleDirectoryReader("../../data").load_data()
index = VectorStoreIndex.from_documents(
    documents=documents,
    transformations=transformations,
)

现在，让我们运行回退工作流程并执行查询。

In [ ]:

Copied!

w = RAGWorkflow()

result = await w.run(query="What's Paul's work", index=index)
result
w = RAGWorkflow()

result = await w.run(query="What's Paul's work", index=index)
result

The generated response will be automatically logged in our Argilla instance. Check it out! From Argilla, you can quickly look at your predictions and annotate them so you can combine both synthetic data and human feedback.

You can check this guide to know how to annotate your data.

后续步骤¶

完成数据标注后，您可以从 Argilla 中提取数据。通过将人工反馈融入流程，我们确保了数据质量，使其能够直接用于模型微调。此外，为维持模型性能并防止数据漂移，您可以预留部分数据用于持续性的长期评估。