基于工作流的子问题查询引擎¶
LlamaIndex 内置了一个子问题查询引擎。在此,我们将其替换为基于工作流的等效实现。
首先我们安装依赖项:
- LlamaIndex 核心库用于主要功能
- OpenAI 的 LLM 和嵌入模型用于大语言模型操作
llama-index-readers-file
为SimpleDirectoryReader
提供 PDF 阅读功能
!pip install llama-index-core llama-index-llms-openai llama-index-embeddings-openai llama-index-readers-file llama-index-utils-workflow
引入依赖项作为导入:
import os, json
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
StorageContext,
load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.workflow import (
step,
Context,
Workflow,
Event,
StartEvent,
StopEvent,
)
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from llama_index.utils.workflow import draw_all_possible_flows
将子问题查询引擎定义为工作流¶
我们的 StartEvent 会进入
query()
方法,该方法负责处理以下事项:- 接收并存储原始查询
- 存储用于处理查询的 LLM
- 存储支持子问题查询的工具列表
- 将原始问题传递给 LLM,要求其将问题拆分为若干子问题
- 为每个生成的子问题触发
QueryEvent
事件
QueryEvents 会进入
sub_question()
方法,该方法会实例化一个新的 ReAct 代理(配备所有可用工具列表),并让其选择要使用的工具。- 这比 LlamaIndex 内置的 SQQE 稍好,因为后者无法使用多个工具
- 每个 QueryEvent 会生成一个
AnswerEvent
AnswerEvents 会进入
combine_answers()
方法。- 该方法使用
self.collect_events()
等待每个 QueryEvent 返回答案 - 将所有答案组合成最终提示,供 LLM 将其整合为单一响应
- 生成 StopEvent 以返回最终结果
- 该方法使用
class QueryEvent(Event):
question: str
class AnswerEvent(Event):
question: str
answer: str
class SubQuestionQueryEngine(Workflow):
@step
async def query(self, ctx: Context, ev: StartEvent) -> QueryEvent:
if hasattr(ev, "query"):
await ctx.store.set("original_query", ev.query)
print(f"Query is {await ctx.store.get('original_query')}")
if hasattr(ev, "llm"):
await ctx.store.set("llm", ev.llm)
if hasattr(ev, "tools"):
await ctx.store.set("tools", ev.tools)
response = (await ctx.store.get("llm")).complete(
f"""
Given a user question, and a list of tools, output a list of
relevant sub-questions, such that the answers to all the
sub-questions put together will answer the question. Respond
in pure JSON without any markdown, like this:
{{
"sub_questions": [
"What is the population of San Francisco?",
"What is the budget of San Francisco?",
"What is the GDP of San Francisco?"
]
}}
Here is the user question: {await ctx.store.get('original_query')}
And here is the list of tools: {await ctx.store.get('tools')}
"""
)
print(f"Sub-questions are {response}")
response_obj = json.loads(str(response))
sub_questions = response_obj["sub_questions"]
await ctx.store.set("sub_question_count", len(sub_questions))
for question in sub_questions:
self.send_event(QueryEvent(question=question))
return None
@step
async def sub_question(self, ctx: Context, ev: QueryEvent) -> AnswerEvent:
print(f"Sub-question is {ev.question}")
agent = ReActAgent.from_tools(
await ctx.store.get("tools"),
llm=await ctx.store.get("llm"),
verbose=True,
)
response = agent.chat(ev.question)
return AnswerEvent(question=ev.question, answer=str(response))
@step
async def combine_answers(
self, ctx: Context, ev: AnswerEvent
) -> StopEvent | None:
ready = ctx.collect_events(
ev, [AnswerEvent] * await ctx.store.get("sub_question_count")
)
if ready is None:
return None
answers = "\n\n".join(
[
f"Question: {event.question}: \n Answer: {event.answer}"
for event in ready
]
)
prompt = f"""
You are given an overall question that has been split into sub-questions,
each of which has been answered. Combine the answers to all the sub-questions
into a single answer to the original question.
Original question: {await ctx.store.get('original_query')}
Sub-questions and answers:
{answers}
"""
print(f"Final prompt is {prompt}")
response = (await ctx.store.get("llm")).complete(prompt)
print("Final response is", response)
return StopEvent(result=str(response))
draw_all_possible_flows(
SubQuestionQueryEngine, filename="sub_question_query_engine.html"
)
sub_question_query_engine.html
可视化这一流程看起来相当线性,因为它没有体现 query()
可以生成多个并行的 QueryEvents
,这些事件最终会被收集到 combine_answers
中。
... 162.125.1.18, 2620:100:6057:18::a27d:d12 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://ucca241228aca55dbf9fcd60ae81.dl.dropboxusercontent.com/cd/0/inline/CYOq1NGkhLkELMmygLIg_gyLPXsOO7xOLjc3jW-mb09kevMykGPogSQx_icUTBEfHshxiSainXTynZYnh5O6uZ4ITeGiMkpvjl1QqXkKI34Ea8WzLr4FEyzkwohAC2WCQAU/file# [following] --2024-08-07 18:21:12-- https://ucca241228aca55dbf9fcd60ae81.dl.dropboxusercontent.com/cd/0/inline/CYOq1NGkhLkELMmygLIg_gyLPXsOO7xOLjc3jW-mb09kevMykGPogSQx_icUTBEfHshxiSainXTynZYnh5O6uZ4ITeGiMkpvjl1QqXkKI34Ea8WzLr4FEyzkwohAC2WCQAU/file Resolving ucca241228aca55dbf9fcd60ae81.dl.dropboxusercontent.com (ucca241228aca55dbf9fcd60ae81.dl.dropboxusercontent.com)... 162.125.1.15, 2620:100:6016:15::a27d:10f Connecting to ucca241228aca55dbf9fcd60ae81.dl.dropboxusercontent.com (ucca241228aca55dbf9fcd60ae81.dl.dropboxusercontent.com)|162.125.1.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CYMeJ4nM2JL44i1kCE4kttRGFOk-_34sr37ALElZu9szHfn-VhihA7l4cjIIFKHNN1ajRfeYYspGW3zPK1BZShxO3O7SEaXnHpUwUaziUcoz6b5IkdtXww3M6tRf8K2MZB4pHMSwxiuKe_vw9jitwHNeHn-jVzVRMw9feenAHN21LDudw5PxmsvqXSLeHMAGgs_tjeo1o92vltmhL6FpHs2czHsQFlYuaFMzwecv2xAMzHUGCGOhfNkmg2af16lP2QKLKgWAPK4ttCePTv-Ivy2KQ_GYVKKXRFlYHkIwhCQ_JFOyrtl_n14xls76NyPZRSZWmygSHJ-HH6Hntqvi86XpgCF-N_dZJh_HhSxuAaZd2g/file [following] --2024-08-07 18:21:13-- https://ucca241228aca55dbf9fcd60ae81.dl.dropboxusercontent.com/cd/0/inline2/CYMeJ4nM2JL44i1kCE4kttRGFOk-_34sr37ALElZu9szHfn-VhihA7l4cjIIFKHNN1ajRfeYYspGW3zPK1BZShxO3O7SEaXnHpUwUaziUcoz6b5IkdtXww3M6tRf8K2MZB4pHMSwxiuKe_vw9jitwHNeHn-jVzVRMw9feenAHN21LDudw5PxmsvqXSLeHMAGgs_tjeo1o92vltmhL6FpHs2czHsQFlYuaFMzwecv2xAMzHUGCGOhfNkmg2af16lP2QKLKgWAPK4ttCePTv-Ivy2KQ_GYVKKXRFlYHkIwhCQ_JFOyrtl_n14xls76NyPZRSZWmygSHJ-HH6Hntqvi86XpgCF-N_dZJh_HhSxuAaZd2g/file Reusing existing connection to ucca241228aca55dbf9fcd60ae81.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 29467998 (28M) [application/pdf] Saving to: ‘./data/sf_budgets/2016 - CSF_Budget_Book_2016_FINAL_WEB_with-cover-page.pdf’ ./data/sf_budgets/2 100%[===================>] 28.10M 173MB/s in 0.2s 2024-08-07 18:21:14 (173 MB/s) - ‘./data/sf_budgets/2016 - CSF_Budget_Book_2016_FINAL_WEB_with-cover-page.pdf’ saved [29467998/29467998] --2024-08-07 18:21:14-- https://www.dropbox.com/scl/fi/jvw59g5nscu1m7f96tjre/2017-Proposed-Budget-FY2017-18-FY2018-19_1.pdf?rlkey=v988oigs2whtcy87ti9wti6od&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6057:18::a27d:d12 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://uca6ee7d218771b609b8ecdf23d7.dl.dropboxusercontent.com/cd/0/inline/CYNMSJ2zt2I5765XfzleiddbUXb-TkZP91r9LuVw_6wBH0USNyLT6lclDE7x6I0-_WEaGM7zqqCipxx7Uyp5owmnwMx8JyfbHG3fZ4LSDYM6QzubFok7NSc0R2KRd3DX0qg/file# [following] --2024-08-07 18:21:15-- https://uca6ee7d218771b609b8ecdf23d7.dl.dropboxusercontent.com/cd/0/inline/CYNMSJ2zt2I5765XfzleiddbUXb-TkZP91r9LuVw_6wBH0USNyLT6lclDE7x6I0-_WEaGM7zqqCipxx7Uyp5owmnwMx8JyfbHG3fZ4LSDYM6QzubFok7NSc0R2KRd3DX0qg/file Resolving uca6ee7d218771b609b8ecdf23d7.dl.dropboxusercontent.com (uca6ee7d218771b609b8ecdf23d7.dl.dropboxusercontent.com)... 162.125.1.15, 2620:100:6016:15::a27d:10f Connecting to uca6ee7d218771b609b8ecdf23d7.dl.dropboxusercontent.com (uca6ee7d218771b609b8ecdf23d7.dl.dropboxusercontent.com)|162.125.1.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CYNJm2hH6OlYZdhW6cv8AuAYvgEiuyOY1KUwzlH1Nq4RrvmmOHg2ipVgEq88bfVDEC_xV0SegX6DL-4CUB_6_2AjHC7iS5VnZVxsjkbpQHTqEKr7OK6mAlsGNPQi--ocxwOsUbQNpLVNSjEc2zA98VZLpntTl3AoJEvl4wmpvBhNCs_ChiY2TDNcQGFDPH5AjvEEHImiNQqCzrOzoSpFh9Ut9NQty6vjADUHg1yXFcPa5R-ODch6hb4FgTCQZv7WYQJ7H_MRHVJyLoIyCX8bqwZAblnXC9SbUuIxdgmkiAB_wwjJKuFLV7YNNjJX5kg9spGoYnRv7gNDqUhjvXBwKW_IQxsYc1HjsaabrrRFjXntAw/file [following] --2024-08-07 18:21:16-- https://uca6ee7d218771b609b8ecdf23d7.dl.dropboxusercontent.com/cd/0/inline2/CYNJm2hH6OlYZdhW6cv8AuAYvgEiuyOY1KUwzlH1Nq4RrvmmOHg2ipVgEq88bfVDEC_xV0SegX6DL-4CUB_6_2AjHC7iS5VnZVxsjkbpQHTqEKr7OK6mAlsGNPQi--ocxwOsUbQNpLVNSjEc2zA98VZLpntTl3AoJEvl4wmpvBhNCs_ChiY2TDNcQGFDPH5AjvEEHImiNQqCzrOzoSpFh9Ut9NQty6vjADUHg1yXFcPa5R-ODch6hb4FgTCQZv7WYQJ7H_MRHVJyLoIyCX8bqwZAblnXC9SbUuIxdgmkiAB_wwjJKuFLV7YNNjJX5kg9spGoYnRv7gNDqUhjvXBwKW_IQxsYc1HjsaabrrRFjXntAw/file Reusing existing connection to uca6ee7d218771b609b8ecdf23d7.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 13463517 (13M) [application/pdf] Saving to: ‘./data/sf_budgets/2017 - 2017-Proposed-Budget-FY2017-18-FY2018-19_1.pdf’ ./data/sf_budgets/2 100%[===================>] 12.84M --.-KB/s in 0.09s 2024-08-07 18:21:17 (136 MB/s) - ‘./data/sf_budgets/2017 - 2017-Proposed-Budget-FY2017-18-FY2018-19_1.pdf’ saved [13463517/13463517] --2024-08-07 18:21:17-- https://www.dropbox.com/scl/fi/izknlwmbs7ia0lbn7zzyx/2018-o0181-18.pdf?rlkey=p5nv2ehtp7272ege3m9diqhei&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6057:18::a27d:d12 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://uc922ffad4a58390c4df80dcafbf.dl.dropboxusercontent.com/cd/0/inline/CYOEOqz8prU7eZPDzgM8fwVVcHoP1lWOLF--9VoNPtzVDSvDCXUDxR1CeN_VMzOp4JGTG6V-CeYm7oLwrrEIjuWThf5rHt8eLh52TF1nJ4-jVPrn7nAjFrealf436uezAs0/file# [following] --2024-08-07 18:21:17-- https://uc922ffad4a58390c4df80dcafbf.dl.dropboxusercontent.com/cd/0/inline/CYOEOqz8prU7eZPDzgM8fwVVcHoP1lWOLF--9VoNPtzVDSvDCXUDxR1CeN_VMzOp4JGTG6V-CeYm7oLwrrEIjuWThf5rHt8eLh52TF1nJ4-jVPrn7nAjFrealf436uezAs0/file Resolving uc922ffad4a58390c4df80dcafbf.dl.dropboxusercontent.com (uc922ffad4a58390c4df80dcafbf.dl.dropboxusercontent.com)... 162.125.1.15, 2620:100:6016:15::a27d:10f Connecting to uc922ffad4a58390c4df80dcafbf.dl.dropboxusercontent.com (uc922ffad4a58390c4df80dcafbf.dl.dropboxusercontent.com)|162.125.1.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CYNZxULkXqH5RXSO_Tu0-X2BLjKqLUg3ZAH3vZeEHw-ic156C2iVH3wjJtcm6mkh-RpMfru6d3ZBBNTpf_EWLTWBywklJbD4ZhRInyrnF6s5oK4NWS6UQ_7GBHy11itN5OKGF9U0090wCFaQeaPwFyLxwIjhg_gZdTc8smr1YFyESsFTIJTLPq8QjI5uPvYyug6Oidh8RxOP2N2f2mBKDRS2R8cazDZRDrAxhVeAuSXPGpYzQc0lBcsTJQ8ZAXuYKww0e_qlpyHmDv6tRVHpdFNh1dyKyikOHqtGd4p3pYjBr2Kwn-jzJ1zkZf_Fpc_H9vX0Xkk6P9U25oOGvSnmIUC3LFkfHB_CJTGNSZUh36w5cA/file [following] --2024-08-07 18:21:18-- https://uc922ffad4a58390c4df80dcafbf.dl.dropboxusercontent.com/cd/0/inline2/CYNZxULkXqH5RXSO_Tu0-X2BLjKqLUg3ZAH3vZeEHw-ic156C2iVH3wjJtcm6mkh-RpMfru6d3ZBBNTpf_EWLTWBywklJbD4ZhRInyrnF6s5oK4NWS6UQ_7GBHy11itN5OKGF9U0090wCFaQeaPwFyLxwIjhg_gZdTc8smr1YFyESsFTIJTLPq8QjI5uPvYyug6Oidh8RxOP2N2f2mBKDRS2R8cazDZRDrAxhVeAuSXPGpYzQc0lBcsTJQ8ZAXuYKww0e_qlpyHmDv6tRVHpdFNh1dyKyikOHqtGd4p3pYjBr2Kwn-jzJ1zkZf_Fpc_H9vX0Xkk6P9U25oOGvSnmIUC3LFkfHB_CJTGNSZUh36w5cA/file Reusing existing connection to uc922ffad4a58390c4df80dcafbf.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 18487865 (18M) [application/pdf] Saving to: ‘./data/sf_budgets/2018 - 2018-o0181-18.pdf’ ./data/sf_budgets/2 100%[===================>] 17.63M --.-KB/s in 0.1s 2024-08-07 18:21:19 (149 MB/s) - ‘./data/sf_budgets/2018 - 2018-o0181-18.pdf’ saved [18487865/18487865] --2024-08-07 18:21:19-- https://www.dropbox.com/scl/fi/1rstqm9rh5u5fr0tcjnxj/2019-Proposed-Budget-FY2019-20-FY2020-21.pdf?rlkey=3s2ivfx7z9bev1r840dlpbcgg&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6057:18::a27d:d12 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://uce28a421063a08c4ce431616623.dl.dropboxusercontent.com/cd/0/inline/CYNSfAOo0ymwbrL62gVbRB_NTvZpU2t5SZqnLuZDW-OaDOssaoY8SkQxPM9csoAq0-Y3Y8rYA1E6cDD44K1pSJcsuRSyoRRVLHRmXvWdayHKMK_PWAo08V3murDu9ZZAu4s/file# [following] --2024-08-07 18:21:20-- https://uce28a421063a08c4ce431616623.dl.dropboxusercontent.com/cd/0/inline/CYNSfAOo0ymwbrL62gVbRB_NTvZpU2t5SZqnLuZDW-OaDOssaoY8SkQxPM9csoAq0-Y3Y8rYA1E6cDD44K1pSJcsuRSyoRRVLHRmXvWdayHKMK_PWAo08V3murDu9ZZAu4s/file Resolving uce28a421063a08c4ce431616623.dl.dropboxusercontent.com (uce28a421063a08c4ce431616623.dl.dropboxusercontent.com)... 162.125.1.15, 2620:100:6016:15::a27d:10f Connecting to uce28a421063a08c4ce431616623.dl.dropboxusercontent.com (uce28a421063a08c4ce431616623.dl.dropboxusercontent.com)|162.125.1.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CYOFZQRiPKCvnUe8S4h3AQ8gmhPC0MW_0vNg2GTCzxiUPSVRSgUXDsH8XYOgKuU905goGB1ZmWgs00sNArASToS2iE6pJgGfqsk3DYELK3xYZJOwJ_AscWEAjoISiZQEPhi9-QyQpyeXAr5gxavu9eMq3XFNzo9SCUA-SWFIuSCU5Tf5_ZfW_uAU41NZE4dDVsdvaD7rG4Ouci6dp6c902A2dHsNs0O-wRZEEKKFZs5KeHNLvZkdTaUGxYcgQn8vwWgTbuvAz36XycX6Sdhdp32mFF73U30G5ZTUmqAvgYDMlUilhdcJLPhhbrUyhFUWcXrfluUHkK8LkjKCPl4ywKmr8oJGji5ZOwehdXWgrL7ALg/file [following] --2024-08-07 18:21:20-- https://uce28a421063a08c4ce431616623.dl.dropboxusercontent.com/cd/0/inline2/CYOFZQRiPKCvnUe8S4h3AQ8gmhPC0MW_0vNg2GTCzxiUPSVRSgUXDsH8XYOgKuU905goGB1ZmWgs00sNArASToS2iE6pJgGfqsk3DYELK3xYZJOwJ_AscWEAjoISiZQEPhi9-QyQpyeXAr5gxavu9eMq3XFNzo9SCUA-SWFIuSCU5Tf5_ZfW_uAU41NZE4dDVsdvaD7rG4Ouci6dp6c902A2dHsNs0O-wRZEEKKFZs5KeHNLvZkdTaUGxYcgQn8vwWgTbuvAz36XycX6Sdhdp32mFF73U30G5ZTUmqAvgYDMlUilhdcJLPhhbrUyhFUWcXrfluUHkK8LkjKCPl4ywKmr8oJGji5ZOwehdXWgrL7ALg/file Reusing existing connection to uce28a421063a08c4ce431616623.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 13123938 (13M) [application/pdf] Saving to: ‘./data/sf_budgets/2019 - 2019-Proposed-Budget-FY2019-20-FY2020-21.pdf’ ./data/sf_budgets/2 100%[===================>] 12.52M --.-KB/s in 0.08s 2024-08-07 18:21:22 (161 MB/s) - ‘./data/sf_budgets/2019 - 2019-Proposed-Budget-FY2019-20-FY2020-21.pdf’ saved [13123938/13123938] --2024-08-07 18:21:22-- https://www.dropbox.com/scl/fi/7teuwxrjdyvgw0n8jjvk0/2021-AAO-FY20-21-FY21-22-09-11-2020-FINAL.pdf?rlkey=6br3wzxwj5fv1f1l8e69nbmhk&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6057:18::a27d:d12 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://uc8421b1eeadb07bda3c7e093660.dl.dropboxusercontent.com/cd/0/inline/CYMRjMFyYInwu1LATw9fLxGctgY-zI7_0nI1zgKVeJJf55J9CxQivdYpDYLjkYlXCKv2t6rQ9NCns9A5jDEU3xiQ0Ycrd6VrPv7tiYSYvNY7pXMBiV2LvXu7ZDtQgBH1334/file# [following] --2024-08-07 18:21:22-- https://uc8421b1eeadb07bda3c7e093660.dl.dropboxusercontent.com/cd/0/inline/CYMRjMFyYInwu1LATw9fLxGctgY-zI7_0nI1zgKVeJJf55J9CxQivdYpDYLjkYlXCKv2t6rQ9NCns9A5jDEU3xiQ0Ycrd6VrPv7tiYSYvNY7pXMBiV2LvXu7ZDtQgBH1334/file Resolving uc8421b1eeadb07bda3c7e093660.dl.dropboxusercontent.com (uc8421b1eeadb07bda3c7e093660.dl.dropboxusercontent.com)... 162.125.1.15, 2620:100:6016:15::a27d:10f Connecting to uc8421b1eeadb07bda3c7e093660.dl.dropboxusercontent.com (uc8421b1eeadb07bda3c7e093660.dl.dropboxusercontent.com)|162.125.1.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CYOOkJRGOrBeY0GY5xS_84ayGgfFapr4kvbiFcnAUkvwENgCw8Z3qTT_G2oQpBq6h-RVzjOh4SPrgusfRfbEWg9ZxXwxyWPo5I4yJ7eVhhqTi2jZN42r_k1FWF4IjxgRhMA237BSrCcKkweLmMNm3oN4cFap5dw2fyesDaZg0xa-fRAEjF5MubgvXVAwNVmEvrL8M7Sm4s4VsguOPsytt9GqfPkuARDvYXGLfvZeCx4hRfqOaNXdeGyBSy3GUBKyf8bH3YTHw6wEBk8Yp2dG64Q8FJyUgAXkpn1wZpBQe0dnk5WdoWrKrtkL4RDbBPo1k0fDfKeuajw_h5BhtEAl5XVE-11C0IEzcse1D-19TNlSuQ/file [following] --2024-08-07 18:21:24-- https://uc8421b1eeadb07bda3c7e093660.dl.dropboxusercontent.com/cd/0/inline2/CYOOkJRGOrBeY0GY5xS_84ayGgfFapr4kvbiFcnAUkvwENgCw8Z3qTT_G2oQpBq6h-RVzjOh4SPrgusfRfbEWg9ZxXwxyWPo5I4yJ7eVhhqTi2jZN42r_k1FWF4IjxgRhMA237BSrCcKkweLmMNm3oN4cFap5dw2fyesDaZg0xa-fRAEjF5MubgvXVAwNVmEvrL8M7Sm4s4VsguOPsytt9GqfPkuARDvYXGLfvZeCx4hRfqOaNXdeGyBSy3GUBKyf8bH3YTHw6wEBk8Yp2dG64Q8FJyUgAXkpn1wZpBQe0dnk5WdoWrKrtkL4RDbBPo1k0fDfKeuajw_h5BhtEAl5XVE-11C0IEzcse1D-19TNlSuQ/file Reusing existing connection to uc8421b1eeadb07bda3c7e093660.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 3129122 (3.0M) [application/pdf] Saving to: ‘./data/sf_budgets/2021 - 2021-AAO-FY20-21-FY21-22-09-11-2020-FINAL.pdf’ ./data/sf_budgets/2 100%[===================>] 2.98M --.-KB/s in 0.05s 2024-08-07 18:21:24 (66.3 MB/s) - ‘./data/sf_budgets/2021 - 2021-AAO-FY20-21-FY21-22-09-11-2020-FINAL.pdf’ saved [3129122/3129122] --2024-08-07 18:21:24-- https://www.dropbox.com/scl/fi/zhgqch4n6xbv9skgcknij/2022-AAO-FY2021-22-FY2022-23-FINAL-20210730.pdf?rlkey=h78t65dfaz3mqbpbhl1u9e309&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6057:18::a27d:d12 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://uc769a7232da4f728018e664ed74.dl.dropboxusercontent.com/cd/0/inline/CYPqlj1-wREOG6CVYV9KgsQ4Pyu3rqHgdY_UD2MqZIAndb3fAaRZeCB8kTXrOnILu6iGZZcjERz2tqT2mMiIcM86nxXDH6_J7tva-D9ZOwLROXr64weKF_NFuWTHcenrINM/file# [following] --2024-08-07 18:21:26-- https://uc769a7232da4f728018e664ed74.dl.dropboxusercontent.com/cd/0/inline/CYPqlj1-wREOG6CVYV9KgsQ4Pyu3rqHgdY_UD2MqZIAndb3fAaRZeCB8kTXrOnILu6iGZZcjERz2tqT2mMiIcM86nxXDH6_J7tva-D9ZOwLROXr64weKF_NFuWTHcenrINM/file Resolving uc769a7232da4f728018e664ed74.dl.dropboxusercontent.com (uc769a7232da4f728018e664ed74.dl.dropboxusercontent.com)... 162.125.1.15, 2620:100:6016:15::a27d:10f Connecting to uc769a7232da4f728018e664ed74.dl.dropboxusercontent.com (uc769a7232da4f728018e664ed74.dl.dropboxusercontent.com)|162.125.1.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CYNbMKpeTmC_in9_57ZDTlkiMBzRJiPbEXNEcIxLjRQJHTQEYhPcMmdqHcWdoP9Fxi1LYMKQDt1DUW1ZJYX1TxpLjIDxFyezLCprT2JfhkCROToyraIBrDpXPFgMEbBxNJIsBT1x70oL7BXSbW-pKomX6OKsy_nAP1B5jDVxhXOZtJwW8xFJwkvhNo71Aam2bT1wENAWKLdZOcVz4WRIdDI7e4Ri5FZ27Sjy2RCojgcFYusbpMWZFrxui-ssQzHsXvD1ZrZpKjyUXMIq_pdkbonY0V-8Iuq7PudclrjCIsDU2fD0bqo2MLdXw69PDLy2m5uVohTgcM0qCykha7dfGiP3BWfBpEM0PbmcfHx_IDqWDw/file [following] --2024-08-07 18:21:27-- https://uc769a7232da4f728018e664ed74.dl.dropboxusercontent.com/cd/0/inline2/CYNbMKpeTmC_in9_57ZDTlkiMBzRJiPbEXNEcIxLjRQJHTQEYhPcMmdqHcWdoP9Fxi1LYMKQDt1DUW1ZJYX1TxpLjIDxFyezLCprT2JfhkCROToyraIBrDpXPFgMEbBxNJIsBT1x70oL7BXSbW-pKomX6OKsy_nAP1B5jDVxhXOZtJwW8xFJwkvhNo71Aam2bT1wENAWKLdZOcVz4WRIdDI7e4Ri5FZ27Sjy2RCojgcFYusbpMWZFrxui-ssQzHsXvD1ZrZpKjyUXMIq_pdkbonY0V-8Iuq7PudclrjCIsDU2fD0bqo2MLdXw69PDLy2m5uVohTgcM0qCykha7dfGiP3BWfBpEM0PbmcfHx_IDqWDw/file Reusing existing connection to uc769a7232da4f728018e664ed74.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 3233272 (3.1M) [application/pdf] Saving to: ‘./data/sf_budgets/2022 - 2022-AAO-FY2021-22-FY2022-23-FINAL-20210730.pdf’ ./data/sf_budgets/2 100%[===================>] 3.08M --.-KB/s in 0.05s 2024-08-07 18:21:28 (61.4 MB/s) - ‘./data/sf_budgets/2022 - 2022-AAO-FY2021-22-FY2022-23-FINAL-20210730.pdf’ saved [3233272/3233272] --2024-08-07 18:21:28-- https://www.dropbox.com/scl/fi/vip161t63s56vd94neqlt/2023-CSF_Proposed_Budget_Book_June_2023_Master_Web.pdf?rlkey=hemoce3w1jsuf6s2bz87g549i&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6057:18::a27d:d12 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://uc3fa3b8bc2f92ed126eb3788d35.dl.dropboxusercontent.com/cd/0/inline/CYOKIz5n4gWk1Ywf1Ovmc-Dua40rRvPhK4YtffCdTlHM3tOiFbzgN6pyDNBx0vNo5fnHFEr5ilQwYHekMrlKykqII8thu9wiDbfAifKojwVXbgxJ1-Bqz6GXkPlLPp4rXkw/file# [following] --2024-08-07 18:21:29-- https://uc3fa3b8bc2f92ed126eb3788d35.dl.dropboxusercontent.com/cd/0/inline/CYOKIz5n4gWk1Ywf1Ovmc-Dua40rRvPhK4YtffCdTlHM3tOiFbzgN6pyDNBx0vNo5fnHFEr5ilQwYHekMrlKykqII8thu9wiDbfAifKojwVXbgxJ1-Bqz6GXkPlLPp4rXkw/file Resolving uc3fa3b8bc2f92ed126eb3788d35.dl.dropboxusercontent.com (uc3fa3b8bc2f92ed126eb3788d35.dl.dropboxusercontent.com)... 162.125.1.15, 2620:100:6016:15::a27d:10f Connecting to uc3fa3b8bc2f92ed126eb3788d35.dl.dropboxusercontent.com (uc3fa3b8bc2f92ed126eb3788d35.dl.dropboxusercontent.com)|162.125.1.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CYOKgVW-_SqOvVicBez1JsKaYs81mU1xzB4gynkTKGfcI9xEPnjv2pLp8NTtEuaREbjOoLQBNeBO9bLhjMMPubNVHYnWl8KSMk_nJ4WNWlIlK0UjNllsYqOzvtAD6gSDFlYt21i_WaYBOFR6wjOI4ZM69i6uREONYUBODDZ_tfdcbv5rfX87wGP8eZ47KeO9nBUwvpMNhj9Tby7bBuI0qVaIrjREqzYMap1VNN68SXOoDJbF2bdCS6O55U2vL9CvSXjuehi-fWcaEKisFhQCIGT-PyzNY1F2Vd3zl5DH-aqeEInObuL26LGOgAIEbU6c0PHHq10-GKWo40fv2ECnrTxXLD89T5dhJQJ9mCamCA_COg/file [following] --2024-08-07 18:21:30-- https://uc3fa3b8bc2f92ed126eb3788d35.dl.dropboxusercontent.com/cd/0/inline2/CYOKgVW-_SqOvVicBez1JsKaYs81mU1xzB4gynkTKGfcI9xEPnjv2pLp8NTtEuaREbjOoLQBNeBO9bLhjMMPubNVHYnWl8KSMk_nJ4WNWlIlK0UjNllsYqOzvtAD6gSDFlYt21i_WaYBOFR6wjOI4ZM69i6uREONYUBODDZ_tfdcbv5rfX87wGP8eZ47KeO9nBUwvpMNhj9Tby7bBuI0qVaIrjREqzYMap1VNN68SXOoDJbF2bdCS6O55U2vL9CvSXjuehi-fWcaEKisFhQCIGT-PyzNY1F2Vd3zl5DH-aqeEInObuL26LGOgAIEbU6c0PHHq10-GKWo40fv2ECnrTxXLD89T5dhJQJ9mCamCA_COg/file Reusing existing connection to uc3fa3b8bc2f92ed126eb3788d35.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 10550407 (10M) [application/pdf] Saving to: ‘./data/sf_budgets/2023 - 2023-CSF_Proposed_Budget_Book_June_2023_Master_Web.pdf’ ./data/sf_budgets/2 100%[===================>] 10.06M --.-KB/s in 0.09s 2024-08-07 18:21:31 (110 MB/s) - ‘./data/sf_budgets/2023 - 2023-CSF_Proposed_Budget_Book_June_2023_Master_Web.pdf’ saved [10550407/10550407]
加载数据并运行工作流¶
与使用内置的子问题查询引擎类似,我们创建查询工具并实例化一个LLM(大语言模型),然后将它们传入。
每个工具都是基于单个(非常冗长的)旧金山预算文档构建的独立查询引擎,每份文档都超过300页。为了避免重复运行时耗费时间,我们将生成的索引持久化存储到磁盘。
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get("openai-key")
folder = "./data/sf_budgets/"
files = os.listdir(folder)
query_engine_tools = []
for file in files:
year = file.split(" - ")[0]
index_persist_path = f"./storage/budget-{year}/"
if os.path.exists(index_persist_path):
storage_context = StorageContext.from_defaults(
persist_dir=index_persist_path
)
index = load_index_from_storage(storage_context)
else:
documents = SimpleDirectoryReader(
input_files=[folder + file]
).load_data()
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(index_persist_path)
engine = index.as_query_engine()
query_engine_tools.append(
QueryEngineTool(
query_engine=engine,
metadata=ToolMetadata(
name=f"budget_{year}",
description=f"Information about San Francisco's budget in {year}",
),
)
)
engine = SubQuestionQueryEngine(timeout=120, verbose=True)
llm = OpenAI(model="gpt-4o")
result = await engine.run(
llm=llm,
tools=query_engine_tools,
query="How has the total amount of San Francisco's budget changed from 2016 to 2023?",
)
print(result)
Running step query Query is How has the total amount of San Francisco's budget changed from 2016 to 2023? Sub-questions are { "sub_questions": [ "What was the total amount of San Francisco's budget in 2016?", "What was the total amount of San Francisco's budget in 2017?", "What was the total amount of San Francisco's budget in 2018?", "What was the total amount of San Francisco's budget in 2019?", "What was the total amount of San Francisco's budget in 2020?", "What was the total amount of San Francisco's budget in 2021?", "What was the total amount of San Francisco's budget in 2022?", "What was the total amount of San Francisco's budget in 2023?" ] } Step query produced no event Running step sub_question Sub-question is What was the total amount of San Francisco's budget in 2016? > Running step 61365946-614c-4895-8fc3-0968f2d63387. Step input: What was the total amount of San Francisco's budget in 2016? Thought: The current language of the user is English. I need to use a tool to help me answer the question. Action: budget_2016 Action Input: {'input': "total amount of San Francisco's budget in 2016"} Observation: The total amount of San Francisco's budget in 2016 was $9.6 billion. > Running step a85aa30e-a980-4897-a52e-e82b8fb25c72. Step input: None Thought: I can answer without using any more tools. I'll use the user's language to answer. Answer: The total amount of San Francisco's budget in 2016 was $9.6 billion. Step sub_question produced event AnswerEvent Running step sub_question Sub-question is What was the total amount of San Francisco's budget in 2017? > Running step 5d14466c-1400-4a26-ac42-021e7143d3b1. Step input: What was the total amount of San Francisco's budget in 2017? Thought: The current language of the user is English. I need to use a tool to help me answer the question. Action: budget_2017 Action Input: {'input': "total amount of San Francisco's budget in 2017"} Observation: $10,106.9 million > Running step 586a5fab-95ee-44e9-9a35-fcf19993b13e. Step input: None Thought: I have the information needed to answer the question. Answer: The total amount of San Francisco's budget in 2017 was $10,106.9 million. Step sub_question produced event AnswerEvent Running step sub_question Sub-question is What was the total amount of San Francisco's budget in 2018? > Running step d39f64d0-65f6-4571-95ad-d28a16198ea5. Step input: What was the total amount of San Francisco's budget in 2018? Thought: The current language of the user is English. I need to use a tool to help me answer the question. Action: budget_2018 Action Input: {'input': "total amount of San Francisco's budget in 2018"} Observation: The total amount of San Francisco's budget in 2018 was $12,659,306,000. > Running step 3f67feee-489c-4b9e-8f27-37f0d48e3b0d. Step input: None Thought: I can answer without using any more tools. I'll use the user's language to answer. Answer: The total amount of San Francisco's budget in 2018 was $12,659,306,000. Step sub_question produced event AnswerEvent Running step sub_question Sub-question is What was the total amount of San Francisco's budget in 2019? > Running step d5ac0866-b02c-4c4c-94c6-f0e047ebb0fe. Step input: What was the total amount of San Francisco's budget in 2019? Thought: The current language of the user is English. I need to use a tool to help me answer the question. Action: budget_2019 Action Input: {'input': "total amount of San Francisco's budget in 2019"} Observation: $12.3 billion > Running step 3b62859b-bbd3-4dce-b284-f9b398e370c2. Step input: None Thought: I can answer without using any more tools. I'll use the user's language to answer. Answer: The total amount of San Francisco's budget in 2019 was $12.3 billion. Step sub_question produced event AnswerEvent Running step sub_question Sub-question is What was the total amount of San Francisco's budget in 2020? > Running step 41f6ed9f-d695-43df-8743-39dfcc3d919d. Step input: What was the total amount of San Francisco's budget in 2020? Thought: The user is asking for the total amount of San Francisco's budget in 2020. I do not have a tool specifically for the 2020 budget. I will check the available tools to see if they provide any relevant information or if I can infer the 2020 budget from adjacent years. Action: budget_2021 Action Input: {'input': "What was the total amount of San Francisco's budget in 2020?"} Observation: The total amount of San Francisco's budget in 2020 was $15,373,192 (in thousands of dollars). > Running step ea39f7c6-e942-4a41-8963-f37d4a27d559. Step input: None Thought: I now have the information needed to answer the user's question about the total amount of San Francisco's budget in 2020. Answer: The total amount of San Francisco's budget in 2020 was $15,373,192 (in thousands of dollars). Step sub_question produced event AnswerEvent Running step sub_question Sub-question is What was the total amount of San Francisco's budget in 2021? > Running step 6662fd06-e86a-407c-bb89-4828f63caa72. Step input: What was the total amount of San Francisco's budget in 2021? Thought: The current language of the user is English. I need to use a tool to help me answer the question. Action: budget_2021 Action Input: {'input': "total amount of San Francisco's budget in 2021"} Observation: The total amount of San Francisco's budget in 2021 is $14,166,496,000. > Running step 5d0cf9da-2c14-407c-8ae5-5cd638c1fb5c. Step input: None Thought: I can answer without using any more tools. I'll use the user's language to answer. Answer: The total amount of San Francisco's budget in 2021 was $14,166,496,000. Step sub_question produced event AnswerEvent Running step sub_question Sub-question is What was the total amount of San Francisco's budget in 2022? > Running step 62fa9a5f-f40b-489d-9773-5ee4e4eaba9e. Step input: What was the total amount of San Francisco's budget in 2022? Thought: The current language of the user is English. I need to use a tool to help me answer the question. Action: budget_2022 Action Input: {'input': "total amount of San Francisco's budget in 2022"} Observation: $14,550,060 > Running step 7a2d5623-cc75-4c9d-8c58-3dbc2faf163d. Step input: None Thought: I can answer without using any more tools. I'll use the user's language to answer. Answer: The total amount of San Francisco's budget in 2022 was $14,550,060. Step sub_question produced event AnswerEvent Running step sub_question Sub-question is What was the total amount of San Francisco's budget in 2023? > Running step 839eb994-b4e2-4019-a170-9471c1e0d764. Step input: What was the total amount of San Francisco's budget in 2023? Thought: The current language of the user is: English. I need to use a tool to help me answer the question. Action: budget_2023 Action Input: {'input': "total amount of San Francisco's budget in 2023"} Observation: $14.6 billion > Running step 38779f6c-d0c7-4c95-b5c4-0b170c5ed0d5. Step input: None Thought: I can answer without using any more tools. I'll use the user's language to answer. Answer: The total amount of San Francisco's budget in 2023 was $14.6 billion. Step sub_question produced event AnswerEvent Running step combine_answers Step combine_answers produced no event Running step combine_answers Step combine_answers produced no event Running step combine_answers Step combine_answers produced no event Running step combine_answers Step combine_answers produced no event Running step combine_answers Step combine_answers produced no event Running step combine_answers Step combine_answers produced no event Running step combine_answers Step combine_answers produced no event Running step combine_answers Final prompt is You are given an overall question that has been split into sub-questions, each of which has been answered. Combine the answers to all the sub-questions into a single answer to the original question. Original question: How has the total amount of San Francisco's budget changed from 2016 to 2023? Sub-questions and answers: Question: What was the total amount of San Francisco's budget in 2016?: Answer: The total amount of San Francisco's budget in 2016 was $9.6 billion. Question: What was the total amount of San Francisco's budget in 2017?: Answer: The total amount of San Francisco's budget in 2017 was $10,106.9 million. Question: What was the total amount of San Francisco's budget in 2018?: Answer: The total amount of San Francisco's budget in 2018 was $12,659,306,000. Question: What was the total amount of San Francisco's budget in 2019?: Answer: The total amount of San Francisco's budget in 2019 was $12.3 billion. Question: What was the total amount of San Francisco's budget in 2020?: Answer: The total amount of San Francisco's budget in 2020 was $15,373,192 (in thousands of dollars). Question: What was the total amount of San Francisco's budget in 2021?: Answer: The total amount of San Francisco's budget in 2021 was $14,166,496,000. Question: What was the total amount of San Francisco's budget in 2022?: Answer: The total amount of San Francisco's budget in 2022 was $14,550,060. Question: What was the total amount of San Francisco's budget in 2023?: Answer: The total amount of San Francisco's budget in 2023 was $14.6 billion. Final response is From 2016 to 2023, the total amount of San Francisco's budget has seen significant changes. In 2016, the budget was $9.6 billion. It increased to $10,106.9 million in 2017 and further to $12,659,306,000 in 2018. In 2019, the budget was $12.3 billion. The budget saw a substantial rise in 2020, reaching $15,373,192 (in thousands of dollars), which translates to approximately $15.4 billion. In 2021, the budget was $14,166,496,000, and in 2022, it was $14,550,060. By 2023, the budget had increased to $14.6 billion. Overall, from 2016 to 2023, San Francisco's budget grew from $9.6 billion to $14.6 billion. Step combine_answers produced event StopEvent From 2016 to 2023, the total amount of San Francisco's budget has seen significant changes. In 2016, the budget was $9.6 billion. It increased to $10,106.9 million in 2017 and further to $12,659,306,000 in 2018. In 2019, the budget was $12.3 billion. The budget saw a substantial rise in 2020, reaching $15,373,192 (in thousands of dollars), which translates to approximately $15.4 billion. In 2021, the budget was $14,166,496,000, and in 2022, it was $14,550,060. By 2023, the budget had increased to $14.6 billion. Overall, from 2016 to 2023, San Francisco's budget grew from $9.6 billion to $14.6 billion.
我们的调试输出内容非常详细!你可以看到子问题的生成过程,随后反复调用 sub_question()
函数,每次调用都会生成一份简短的 ReAct 代理思考与行动日志来回答每个较小的子问题。
你能观察到 combine_answers
会多次执行——这些是由每个 AnswerEvent
触发但在收集完所有 8 个 AnswerEvent
之前发生的。在最后一次运行时,它会生成完整提示,整合所有答案并返回最终结果。