子问题查询引擎使用指南¶
在本笔记本中,我们将展示如何使用 guidance 来提升子问题查询引擎的鲁棒性。
子问题查询引擎的设计支持可替换的问题生成器,这些生成器需实现 BaseQuestionGenerator 接口。
为充分发挥 guidance 的能力,我们实现了新型的 GuidanceQuestionGenerator(由 GuidancePydanticProgram 驱动)
指导性问题生成器¶
与默认的 LLMQuestionGenerator 不同,guidance 能确保我们获得所需的结构化输出,并消除输出解析错误。
如果您在 Colab 上打开此 Notebook,可能需要安装 LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-question-gen-guidance
%pip install llama-index-question-gen-guidance
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
from llama_index.question_gen.guidance import GuidanceQuestionGenerator
from guidance.llms import OpenAI as GuidanceOpenAI
from llama_index.question_gen.guidance import GuidanceQuestionGenerator
from guidance.llms import OpenAI as GuidanceOpenAI
In [ ]:
Copied!
question_gen = GuidanceQuestionGenerator.from_defaults(
guidance_llm=GuidanceOpenAI("text-davinci-003"), verbose=False
)
question_gen = GuidanceQuestionGenerator.from_defaults(
guidance_llm=GuidanceOpenAI("text-davinci-003"), verbose=False
)
让我们来测试一下!
In [ ]:
Copied!
from llama_index.core.tools import ToolMetadata
from llama_index.core import QueryBundle
from llama_index.core.tools import ToolMetadata
from llama_index.core import QueryBundle
In [ ]:
Copied!
tools = [
ToolMetadata(
name="lyft_10k",
description="Provides information about Lyft financials for year 2021",
),
ToolMetadata(
name="uber_10k",
description="Provides information about Uber financials for year 2021",
),
]
tools = [
ToolMetadata(
name="lyft_10k",
description="Provides information about Lyft financials for year 2021",
),
ToolMetadata(
name="uber_10k",
description="Provides information about Uber financials for year 2021",
),
]
In [ ]:
Copied!
sub_questions = question_gen.generate(
tools=tools,
query=QueryBundle("Compare and contrast Uber and Lyft financial in 2021"),
)
sub_questions = question_gen.generate(
tools=tools,
query=QueryBundle("Compare and contrast Uber and Lyft financial in 2021"),
)
In [ ]:
Copied!
sub_questions
sub_questions
Out[ ]:
[SubQuestion(sub_question='What is the revenue of Uber', tool_name='uber_10k'), SubQuestion(sub_question='What is the EBITDA of Uber', tool_name='uber_10k'), SubQuestion(sub_question='What is the net income of Uber', tool_name='uber_10k'), SubQuestion(sub_question='What is the revenue of Lyft', tool_name='lyft_10k'), SubQuestion(sub_question='What is the EBITDA of Lyft', tool_name='lyft_10k'), SubQuestion(sub_question='What is the net income of Lyft', tool_name='lyft_10k')]
使用引导式问题生成器与子问题查询引擎¶
准备数据与基础查询引擎¶
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_response
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_response
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
下载数据
In [ ]:
Copied!
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
In [ ]:
Copied!
lyft_docs = SimpleDirectoryReader(
input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
uber_docs = SimpleDirectoryReader(
input_files=["./data/10k/uber_2021.pdf"]
).load_data()
lyft_docs = SimpleDirectoryReader(
input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
uber_docs = SimpleDirectoryReader(
input_files=["./data/10k/uber_2021.pdf"]
).load_data()
In [ ]:
Copied!
lyft_index = VectorStoreIndex.from_documents(lyft_docs)
uber_index = VectorStoreIndex.from_documents(uber_docs)
lyft_index = VectorStoreIndex.from_documents(lyft_docs)
uber_index = VectorStoreIndex.from_documents(uber_docs)
In [ ]:
Copied!
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)
构建子问题查询引擎并执行查询!¶
In [ ]:
Copied!
query_engine_tools = [
QueryEngineTool(
query_engine=lyft_engine,
metadata=ToolMetadata(
name="lyft_10k",
description=(
"Provides information about Lyft financials for year 2021"
),
),
),
QueryEngineTool(
query_engine=uber_engine,
metadata=ToolMetadata(
name="uber_10k",
description=(
"Provides information about Uber financials for year 2021"
),
),
),
]
s_engine = SubQuestionQueryEngine.from_defaults(
question_gen=question_gen, # use guidance based question_gen defined above
query_engine_tools=query_engine_tools,
)
query_engine_tools = [
QueryEngineTool(
query_engine=lyft_engine,
metadata=ToolMetadata(
name="lyft_10k",
description=(
"Provides information about Lyft financials for year 2021"
),
),
),
QueryEngineTool(
query_engine=uber_engine,
metadata=ToolMetadata(
name="uber_10k",
description=(
"Provides information about Uber financials for year 2021"
),
),
),
]
s_engine = SubQuestionQueryEngine.from_defaults(
question_gen=question_gen, # use guidance based question_gen defined above
query_engine_tools=query_engine_tools,
)
In [ ]:
Copied!
response = s_engine.query(
"Compare and contrast the customer segments and geographies that grew the"
" fastest"
)
response = s_engine.query(
"Compare and contrast the customer segments and geographies that grew the"
" fastest"
)
Generated 4 sub questions. [uber_10k] Q: What customer segments grew the fastest for Uber [uber_10k] A: in 2021? The customer segments that grew the fastest for Uber in 2021 were its Mobility Drivers, Couriers, Riders, and Eaters. These segments experienced growth due to the continued stay-at-home order demand related to COVID-19, as well as Uber's membership programs, such as Uber One, Uber Pass, Eats Pass, and Rides Pass. Additionally, Uber's marketplace-centric advertising helped to connect merchants and brands with its platform network, further driving growth. [uber_10k] Q: What geographies grew the fastest for Uber [uber_10k] A: Based on the context information, it appears that Uber experienced the most growth in large metropolitan areas, such as Chicago, Miami, New York City, Sao Paulo, and London. Additionally, Uber experienced growth in suburban and rural areas, as well as in countries such as Argentina, Germany, Italy, Japan, South Korea, and Spain. [lyft_10k] Q: What customer segments grew the fastest for Lyft [lyft_10k] A: The customer segments that grew the fastest for Lyft were ridesharing, light vehicles, and public transit. Ridesharing grew as Lyft was able to predict demand and proactively incentivize drivers to be available for rides in the right place at the right time. Light vehicles grew as users were looking for options that were more active, usually lower-priced, and often more efficient for short trips during heavy traffic. Public transit grew as Lyft integrated third-party public transit data into the Lyft App to offer users a robust view of transportation options around them. [lyft_10k] Q: What geographies grew the fastest for Lyft [lyft_10k] A: It is not possible to answer this question with the given context information.
In [ ]:
Copied!
print(response)
print(response)
The customer segments that grew the fastest for Uber in 2021 were its Mobility Drivers, Couriers, Riders, and Eaters. These segments experienced growth due to the continued stay-at-home order demand related to COVID-19, as well as Uber's membership programs, such as Uber One, Uber Pass, Eats Pass, and Rides Pass. Additionally, Uber's marketplace-centric advertising helped to connect merchants and brands with its platform network, further driving growth. Uber experienced the most growth in large metropolitan areas, such as Chicago, Miami, New York City, Sao Paulo, and London. Additionally, Uber experienced growth in suburban and rural areas, as well as in countries such as Argentina, Germany, Italy, Japan, South Korea, and Spain. The customer segments that grew the fastest for Lyft were ridesharing, light vehicles, and public transit. Ridesharing grew as Lyft was able to predict demand and proactively incentivize drivers to be available for rides in the right place at the right time. Light vehicles grew as users were looking for options that were more active, usually lower-priced, and often more efficient for short trips during heavy traffic. Public transit grew as Lyft integrated third-party public transit data into the Lyft App to offer users a robust view of transportation options around them. It is not possible to answer the question of which geographies grew the fastest for Lyft with the given context information. In summary, Uber and Lyft both experienced growth in customer segments related to their respective services, such as Mobility Drivers, Couriers, Riders, and Eaters for Uber, and ridesharing, light vehicles, and public transit for Lyft. Uber experienced the most growth in large metropolitan areas, as well as in suburban and rural areas, and in countries such as Argentina, Germany, Italy, Japan, South Korea, and Spain. It is not possible to answer the question of which geographies grew the fastest for Lyft with the given context information.