结构化LLM重排序器演示(2021年Lyft 10-k年报)¶
本教程展示如何进行两阶段检索。首先使用基于嵌入的检索并设置较高的top-k值,以最大化召回率并获取大量候选项目。随后利用基于LLM的检索,通过结构化输出动态选择真正与查询相关的节点。
当您使用的模型支持函数调用功能时,建议优先选用StructuredLLMReranker
而非LLMReranker
。该类别将充分利用模型的结构化输出能力,而非依赖提示模型按特定格式对节点进行排序。
In [ ]:
Copied!
%pip install llama-index-llms-openai
%pip install llama-index-llms-openai
In [ ]:
Copied!
import nest_asyncio
nest_asyncio.apply()
import nest_asyncio
nest_asyncio.apply()
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.postprocessor import StructuredLLMRerank
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.postprocessor import StructuredLLMRerank
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
--2025-03-20 15:13:23-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1440303 (1.4M) [application/octet-stream] Saving to: ‘data/10k/lyft_2021.pdf’ data/10k/lyft_2021. 100%[===================>] 1.37M --.-KB/s in 0.06s 2025-03-20 15:13:24 (23.9 MB/s) - ‘data/10k/lyft_2021.pdf’ saved [1440303/1440303]
加载数据并构建索引¶
In [ ]:
Copied!
from llama_index.core import Settings
# LLM (gpt-4o-mini)
Settings.llm = OpenAI(temperature=0, model="gpt-4o-mini")
Settings.chunk_overlap = 0
Settings.chunk_size = 128
from llama_index.core import Settings
# LLM (gpt-4o-mini)
Settings.llm = OpenAI(temperature=0, model="gpt-4o-mini")
Settings.chunk_overlap = 0
Settings.chunk_size = 128
In [ ]:
Copied!
# load documents
documents = SimpleDirectoryReader(
input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
# load documents
documents = SimpleDirectoryReader(
input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
In [ ]:
Copied!
index = VectorStoreIndex.from_documents(
documents,
)
index = VectorStoreIndex.from_documents(
documents,
)
检索方式对比¶
In [ ]:
Copied!
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core import QueryBundle
import pandas as pd
from IPython.display import display, HTML
from copy import deepcopy
def get_retrieved_nodes(
query_str, vector_top_k=10, reranker_top_n=3, with_reranker=False
):
query_bundle = QueryBundle(query_str)
# configure retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=vector_top_k,
)
retrieved_nodes = retriever.retrieve(query_bundle)
if with_reranker:
# configure reranker
reranker = StructuredLLMRerank(
choice_batch_size=5,
top_n=reranker_top_n,
)
retrieved_nodes = reranker.postprocess_nodes(
retrieved_nodes, query_bundle
)
return retrieved_nodes
def pretty_print(df):
return display(HTML(df.to_html().replace("\\n", "<br>")))
def visualize_retrieved_nodes(nodes) -> None:
result_dicts = []
for node in nodes:
node = deepcopy(node)
node.node.metadata = {}
node_text = node.node.get_text()
node_text = node_text.replace("\n", " ")
result_dict = {"Score": node.score, "Text": node_text}
result_dicts.append(result_dict)
pretty_print(pd.DataFrame(result_dicts))
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core import QueryBundle
import pandas as pd
from IPython.display import display, HTML
from copy import deepcopy
def get_retrieved_nodes(
query_str, vector_top_k=10, reranker_top_n=3, with_reranker=False
):
query_bundle = QueryBundle(query_str)
# configure retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=vector_top_k,
)
retrieved_nodes = retriever.retrieve(query_bundle)
if with_reranker:
# configure reranker
reranker = StructuredLLMRerank(
choice_batch_size=5,
top_n=reranker_top_n,
)
retrieved_nodes = reranker.postprocess_nodes(
retrieved_nodes, query_bundle
)
return retrieved_nodes
def pretty_print(df):
return display(HTML(df.to_html().replace("\\n", "
"))) def visualize_retrieved_nodes(nodes) -> None: result_dicts = [] for node in nodes: node = deepcopy(node) node.node.metadata = {} node_text = node.node.get_text() node_text = node_text.replace("\n", " ") result_dict = {"Score": node.score, "Text": node_text} result_dicts.append(result_dict) pretty_print(pd.DataFrame(result_dicts))
"))) def visualize_retrieved_nodes(nodes) -> None: result_dicts = [] for node in nodes: node = deepcopy(node) node.node.metadata = {} node_text = node.node.get_text() node_text = node_text.replace("\n", " ") result_dict = {"Score": node.score, "Text": node_text} result_dicts.append(result_dict) pretty_print(pd.DataFrame(result_dicts))
In [ ]:
Copied!
new_nodes = get_retrieved_nodes(
"What is Lyft's response to COVID-19?", vector_top_k=5, with_reranker=False
)
new_nodes = get_retrieved_nodes(
"What is Lyft's response to COVID-19?", vector_top_k=5, with_reranker=False
)
In [ ]:
Copied!
visualize_retrieved_nodes(new_nodes)
visualize_retrieved_nodes(new_nodes)
Score | Text | |
---|---|---|
0 | 0.870327 | Further, COVID-19 has and may continue to negatively impact Lyft’s ability to conduct rental operationsthrough the Express Drive program and Lyft Rentals as a result of restrictions on travel, mandated closures, limited staffing availability, and other factors relatedto COVID-19. For example, in 2020, Lyft Rentals temporarily ceased operations, closing its rental locations, as a result of COVID-19. |
1 | 0.858815 | The Company has adopted a number of measures in response to the COVID-19 pandemic including, but not limited to, establishing new health and safetyrequirements for ridesharing and updating workplace policies. The Company also made adjustments to its expenses and cash flow to correlate with declines in revenuesincluding headcount reductions in 2020. Refer to Note 17 “Restructuring” to the consolidated financial statements for information regarding the 2020 restructuring events. |
2 | 0.857701 | •The responsive measures to the COVID-19 pandemic have caused us to modify our business practices by permitting corporate employees in nearly all of ourlocations to work remotely, limiting employee travel, and canceling, postponing or holding virtual events and meetings. |
3 | 0.855108 | The strength and duration ofthese challenges cannot be presently estimated.In response to the COVID-19 pandemic, we have adopted multiple measures, including, but not limited, to establishing new health and safety requirements forridesharing and updating workplace policies. We also made adjustments to our expenses and cash flow to correlate with declines in revenues including headcountreductions in 2020. |
4 | 0.854779 | In 2020, Flexdrive also began to waive rental fees for drivers who are confirmed to have testedpositive for COVID-19 or requested to quarantine by a medical professional, which it continues to do at this time. Further, Lyft Rentals and Flexdrive have facedsignificantly higher costs in transporting, repossessing, cleaning, and17 |
In [ ]:
Copied!
new_nodes = get_retrieved_nodes(
"What is Lyft's response to COVID-19?",
vector_top_k=20,
reranker_top_n=5,
with_reranker=True,
)
new_nodes = get_retrieved_nodes(
"What is Lyft's response to COVID-19?",
vector_top_k=20,
reranker_top_n=5,
with_reranker=True,
)
In [ ]:
Copied!
visualize_retrieved_nodes(new_nodes)
visualize_retrieved_nodes(new_nodes)
Score | Text | |
---|---|---|
0 | 10.0 | The Company has adopted a number of measures in response to the COVID-19 pandemic including, but not limited to, establishing new health and safetyrequirements for ridesharing and updating workplace policies. The Company also made adjustments to its expenses and cash flow to correlate with declines in revenuesincluding headcount reductions in 2020. Refer to Note 17 “Restructuring” to the consolidated financial statements for information regarding the 2020 restructuring events. |
1 | 10.0 | We have adopted several measures in response to the COVID-19 pandemic including, but not limited to, establishing new health and safety requirements forridesharing, and updating workplace policies. We also made adjustments to our expenses and cash flow to correlate with declines in revenues including the transaction withWoven Planet completed on July 13, 2021 and headcount reductions in 2020. |
2 | 10.0 | •manage our platform and our business assets and expenses in light of the COVID-19 pandemic and related public health measures issued by various jurisdictions,including travel bans, travel restrictions and shelter-in-place orders, as well as maintain demand for and confidence in the safety of our platform during andfollowing the COVID-19 pandemic;•plan for and manage capital expenditures for our current and future offerings, |
3 | 9.0 | The strength and duration ofthese challenges cannot be presently estimated.In response to the COVID-19 pandemic, we have adopted multiple measures, including, but not limited, to establishing new health and safety requirements forridesharing and updating workplace policies. We also made adjustments to our expenses and cash flow to correlate with declines in revenues including headcountreductions in 2020. |
4 | 9.0 | The strength and duration ofthese challenges cannot be presently estimated.In response to the COVID-19 pandemic, we have adopted multiple measures, including, but not limited, to establishing new health and safety requirements forridesharing and updating workplace policies. We also made adjustments to our expenses and cash flow to correlate with declines in revenues including headcountreductions in 2020.56 |
In [ ]:
Copied!
new_nodes = get_retrieved_nodes(
"What initiatives are the company focusing on independently of COVID-19?",
vector_top_k=5,
with_reranker=False,
)
new_nodes = get_retrieved_nodes(
"What initiatives are the company focusing on independently of COVID-19?",
vector_top_k=5,
with_reranker=False,
)
In [ ]:
Copied!
visualize_retrieved_nodes(new_nodes)
visualize_retrieved_nodes(new_nodes)
Score | Text | |
---|---|---|
0 | 0.813871 | •The responsive measures to the COVID-19 pandemic have caused us to modify our business practices by permitting corporate employees in nearly all of ourlocations to work remotely, limiting employee travel, and canceling, postponing or holding virtual events and meetings. |
1 | 0.810687 | •manage our platform and our business assets and expenses in light of the COVID-19 pandemic and related public health measures issued by various jurisdictions,including travel bans, travel restrictions and shelter-in-place orders, as well as maintain demand for and confidence in the safety of our platform during andfollowing the COVID-19 pandemic;•plan for and manage capital expenditures for our current and future offerings, |
2 | 0.809540 | The strength and duration ofthese challenges cannot be presently estimated.In response to the COVID-19 pandemic, we have adopted multiple measures, including, but not limited, to establishing new health and safety requirements forridesharing and updating workplace policies. We also made adjustments to our expenses and cash flow to correlate with declines in revenues including headcountreductions in 2020. |
3 | 0.806794 | the timing and extent of spending to support ourefforts to develop our platform, actual insurance payments for which we have made reserves, measures we take in response to the COVID-19 pandemic, our ability tomaintain demand for and confidence in the safety of our platform during and following the COVID-19 pandemic, and the expansion of sales and marketing activities. |
4 | 0.805533 | •anticipate and respond to macroeconomic changes and changes in the markets in which we operate;•maintain and enhance the value of our reputation and brand;•effectively manage our growth and business operations, including the impacts of the COVID-19 pandemic on our business;•successfully expand our geographic reach;•hire, integrate and retain talented people at all levels of our organization;•successfully develop new platform features, offerings and services to enhance the experience of users; and•right-size our real estate portfolio. |
In [ ]:
Copied!
new_nodes = get_retrieved_nodes(
"What initiatives are the company focusing on independently of COVID-19?",
vector_top_k=40,
reranker_top_n=5,
with_reranker=True,
)
new_nodes = get_retrieved_nodes(
"What initiatives are the company focusing on independently of COVID-19?",
vector_top_k=40,
reranker_top_n=5,
with_reranker=True,
)
In [ ]:
Copied!
visualize_retrieved_nodes(new_nodes)
visualize_retrieved_nodes(new_nodes)
Score | Text | |
---|---|---|
0 | 9.0 | Even as we invest in the business, we also remain focused on finding ways to operate more efficiently.To advance our mission, we aim to build the defining brand of our generation and to advocate through our commitment to social and environmental responsibility.We believe that our brand represents freedom at your fingertips: freedom from the stresses of car ownership and freedom to do and see more. |
1 | 8.0 | We have also invested in sales and marketing to grow our community,cultivate a differentiated brand that resonates with drivers and riders and promote further brand awareness. Together, these investments have enabled us to create a powerfulmultimodal platform and scaled user network.Notwithstanding the impact of COVID-19, we are continuing to invest in the future, both organically and through acquisitions of complementary businesses. |
2 | 8.0 | As a result, we may introduce significantchanges to our existing offerings or develop and introduce new and unproven offerings. For example, in April 2020, we began piloting a delivery service platform inresponse to the COVID-19 pandemic. |
3 | 6.0 | •anticipate and respond to macroeconomic changes and changes in the markets in which we operate;•maintain and enhance the value of our reputation and brand;•effectively manage our growth and business operations, including the impacts of the COVID-19 pandemic on our business;•successfully expand our geographic reach;•hire, integrate and retain talented people at all levels of our organization;•successfully develop new platform features, offerings and services to enhance the experience of users; and•right-size our real estate portfolio. |
4 | 6.0 | has been critical to our success. We face a number ofchallenges that may affect our ability to sustain our corporate culture, including:•failure to identify, attract, reward and retain people in leadership positions in our organization who share and further our culture, values and mission;•the increasing size and geographic diversity of our workforce;•shelter-in-place orders in certain jurisdictions where we operate that have required many of our employees to work remotely, |