Zyte Serp 阅读器¶
Zyte Serp Reader 可让您获取谷歌搜索的有机结果。根据给定的查询字符串,它会返回顶部搜索结果的 URL 及关联的文本内容。
In [ ]:
Copied!
# %pip install llama-index llama-index-readers-zyte-serp
# %pip install llama-index llama-index-readers-zyte-serp
在本笔记中,我们将展示如何利用 Zyte Serp Reader(配合网页阅读器)收集特定主题的信息。基于这些文档,我们可以对该主题进行查询。
近期爱尔兰政府公布了2024年财政预算,这里我们将演示如何查询与该预算相关的信息。首先通过 Zyte Serp Reader 获取相关信息,随后使用网页阅读器从这些 URL 中提取内容,最后通过 openai chatgpt 模型来解答查询。
In [ ]:
Copied!
import os
from llama_index.readers.zyte_serp import ZyteSerpReader
from llama_index.readers.web.zyte_web.base import ZyteWebReader
import os
from llama_index.readers.zyte_serp import ZyteSerpReader
from llama_index.readers.web.zyte_web.base import ZyteWebReader
In [ ]:
Copied!
# This is needed to run it in juypter notebook
# import nest_asyncio
# nest_asyncio.apply()
# This is needed to run it in juypter notebook
# import nest_asyncio
# nest_asyncio.apply()
In [ ]:
Copied!
zyte_api_key = os.environ.get("ZYTE_API_KEY")
zyte_api_key = os.environ.get("ZYTE_API_KEY")
获取相关资源(使用 ZyteSerp)¶
给定一个主题,我们利用谷歌搜索结果获取相关页面的链接。
In [ ]:
Copied!
topic = "Ireland Budget 2025"
topic = "Ireland Budget 2025"
In [ ]:
Copied!
serp_reader = ZyteSerpReader(api_key=zyte_api_key)
serp_reader = ZyteSerpReader(api_key=zyte_api_key)
In [ ]:
Copied!
search_results = serp_reader.load_data(topic)
search_results = serp_reader.load_data(topic)
In [ ]:
Copied!
len(search_results)
len(search_results)
Out[ ]:
7
In [ ]:
Copied!
for r in search_results[:4]:
print(r.text)
print(r.metadata)
for r in search_results[:4]:
print(r.text)
print(r.metadata)
https://www.gov.ie/en/publication/e8315-budget-2025/ {'name': 'Budget 2025', 'rank': 1} https://www.citizensinformation.ie/en/money-and-tax/budgets/budget-2025/ {'name': 'Budget 2025', 'rank': 2} https://www.gov.ie/en/publication/cb193-your-guide-to-budget-2025/ {'name': 'Your guide to Budget 2025', 'rank': 3} https://www.irishtimes.com/your-money/2024/10/01/budget-2025-ireland-main-points/ {'name': 'Budget 2025 main points: Energy credits, bonus welfare ...', 'rank': 4}
In [ ]:
Copied!
urls = [r.text for r in search_results]
urls = [r.text for r in search_results]
我们似乎已获得与主题("2024年爱尔兰预算")相关的网址列表。元数据显示了搜索结果条目对应的文本和排名。接下来我们将使用网页阅读器获取这些网页的内容。
获取主题内容¶
根据包含相关主题信息的网页链接,我们可以获取其内容。由于网页通常包含大量无关内容,可通过 ZyteWebReader 的 "article" 模式获取过滤后的内容,该模式仅返回网页中的正文部分。
In [ ]:
Copied!
web_reader = ZyteWebReader(api_key=zyte_api_key, mode="article")
documents = web_reader.load_data(urls)
web_reader = ZyteWebReader(api_key=zyte_api_key, mode="article")
documents = web_reader.load_data(urls)
In [ ]:
Copied!
print(documents[0].text[:200])
print(documents[0].text[:200])
Budget 2025 - Tax Highlights Ireland Budget 2025 announced on 1 October 2024 included a substantial "cost-of-living" package including many one-off payments, as well as outlining a framework to direc
In [ ]:
Copied!
len(documents)
len(documents)
Out[ ]:
7
查询引擎¶
以下是一个使用 VectorStoreIndex 执行的基础查询示例。请确保在运行以下代码前已设置 OPENAI_API_KEY 环境变量。
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
In [ ]:
Copied!
query_engine = index.as_query_engine()
response = query_engine.query(
"What kind of energy credits are provided in the budget?"
)
print(response)
query_engine = index.as_query_engine()
response = query_engine.query(
"What kind of energy credits are provided in the budget?"
)
print(response)
Two €125 electricity credits will be provided - one this year and one in 2025.