Oxylabs 阅读器¶
使用 Oxylabs Reader 从 Google 搜索、Amazon 和 YouTube 获取信息。 更多详情请查阅 Oxylabs 文档。
In [ ]:
Copied!
%pip install llama-index llama-index-readers-oxylabs
%pip install llama-index llama-index-readers-oxylabs
在本笔记本中,我们将展示如何使用 Oxylabs 阅读器从不同来源收集信息。
首先,导入一个 Oxylabs 阅读器。
当前可用的阅读器包括:
- OxylabsAmazonSearchReader
- OxylabsAmazonPricingReader
- OxylabsAmazonProductReader
- OxylabsAmazonSellersReader
- OxylabsAmazonBestsellersReader
- OxylabsAmazonReviewsReader
- OxylabsGoogleSearchReader
- OxylabsGoogleAdsReader
- OxylabsYoutubeTranscriptReader
In [ ]:
Copied!
import os
from llama_index.readers.oxylabs import OxylabsGoogleSearchReader
import os
from llama_index.readers.oxylabs import OxylabsGoogleSearchReader
使用您的用户名和密码实例化读取器。
In [ ]:
Copied!
oxylabs_username = os.environ.get("OXYLABS_USERNAME")
oxylabs_password = os.environ.get("OXYLABS_PASSWORD")
google_search_reader = OxylabsGoogleSearchReader(
oxylabs_username, oxylabs_password
)
oxylabs_username = os.environ.get("OXYLABS_USERNAME")
oxylabs_password = os.environ.get("OXYLABS_PASSWORD")
google_search_reader = OxylabsGoogleSearchReader(
oxylabs_username, oxylabs_password
)
准备参数。本示例将加载针对查询词 "iPhone 16" 且定位在 "德国柏林" 的谷歌搜索结果。
查看文档获取更多示例。
In [ ]:
Copied!
results = google_search_reader.load_data(
{"query": "Iphone 16", "parse": True, "geo_location": "Berlin, Germany"}
)
print(results[0].text)
results = google_search_reader.load_data(
{"query": "Iphone 16", "parse": True, "geo_location": "Berlin, Germany"}
)
print(results[0].text)
ORGANIC RESULTS ITEMS:
ORGANIC-ITEM-1:
POS: 1
URL: https://www.apple.com/de/iphone-16/
DESC: Dieses Design verdient ein langes Leben. Das iPhone 16 hat ein Gehäuse aus Aluminium in Raumfahrt-Qualität und durchgefärbtes Glas auf der Rückseite, das extrem ...
TITLE: iPhone 16 und iPhone 16 Plus - Apple (DE)
SITELINKS:
SITELINKS:
EXPANDED ITEMS:
EXPANDED-ITEM-1:
URL: https://www.apple.com/de/shop/buy-iphone/iphone-16-pro
TITLE: iPhone 16 Pro kaufen
EXPANDED-ITEM-2:
URL: https://www.apple.com/de/iphone-16-pro/
TITLE: iPhone 16 Pro
...
更多示例¶
亚马逊商品¶
In [ ]:
Copied!
from llama_index.readers.oxylabs import OxylabsAmazonProductReader
amazon_product_reader = OxylabsAmazonProductReader(
oxylabs_username, oxylabs_password
)
results = amazon_product_reader.load_data(
{
"domain": "com",
"query": "B08D9N7RJ4",
"parse": True,
"context": [{"key": "autoselect_variant", "value": True}],
}
)
print(results[0].text)
from llama_index.readers.oxylabs import OxylabsAmazonProductReader
amazon_product_reader = OxylabsAmazonProductReader(
oxylabs_username, oxylabs_password
)
results = amazon_product_reader.load_data(
{
"domain": "com",
"query": "B08D9N7RJ4",
"parse": True,
"context": [{"key": "autoselect_variant", "value": True}],
}
)
print(results[0].text)
# Products
- Item 1:
## url
https://www.amazon.com/dp/B08D9N7RJ4?th=1&psc=1
## asin
B08D9N7RJ4
## page
1
## brand
Philips Hue
...
YouTube 视频转录文本¶
In [ ]:
Copied!
from llama_index.readers.oxylabs import OxylabsYoutubeTranscriptReader
youtube_transcript_reader = OxylabsYoutubeTranscriptReader(
oxylabs_username, oxylabs_password
)
results = youtube_transcript_reader.load_data(
{
"query": "SLoqvcnwwN4",
"context": [
{"key": "language_code", "value": "en"},
{"key": "transcript_origin", "value": "uploader_provided"},
],
}
)
print(results[0].text)
from llama_index.readers.oxylabs import OxylabsYoutubeTranscriptReader
youtube_transcript_reader = OxylabsYoutubeTranscriptReader(
oxylabs_username, oxylabs_password
)
results = youtube_transcript_reader.load_data(
{
"query": "SLoqvcnwwN4",
"context": [
{"key": "language_code", "value": "en"},
{"key": "transcript_origin", "value": "uploader_provided"},
],
}
)
print(results[0].text)
# YouTube video transcripts
- Item 1:
- Item 1:
### transcriptSectionHeaderRenderer
#### startMs
0
#### endMs
25000
#### accessibility
##### accessibilityData
###### label
Introduction
#### trackingParams
CAIQ8bsCIhMIntXqp4f6jAMVlSqzAB2-DSWc
#### enableTappableTranscriptHeader
True
#### sectionHeader
##### sectionHeaderViewModel
###### headline
###### content
Introduction
...