使用 Cohere 多模态嵌入实现多模态检索¶
Cohere 发布了多模态嵌入模型,本笔记本将演示使用 Cohere 多模态嵌入实现多模态检索。
为什么多模态嵌入如此重要?
多模态嵌入的重要性在于它能让AI系统以统一的方式理解和搜索图像与文本内容。不同于传统的文本和图像分离搜索系统,多模态嵌入将两种类型的内容转换到相同的嵌入空间,使用户能够针对特定查询跨不同媒体类型找到相关信息。
演示流程包含以下步骤:
- 从相关维基百科文章下载文本、图像及原始PDF文件
- 使用Cohere多模态嵌入为文本和图像构建多模态索引
- 通过多模态检索器针对查询同时检索相关文本和图像
- 使用多模态查询引擎生成查询响应
注意: 由于Cohere目前尚未支持多模态大语言模型,我们将使用Anthropic的多模态LLM来生成响应。
安装¶
我们将采用 Cohere 多模态嵌入模型实现检索功能,使用 Qdrant 向量数据库存储数据,并基于 Anthropic 多模态大语言模型生成响应。
%pip install llama-index-embeddings-cohere
%pip install llama-index-vector-stores-qdrant
%pip install llama-index-multi-modal-llms-anthropic
import os
os.environ["COHERE_API_KEY"] = "<YOUR COHERE API KEY>"
os.environ["ANTHROPIC_API_KEY"] = "<YOUR ANTHROPIC API KEY>"
工具函数¶
get_wikipedia_images: 从指定标题的维基百科页面获取图片URL列表。plot_images: 绘制指定图片路径列表中的所有图像。delete_large_images: 删除指定目录中大于5MB的图片文件。
注意: Cohere API仅接受小于5MB的图片文件。
import requests
import matplotlib.pyplot as plt
from PIL import Image
from pathlib import Path
import urllib.request
import os
def get_wikipedia_images(title):
"""
Get the image URLs from the Wikipedia page with the specified title.
"""
response = requests.get(
"https://en.wikipedia.org/w/api.php",
params={
"action": "query",
"format": "json",
"titles": title,
"prop": "imageinfo",
"iiprop": "url|dimensions|mime",
"generator": "images",
"gimlimit": "50",
},
).json()
image_urls = []
for page in response["query"]["pages"].values():
if page["imageinfo"][0]["url"].endswith(".jpg") or page["imageinfo"][
0
]["url"].endswith(".png"):
image_urls.append(page["imageinfo"][0]["url"])
return image_urls
def plot_images(image_paths):
"""
Plot the images in the specified list of image paths.
"""
images_shown = 0
plt.figure(figsize=(16, 9))
for img_path in image_paths:
if os.path.isfile(img_path):
image = Image.open(img_path)
plt.subplot(2, 3, images_shown + 1)
plt.imshow(image)
plt.xticks([])
plt.yticks([])
images_shown += 1
if images_shown >= 9:
break
def delete_large_images(folder_path):
"""
Delete images larger than 5 MB in the specified directory.
"""
# List to hold the names of deleted image files
deleted_images = []
# Iterate through each file in the directory
for file_name in os.listdir(folder_path):
if file_name.lower().endswith(
(".png", ".jpg", ".jpeg", ".gif", ".bmp")
):
# Construct the full file path
file_path = os.path.join(folder_path, file_name)
# Get the size of the file in bytes
file_size = os.path.getsize(file_path)
# Check if the file size is greater than 5 MB (5242880 bytes) and remove it
if file_size > 5242880:
os.remove(file_path)
deleted_images.append(file_name)
print(
f"Image: {file_name} was larger than 5 MB and has been deleted."
)
image_uuid = 0
# image_metadata_dict stores images metadata including image uuid, filename and path
image_metadata_dict = {}
MAX_IMAGES_PER_WIKI = 10
wiki_titles = {
"Audi e-tron",
"Ford Mustang",
"Porsche Taycan",
}
data_path = Path("mixed_wiki")
if not data_path.exists():
Path.mkdir(data_path)
for title in wiki_titles:
response = requests.get(
"https://en.wikipedia.org/w/api.php",
params={
"action": "query",
"format": "json",
"titles": title,
"prop": "extracts",
"explaintext": True,
},
).json()
page = next(iter(response["query"]["pages"].values()))
wiki_text = page["extract"]
with open(data_path / f"{title}.txt", "w") as fp:
fp.write(wiki_text)
images_per_wiki = 0
try:
list_img_urls = get_wikipedia_images(title)
for url in list_img_urls:
if (
url.endswith(".jpg")
or url.endswith(".png")
or url.endswith(".svg")
):
image_uuid += 1
urllib.request.urlretrieve(
url, data_path / f"{image_uuid}.jpg"
)
images_per_wiki += 1
if images_per_wiki > MAX_IMAGES_PER_WIKI:
break
except:
print(str(Exception("No images found for Wikipedia page: ")) + title)
continue
删除较大的图像文件¶
Cohere 多模态嵌入模型接受的图像文件需小于 5MB,因此此处我们将删除较大的图像文件。
delete_large_images(data_path)
Image: 8.jpg was larger than 5 MB and has been deleted. Image: 13.jpg was larger than 5 MB and has been deleted. Image: 11.jpg was larger than 5 MB and has been deleted. Image: 21.jpg was larger than 5 MB and has been deleted. Image: 23.jpg was larger than 5 MB and has been deleted. Image: 32.jpg was larger than 5 MB and has been deleted. Image: 19.jpg was larger than 5 MB and has been deleted. Image: 4.jpg was larger than 5 MB and has been deleted. Image: 5.jpg was larger than 5 MB and has been deleted. Image: 7.jpg was larger than 5 MB and has been deleted. Image: 6.jpg was larger than 5 MB and has been deleted. Image: 1.jpg was larger than 5 MB and has been deleted.
设置嵌入模型与大语言模型¶
采用 Cohere 多模态嵌入模型进行检索,配合 Anthropic 多模态大语言模型生成响应。
from llama_index.embeddings.cohere import CohereEmbedding
from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal
from llama_index.core import Settings
Settings.embed_model = CohereEmbedding(
api_key=os.environ["COHERE_API_KEY"],
model_name="embed-english-v3.0", # current v3 models support multimodal embeddings
)
anthropic_multimodal_llm = AnthropicMultiModal(max_tokens=300)
加载数据¶
我们将加载已下载的文本和图像数据。
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./mixed_wiki/").load_data()
配置 Qdrant 向量数据库¶
我们将使用 Qdrant 向量数据库来存储图像和文本的嵌入向量及其关联元数据。
from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext
import qdrant_client
# Create a local Qdrant vector store
client = qdrant_client.QdrantClient(path="qdrant_mm_db")
text_store = QdrantVectorStore(
client=client, collection_name="text_collection"
)
image_store = QdrantVectorStore(
client=client, collection_name="image_collection"
)
storage_context = StorageContext.from_defaults(
vector_store=text_store, image_store=image_store
)
创建多模态向量存储索引¶
index = MultiModalVectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
image_embed_model=Settings.embed_model,
)
WARNING:root:Payload indexes have no effect in the local Qdrant. Please use server Qdrant if you need payload indexes. WARNING:root:Payload indexes have no effect in the local Qdrant. Please use server Qdrant if you need payload indexes.
测试检索功能¶
在此我们创建一个检索器并进行测试。
retriever_engine = index.as_retriever(
similarity_top_k=4, image_similarity_top_k=4
)
query = "Which models of Porsche are discussed here?"
retrieval_results = retriever_engine.retrieve(query)
检查检索结果¶
from llama_index.core.response.notebook_utils import display_source_node
from llama_index.core.schema import ImageNode
retrieved_image = []
for res_node in retrieval_results:
if isinstance(res_node.node, ImageNode):
retrieved_image.append(res_node.node.metadata["file_path"])
else:
display_source_node(res_node, source_length=200)
plot_images(retrieved_image)
Node ID: ac3e92f1-e192-4aa5-bbc6-45674654d96f
Similarity: 0.49435770203542906
Text: === Aerodynamics ===
The Taycan Turbo has a drag coefficient of Cd=0.22, which the manufacturer claims is the lowest of any current Porsche model. The Turbo S model has a slightly higher drag coeff...
Node ID: 045cde7c-963f-46cd-b820-9cabe07f1ab5
Similarity: 0.4804621315897337
Text: The Porsche Taycan is a battery electric luxury sports sedan and shooting brake car produced by German automobile manufacturer Porsche. The concept version of the Taycan named the Porsche Mission E...
Node ID: e14475d1-7bd4-48f3-a085-f712d5bc7e5a
Similarity: 0.46787589674504015
Text: === Porsche Mission E Cross Turismo ===
The Porsche Mission E Cross Turismo previewed the Taycan Cross Turismo, and was presented at the 2018 Geneva Motor Show. The design language of the Mission E...
Node ID: a25b3aea-2fdd-4ae2-b5bc-55eef453fe82
Similarity: 0.4370399571869162
Text: == Specifications ==
=== Chassis ===
The Taycan's body is mainly steel and aluminium joined by different bonding techniques. The body's B pillars, side roof frame and seat cross member are made f...
测试多模态查询引擎¶
我们将通过上述的MultiModalVectorStoreIndex来创建一个QueryEngine。
from llama_index.core import PromptTemplate
qa_tmpl_str = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information and not prior knowledge, "
"answer the query.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_tmpl = PromptTemplate(qa_tmpl_str)
query_engine = index.as_query_engine(
llm=anthropic_multimodal_llm, text_qa_template=qa_tmpl
)
query = "Which models of Porsche are discussed here?"
response = query_engine.query(query)
print(str(response))
Based on the context provided, the Porsche models discussed are: - Porsche Taycan - a battery electric luxury sports sedan. It is offered in several variants at different performance levels, including the Taycan Turbo and Turbo S high-performance AWD models, the mid-range Taycan 4S, and a base RWD model. - Porsche Taycan Cross Turismo - a lifted shooting brake/wagon version of the Taycan with crossover-like features and styling. - Porsche Taycan Sport Turismo - shares the shooting brake profile with the Cross Turismo but without the crossover styling elements. A RWD version is available as the base Taycan Sport Turismo. - Porsche Mission E - the concept car unveiled in 2015 that previewed the design and technology of the production Taycan models.
检查源代码¶
from llama_index.core.response.notebook_utils import display_source_node
for text_node in response.metadata["text_nodes"]:
display_source_node(text_node, source_length=200)
plot_images(
[n.metadata["file_path"] for n in response.metadata["image_nodes"]]
)
Node ID: ac3e92f1-e192-4aa5-bbc6-45674654d96f
Similarity: 0.49435770203542906
Text: === Aerodynamics ===
The Taycan Turbo has a drag coefficient of Cd=0.22, which the manufacturer claims is the lowest of any current Porsche model. The Turbo S model has a slightly higher drag coeff...
Node ID: 045cde7c-963f-46cd-b820-9cabe07f1ab5
Similarity: 0.4804621315897337
Text: The Porsche Taycan is a battery electric luxury sports sedan and shooting brake car produced by German automobile manufacturer Porsche. The concept version of the Taycan named the Porsche Mission E...