检索增强的图像描述生成¶
本示例展示如何利用 LLaVa + Replicate 进行图像理解/描述生成,并根据图像理解结果从特斯拉10K文件中检索相关的非结构化文本和嵌入式表格。
- LlaVa 可根据用户提示提供图像理解能力
- 使用 Unstructured 解析表格,并通过 LlamaIndex 递归检索机制对表格和文本建立索引/实现检索
- 结合第一步获得的图像理解结果,从第二步构建的知识库(由 LlamaIndex 建立索引)中检索相关信息
LLaVA 背景资料:大型语言与视觉助手
关于 LlamaIndex: LlaVa+Replicate 使我们能够本地运行图像理解,并将多模态知识与 RAG 知识库系统相结合。
待办事项:
等待 llama-cpp-python 在 Python 封装中支持 LlaVa 模型。
届时 LlamaIndex 可通过 LlamaCPP 类直接/本地部署 LlaVa 模型。
通过 LlamaIndex 使用 Replicate 部署 LLaVa 模型¶
通过 Llama.cpp 本地构建并运行 LLaVa 模型(已弃用)¶
- 克隆仓库:https://github.com/ggerganov/llama.cpp.git
- 进入目录:
cd llama.cpp。查看 llama.cpp 仓库获取更多细节 - 执行编译:
make - 从 Hugging Face 仓库 下载 Llava 模型文件(包括
ggml-model-*和mmproj-model-*)。请根据本地配置选择合适的模型 - 运行检测:
./llava用于验证 llava 是否在本地正常运行
In [ ]:
Copied!
%pip install llama-index-readers-file
%pip install llama-index-multi-modal-llms-replicate
%pip install llama-index-readers-file
%pip install llama-index-multi-modal-llms-replicate
In [ ]:
Copied!
%load_ext autoreload
% autoreload 2
%load_ext autoreload
% autoreload 2
UsageError: Line magic function `%` not found.
In [ ]:
Copied!
!pip install unstructured
!pip install unstructured
In [ ]:
Copied!
from unstructured.partition.html import partition_html
import pandas as pd
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", None)
pd.set_option("display.max_colwidth", None)
from unstructured.partition.html import partition_html
import pandas as pd
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", None)
pd.set_option("display.max_colwidth", None)
WARNING: CPU random generator seem to be failing, disabling hardware random number generation WARNING: RDRND generated: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
从特斯拉10-K文件中执行数据提取¶
在这些章节中,我们使用Unstructured工具解析表格与非表格元素。
提取元素¶
我们使用 Unstructured 工具从 10-K 文件中提取表格与非表格元素。
In [ ]:
Copied!
!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm
!wget "https://docs.google.com/uc?export=download&id=1THe1qqM61lretr9N3BmINc_NWDvuthYf" -O shanghai.jpg
!wget "https://docs.google.com/uc?export=download&id=1PDVCf_CzLWXNnNoRV8CFgoJxv6U0sHAO" -O tesla_supercharger.jpg
!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm
!wget "https://docs.google.com/uc?export=download&id=1THe1qqM61lretr9N3BmINc_NWDvuthYf" -O shanghai.jpg
!wget "https://docs.google.com/uc?export=download&id=1PDVCf_CzLWXNnNoRV8CFgoJxv6U0sHAO" -O tesla_supercharger.jpg
In [ ]:
Copied!
from llama_index.readers.file import FlatReader
from pathlib import Path
reader = FlatReader()
docs_2021 = reader.load_data(Path("tesla_2021_10k.htm"))
from llama_index.readers.file import FlatReader
from pathlib import Path
reader = FlatReader()
docs_2021 = reader.load_data(Path("tesla_2021_10k.htm"))
In [ ]:
Copied!
from llama_index.core.node_parser import UnstructuredElementNodeParser
node_parser = UnstructuredElementNodeParser()
from llama_index.core.node_parser import UnstructuredElementNodeParser
node_parser = UnstructuredElementNodeParser()
In [ ]:
Copied!
import os
REPLICATE_API_TOKEN = "..." # Your Relicate API token here
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN
import os
REPLICATE_API_TOKEN = "..." # Your Relicate API token here
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN
In [ ]:
Copied!
import openai
OPENAI_API_KEY = "sk-..."
openai.api_key = OPENAI_API_KEY # add your openai api key here
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
import openai
OPENAI_API_KEY = "sk-..."
openai.api_key = OPENAI_API_KEY # add your openai api key here
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
In [ ]:
Copied!
import os
import pickle
if not os.path.exists("2021_nodes.pkl"):
raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
import os
import pickle
if not os.path.exists("2021_nodes.pkl"):
raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
In [ ]:
Copied!
nodes_2021, objects_2021 = node_parser.get_nodes_and_objects(raw_nodes_2021)
nodes_2021, objects_2021 = node_parser.get_nodes_and_objects(raw_nodes_2021)
配置可组合检索器¶
现在我们已经完成了表格及其摘要的提取工作,接下来可以在 LlamaIndex 中配置一个可组合检索器来查询这些表格。
构建检索器¶
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex
# construct top-level vector index + query engine
vector_index = VectorStoreIndex(nodes=nodes_2021, objects=objects_2021)
query_engine = vector_index.as_query_engine(similarity_top_k=2, verbose=True)
from llama_index.core import VectorStoreIndex
# construct top-level vector index + query engine
vector_index = VectorStoreIndex(nodes=nodes_2021, objects=objects_2021)
query_engine = vector_index.as_query_engine(similarity_top_k=2, verbose=True)
In [ ]:
Copied!
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./tesla_supercharger.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./tesla_supercharger.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
Out[ ]:
<matplotlib.image.AxesImage at 0x7f24f9bb8410>
通过 LlamaIndex 使用 Replicate 运行 LLaVa 模型实现图像理解¶
In [ ]:
Copied!
from llama_index.multi_modal_llms.replicate import ReplicateMultiModal
from llama_index.core.schema import ImageDocument
from llama_index.multi_modal_llms.replicate.base import (
REPLICATE_MULTI_MODAL_LLM_MODELS,
)
multi_modal_llm = ReplicateMultiModal(
model=REPLICATE_MULTI_MODAL_LLM_MODELS["llava-13b"],
max_new_tokens=200,
temperature=0.1,
)
prompt = "what is the main object for tesla in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
from llama_index.multi_modal_llms.replicate import ReplicateMultiModal
from llama_index.core.schema import ImageDocument
from llama_index.multi_modal_llms.replicate.base import (
REPLICATE_MULTI_MODAL_LLM_MODELS,
)
multi_modal_llm = ReplicateMultiModal(
model=REPLICATE_MULTI_MODAL_LLM_MODELS["llava-13b"],
max_new_tokens=200,
temperature=0.1,
)
prompt = "what is the main object for tesla in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
基于LLaVa图像理解从LlamaIndex知识库检索相关信息¶
In [ ]:
Copied!
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
Retrieval entering id_1836_table: TextNode Retrieving from object TextNode with query please provide relevant information about: The main object for Tesla in the image is a red and white electric car charging station. Retrieval entering id_431_table: TextNode Retrieving from object TextNode with query please provide relevant information about: The main object for Tesla in the image is a red and white electric car charging station.
展示 LlamaIndex 生成的最终 RAG 图像描述结果¶
In [ ]:
Copied!
print(str(rag_response))
print(str(rag_response))
The main object for Tesla in the image is a red and white electric car charging station.
In [ ]:
Copied!
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./shanghai.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./shanghai.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
Out[ ]:
<matplotlib.image.AxesImage at 0x7f24f787aa50>
从LlamaIndex检索新图像的相关信息¶
In [ ]:
Copied!
prompt = "which Tesla factory is shown in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
prompt = "which Tesla factory is shown in the image?"
llava_response = multi_modal_llm.complete(
prompt=prompt,
image_documents=[ImageDocument(image_path=imageUrl)],
)
In [ ]:
Copied!
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response.text)
Retrieving with query id None: please provide relevant information about: a large Tesla factory with a white roof, located in Shanghai, China. The factory is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. The scene gives an impression of a busy and well-organized facility, likely producing electric vehicles for the global market Retrieved node with id, entering: id_431_table Retrieving with query id id_431_table: please provide relevant information about: a large Tesla factory with a white roof, located in Shanghai, China. The factory is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. The scene gives an impression of a busy and well-organized facility, likely producing electric vehicles for the global market Retrieving text node: We continue to increase the degree of localized procurement and manufacturing there. Gigafactory Shanghai is representative of our plan to iteratively improve our manufacturing operations as we establish new factories, as we implemented the learnings from our Model 3 and Model Y ramp at the Fremont Factory to commence and ramp our production at Gigafactory Shanghai quickly and cost-effectively. Other Manufacturing Generally, we continue to expand production capacity at our existing facilities. We also intend to further increase cost-competitiveness in our significant markets by strategically adding local manufacturing, including at Gigafactory Berlin in Germany and Gigafactory Texas in Austin, Texas, which will begin production in 2022. Supply Chain Our products use thousands of purchased parts that are sourced from hundreds of suppliers across the world. We have developed close relationships with vendors of key parts such as battery cells, electronics and complex vehicle assemblies. Certain components purchased from these suppliers are shared or are similar across many product lines, allowing us to take advantage of pricing efficiencies from economies of scale. As is the case for most automotive companies, most of our procured components and systems are sourced from single suppliers. Where multiple sources are available for certain key components, we work to qualify multiple suppliers for them where it is sensible to do so in order to minimize production risks owing to disruptions in their supply. We also mitigate risk by maintaining safety stock for key parts and assemblies and die banks for components with lengthy procurement lead times. Our products use various raw materials including aluminum, steel, cobalt, lithium, nickel and copper. Pricing for these materials is governed by market conditions and may fluctuate due to various factors outside of our control, such as supply and demand and market speculation. We strive to execute long-term supply contracts for such materials at competitive pricing when feasible, and we currently believe that we have adequate access to raw materials supplies in order to meet the needs of our operations. Governmental Programs, Incentives and Regulations Globally, both the operation of our business by us and the ownership of our products by our customers are impacted by various government programs, incentives and other arrangements. Our business and products are also subject to numerous governmental regulations that vary among jurisdictions. Programs and Incentives California Alternative Energy and Advanced Transportation Financing Authority Tax Incentives We have agreements with the California Alternative Energy and Advanced Transportation Financing Authority that provide multi-year sales tax exclusions on purchases of manufacturing equipment that will be used for specific purposes, including the expansion and ongoing development of electric vehicles and powertrain production in California, thus reducing our cost basis in the related assets in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K. Gigafactory Nevada—Nevada Tax Incentives In connection with the construction of Gigafactory Nevada, we entered into agreements with the State of Nevada and Storey County in Nevada that provide abatements for specified taxes, discounts to the base tariff energy rates and transferable tax credits in consideration of capital investment and hiring targets that were met at Gigafactory Nevada. These incentives are available until June 2024 or June 2034, depending on the incentive and primarily offset related costs in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K. Gigafactory New York—New York State Investment and Lease We have a lease through the Research Foundation for the State University of New York (the “SUNY Foundation”) with respect to Gigafactory New York. Under the lease and a related research and development agreement, we are continuing to designate further buildouts at the facility. We are required to comply with certain covenants, including hiring and cumulative investment targets. This incentive offsets the related lease costs of the facility in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K. As we temporarily suspended most of our manufacturing operations at Gigafactory New York pursuant to a New York State executive order issued in March 2020 as a result of the COVID-19 pandemic, we were granted a deferral of our obligation to be compliant with our applicable targets through December 31, 2021 in an amendment memorialized in August 2021. As of December 31, 2021, we are in excess of such targets relating to investments and personnel in the State of New York and Buffalo. Gigafactory Shanghai—Land Use Rights and Economic Benefits We have an agreement with the local government of Shanghai for land use rights at Gigafactory Shanghai. Under the terms of the arrangement, we are required to meet a cumulative capital expenditure target and an annual tax revenue target starting at the end of 2023. In addition, the Shanghai government has granted to our Gigafactory Shanghai subsidiary certain incentives to be used in connection with eligible capital investments at Gigafactory Shanghai.
展示 LlamaIndex 生成的最终 RAG 图像描述结果¶
In [ ]:
Copied!
print(rag_response)
print(rag_response)
The Gigafactory Shanghai in Shanghai, China is a large Tesla factory that produces electric vehicles for the global market. The factory has a white roof and is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. This scene gives an impression of a busy and well-organized facility.