使用 OpenAI GPT-4V 模型进行图像推理¶
在本笔记本中,我们将展示如何结合 GPT4V 的图像理解/推理能力来使用 OpenAI 大语言模型抽象层。
我们还演示了当前 OpenAI 大语言模型类在处理 GPT4V 时支持的几个功能:
complete(同步和异步版本):针对单个提示和图像列表chat(同步和异步版本):处理多条聊天消息stream complete(同步和异步版本):实现 complete 的流式输出stream chat(同步和异步版本):实现 chat 的流式输出
In [ ]:
Copied!
%pip install llama-index-llms-openai matplotlib
%pip install llama-index-llms-openai matplotlib
使用 GPT4V 理解来自 URL 的图像¶
In [ ]:
Copied!
import os
OPENAI_API_KEY = "sk-..." # Your OpenAI API token here
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
import os
OPENAI_API_KEY = "sk-..." # Your OpenAI API token here
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
初始化 OpenAIMultiModal 并从 URL 加载图像¶
In [ ]:
Copied!
from llama_index.llms.openai import OpenAI
image_urls = [
"https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg",
"https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg",
"https://i2-prod.mirror.co.uk/incoming/article7160664.ece/ALTERNATES/s1200d/FIFA-Ballon-dOr-Gala-2015.jpg",
]
openai_llm = OpenAI(model="gpt-4o", max_new_tokens=300)
from llama_index.llms.openai import OpenAI
image_urls = [
"https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg",
"https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg",
"https://i2-prod.mirror.co.uk/incoming/article7160664.ece/ALTERNATES/s1200d/FIFA-Ballon-dOr-Gala-2015.jpg",
]
openai_llm = OpenAI(model="gpt-4o", max_new_tokens=300)
In [ ]:
Copied!
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
img_response = requests.get(image_urls[0])
print(image_urls[0])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
img_response = requests.get(image_urls[0])
print(image_urls[0])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg
Out[ ]:
<matplotlib.image.AxesImage at 0x11a2dc920>
要求模型描述所见内容¶
In [ ]:
Copied!
from llama_index.core.llms import (
ChatMessage,
ImageBlock,
TextBlock,
MessageRole,
)
msg = ChatMessage(
role=MessageRole.USER,
blocks=[
TextBlock(text="Describe the images as an alternative text"),
ImageBlock(url=image_urls[0]),
ImageBlock(url=image_urls[1]),
],
)
response = openai_llm.chat(messages=[msg])
from llama_index.core.llms import (
ChatMessage,
ImageBlock,
TextBlock,
MessageRole,
)
msg = ChatMessage(
role=MessageRole.USER,
blocks=[
TextBlock(text="Describe the images as an alternative text"),
ImageBlock(url=image_urls[0]),
ImageBlock(url=image_urls[1]),
],
)
response = openai_llm.chat(messages=[msg])
In [ ]:
Copied!
print(response)
print(response)
assistant: **Image 1:** The Colosseum in Rome is illuminated at night with the colors of the Italian flag: green, white, and red. The ancient amphitheater stands prominently against a deep blue sky, with some clouds visible. The foreground shows a construction area with barriers and a few people walking nearby. **Image 2:** A line graph titled "The U.S. Mortgage Rate Surge" compares the U.S. 30-year fixed-rate mortgage (in red) with existing home sales (in blue) from 2014 to 2023. The mortgage rate line shows a significant increase, reaching its highest level in over 20 years. Existing home sales fluctuate, with a notable decline in recent years. A text box highlights that in 2023, high mortgage rates and rising home prices have led to the lowest housing affordability since 1989.
我们还可以异步流式传输模型响应
In [ ]:
Copied!
async_resp = await openai_llm.astream_chat(messages=[msg])
async for delta in async_resp:
print(delta.delta, end="")
async_resp = await openai_llm.astream_chat(messages=[msg])
async for delta in async_resp:
print(delta.delta, end="")
**Image 1:** The Colosseum in Rome is illuminated at night with the colors of the Italian flag: green, white, and red. The ancient structure stands prominently against a deep blue sky, with some clouds visible. The lower part of the image shows a construction area with barriers and a few people walking nearby. **Image 2:** A line graph titled "The U.S. Mortgage Rate Surge" compares the U.S. 30-year fixed-rate mortgage (in red) with existing home sales (in blue) from 2014 to 2023. The graph shows mortgage rates rising sharply in recent years, while home sales fluctuate. A note highlights that in 2023, high mortgage rates and rising home prices have led to the lowest housing affordability since 1989.
使用 GPT4V 理解本地文件中的图像¶
In [ ]:
Copied!
%pip install llama-index-readers-file
%pip install llama-index-readers-file
In [ ]:
Copied!
from pathlib import Path
import shutil
import requests
img_path = Path().resolve() / "image.jpg"
response = requests.get(image_urls[-1])
with open(img_path, "wb") as file:
file.write(response.content)
msg = ChatMessage(
role=MessageRole.USER,
blocks=[
TextBlock(text="Describe the image as an alternative text"),
ImageBlock(path=img_path, image_mimetype="image/jpeg"),
],
)
response = openai_llm.chat(messages=[msg])
from pathlib import Path
import shutil
import requests
img_path = Path().resolve() / "image.jpg"
response = requests.get(image_urls[-1])
with open(img_path, "wb") as file:
file.write(response.content)
msg = ChatMessage(
role=MessageRole.USER,
blocks=[
TextBlock(text="Describe the image as an alternative text"),
ImageBlock(path=img_path, image_mimetype="image/jpeg"),
],
)
response = openai_llm.chat(messages=[msg])
In [ ]:
Copied!
print(response)
print(response)
assistant: A person in a black tuxedo and bow tie is holding a golden soccer ball trophy on a stage. The background is a warm yellow color with spotlights shining upwards.