NVIDIA NIMs¶
llama-index-llms-nvidia
软件包包含与 NVIDIA NIM 推理微服务模型构建应用的 LlamaIndex 集成组件。NIM 支持来自社区及 NVIDIA 的跨领域模型,包括聊天、嵌入和重排序模型。这些模型经过 NVIDIA 优化,可在 NVIDIA 加速基础设施上实现最佳性能,并部署为 NIM——一种易于使用的预构建容器,只需在 NVIDIA 加速基础设施上执行单一命令即可随处部署。
可通过 NVIDIA API catalog 测试 NVIDIA 托管的 NIM 部署。测试完成后,企业可使用 NVIDIA AI Enterprise 许可证从 NVIDIA API 目录导出 NIM,并在本地或云端运行,从而完全掌控其知识产权和 AI 应用。
NIM 按模型打包为容器镜像,通过 NVIDIA NGC 目录以 NGC 容器镜像形式分发。其核心是为 AI 模型推理提供简单、一致且熟悉的 API 接口。
!pip install llama-index-core
!pip install llama-index-readers-file
!pip install llama-index-llms-nvidia
!pip install llama-index-embeddings-nvidia
!pip install llama-index-postprocessor-nvidia-rerank
导入测试数据集:一份关于2021年旧金山住房建设的PDF文档。
!mkdir data
!wget "https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0" -O "data/housing_data.pdf"
--2024-05-28 17:42:44-- https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6016:18::a27d:112 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file# [following] --2024-05-28 17:42:45-- https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file Resolving ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)... 162.125.4.15, 2620:100:6016:15::a27d:10f Connecting to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)|162.125.4.15|:443... connected. HTTP request sent, awaiting response... 302 Found Location: /cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file [following] --2024-05-28 17:42:45-- https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file Reusing existing connection to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 4808625 (4.6M) [application/pdf] Saving to: ‘data/housing_data.pdf’ data/housing_data.p 100%[===================>] 4.58M 8.26MB/s in 0.6s 2024-05-28 17:42:47 (8.26 MB/s) - ‘data/housing_data.pdf’ saved [4808625/4808625]
环境配置¶
导入相关依赖项,并从 NVIDIA API 目录 获取我们将使用的两个目录托管模型(嵌入模型和重排序模型)的 NVIDIA API 密钥。
快速开始指南:
在托管 NVIDIA AI Foundation 模型的 NVIDIA 平台 注册免费账户
选择您需要的模型
在输入区域选择 Python 标签页,点击
获取API密钥
,然后点击生成密钥
复制生成的密钥并保存为 NVIDIA_API_KEY。完成此步骤后,您即可访问相关 API 端点
from llama_index.core import SimpleDirectoryReader, Settings, VectorStoreIndex
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.llms.nvidia import NVIDIA
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from google.colab import userdata
import os
os.environ["NVIDIA_API_KEY"] = userdata.get("nvidia-api-key")
让我们使用 NVIDIA 托管的 NIM 作为嵌入模型。
NVIDIA 的默认嵌入仅处理前 512 个 token,因此我们将分块大小设置为 500 以最大化嵌入的准确性。
Settings.text_splitter = SentenceSplitter(chunk_size=500)
documents = SimpleDirectoryReader("./data").load_data()
我们将嵌入模型设置为 NVIDIA 的默认配置。如果某个文本块超过模型能编码的令牌数,默认会抛出错误,因此我们设置 truncate="END"
来舍弃超出限制的令牌(由于上文设置的块大小,希望这种情况不会频繁发生)。
Settings.embed_model = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END")
index = VectorStoreIndex.from_documents(documents)
现在我们已经将数据完成嵌入并在内存中建立了索引,接下来要设置本地自托管的LLM。按照这份NIM快速入门指南,使用Docker只需5分钟即可在本地部署NIM。
下面我们将演示如何:
- 将Meta开源的
meta/llama3-8b-instruct
模型作为本地NIM使用 - 将NVIDIA托管的API目录中的
meta/llama3-70b-instruct
作为NIM使用
若使用本地NIM,请确保将base_url
修改为您部署的NIM地址!
我们将检索与问题最相关的5个文本块来生成答案。
# self-hosted NIM: if you want to use a self-hosted NIM uncomment the line below
# and comment the line using the API catalog
# Settings.llm = NVIDIA(model="meta/llama3-8b-instruct", base_url="http://your-nim-host-address:8000/v1")
# api catalog NIM: if you're using a self-hosted NIM comment the line below
# and un-comment the line using local NIM above
Settings.llm = NVIDIA(model="meta/llama3-70b-instruct")
query_engine = index.as_query_engine(similarity_top_k=20)
让我们向它提出一个简单的问题,我们知道该问题的答案在文档中的某一处(第18页)有明确解答。
response = query_engine.query(
"How many new housing units were built in San Francisco in 2021?"
)
print(response)
There was a net addition of 4,649 units to the City’s housing stock in 2021.
现在让我们向它提出一个更复杂的问题,这需要阅读表格(该表格位于文档第41页):
response = query_engine.query(
"What was the net gain in housing units in the Mission in 2021?"
)
print(response)
There is no specific information about the net gain in housing units in the Mission in 2021. The provided data is about the city's overall housing stock and production, but it does not provide a breakdown by neighborhood, including the Mission.
这可不妙!这是净新增量,并非我们想要的数值。让我们尝试使用更高级的 PDF 解析工具 LlamaParse:
!pip install llama-parse
from llama_parse import LlamaParse
# in a notebook, LlamaParse requires this to work
import nest_asyncio
nest_asyncio.apply()
# you can get a key at cloud.llamaindex.ai
os.environ["LLAMA_CLOUD_API_KEY"] = userdata.get("llama-cloud-key")
# set up parser
parser = LlamaParse(
result_type="markdown" # "markdown" and "text" are available
)
# use SimpleDirectoryReader to parse our file
file_extractor = {".pdf": parser}
documents2 = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
Started parsing the file under job_id 84cb91f7-45ec-4b99-8281-0f4beef6a892
index2 = VectorStoreIndex.from_documents(documents2)
query_engine2 = index2.as_query_engine(similarity_top_k=20)
response = query_engine2.query(
"What was the net gain in housing units in the Mission in 2021?"
)
print(response)
The net gain in housing units in the Mission in 2021 was 1,305 units.
完美!借助更完善的解析器,大语言模型现在能够回答这个问题了。
现在让我们尝试一个更棘手的问题:
response = query_engine2.query(
"How many affordable housing units were completed in 2021?"
)
print(response)
Repeat: 110
大型语言模型似乎出现了混淆;这里显示的应该是住房单元的增长百分比。
让我们尝试为模型提供更多上下文(40条而非20条),然后通过重排序器对这些文本块进行排序。这里我们将使用英伟达的重排序器:
from llama_index.postprocessor.nvidia_rerank import NVIDIARerank
query_engine3 = index2.as_query_engine(
similarity_top_k=40, node_postprocessors=[NVIDIARerank(top_n=10)]
)
response = query_engine3.query(
"How many affordable housing units were completed in 2021?"
)
print(response)
1,495
太棒了!现在这个图表是正确的(如果你好奇的话,这是在文档第35页)。