使用 Optimum-Intel 优化嵌入模型¶

LlamaIndex 支持通过 Optimum-Intel 库加载面向英特尔平台的量化嵌入模型。

优化后的模型体积更小、速度更快且精度损失极小，具体可参阅文档说明以及使用 IntelLabs/fastRAG 库的优化指南。

该优化基于第四代至强®（Xeon®）或更新处理器中的数学指令集实现。

要加载和使用量化模型，需安装以下依赖项：pip install optimum[exporters] optimum-intel neural-compressor intel_extension_for_pytorch。

通过 IntelEmbedding 类实现模型加载，其用法与任何本地 HuggingFace 嵌入模型类似，示例如下：

In [ ]:

Copied!

%pip install llama-index-embeddings-huggingface-optimum-intel
%pip install llama-index-embeddings-huggingface-optimum-intel

In [ ]:

Copied!

from llama_index.embeddings.huggingface_optimum_intel import IntelEmbedding

embed_model = IntelEmbedding("Intel/bge-small-en-v1.5-rag-int8-static")
from llama_index.embeddings.huggingface_optimum_intel import IntelEmbedding

embed_model = IntelEmbedding("Intel/bge-small-en-v1.5-rag-int8-static")

In [ ]:

Copied!

embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])

384
[-0.0032782123889774084, -0.013396517373621464, 0.037944991141557693, -0.04642259329557419, 0.027709005400538445]