HuggingFaceEmbedding¶
基础类 HuggingFaceEmbedding 是一个通用封装器,适用于所有 HuggingFace 的嵌入模型。所有 Hugging Face 上的嵌入模型 都应兼容运行。更多推荐模型可参考 嵌入模型排行榜。
该类依赖于 sentence-transformers 包,可通过 pip install sentence-transformers 命令安装。
注意:若您此前使用过 LangChain 中的 HuggingFaceEmbeddings,本类将提供等效功能。
如果您在 Colab 上打开此 Notebook,可能需要安装 LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-embeddings-huggingface
%pip install llama-index-embeddings-huggingface
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# loads https://huggingface.co/BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# loads https://huggingface.co/BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
In [ ]:
Copied!
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
384 [-0.003275700844824314, -0.011690810322761536, 0.041559211909770966, -0.03814814239740372, 0.024183044210076332]
基准测试¶
让我们尝试使用一份经典的大型文档进行对比——IPCC气候报告的第三章。
In [ ]:
Copied!
!curl https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf
!curl https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 20.7M 100 20.7M 0 0 69.6M 0 --:--:-- --:--:-- --:--:-- 70.0M
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
documents = SimpleDirectoryReader(
input_files=["IPCC_AR6_WGII_Chapter03.pdf"]
).load_data()
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
documents = SimpleDirectoryReader(
input_files=["IPCC_AR6_WGII_Chapter03.pdf"]
).load_data()
基础 HuggingFace 嵌入¶
In [ ]:
Copied!
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# loads BAAI/bge-small-en-v1.5 with the default torch backend
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5",
device="cpu",
embed_batch_size=8,
)
test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# loads BAAI/bge-small-en-v1.5 with the default torch backend
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5",
device="cpu",
embed_batch_size=8,
)
test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model
In [ ]:
Copied!
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(documents, show_progress=True)
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(documents, show_progress=True)
Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 428.44it/s] Generating embeddings: 100%|██████████| 459/459 [00:19<00:00, 23.32it/s]
20.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
ONNX 嵌入¶
In [ ]:
Copied!
# pip install sentence-transformers[onnx]
# loads BAAI/bge-small-en-v1.5 with the onnx backend
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5",
device="cpu",
backend="onnx",
model_kwargs={
"provider": "CPUExecutionProvider"
}, # For ONNX, you can specify the provider, see https://sbert.net/docs/sentence_transformer/usage/efficiency.html
)
test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model
# pip install sentence-transformers[onnx]
# loads BAAI/bge-small-en-v1.5 with the onnx backend
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5",
device="cpu",
backend="onnx",
model_kwargs={
"provider": "CPUExecutionProvider"
}, # For ONNX, you can specify the provider, see https://sbert.net/docs/sentence_transformer/usage/efficiency.html
)
test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model
In [ ]:
Copied!
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(documents, show_progress=True)
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(documents, show_progress=True)
Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 421.63it/s] Generating embeddings: 100%|██████████| 459/459 [00:31<00:00, 14.53it/s]
32.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
OpenVINO 嵌入技术¶
In [ ]:
Copied!
# pip install sentence-transformers[openvino]
# loads BAAI/bge-small-en-v1.5 with the openvino backend
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5",
device="cpu",
backend="openvino", # OpenVINO is very strong on CPUs
revision="refs/pr/16", # BAAI/bge-small-en-v1.5 itself doesn't have an OpenVINO model currently, but there's a PR with it that we can load: https://huggingface.co/BAAI/bge-small-en-v1.5/discussions/16
model_kwargs={
"file_name": "openvino_model_qint8_quantized.xml"
}, # If we're using an optimized/quantized model, we need to specify the file name like this
)
test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model
# pip install sentence-transformers[openvino]
# loads BAAI/bge-small-en-v1.5 with the openvino backend
embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5",
device="cpu",
backend="openvino", # OpenVINO is very strong on CPUs
revision="refs/pr/16", # BAAI/bge-small-en-v1.5 itself doesn't have an OpenVINO model currently, but there's a PR with it that we can load: https://huggingface.co/BAAI/bge-small-en-v1.5/discussions/16
model_kwargs={
"file_name": "openvino_model_qint8_quantized.xml"
}, # If we're using an optimized/quantized model, we need to specify the file name like this
)
test_embeds = embed_model.get_text_embedding("Hello World!")
Settings.embed_model = embed_model
In [ ]:
Copied!
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(documents, show_progress=True)
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(documents, show_progress=True)
Parsing nodes: 100%|██████████| 172/172 [00:00<00:00, 403.15it/s] Generating embeddings: 100%|██████████| 459/459 [00:08<00:00, 53.83it/s]
9.03 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
参考文献¶
- 本地嵌入模型 详细介绍了如何使用此类本地模型。
- Sentence Transformers > 加速推理 包含关于如何高效使用后端选项的完整文档,包括针对 ONNX 和 OpenVINO 的优化与量化技术。