OpenVINO GenAI 大语言模型¶

OpenVINO™ 是一个用于优化和部署人工智能推理的开源工具包。OpenVINO™ 运行时能够实现在多种硬件设备上运行经过优化的同一模型。该工具包可加速深度学习在以下应用场景中的性能表现：语言与大语言模型、计算机视觉、自动语音识别等。

OpenVINOGenAILLM 是 OpenVINO-GenAI API 的封装器。通过这个由 LlamaIndex 封装的实体，可以在本地运行 OpenVINO 模型：

在下方这行代码中，我们安装了本演示所需的软件包：

In [ ]:

Copied!

%pip install llama-index-llms-openvino-genai
%pip install llama-index-llms-openvino-genai

In [ ]:

Copied!

%pip install optimum[openvino]
%pip install optimum[openvino]

现在我们已经完成设置，可以开始操作了：

如果您在 Colab 上打开此 Notebook，可能需要安装 LlamaIndex 🦙。

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!

from llama_index.llms.openvino_genai import OpenVINOGenAILLM
from llama_index.llms.openvino_genai import OpenVINOGenAILLM

/home2/ethan/intel/llama_index/llama_test/lib/python3.10/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_path" in OpenVINOGenAILLM has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(

模型导出¶

您可以通过CLI工具将模型导出为OpenVINO IR格式，并从本地文件夹加载该模型。

In [ ]:

Copied!

!optimum-cli export openvino --model microsoft/Phi-3-mini-4k-instruct --task text-generation-with-past --weight-format int4 model_path
!optimum-cli export openvino --model microsoft/Phi-3-mini-4k-instruct --task text-generation-with-past --weight-format int4 model_path

您可以从 Hugging Face 的 OpenVINO 模型中心下载优化后的 IR 模型。

In [ ]:

Copied!

import huggingface_hub as hf_hub

model_id = "OpenVINO/Phi-3-mini-4k-instruct-int4-ov"
model_path = "Phi-3-mini-4k-instruct-int4-ov"

hf_hub.snapshot_download(model_id, local_dir=model_path)
import huggingface_hub as hf_hub

model_id = "OpenVINO/Phi-3-mini-4k-instruct-int4-ov"
model_path = "Phi-3-mini-4k-instruct-int4-ov"

hf_hub.snapshot_download(model_id, local_dir=model_path)

Fetching 17 files:   0%|          | 0/17 [00:00<?, ?it/s]

Out[ ]:

'/home2/ethan/intel/llama_index/docs/docs/examples/llm/Phi-3-mini-4k-instruct-int4-ov'

模型加载¶

通过使用 OpenVINOGenAILLM 方法指定模型参数即可加载模型。

若设备配备英特尔 GPU，可通过指定 device="gpu" 参数在其上运行推理。

In [ ]:

Copied!





ov_llm = OpenVINOGenAILLM(
    model_path=model_path,
    device="CPU",
)
ov_llm = OpenVINOGenAILLM(
    model_path=model_path,
    device="CPU",
)

你可以通过 ov_llm.config 传递生成配置参数。支持的参数列表详见 openvino_genai.GenerationConfig。

In [ ]:

Copied!

ov_llm.config.max_new_tokens = 100
ov_llm.config.max_new_tokens = 100

In [ ]:

Copied!

response = ov_llm.complete("What is the meaning of life?")
print(str(response))
response = ov_llm.complete("What is the meaning of life?")
print(str(response))


# Answer
The meaning of life is a profound and complex question that has been debated by philosophers, theologians, scientists, and thinkers throughout history. Different cultures, religions, and individuals have their own interpretations and beliefs about what gives life purpose and significance.

From a philosophical standpoint, existentialists like Jean-Paul Sartre and Albert Camus have argued that life inherently has no meaning, and it is

流式传输¶

使用 stream_complete 端点

In [ ]:

Copied!

response = ov_llm.stream_complete("Who is Paul Graham?")
for r in response:
    print(r.delta, end="")
response = ov_llm.stream_complete("Who is Paul Graham?")
for r in response:
    print(r.delta, end="")


Paul Graham is a computer scientist and entrepreneur who is best known for founding the startup accelerator program Y Combinator. He is also the founder of the web development company Viaweb, which was acquired by PayPal for $497 million in 1raneworks.

What is Y Combinator?

Y Combinator is a startup accelerator program that provides funding, mentorship, and resources to early-stage start

使用 stream_chat 端点

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = ov_llm.stream_chat(messages)

for r in resp:
    print(r.delta, end="")
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = ov_llm.stream_chat(messages)

for r in resp:
    print(r.delta, end="")

I'm Phi, Microsoft's AI assistant. How can I assist you today?

如需了解更多信息，请参考：