通过 LlamaIndex 与部署在 Amazon SageMaker 终端节点的 LLM 交互¶

Amazon SageMaker 终端节点是一项全托管资源，支持部署机器学习模型（特别是大语言模型 LLM）用于对新数据进行预测。

本笔记本演示了如何使用 SageMakerLLM 与 LLM 终端节点交互，从而解锁更多 llamaIndex 功能。因此，本文档假设 LLM 已部署在 SageMaker 终端节点上。

环境配置¶

如果您在 Colab 上打开这个 Notebook，很可能需要先安装 LlamaIndex 🦙。

In [ ]:

Copied!

%pip install llama-index-llms-sagemaker-endpoint
%pip install llama-index-llms-sagemaker-endpoint

In [ ]:

Copied!

! pip install llama-index
! pip install llama-index

必须指定端点名称才能进行交互。

In [ ]:

Copied!

ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"
ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"

需要提供凭证以连接到终端节点。您可以选择以下方式之一：

通过指定 profile_name 参数使用 AWS 配置文件，若未指定则将使用默认凭证配置文件
直接传递凭证参数（aws_access_key_id、aws_secret_access_key、aws_session_token、region_name）

更多详情请查看此链接。

AWS 配置文件名

In [ ]:

Copied!





from llama_index.llms.sagemaker_endpoint import SageMakerLLM

AWS_ACCESS_KEY_ID = "<-YOUR-AWS-ACCESS-KEY-ID->"
AWS_SECRET_ACCESS_KEY = "<-YOUR-AWS-SECRET-ACCESS-KEY->"
AWS_SESSION_TOKEN = "<-YOUR-AWS-SESSION-TOKEN->"
REGION_NAME = "<-YOUR-ENDPOINT-REGION-NAME->"
from llama_index.llms.sagemaker_endpoint import SageMakerLLM

AWS_ACCESS_KEY_ID = "<-YOUR-AWS-ACCESS-KEY-ID->"
AWS_SECRET_ACCESS_KEY = "<-YOUR-AWS-SECRET-ACCESS-KEY->"
AWS_SESSION_TOKEN = "<-YOUR-AWS-SESSION-TOKEN->"
REGION_NAME = "<-YOUR-ENDPOINT-REGION-NAME->"

In [ ]:

Copied!





llm = SageMakerLLM(
    endpoint_name=ENDPOINT_NAME,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    aws_session_token=AWS_SESSION_TOKEN,
    region_name=REGION_NAME,
)
llm = SageMakerLLM(
    endpoint_name=ENDPOINT_NAME,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    aws_session_token=AWS_SESSION_TOKEN,
    region_name=REGION_NAME,
)

使用凭证时：

In [ ]:

Copied!





from llama_index.llms.sagemaker_endpoint import SageMakerLLM

ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"
PROFILE_NAME = "<-YOUR-PROFILE-NAME->"
llm = SageMakerLLM(
    endpoint_name=ENDPOINT_NAME, profile_name=PROFILE_NAME
)  # Omit the profile name to use the default profile
from llama_index.llms.sagemaker_endpoint import SageMakerLLM

ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"
PROFILE_NAME = "<-YOUR-PROFILE-NAME->"
llm = SageMakerLLM(
    endpoint_name=ENDPOINT_NAME, profile_name=PROFILE_NAME
)  # Omit the profile name to use the default profile

基本用法¶

使用提示词调用 `complete` 方法¶

In [ ]:

Copied!





resp = llm.complete(
    "Paul Graham is ", formatted=True
)  # formatted=True to avoid adding system prompt
print(resp)
resp = llm.complete(
    "Paul Graham is ", formatted=True
)  # formatted=True to avoid adding system prompt
print(resp)

66 years old (birthdate: September 4, 1951). He is a British-American computer scientist, programmer, and entrepreneur who is known for his work in the fields of artificial intelligence, machine learning, and computer vision. He is a professor emeritus at Stanford University and a researcher at the Stanford Artificial Intelligence Lab (SAIL).

Graham has made significant contributions to the field of computer science, including the development of the concept of "n-grams," which are sequences of n items that occur together in a dataset. He has also worked on the development of machine learning algorithms and has written extensively on the topic of machine learning.

Graham has received numerous awards for his work, including the Association for Computing Machinery (ACM) A.M. Turing Award, the IEEE Neural Networks Pioneer Award, and the IJCAI Award

使用消息列表调用 `chat` 方法¶

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

In [ ]:

Copied!

print(resp)
print(resp)

assistant:   Arrrr, shiver me timbers! *adjusts eye patch* Me name be Cap'n Blackbeak, the most feared and infamous pirate on the seven seas! *winks*

*ahem* But enough about me, matey. What be bringin' ye to these fair waters? Are ye here to plunder some booty, or just to share a pint o' grog with a salty old sea dog like meself? *chuckles*

流式处理¶

使用 `stream_complete` 端点¶

In [ ]:

Copied!

resp = llm.stream_complete("Paul Graham is ", formatted=True)
resp = llm.stream_complete("Paul Graham is ", formatted=True)

In [ ]:

Copied!

for r in resp:
    print(r.delta)
for r in resp:
    print(r.delta)

64 today. He’s a computer sci
ist, entrepreneur, and writer, best known for his work in the fields of artificial intelligence, machine learning, and computer graphics.
Graham was born in 1956 in Boston, Massachusetts. He earned his Bachelor’s degree in Computer Science from Harvard University in 1978 and his PhD in Computer Science from the University of California, Berkeley in 1982.
Graham’s early work focused on the development of the first computer graphics systems that could generate photorealistic images. In the 1980s, he became interested in the field of artificial intelligence and machine learning, and he co-founded a number of companies to explore these areas, including Viaweb, which was one of the first commercial web hosting services.
Graham is also a prolific writer and has published a number of influential essays on topics such as the nature

使用 `stream_chat` 端点¶

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)

In [ ]:

Copied!

for r in resp:
    print(r.delta, end="")
for r in resp:
    print(r.delta, end="")

  ARRGH! *adjusts eye patch* Me hearty? *winks* Me name be Captain Blackbeak, the most feared and infamous pirate to ever sail the seven seas! *chuckles* Or, at least, that's what me matey mates tell me. *winks*

So, what be bringin' ye to these waters, matey? Are ye here to plunder some booty or just to hear me tales of the high seas? *grins* Either way, I be ready to share me treasure with ye! *winks* Just don't be tellin' any landlubbers about me hidden caches o' gold, or ye might be walkin' the plank, savvy? *winks*

配置模型¶

SageMakerLLM 是一个用于与部署在 Amazon SageMaker 上的不同语言模型（LLM）进行交互的抽象层。所有默认参数均与 Llama 2 模型兼容。因此，如果您使用其他模型，可能需要设置以下参数：

messages_to_prompt：一个可调用对象，接收包含 ChatMessage 对象的列表（若消息中未指定则包含系统提示），并返回符合终端 LLM 格式的消息字符串。
completion_to_prompt：一个可调用对象，接收带有系统提示的补全字符串，并返回符合终端 LLM 格式的字符串。
content_handler：继承自 llama_index.llms.sagemaker_llm_endpoint_utils.BaseIOHandler 的类，需实现以下方法：serialize_input、deserialize_output、deserialize_streaming_output 和 remove_prefix。

通过 LlamaIndex 与部署在 Amazon SageMaker 终端节点的 LLM 交互¶

环境配置¶

基本用法¶

使用提示词调用 complete 方法¶

使用消息列表调用 chat 方法¶

流式处理¶

使用 stream_complete 端点¶

使用 stream_chat 端点¶

配置模型¶

使用提示词调用 `complete` 方法¶

使用消息列表调用 `chat` 方法¶

使用 `stream_complete` 端点¶

使用 `stream_chat` 端点¶