Portkey¶

Portkey 是一个全栈 LLMOps 平台，能够可靠且安全地将您的生成式 AI 应用投入生产环境。

Portkey 与 Llamaindex 集成的核心特性：¶

header

🚪 AI 网关:
- 自动故障转移与重试: 确保主服务故障时应用仍可正常运行。
- 负载均衡: 在多个模型间高效分配请求流量。
- 语义缓存: 通过智能缓存机制降低成本和延迟。
🔬 可观测性:
- 请求日志: 完整记录所有请求以便监控和调试。
- 请求追踪: 分析每个请求的流转路径以优化性能。
- 自定义标签: 通过标签分类请求以获取更深入的洞察。
📝 基于用户反馈的持续优化:
- 反馈收集: 无缝收集生成内容或对话层面的服务反馈。
- 加权反馈: 通过为反馈值附加权重获取更精细的评估数据。
- 反馈元数据: 为反馈添加自定义上下文信息，实现更丰富的分析和洞察。
🔑 密钥安全管理:
- 虚拟密钥: 将原始供应商密钥转换为虚拟密钥，确保主凭证不被直接使用。
- 多标识符支持: 可为同一供应商添加多个密钥，或将同一密钥以不同名称存储，在保障安全的前提下实现便捷识别。

开始使用前请完成以下配置：

$\"Open$

如果您在 Colab 上打开此 Notebook，可能需要安装 LlamaIndex 🦙。

In [ ]:

Copied!

%pip install llama-index-llms-portkey
%pip install llama-index-llms-portkey

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!





# Installing Llamaindex & Portkey SDK
!pip install -U llama_index
!pip install -U portkey-ai

# Importing necessary libraries and modules
from llama_index.llms.portkey import Portkey
from llama_index.core.llms import ChatMessage
import portkey as pk
# Installing Llamaindex & Portkey SDK
!pip install -U llama_index
!pip install -U portkey-ai

# Importing necessary libraries and modules
from llama_index.llms.portkey import Portkey
from llama_index.core.llms import ChatMessage
import portkey as pk

您无需安装任何其他 SDK 或在 Llamaindex 应用中导入它们。

步骤 1️⃣：获取您的 Portkey API 密钥及 OpenAI、Anthropic 等虚拟密钥¶

Portkey API 密钥：登录 Portkey 平台后，点击左上角个人资料图标并选择"复制 API 密钥"。

In [ ]:

Copied!

import os

os.environ["PORTKEY_API_KEY"] = "PORTKEY_API_KEY"
import os

os.environ["PORTKEY_API_KEY"] = "PORTKEY_API_KEY"

虚拟密钥

前往 Portkey 控制面板的「虚拟密钥」页面，点击右上角的「添加密钥」按钮。
选择您的AI服务提供商（OpenAI、Anthropic、Cohere、HuggingFace等），为密钥分配唯一名称，如有需要可记录使用备注。您的虚拟密钥即刻就绪！

header 3. 现在复制下方密钥——您可在Portkey生态系统中任意使用这些密钥，同时确保原始密钥安全无虞。

In [ ]:

Copied!





openai_virtual_key_a = ""
openai_virtual_key_b = ""

anthropic_virtual_key_a = ""
anthropic_virtual_key_b = ""

cohere_virtual_key_a = ""
cohere_virtual_key_b = ""
openai_virtual_key_a = ""
openai_virtual_key_b = ""

anthropic_virtual_key_a = ""
anthropic_virtual_key_b = ""

cohere_virtual_key_a = ""
cohere_virtual_key_b = ""

如果你不想使用 Portkey 的虚拟密钥，也可以直接使用 AI 服务提供商的密钥。

In [ ]:

Copied!

os.environ["OPENAI_API_KEY"] = ""
os.environ["ANTHROPIC_API_KEY"] = ""
os.environ["OPENAI_API_KEY"] = ""
os.environ["ANTHROPIC_API_KEY"] = ""

步骤 2️⃣：配置 Portkey 功能¶

要充分发挥 Portkey 与 Llamaindex 集成的全部潜力，您可以如上图所示配置各项功能。以下是所有 Portkey 功能及其预期值的指南：

功能	配置键	值（类型）	必填
API 密钥	`api_key`	`字符串`	✅ 必填（可外部设置）
模式	`mode`	`fallback`, `loadbalance`, `single`	✅ 必填
缓存类型	`cache_status`	`simple`, `semantic`	❔ 可选
强制缓存刷新	`cache_force_refresh`	`True`, `False`	❔ 可选
缓存时效	`cache_age`	`整数`（秒）	❔ 可选
追踪 ID	`trace_id`	`字符串`	❔ 可选
重试次数	`retry`	`整数` [0,5]	❔ 可选
元数据	`metadata`	`JSON 对象` 更多信息	❔ 可选
基础 URL	`base_url`	`URL`	❔ 可选

api_key 和 mode 为必填项。
您可以通过 Portkey 构造函数设置 API 密钥，或将其设为环境变量。
共有 3 种模式 - 单一（Single）、回退（Fallback）、负载均衡（Loadbalance）。
- 单一模式 - 标准模式。若不需回退或负载均衡功能时使用。
- 回退模式 - 需启用回退功能时设置此模式。查看指南。
- 负载均衡模式 - 需启用负载均衡功能时设置此模式。查看指南。

以下为配置部分功能的示例：

In [ ]:

Copied!

portkey_client = Portkey(
    mode="single",
)

# Since we have defined the Portkey API Key with os.environ, we do not need to set api_key again here
portkey_client = Portkey(
    mode="single",
)

# Since we have defined the Portkey API Key with os.environ, we do not need to set api_key again here

步骤 3️⃣：构建大语言模型¶

通过 Portkey 集成，大语言模型的构建过程得以简化。使用 LLMOptions 函数即可适配所有服务提供商，其参数键名与您在 OpenAI 或 Anthropic 构造器中惯用的完全一致。唯一新增的参数是 weight，该参数对实现负载均衡功能至关重要。

In [ ]:

Copied!





openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    virtual_key=openai_virtual_key_a,
)
openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    virtual_key=openai_virtual_key_a,
)

上述代码展示了如何利用 LLMOptions 函数配置一个使用 OpenAI 提供商和 GPT-4 模型的大语言模型。该函数同样适用于其他服务提供商，使得不同提供商间的集成流程保持统一且高效。

步骤 4️⃣：激活 Portkey 客户端¶

使用 LLMOptions 函数构建 LLM 后，下一步是通过 Portkey 激活它。此步骤对于确保 LLM 能够使用所有 Portkey 功能至关重要。

In [ ]:

Copied!

portkey_client.add_llms(openai_llm)
portkey_client.add_llms(openai_llm)

就这样！只需 4 个步骤，您就为 Llamaindex 应用注入了成熟的生产级能力。

🔧 测试集成功能¶

让我们确认所有配置是否正确。下面我们将创建一个简单的聊天场景，并通过 Portkey 客户端传递以查看响应。

In [ ]:

Copied!





messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]
print("Testing Portkey Llamaindex integration:")
response = portkey_client.chat(messages)
print(response)
messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]
print("Testing Portkey Llamaindex integration:")
response = portkey_client.chat(messages)
print(response)

以下是在 Portkey 仪表盘上查看日志的显示效果：

Logs

⏩ 流式响应¶

通过 Portkey，实现流式响应变得前所未有的简单。Portkey 提供 4 种响应函数：

.complete(prompt)
.stream_complete(prompt)
.chat(messages)
.stream_chat(messages)

complete 函数接收字符串输入（str），而 chat 函数则处理 ChatMessage 对象数组。

使用示例：

In [ ]:

Copied!





# Let's set up a prompt and then use the stream_complete function to obtain a streamed response.

prompt = "Why is the sky blue?"

print("\nTesting Stream Complete:\n")
response = portkey_client.stream_complete(prompt)
for i in response:
    print(i.delta, end="", flush=True)

# Let's prepare a set of chat messages and then utilize the stream_chat function to achieve a streamed chat response.

messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]

print("\nTesting Stream Chat:\n")
response = portkey_client.stream_chat(messages)
for i in response:
    print(i.delta, end="", flush=True)
# Let's set up a prompt and then use the stream_complete function to obtain a streamed response.

prompt = "Why is the sky blue?"

print("\nTesting Stream Complete:\n")
response = portkey_client.stream_complete(prompt)
for i in response:
    print(i.delta, end="", flush=True)

# Let's prepare a set of chat messages and then utilize the stream_chat function to achieve a streamed chat response.

messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]

print("\nTesting Stream Chat:\n")
response = portkey_client.stream_chat(messages)
for i in response:
    print(i.delta, end="", flush=True)

🔍 回顾与参考¶

恭喜！🎉 您已成功完成 Portkey 与 Llamaindex 的集成配置和测试。以下是步骤回顾：

通过 pip 安装 portkey-ai：pip install portkey-ai
从 llama_index.llms 导入 Portkey：from llama_index.llms import Portkey
获取 Portkey API 密钥并在此处创建虚拟供应商密钥
构建 Portkey 客户端并设置模式：portkey_client=Portkey(mode="fallback")
使用 LLMOptions 配置供应商 LLM：openai_llm = pk.LLMOptions(provider="openai", model="gpt-4", virtual_key=openai_key_a)
将 LLM 添加至 Portkey：portkey_client.add_llms(openai_llm)
像调用常规 LLM 一样使用 Portkey 方法：portkey_client.chat(messages)

功能及参数指南：

Portkey LLM 构造函数
LLMOptions 构造函数
Portkey + Llamaindex 功能列表

$\"Open$

🔁 使用 Portkey 实现后备方案与重试机制¶

后备方案和重试机制是构建健壮 AI 应用的关键要素。通过 Portkey，这些功能的实现变得简单直观：

后备方案：当主服务或模型失效时，Portkey 会自动切换至备用模型。
重试机制：若请求失败，可配置 Portkey 进行多次自动重试。

以下示例展示如何通过 Portkey 配置后备方案与重试功能：

In [ ]:

Copied!





portkey_client = Portkey(mode="fallback")
messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]

llm1 = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    retry_settings={"on_status_codes": [429, 500], "attempts": 2},
    virtual_key=openai_virtual_key_a,
)

llm2 = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_b,
)

portkey_client.add_llms(llm_params=[llm1, llm2])

print("Testing Fallback & Retry functionality:")
response = portkey_client.chat(messages)
print(response)
portkey_client = Portkey(mode="fallback")
messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]

llm1 = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    retry_settings={"on_status_codes": [429, 500], "attempts": 2},
    virtual_key=openai_virtual_key_a,
)

llm2 = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_b,
)

portkey_client.add_llms(llm_params=[llm1, llm2])

print("Testing Fallback & Retry functionality:")
response = portkey_client.chat(messages)
print(response)

⚖️ 使用 Portkey 实现负载均衡¶

负载均衡能确保传入的请求被高效地分配到多个模型之间。这不仅能提升性能，还能在某个模型出现故障时提供冗余保障。

通过 Portkey 实现负载均衡非常简单，您只需：

为每个大语言模型定义 weight 参数。该权重决定了请求在不同模型间的分配比例。
确保所有模型的权重总和等于 1。

以下是通过 Portkey 设置负载均衡的示例：

In [ ]:

Copied!





portkey_client = Portkey(mode="ab_test")

messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]

llm1 = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    virtual_key=openai_virtual_key_a,
    weight=0.2,
)

llm2 = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    weight=0.8,
)

portkey_client.add_llms(llm_params=[llm1, llm2])

print("Testing Loadbalance functionality:")
response = portkey_client.chat(messages)
print(response)
portkey_client = Portkey(mode="ab_test")

messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?"),
]

llm1 = pk.LLMOptions(
    provider="openai",
    model="gpt-4",
    virtual_key=openai_virtual_key_a,
    weight=0.2,
)

llm2 = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    weight=0.8,
)

portkey_client.add_llms(llm_params=[llm1, llm2])

print("Testing Loadbalance functionality:")
response = portkey_client.chat(messages)
print(response)

🧠 使用 Portkey 实现语义缓存¶

语义缓存是一种智能缓存机制，它能够理解请求的上下文。不同于仅基于完全匹配输入的传统缓存方式，语义缓存可以识别相似的请求并返回缓存结果，从而减少冗余请求、提升响应速度并节省成本。

下面介绍如何使用 Portkey 实现语义缓存：

In [ ]:

Copied!





import time

portkey_client = Portkey(mode="single")

openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    cache_status="semantic",
)

portkey_client.add_llms(openai_llm)

current_messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What are the ingredients of a pizza?"),
]

print("Testing Portkey Semantic Cache:")

start = time.time()
response = portkey_client.chat(current_messages)
end = time.time() - start

print(response)
print(f"{'-'*50}\nServed in {end} seconds.\n{'-'*50}")

new_messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="Ingredients of pizza"),
]

print("Testing Portkey Semantic Cache:")

start = time.time()
response = portkey_client.chat(new_messages)
end = time.time() - start

print(response)
print(f"{'-'*50}\nServed in {end} seconds.\n{'-'*50}")
import time

portkey_client = Portkey(mode="single")

openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    cache_status="semantic",
)

portkey_client.add_llms(openai_llm)

current_messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What are the ingredients of a pizza?"),
]

print("Testing Portkey Semantic Cache:")

start = time.time()
response = portkey_client.chat(current_messages)
end = time.time() - start

print(response)
print(f"{'-'*50}\nServed in {end} seconds.\n{'-'*50}")

new_messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="Ingredients of pizza"),
]

print("Testing Portkey Semantic Cache:")

start = time.time()
response = portkey_client.chat(new_messages)
end = time.time() - start

print(response)
print(f"{'-'*50}\nServed in {end} seconds.\n{'-'*50}")

Portkey 的缓存系统额外支持两项关键功能——强制刷新（Force Refresh）和缓存时效（Age）。

cache_force_refresh: 强制向提供商发送请求而非从缓存中读取数据
cache_age: 设定特定字符串缓存自动刷新的时间间隔（以秒为单位）

使用方法如下：

In [ ]:

Copied!





# Setting the cache status as `semantic` and cache_age as 60s.
openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    cache_force_refresh=True,
    cache_age=60,
)
# Setting the cache status as `semantic` and cache_age as 60s.
openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    cache_force_refresh=True,
    cache_age=60,
)

🔬 Portkey 可观测性¶

洞察应用程序的行为至关重要。Portkey 的可观测性功能让您能够轻松监控、调试和优化 AI 应用。您可以追踪每个请求、了解其执行路径，并根据自定义标签进行分类。这种精细度能帮助识别性能瓶颈、优化成本并提升整体用户体验。

以下是配置 Portkey 可观测性的方法：

In [ ]:

Copied!





metadata = {
    "_environment": "production",
    "_prompt": "test",
    "_user": "user",
    "_organisation": "acme",
}

trace_id = "llamaindex_portkey"

portkey_client = Portkey(mode="single")

openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    metadata=metadata,
    trace_id=trace_id,
)

portkey_client.add_llms(openai_llm)

print("Testing Observability functionality:")
response = portkey_client.chat(messages)
print(response)
metadata = {
    "_environment": "production",
    "_prompt": "test",
    "_user": "user",
    "_organisation": "acme",
}

trace_id = "llamaindex_portkey"

portkey_client = Portkey(mode="single")

openai_llm = pk.LLMOptions(
    provider="openai",
    model="gpt-3.5-turbo",
    virtual_key=openai_virtual_key_a,
    metadata=metadata,
    trace_id=trace_id,
)

portkey_client.add_llms(openai_llm)

print("Testing Observability functionality:")
response = portkey_client.chat(messages)
print(response)

🌉 开源 AI 网关¶

Portkey 的 AI 网关内部采用开源项目 Rubeus 构建。Rubeus 提供了大语言模型互操作性、负载均衡、故障转移等核心功能，并作为中间层确保请求获得最优处理。

使用 Portkey 的优势之一在于其灵活性。您可以轻松定制其行为，将请求重定向至不同供应商，甚至完全绕过 Portkey 的日志记录功能。

以下是通过 Portkey 定制行为的示例：

portkey_client.base_url=None

📝 通过 Portkey 提供反馈¶

持续改进是人工智能的基石。为确保您的模型和应用不断进化并更好地服务用户，反馈至关重要。Portkey 的反馈 API 提供了一种简单的方式来收集用户的加权反馈，使您能够持续优化和改进。

以下是使用 Portkey 反馈 API 的方法：

了解更多关于反馈功能的信息。

In [ ]:

Copied!





import requests
import json

# Endpoint URL
url = "https://api.portkey.ai/v1/feedback"

# Headers
headers = {
    "x-portkey-api-key": os.environ.get("PORTKEY_API_KEY"),
    "Content-Type": "application/json",
}

# Data
data = {"trace_id": "llamaindex_portkey", "value": 1}

# Making the request
response = requests.post(url, headers=headers, data=json.dumps(data))

# Print the response
print(response.text)
import requests
import json

# Endpoint URL
url = "https://api.portkey.ai/v1/feedback"

# Headers
headers = {
    "x-portkey-api-key": os.environ.get("PORTKEY_API_KEY"),
    "Content-Type": "application/json",
}

# Data
data = {"trace_id": "llamaindex_portkey", "value": 1}

# Making the request
response = requests.post(url, headers=headers, data=json.dumps(data))

# Print the response
print(response.text)

所有包含 weight 和 value 参数的追踪 ID 反馈数据均可在 Portkey 仪表盘中查看：

Feedback

✅ 结论¶

将 Portkey 与 Llamaindex 集成，可显著简化构建健壮且具备弹性的 AI 应用程序的流程。通过语义缓存、可观测性、负载均衡、反馈机制和故障回退等功能，您能确保应用始终处于最佳性能状态并实现持续优化。

按照本指南操作后，您已完成 Portkey 与 Llamaindex 的集成配置和测试。在后续构建和部署 AI 应用时，请充分发挥该集成的全部潜力！

如需进一步帮助或咨询，请联系开发团队 ➡️

加入我们的实践者社区，共同推进 LLM 生产化 ➡️