NVIDIA 大语言模型文本补全 API¶
扩展 NVIDIA 类以支持以下模型的 /completion API:
- bigcode/starcoder2-7b
- bigcode/starcoder2-15b
安装¶
In [ ]:
Copied!
!pip install --force-reinstall llama_index-llms-nvidia
!pip install --force-reinstall llama_index-llms-nvidia
In [ ]:
Copied!
!which python
!which python
In [ ]:
Copied!
import getpass
import os
# del os.environ['NVIDIA_API_KEY'] ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
assert nvapi_key.startswith(
"nvapi-"
), f"{nvapi_key[:5]}... is not a valid key"
os.environ["NVIDIA_API_KEY"] = nvapi_key
import getpass
import os
# del os.environ['NVIDIA_API_KEY'] ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
assert nvapi_key.startswith(
"nvapi-"
), f"{nvapi_key[:5]}... is not a valid key"
os.environ["NVIDIA_API_KEY"] = nvapi_key
In [ ]:
Copied!
os.environ["NVIDIA_API_KEY"]
os.environ["NVIDIA_API_KEY"]
In [ ]:
Copied!
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()
In [ ]:
Copied!
from llama_index.llms.nvidia import NVIDIA
llm = NVIDIA(model="bigcode/starcoder2-15b", use_chat_completions=False)
from llama_index.llms.nvidia import NVIDIA
llm = NVIDIA(model="bigcode/starcoder2-15b", use_chat_completions=False)
可用模型¶
is_chat_model
可用于获取可用的文本补全模型
In [ ]:
Copied!
print([model for model in llm.available_models if model.is_chat_model])
print([model for model in llm.available_models if model.is_chat_model])
使用 NVIDIA NIM 微服务¶
除了连接托管的 NVIDIA NIM 服务外,该连接器还可用于连接本地微服务实例。这使您能够在必要时将应用程序部署到本地环境。
有关如何设置本地微服务实例的说明,请参阅:https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/
In [ ]:
Copied!
from llama_index.llms.nvidia import NVIDIA
# connect to an chat NIM running at localhost:8080, spcecifying a specific model
llm = NVIDIA(base_url="http://localhost:8080/v1")
from llama_index.llms.nvidia import NVIDIA
# connect to an chat NIM running at localhost:8080, spcecifying a specific model
llm = NVIDIA(base_url="http://localhost:8080/v1")
In [ ]:
Copied!
print(llm.complete("# Function that does quicksort:"))
print(llm.complete("# Function that does quicksort:"))
正如 LlamaIndex 预期的那样——我们得到了一个 CompletionResponse
作为响应。
异步完成:.acomplete()
¶
同样也提供了异步实现方式,可按相同方式调用!
In [ ]:
Copied!
await llm.acomplete("# Function that does quicksort:")
await llm.acomplete("# Function that does quicksort:")
流式传输¶
In [ ]:
Copied!
x = llm.stream_complete(prompt="# Reverse string in python:", max_tokens=512)
x = llm.stream_complete(prompt="# Reverse string in python:", max_tokens=512)
In [ ]:
Copied!
for t in x:
print(t.delta, end="")
for t in x:
print(t.delta, end="")
异步流式处理¶
In [ ]:
Copied!
x = await llm.astream_complete(
prompt="# Reverse program in python:", max_tokens=512
)
x = await llm.astream_complete(
prompt="# Reverse program in python:", max_tokens=512
)
In [ ]:
Copied!
async for t in x:
print(t.delta, end="")
async for t in x:
print(t.delta, end="")