在LlamaIndex抽象层中自定义LLM模型#

您可以将这些LLM抽象层插件集成到LlamaIndex的其他模块中（索引、检索器、查询引擎、代理），从而构建高级数据工作流。

默认情况下，我们使用OpenAI的gpt-3.5-turbo模型。但您也可以自定义底层使用的LLM模型。

示例：更换底层LLM模型#

以下代码片段展示了如何自定义使用的LLM模型。

本示例中，我们使用gpt-4o-mini替代默认的gpt-3.5-turbo。可用模型包括gpt-4o-mini、gpt-4o、o3-mini等。

from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# define LLM
llm = OpenAI(temperature=0.1, model="gpt-4o-mini")

# change the global default LLM
Settings.llm = llm

documents = SimpleDirectoryReader("data").load_data()

# build index
index = VectorStoreIndex.from_documents(documents)

# locally override the LLM
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query(
    "What did the author do after his time at Y Combinator?"
)

示例：使用自定义LLM模型 - 高级用法#

要使用自定义LLM模型，您只需实现LLM类（或更简单的CustomLLM接口）您需要负责将文本传递给模型并返回新生成的标记。

这个实现可以是本地模型，甚至是您自己API的封装。

注意：如需完全私有化体验，还需设置本地嵌入模型。

以下是基础模板示例：

from typing import Optional, List, Mapping, Any

from llama_index.core import SimpleDirectoryReader, SummaryIndex
from llama_index.core.callbacks import CallbackManager
from llama_index.core.llms import (
    CustomLLM,
    CompletionResponse,
    CompletionResponseGen,
    LLMMetadata,
)
from llama_index.core.llms.callbacks import llm_completion_callback
from llama_index.core import Settings


class OurLLM(CustomLLM):
    context_window: int = 3900
    num_output: int = 256
    model_name: str = "custom"
    dummy_response: str = "My response"

    @property
    def metadata(self) -> LLMMetadata:
        """Get LLM metadata."""
        return LLMMetadata(
            context_window=self.context_window,
            num_output=self.num_output,
            model_name=self.model_name,
        )

    @llm_completion_callback()
    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
        return CompletionResponse(text=self.dummy_response)

    @llm_completion_callback()
    def stream_complete(
        self, prompt: str, **kwargs: Any
    ) -> CompletionResponseGen:
        response = ""
        for token in self.dummy_response:
            response += token
            yield CompletionResponse(text=response, delta=token)


# define our LLM
Settings.llm = OurLLM()

# define embed model
Settings.embed_model = "local:BAAI/bge-base-en-v1.5"


# Load the your data
documents = SimpleDirectoryReader("./data").load_data()
index = SummaryIndex.from_documents(documents)

# Query and print response
query_engine = index.as_query_engine()
response = query_engine.query("<query_text>")
print(response)

通过这种方法，您可以使用任何LLM模型。无论是本地运行的模型，还是您自己服务器上的模型。只要实现了类并返回生成的标记，就能正常工作。注意需要使用提示助手来自定义提示大小，因为每个模型的上下文长度略有不同。

装饰器是可选的，但能通过回调提供LLM调用的可观测性。

请注意，您可能需要调整内部提示以获得良好性能。即便如此，您仍应使用足够大的LLM模型以确保其能处理LlamaIndex内部使用的复杂查询，因此实际效果可能因情况而异。

所有默认内部提示列表可查看此处，聊天专用提示列表在此处。您也可以实现自定义提示。