MLflow 追踪与 LlamaIndex 端到端集成¶

欢迎来到这个关于 LlamaIndex 与 MLflow 集成的交互式教程。本教程将带您通过实践学习 LlamaIndex 和 MLflow 的核心功能。

mlflow-tracing

Download this Notebook

为什么要在 MLflow 中使用 LlamaIndex？¶

LlamaIndex 与 MLflow 的集成为开发和管 LlamaIndex 应用程序提供了无缝体验：

MLflow Tracing 是一个强大的可观测性工具，用于监控和调试 LlamaIndex 模型内部运行情况，帮助您快速识别潜在瓶颈或问题。
MLflow Experiment 允许您在 MLflow 中跟踪索引/引擎/工作流，并管理构成 LlamaIndex 项目的众多动态组件，例如提示词、大语言模型、工具、全局配置等。
MLflow Model 将您的 LlamaIndex 应用程序与其所有依赖版本、输入输出接口及其他关键元数据打包成统一模块。
MLflow Evaluate 助力高效评估 LlamaIndex 应用程序性能，确保获得稳健的性能分析并实现快速迭代。

学习内容¶

完成本教程后，您将掌握：

在LlamaIndex中创建MVP版VectorStore索引
使用该索引作为查询引擎进行推理，并通过MLflow Tracing检查过程
将索引记录到MLflow实验中
探索MLflow UI界面，了解MLflow Model如何打包您的LlamaIndex应用

这些基础知识将帮助您熟悉MLflow中的LlamaIndex使用流程。如需了解更高级用例（如工具调用代理）的集成方法，请参阅此高级教程。

安装配置¶

安装 MLflow 和 LlamaIndex：

In [ ]:

Copied!

%pip install mlflow>=2.18 llama-index>=0.10.44 -q
%pip install mlflow>=2.18 llama-index>=0.10.44 -q

若尚未启动，请打开独立终端并运行 mlflow ui --port 5000 以启动 MLflow 用户界面。若在云环境中运行此笔记本，请参考如何运行教程指南了解 MLflow 的不同配置方式。
创建一个 MLflow 实验并将笔记本与之关联

In [ ]:

Copied!





import mlflow

mlflow.set_experiment("llama-index-tutorial")
mlflow.set_tracking_uri(
    "http://localhost:5000"
)  # Or your remote tracking server URI
import mlflow

mlflow.set_experiment("llama-index-tutorial")
mlflow.set_tracking_uri(
    "http://localhost:5000"
)  # Or your remote tracking server URI

将 OpenAI API 密钥设置为环境变量。若使用其他大语言模型供应商，请设置对应的环境变量。

In [ ]:

Copied!

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

启用 MLflow 追踪功能¶

只需一行代码即可为 LlamaIndex 开启 MLflow 追踪功能。

In [ ]:

Copied!

mlflow.llama_index.autolog()
mlflow.llama_index.autolog()

创建索引¶

向量存储索引是LlamaIndex的核心组件之一。它们包含已摄取文档分块的嵌入向量（有时也包含文档分块本身）。在LlamaIndex中，这些向量可通过不同类型的引擎用于推理任务。

查询引擎：执行直接查询以根据用户问题检索相关信息。适用于获取简明答案或匹配特定查询的文档，类似于搜索引擎的功能。
聊天引擎：处理需要跨多次交互保持上下文和历史记录的对话式AI任务。适用于需要对话上下文的交互式应用场景，如客服机器人或虚拟助手。

In [ ]:

Copied!





from llama_index.core import Document, VectorStoreIndex
from llama_index.core.llms import ChatMessage

# Create an index with a single dummy document
llama_index_example_document = Document.example()
index = VectorStoreIndex.from_documents([llama_index_example_document])
from llama_index.core import Document, VectorStoreIndex
from llama_index.core.llms import ChatMessage

# Create an index with a single dummy document
llama_index_example_document = Document.example()
index = VectorStoreIndex.from_documents([llama_index_example_document])

查询索引¶

让我们使用这个索引通过查询引擎执行推理。

In [ ]:

Copied!

query_response = index.as_query_engine().query("What is llama_index?")
print(query_response)
query_response = index.as_query_engine().query("What is llama_index?")
print(query_response)

除了打印出的响应外，您还应在输出单元格中看到 MLflow Trace UI。这提供了查询引擎执行流程的详细而直观的可视化，帮助您理解内部工作原理并调试可能出现的问题。

这次让我们使用聊天引擎进行另一个查询，观察执行流程的差异。

In [ ]:

Copied!





chat_response = index.as_chat_engine().chat(
    "What is llama_index?",
    chat_history=[
        ChatMessage(role="system", content="You are an expert on RAG!")
    ],
)
print(chat_response)
chat_response = index.as_chat_engine().chat(
    "What is llama_index?",
    chat_history=[
        ChatMessage(role="system", content="You are an expert on RAG!")
    ],
)
print(chat_response)

如图所示，追踪记录显示的主要区别在于：查询引擎执行的是静态工作流（RAG），而聊天引擎采用智能体工作流，能够动态地从索引中提取所需上下文。

您还可以通过访问先前创建的实验并选择Trace标签页，在MLflow UI中查看记录的追踪数据。若希望不在输出单元格显示追踪记录而仅将其记录在MLflow中，可在笔记本中运行mlflow.tracing.disable_notebook_display()命令。

使用 MLflow 保存索引¶

以下代码通过 MLflow 记录 LlamaIndex 模型，跟踪其参数和示例输入，并使用唯一的 model_uri 进行注册。这确保了在开发、测试和生产环境中模型管理的一致性和可复现性，并简化了部署和共享流程。

关键参数：

engine_type：定义 pyfunc 和 spark_udf 推理类型
input_example：通过预测定义输入签名并推断输出签名
registered_model_name：定义 MLflow 模型注册表中的模型名称

In [ ]:

Copied!





with mlflow.start_run() as run:
    model_info = mlflow.llama_index.log_model(
        index,
        artifact_path="llama_index",
        engine_type="query",
        input_example="hi",
        registered_model_name="my_llama_index_vector_store",
    )
    model_uri = model_info.model_uri
    print(f"Model identifier for loading: {model_uri}")
with mlflow.start_run() as run:
    model_info = mlflow.llama_index.log_model(
        index,
        artifact_path="llama_index",
        engine_type="query",
        input_example="hi",
        registered_model_name="my_llama_index_vector_store",
    )
    model_uri = model_info.model_uri
    print(f"Model identifier for loading: {model_uri}")

加载索引并执行推理¶

以下代码展示了加载模型后可执行的三种核心推理类型。

通过LlamaIndex加载并执行推理：该方法使用 mlflow.llama_index.load_model 加载模型，支持直接查询、对话或检索功能。当需要充分利用底层llama索引对象的完整能力时，这是理想选择。
通过MLflow PyFunc加载并执行推理：该方法使用 mlflow.pyfunc.load_model 加载模型，以通用PyFunc格式进行模型预测（引擎类型在记录模型时指定）。适用于通过 mlflow.evaluate 评估模型或部署模型进行服务化。
通过MLflow Spark UDF加载并执行推理：该方法使用 mlflow.pyfunc.spark_udf 将模型加载为Spark UDF，可在Spark DataFrame中实现跨大规模数据集的分布式推理。适合处理海量数据，与PyFunc推理类似，仅支持记录模型时定义的引擎类型。

In [ ]:

Copied!





print("\n------------- Inference via Llama Index   -------------")
index = mlflow.llama_index.load_model(model_uri)
query_response = index.as_query_engine().query("hi")
print(query_response)

print("\n------------- Inference via MLflow PyFunc -------------")
index = mlflow.pyfunc.load_model(model_uri)
query_response = index.predict("hi")
print(query_response)
print("\n------------- Inference via Llama Index   -------------")
index = mlflow.llama_index.load_model(model_uri)
query_response = index.as_query_engine().query("hi")
print(query_response)

print("\n------------- Inference via MLflow PyFunc -------------")
index = mlflow.pyfunc.load_model(model_uri)
query_response = index.predict("hi")
print(query_response)

In [ ]:

Copied!





# Optional: Spark UDF inference
show_spark_udf_inference = False
if show_spark_udf_inference:
    print("\n------------- Inference via MLflow Spark UDF -------------")
    from pyspark.sql import SparkSession

    spark = SparkSession.builder.getOrCreate()

    udf = mlflow.pyfunc.spark_udf(spark, model_uri, result_type="string")
    df = spark.createDataFrame([("hi",), ("hello",)], ["text"])
    df.withColumn("response", udf("text")).toPandas()
# Optional: Spark UDF inference
show_spark_udf_inference = False
if show_spark_udf_inference:
    print("\n------------- Inference via MLflow Spark UDF -------------")
    from pyspark.sql import SparkSession

    spark = SparkSession.builder.getOrCreate()

    udf = mlflow.pyfunc.spark_udf(spark, model_uri, result_type="string")
    df = spark.createDataFrame([("hi",), ("hello",)], ["text"])
    df.withColumn("response", udf("text")).toPandas()

探索 MLflow 实验界面¶

最后，让我们通过 MLflow 界面查看目前已记录的内容。您可以通过浏览器访问 http://localhost:5000 打开该界面，或运行以下代码单元在笔记本内直接显示。

In [ ]:

Copied!

# Directly renders MLflow UI within the notebook for easy browsing:)
IFrame(src="http://localhost:5000", width=1000, height=600)
# Directly renders MLflow UI within the notebook for easy browsing:)
IFrame(src="http://localhost:5000", width=1000, height=600)

让我们导航至屏幕左上角的实验选项卡，点击最近一次运行的记录，如下图所示。

运行页面会显示实验的总体元数据。您可以进一步导航至Artifacts选项卡查看已记录的构件（模型）。

MLflow会在运行过程中记录与模型及其环境相关的构件。大多数记录文件（如conda.yaml、python_env.yml和requirements.txt）是MLflow的标准记录内容，用于确保不同环境间的可复现性。但有两组构件是LlamaIndex特有的：

index：存储序列化向量存储的目录。详情请参阅LlamaIndex序列化文档。
settings.json：序列化的llama_index.core.Settings服务上下文。详情请参阅LlamaIndex设置文档

通过存储这些对象，MLflow能够重建您记录模型时的环境。

重要提示： MLflow不会序列化API密钥。这些密钥必须以环境变量的形式存在于模型加载环境中。

最后，您可以通过导航至 Tracing 选项卡查看本教程期间记录的所有追踪列表。点击每一行即可查看详细追踪视图，类似于先前在输出单元格中显示的界面。

定制化与后续步骤¶

在生产系统中工作时，用户通常会利用定制化的服务上下文，这可以通过 LlamaIndex 的 Settings 对象来实现。