Google AlloyDB for PostgreSQL - `AlloyDBReader`¶

AlloyDB 是一款全托管的云数据库服务，具备高性能、无缝集成和卓越的可扩展性，同时保持与 PostgreSQL 100% 兼容。通过 AlloyDB 的 LlamaIndex 集成，您可以扩展数据库应用以构建支持人工智能的体验。

本指南将介绍如何使用 AlloyDB for PostgreSQL 的 AlloyDBReader 类来检索文档数据。

了解更多关于该包的信息，请访问 GitHub。

准备工作¶

运行此笔记本前，您需要完成以下步骤：

🦙 库安装¶

安装集成库 llama-index-alloydb-pg。

仅限 Colab 使用： 如需重启内核，请取消注释以下代码单元或使用重启内核按钮。对于 Vertex AI Workbench 用户，可通过顶部的按钮重启终端。

In [ ]:

Copied!

# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

🔐 身份验证¶

请以当前登录此笔记本的 IAM 用户身份向 Google Cloud 进行认证，以便访问您的 Google Cloud 项目。

若使用 Colab 运行此笔记本，请执行下方单元格并继续操作
若使用 Vertex AI Workbench，请查阅此处的环境设置说明

In [ ]:

Copied!

from google.colab import auth

auth.authenticate_user()
from google.colab import auth

auth.authenticate_user()

☁ 设置您的 Google Cloud 项目¶

设置您的 Google Cloud 项目以便在此笔记本中使用 Google Cloud 资源。

若不清楚项目 ID，可尝试以下方法：

运行 gcloud config list 命令
运行 gcloud projects list 命令
查看支持页面：查找项目 ID

In [ ]:

Copied!

# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "my-project-id"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}
# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "my-project-id"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}

基本用法¶

设置 AlloyDB 数据库值¶

在 AlloyDB 实例页面查找您的数据库值。

In [ ]:

Copied!





# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
CLUSTER = "my-cluster"  # @param {type: "string"}
INSTANCE = "my-primary"  # @param {type: "string"}
DATABASE = "my-database"  # @param {type: "string"}
TABLE_NAME = "document_store"  # @param {type: "string"}
USER = "postgres"  # @param {type: "string"}
PASSWORD = "my-password"  # @param {type: "string"}
# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
CLUSTER = "my-cluster"  # @param {type: "string"}
INSTANCE = "my-primary"  # @param {type: "string"}
DATABASE = "my-database"  # @param {type: "string"}
TABLE_NAME = "document_store"  # @param {type: "string"}
USER = "postgres"  # @param {type: "string"}
PASSWORD = "my-password"  # @param {type: "string"}

AlloyDBEngine 连接池¶

建立 AlloyDB 读取器的必要参数之一是一个 AlloyDBEngine 对象。该对象会为您的 AlloyDB 数据库配置连接池，确保应用程序能够成功连接并遵循行业最佳实践。

通过 AlloyDBEngine.from_instance() 创建连接池时，您只需提供以下 5 项信息：

project_id：AlloyDB 实例所在的 Google Cloud 项目 ID
region：AlloyDB 实例所在的区域
cluster：AlloyDB 集群名称
instance：AlloyDB 实例名称
database：目标实例上要连接的数据库名称

默认情况下，系统会采用 IAM 数据库身份验证方式。本库会使用从环境变量获取的应用默认凭据 (ADC) 所属的 IAM 主体。

您也可以选择使用内置数据库身份验证方式，通过用户名和密码访问 AlloyDB 数据库。只需向 AlloyDBEngine.from_instance() 提供以下可选参数：

user：用于内置数据库身份验证的数据库用户名
password：用于内置数据库身份验证的数据库密码

注意： 本教程演示异步接口。所有异步方法均有对应的同步方法。

In [ ]:

Copied!





from llama_index_alloydb_pg import AlloyDBEngine

engine = await AlloyDBEngine.afrom_instance(
    project_id=PROJECT_ID,
    region=REGION,
    cluster=CLUSTER,
    instance=INSTANCE,
    database=DATABASE,
    user=USER,
    password=PASSWORD,
)
from llama_index_alloydb_pg import AlloyDBEngine

engine = await AlloyDBEngine.afrom_instance(
    project_id=PROJECT_ID,
    region=REGION,
    cluster=CLUSTER,
    instance=INSTANCE,
    database=DATABASE,
    user=USER,
    password=PASSWORD,
)

创建 AlloyDBReader¶

在创建用于从 AlloyDB 获取数据的 AlloyDBReader 时，您有两种主要方式来指定要加载的数据：

使用 table_name 参数 - 当指定 table_name 参数时，您是在指示读取器从给定表中获取所有数据。
使用 query 参数 - 当指定 query 参数时，您可以提供自定义 SQL 查询来获取数据。这使您能够完全控制 SQL 查询，包括选择特定列、应用筛选条件、排序、表连接等操作。

使用 `table_name` 参数加载文档¶

通过默认表格加载文档¶

该读取器从表格中返回一个文档列表，其中第一列作为文本内容，其余所有列作为元数据。默认表格的第一列为文本内容，第二列为元数据（JSON格式）。每一行将转换为一个文档。

In [ ]:

Copied!





from llama_index_alloydb_pg import AlloyDBReader

# Creating a basic AlloyDBReader object
reader = await AlloyDBReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
)
from llama_index_alloydb_pg import AlloyDBReader

# Creating a basic AlloyDBReader object
reader = await AlloyDBReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
)

通过自定义表格/元数据或自定义页面内容列加载文档¶

In [ ]:

Copied!





reader = await AlloyDBReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
    content_columns=["product_name"],  # Optional
    metadata_columns=["id"],  # Optional
)
reader = await AlloyDBReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
    content_columns=["product_name"],  # Optional
    metadata_columns=["id"],  # Optional
)

通过 SQL 查询加载文档¶

query 参数允许用户指定自定义 SQL 查询，该查询可包含筛选条件以从数据库中加载特定文档。

In [ ]:

Copied!





table_name = "products"
content_columns = ["product_name", "description"]
metadata_columns = ["id", "content"]

reader = AlloyDBReader.create(
    engine=engine,
    query=f"SELECT * FROM {table_name};",
    content_columns=content_columns,
    metadata_columns=metadata_columns,
)
table_name = "products"
content_columns = ["product_name", "description"]
metadata_columns = ["id", "content"]

reader = AlloyDBReader.create(
    engine=engine,
    query=f"SELECT * FROM {table_name};",
    content_columns=content_columns,
    metadata_columns=metadata_columns,
)

注意：如果未指定 content_columns 和 metadata_columns，读取器将自动把第一个返回的列视为文档的 text 内容，其余所有列则作为 metadata 元数据处理。

设置页面内容格式¶

阅读器返回一个文档列表，每行对应一个文档，页面内容以指定的字符串格式呈现，例如文本（空格分隔的拼接形式）、JSON、YAML、CSV等。其中JSON和YAML格式包含字段标题，而文本和CSV格式不包含字段标题。

In [ ]:

Copied!





reader = await AlloyDBReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
    content_columns=["product_name", "description"],
    format="YAML",
)
reader = await AlloyDBReader.create(
    engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME,
    content_columns=["product_name", "description"],
    format="YAML",
)

加载文档¶

您可以选择以下两种方式加载文档：

一次性加载所有数据
延迟加载数据

一次性加载所有数据¶

In [ ]:

Copied!

docs = await reader.aload_data()

print(docs)
docs = await reader.aload_data()

print(docs)

延迟加载数据¶

In [ ]:

Copied!

docs_iterable = reader.alazy_load_data()

docs = []
async for doc in docs_iterable:
    docs.append(doc)

print(docs)
docs_iterable = reader.alazy_load_data()

docs = []
async for doc in docs_iterable:
    docs.append(doc)

print(docs)

Google AlloyDB for PostgreSQL - AlloyDBReader¶

准备工作¶

🦙 库安装¶

🔐 身份验证¶

☁ 设置您的 Google Cloud 项目¶

基本用法¶

设置 AlloyDB 数据库值¶

AlloyDBEngine 连接池¶

创建 AlloyDBReader¶

使用 table_name 参数加载文档¶

通过默认表格加载文档¶

通过自定义表格/元数据或自定义页面内容列加载文档¶

通过 SQL 查询加载文档¶

设置页面内容格式¶

加载文档¶

一次性加载所有数据¶

延迟加载数据¶

Google AlloyDB for PostgreSQL - `AlloyDBReader`¶

使用 `table_name` 参数加载文档¶