Google AlloyDB for PostgreSQL - `AlloyDBDocumentStore` 与 `AlloyDBIndexStore`¶

AlloyDB 是一款全托管关系型数据库服务，提供高性能、无缝集成和卓越的可扩展性。AlloyDB 100% 兼容 PostgreSQL。通过 AlloyDB 与 LlamaIndex 的集成，扩展您的数据库应用以构建 AI 驱动的体验。

本指南将介绍如何使用 AlloyDB for PostgreSQL 通过 AlloyDBDocumentStore 和 AlloyDBIndexStore 类来存储文档和索引。

在 GitHub 上了解更多关于该包的信息。

准备工作¶

运行此笔记本前，您需要完成以下步骤：

🦙 库安装¶

安装集成库 llama-index-alloydb-pg 以及嵌入服务库 llama-index-embeddings-vertex。

In [ ]:

Copied!

%pip install --upgrade --quiet llama-index-alloydb-pg llama-index-llms-vertex llama-index
%pip install --upgrade --quiet llama-index-alloydb-pg llama-index-llms-vertex llama-index

仅限 Colab 使用： 如需重启内核，请取消注释以下代码单元，或使用重启内核按钮。对于 Vertex AI Workbench 用户，可通过顶部的按钮重启终端。

In [ ]:

Copied!

# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

🔐 身份验证¶

请以当前登录此笔记本的 IAM 用户身份向 Google Cloud 进行认证，以便访问您的 Google Cloud 项目。

若使用 Colab 运行此笔记本，请执行下方单元格并继续操作。
若使用 Vertex AI Workbench，请查阅此处的环境设置说明。

In [ ]:

Copied!

from google.colab import auth

auth.authenticate_user()
from google.colab import auth

auth.authenticate_user()

☁ 设置您的 Google Cloud 项目¶

设置您的 Google Cloud 项目以便在此笔记本中使用 Google Cloud 资源。

若不清楚项目 ID，可尝试以下方法：

运行 gcloud config list 命令
运行 gcloud projects list 命令
查看支持页面：查找项目 ID

In [ ]:

Copied!

# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "my-project-id"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}
# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "my-project-id"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}

基本用法¶

设置 AlloyDB 数据库参数¶

在 AlloyDB 实例页面查找您的数据库参数。

In [ ]:

Copied!





# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
CLUSTER = "my-cluster"  # @param {type: "string"}
INSTANCE = "my-primary"  # @param {type: "string"}
DATABASE = "my-database"  # @param {type: "string"}
TABLE_NAME = "document_store"  # @param {type: "string"}
USER = "postgres"  # @param {type: "string"}
PASSWORD = "my-password"  # @param {type: "string"}
# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
CLUSTER = "my-cluster"  # @param {type: "string"}
INSTANCE = "my-primary"  # @param {type: "string"}
DATABASE = "my-database"  # @param {type: "string"}
TABLE_NAME = "document_store"  # @param {type: "string"}
USER = "postgres"  # @param {type: "string"}
PASSWORD = "my-password"  # @param {type: "string"}

AlloyDBEngine 连接池¶

将 AlloyDB 建立为文档存储库时，核心需求及参数之一便是 AlloyDBEngine 对象。该对象会为您的 AlloyDB 数据库配置连接池，既确保应用程序能成功建立连接，又遵循行业最佳实践。

通过 AlloyDBEngine.from_instance() 创建连接池时，仅需提供以下 5 项参数：

project_id : AlloyDB 实例所在的 Google Cloud 项目 ID
region : AlloyDB 实例所在的区域
cluster : AlloyDB 集群名称
instance : AlloyDB 实例名称
database : 目标 AlloyDB 实例上要连接的数据库名称

默认情况下，系统会采用 IAM 数据库身份验证作为认证方式。本库会使用从环境中获取的应用默认凭据 (ADC) 所关联的 IAM 主体。

您也可以选择使用内置数据库身份验证，通过用户名和密码访问 AlloyDB 数据库。只需向 AlloyDBEngine.from_instance() 提供以下可选参数：

user : 用于内置数据库身份验证的数据库用户名
password : 用于内置数据库身份验证的数据库密码

注意： 本教程演示异步接口。所有异步方法均有对应的同步方法。

In [ ]:

Copied!





from llama_index_alloydb_pg import AlloyDBEngine

engine = await AlloyDBEngine.afrom_instance(
    project_id=PROJECT_ID,
    region=REGION,
    cluster=CLUSTER,
    instance=INSTANCE,
    database=DATABASE,
    user=USER,
    password=PASSWORD,
)
from llama_index_alloydb_pg import AlloyDBEngine

engine = await AlloyDBEngine.afrom_instance(
    project_id=PROJECT_ID,
    region=REGION,
    cluster=CLUSTER,
    instance=INSTANCE,
    database=DATABASE,
    user=USER,
    password=PASSWORD,
)

AlloyDB Omni 的 AlloyDBEngine¶

要为 AlloyDB Omni 创建 AlloyDBEngine，您需要一个连接 URL。AlloyDBEngine.from_connection_string 会先创建一个异步引擎，然后将其转换为 AlloyDBEngine。以下是使用 asyncpg 驱动程序的连接示例：

In [ ]:

Copied!





# Replace with your own AlloyDB Omni info
OMNI_USER = "my-omni-user"
OMNI_PASSWORD = ""
OMNI_HOST = "127.0.0.1"
OMNI_PORT = "5432"
OMNI_DATABASE = "my-omni-db"

connstring = f"postgresql+asyncpg://{OMNI_USER}:{OMNI_PASSWORD}@{OMNI_HOST}:{OMNI_PORT}/{OMNI_DATABASE}"
engine = AlloyDBEngine.from_connection_string(connstring)
# Replace with your own AlloyDB Omni info
OMNI_USER = "my-omni-user"
OMNI_PASSWORD = ""
OMNI_HOST = "127.0.0.1"
OMNI_PORT = "5432"
OMNI_DATABASE = "my-omni-db"

connstring = f"postgresql+asyncpg://{OMNI_USER}:{OMNI_PASSWORD}@{OMNI_HOST}:{OMNI_PORT}/{OMNI_DATABASE}"
engine = AlloyDBEngine.from_connection_string(connstring)

初始化数据表¶

AlloyDBDocumentStore 类需要一个数据库表。AlloyDBEngine 引擎提供了一个辅助方法 init_doc_store_table()，可用于创建符合规范结构的表。

In [ ]:

Copied!

await engine.ainit_doc_store_table(
    table_name=TABLE_NAME,
)
await engine.ainit_doc_store_table(
    table_name=TABLE_NAME,
)

可选提示：💡¶

您还可以在传递 table_name 的任何地方，通过传递 schema_name 来指定模式名称。

In [ ]:

Copied!





SCHEMA_NAME = "my_schema"

await engine.ainit_doc_store_table(
    table_name=TABLE_NAME,
    schema_name=SCHEMA_NAME,
)
SCHEMA_NAME = "my_schema"

await engine.ainit_doc_store_table(
    table_name=TABLE_NAME,
    schema_name=SCHEMA_NAME,
)

初始化默认的 AlloyDBDocumentStore¶

In [ ]:

Copied!





from llama_index_alloydb_pg import AlloyDBDocumentStore

doc_store = await AlloyDBDocumentStore.create(
    engine=engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME
)
from llama_index_alloydb_pg import AlloyDBDocumentStore

doc_store = await AlloyDBDocumentStore.create(
    engine=engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME
)

下载数据¶

In [ ]:

Copied!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载文档¶

In [ ]:

Copied!

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)

解析为节点¶

In [ ]:

Copied!

from llama_index.core.node_parser import SentenceSplitter

nodes = SentenceSplitter().get_nodes_from_documents(documents)
from llama_index.core.node_parser import SentenceSplitter

nodes = SentenceSplitter().get_nodes_from_documents(documents)

设置索引存储库¶

In [ ]:

Copied!





from llama_index_alloydb_pg import AlloyDBIndexStore


INDEX_TABLE_NAME = "index_store"
await engine.ainit_index_store_table(
    table_name=INDEX_TABLE_NAME,
)

index_store = await AlloyDBIndexStore.create(
    engine=engine,
    table_name=INDEX_TABLE_NAME,
    # schema_name=SCHEMA_NAME
)
from llama_index_alloydb_pg import AlloyDBIndexStore


INDEX_TABLE_NAME = "index_store"
await engine.ainit_index_store_table(
    table_name=INDEX_TABLE_NAME,
)

index_store = await AlloyDBIndexStore.create(
    engine=engine,
    table_name=INDEX_TABLE_NAME,
    # schema_name=SCHEMA_NAME
)

添加到文档存储¶

In [ ]:

Copied!

from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(
    docstore=doc_store, index_store=index_store
)

storage_context.docstore.add_documents(nodes)
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(
    docstore=doc_store, index_store=index_store
)

storage_context.docstore.add_documents(nodes)

与索引结合使用¶

文档存储（Document Store）支持与多个索引共同使用。所有索引共享相同的底层节点。

In [ ]:

Copied!





from llama_index.core import Settings, SimpleKeywordTableIndex, SummaryIndex
from llama_index.llms.vertex import Vertex

Settings.llm = Vertex(model="gemini-1.5-flash", project=PROJECT_ID)
summary_index = SummaryIndex(nodes, storage_context=storage_context)
keyword_table_index = SimpleKeywordTableIndex(
    nodes, storage_context=storage_context
)
from llama_index.core import Settings, SimpleKeywordTableIndex, SummaryIndex
from llama_index.llms.vertex import Vertex

Settings.llm = Vertex(model="gemini-1.5-flash", project=PROJECT_ID)
summary_index = SummaryIndex(nodes, storage_context=storage_context)
keyword_table_index = SimpleKeywordTableIndex(
    nodes, storage_context=storage_context
)

查询索引¶

In [ ]:

Copied!

query_engine = summary_index.as_query_engine()
response = query_engine.query("What did the author do?")
print(response)
query_engine = summary_index.as_query_engine()
response = query_engine.query("What did the author do?")
print(response)

加载现有索引¶

文档存储库支持与多个索引配合使用。每个索引共享相同的底层节点。

In [ ]:

Copied!

# note down index IDs
list_id = summary_index.index_id
keyword_id = keyword_table_index.index_id
# note down index IDs
list_id = summary_index.index_id
keyword_id = keyword_table_index.index_id

In [ ]:

Copied!





from llama_index.core import load_index_from_storage

# re-create storage context
storage_context = StorageContext.from_defaults(
    docstore=doc_store, index_store=index_store
)

# load indices
summary_index = load_index_from_storage(
    storage_context=storage_context, index_id=list_id
)
keyword_table_index = load_index_from_storage(
    storage_context=storage_context, index_id=keyword_id
)
from llama_index.core import load_index_from_storage

# re-create storage context
storage_context = StorageContext.from_defaults(
    docstore=doc_store, index_store=index_store
)

# load indices
summary_index = load_index_from_storage(
    storage_context=storage_context, index_id=list_id
)
keyword_table_index = load_index_from_storage(
    storage_context=storage_context, index_id=keyword_id
)

Google AlloyDB for PostgreSQL - AlloyDBDocumentStore 与 AlloyDBIndexStore¶

准备工作¶

🦙 库安装¶

🔐 身份验证¶

☁ 设置您的 Google Cloud 项目¶

基本用法¶

设置 AlloyDB 数据库参数¶

AlloyDBEngine 连接池¶

AlloyDB Omni 的 AlloyDBEngine¶

初始化数据表¶

可选提示：💡¶

初始化默认的 AlloyDBDocumentStore¶

下载数据¶

加载文档¶

解析为节点¶

设置索引存储库¶

添加到文档存储¶

与索引结合使用¶

查询索引¶

加载现有索引¶

Google AlloyDB for PostgreSQL - `AlloyDBDocumentStore` 与 `AlloyDBIndexStore`¶