Google Cloud SQL for PostgreSQL - `PostgresDocumentStore` 与 `PostgresIndexStore`¶

Cloud SQL 是一项全托管式关系型数据库服务，提供高性能、无缝集成和卓越的可扩展性。它支持 MySQL、PostgreSQL 和 SQL Server 数据库引擎。通过 Cloud SQL 与 LlamaIndex 的集成，可扩展您的数据库应用程序以构建支持 AI 功能的体验。

本笔记本将介绍如何使用 Cloud SQL for PostgreSQL 通过 PostgresDocumentStore 和 PostgresIndexStore 类来存储文档和索引。

在 GitHub 上了解更多关于该包的信息。

准备工作¶

运行此笔记本前，您需要完成以下步骤：

🦙 库安装¶

安装集成库 llama-index-cloud-sql-pg 以及嵌入服务库 llama-index-embeddings-vertex。

In [ ]:

Copied!

%pip install --upgrade --quiet llama-index-cloud-sql-pg llama-index-llms-vertex llama-index
%pip install --upgrade --quiet llama-index-cloud-sql-pg llama-index-llms-vertex llama-index

仅限 Colab 使用： 如需重启内核，请取消注释以下代码单元，或使用重启内核按钮。对于 Vertex AI Workbench 用户，可通过顶部的按钮重启终端。

In [ ]:

Copied!

# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

🔐 身份验证¶

请以当前登录此笔记本的 IAM 用户身份向 Google Cloud 进行认证，以便访问您的 Google Cloud 项目。

若使用 Colab 运行此笔记本，请执行下方单元格并继续操作。
若使用 Vertex AI Workbench，请查阅此处的环境设置说明。

In [ ]:

Copied!

from google.colab import auth

auth.authenticate_user()
from google.colab import auth

auth.authenticate_user()

☁ 设置您的 Google Cloud 项目¶

设置您的 Google Cloud 项目以便在此笔记本中使用 Google Cloud 资源。

若不清楚项目 ID，可尝试以下方法：

运行 gcloud config list 命令
运行 gcloud projects list 命令
查看支持页面：查找项目 ID

In [ ]:

Copied!

# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "my-project-id"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}
# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "my-project-id"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}

基本用法¶

设置 Cloud SQL 数据库参数¶

在 Cloud SQL 实例页面中查找您的数据库参数。

In [ ]:

Copied!





# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
INSTANCE = "my-primary"  # @param {type: "string"}
DATABASE = "my-database"  # @param {type: "string"}
TABLE_NAME = "document_store"  # @param {type: "string"}
USER = "postgres"  # @param {type: "string"}
PASSWORD = "my-password"  # @param {type: "string"}
# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
INSTANCE = "my-primary"  # @param {type: "string"}
DATABASE = "my-database"  # @param {type: "string"}
TABLE_NAME = "document_store"  # @param {type: "string"}
USER = "postgres"  # @param {type: "string"}
PASSWORD = "my-password"  # @param {type: "string"}

PostgresEngine 连接池¶

将 Cloud SQL 设为向量存储的要求和参数之一是 PostgresEngine 对象。PostgresEngine 会为您的 Cloud SQL 数据库配置连接池，确保应用程序能够成功连接并遵循行业最佳实践。

要通过 PostgresEngine.from_instance() 创建 PostgresEngine，您只需提供以下 4 项信息：

project_id：Cloud SQL 实例所在的 Google Cloud 项目 ID。
region：Cloud SQL 实例所在的区域。
instance：Cloud SQL 实例的名称。
database：要连接的 Cloud SQL 实例上的数据库名称。

默认情况下，系统会使用 IAM 数据库身份验证作为数据库认证方式。该库使用从环境中获取的应用默认凭据 (ADC) 所属的 IAM 主体。

有关 IAM 数据库身份验证的更多信息，请参阅：

您也可以选择使用内置数据库身份验证，通过用户名和密码访问 Cloud SQL 数据库。只需向 PostgresEngine.from_instance() 提供可选的 user 和 password 参数即可：

user：用于内置数据库身份验证和登录的数据库用户
password：用于内置数据库身份验证和登录的数据库密码

注意： 本教程演示异步接口。所有异步方法都有对应的同步方法。

In [ ]:

Copied!





from llama_index_cloud_sql_pg import PostgresEngine

engine = await PostgresEngine.afrom_instance(
    project_id=PROJECT_ID,
    region=REGION,
    instance=INSTANCE,
    database=DATABASE,
    user=USER,
    password=PASSWORD,
)
from llama_index_cloud_sql_pg import PostgresEngine

engine = await PostgresEngine.afrom_instance(
    project_id=PROJECT_ID,
    region=REGION,
    instance=INSTANCE,
    database=DATABASE,
    user=USER,
    password=PASSWORD,
)

初始化数据表¶

PostgresDocumentStore 类需要一个数据库表。PostgresEngine 引擎提供了一个辅助方法 init_doc_store_table()，可用于创建符合规范结构的表。

In [ ]:

Copied!

await engine.ainit_doc_store_table(
    table_name=TABLE_NAME,
)
await engine.ainit_doc_store_table(
    table_name=TABLE_NAME,
)

可选提示：💡¶

您还可以在传递 table_name 的任何地方，通过传递 schema_name 来指定模式名称。

In [ ]:

Copied!





SCHEMA_NAME = "my_schema"

await engine.ainit_doc_store_table(
    table_name=TABLE_NAME,
    schema_name=SCHEMA_NAME,
)
SCHEMA_NAME = "my_schema"

await engine.ainit_doc_store_table(
    table_name=TABLE_NAME,
    schema_name=SCHEMA_NAME,
)

初始化默认的 PostgresDocumentStore¶

In [ ]:

Copied!





from llama_index_cloud_sql_pg import PostgresDocumentStore

doc_store = await PostgresDocumentStore.create(
    engine=engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME
)
from llama_index_cloud_sql_pg import PostgresDocumentStore

doc_store = await PostgresDocumentStore.create(
    engine=engine,
    table_name=TABLE_NAME,
    # schema_name=SCHEMA_NAME
)

下载数据¶

In [ ]:

Copied!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

加载文档¶

In [ ]:

Copied!

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)

解析为节点¶

In [ ]:

Copied!

from llama_index.core.node_parser import SentenceSplitter

nodes = SentenceSplitter().get_nodes_from_documents(documents)
from llama_index.core.node_parser import SentenceSplitter

nodes = SentenceSplitter().get_nodes_from_documents(documents)

设置索引存储库¶

In [ ]:

Copied!





from llama_index_cloud_sql_pg import PostgresIndexStore


INDEX_TABLE_NAME = "index_store"
await engine.ainit_index_store_table(
    table_name=INDEX_TABLE_NAME,
)

index_store = await PostgresIndexStore.create(
    engine=engine,
    table_name=INDEX_TABLE_NAME,
    # schema_name=SCHEMA_NAME
)
from llama_index_cloud_sql_pg import PostgresIndexStore


INDEX_TABLE_NAME = "index_store"
await engine.ainit_index_store_table(
    table_name=INDEX_TABLE_NAME,
)

index_store = await PostgresIndexStore.create(
    engine=engine,
    table_name=INDEX_TABLE_NAME,
    # schema_name=SCHEMA_NAME
)

添加到文档存储¶

In [ ]:

Copied!

from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(
    docstore=doc_store, index_store=index_store
)

storage_context.docstore.add_documents(nodes)
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(
    docstore=doc_store, index_store=index_store
)

storage_context.docstore.add_documents(nodes)

与索引配合使用¶

文档存储（Document Store）可与多个索引协同工作。所有索引共享相同的底层节点。

In [ ]:

Copied!





from llama_index.core import Settings, SimpleKeywordTableIndex, SummaryIndex
from llama_index.llms.vertex import Vertex

Settings.llm = Vertex(model="gemini-1.5-flash", project=PROJECT_ID)
summary_index = SummaryIndex(nodes, storage_context=storage_context)
keyword_table_index = SimpleKeywordTableIndex(
    nodes, storage_context=storage_context
)
from llama_index.core import Settings, SimpleKeywordTableIndex, SummaryIndex
from llama_index.llms.vertex import Vertex

Settings.llm = Vertex(model="gemini-1.5-flash", project=PROJECT_ID)
summary_index = SummaryIndex(nodes, storage_context=storage_context)
keyword_table_index = SimpleKeywordTableIndex(
    nodes, storage_context=storage_context
)

查询索引¶

In [ ]:

Copied!

query_engine = summary_index.as_query_engine()
response = query_engine.query("What did the author do?")
print(response)
query_engine = summary_index.as_query_engine()
response = query_engine.query("What did the author do?")
print(response)

加载现有索引¶

文档存储库支持使用多个索引。所有索引共享相同的底层节点。

In [ ]:

Copied!

# note down index IDs
list_id = summary_index.index_id
keyword_id = keyword_table_index.index_id
# note down index IDs
list_id = summary_index.index_id
keyword_id = keyword_table_index.index_id

In [ ]:

Copied!





from llama_index.core import load_index_from_storage

# re-create storage context
storage_context = StorageContext.from_defaults(
    docstore=doc_store, index_store=index_store
)

# load indices
summary_index = load_index_from_storage(
    storage_context=storage_context, index_id=list_id
)
keyword_table_index = load_index_from_storage(
    storage_context=storage_context, index_id=keyword_id
)
from llama_index.core import load_index_from_storage

# re-create storage context
storage_context = StorageContext.from_defaults(
    docstore=doc_store, index_store=index_store
)

# load indices
summary_index = load_index_from_storage(
    storage_context=storage_context, index_id=list_id
)
keyword_table_index = load_index_from_storage(
    storage_context=storage_context, index_id=keyword_id
)

Google Cloud SQL for PostgreSQL - PostgresDocumentStore 与 PostgresIndexStore¶

准备工作¶

🦙 库安装¶

🔐 身份验证¶

☁ 设置您的 Google Cloud 项目¶

基本用法¶

设置 Cloud SQL 数据库参数¶

PostgresEngine 连接池¶

初始化数据表¶

可选提示：💡¶

初始化默认的 PostgresDocumentStore¶

下载数据¶

加载文档¶

解析为节点¶

设置索引存储库¶

添加到文档存储¶

与索引配合使用¶

查询索引¶

加载现有索引¶

Google Cloud SQL for PostgreSQL - `PostgresDocumentStore` 与 `PostgresIndexStore`¶