IBM Db2 向量存储与向量搜索¶
LlamaIndex 的 Db2 集成模块(llama-index-vector-stores-db2)为 IBM 关系型数据库 Db2 v12.1.2 及以上版本提供向量存储与向量搜索功能,采用 MIT 许可证分发。用户可直接使用现有实现,或根据特定需求进行定制。主要功能包括:
- 带元数据的向量存储
- 向量相似性搜索与筛选选项
- 支持 EUCLIDEN_DISTANCE(欧式距离)、DOT_PRODUCT(点积)、COSINE(余弦)、MANHATTAN_DISTANCE(曼哈顿距离)、HAMMING_DISTANCE(汉明距离)和 EUCLIDEAN_SQUARED(平方欧式距离)等距离度量标准
- 通过创建索引和近似最近邻搜索实现性能优化(即将推出)
使用 LlamaIndex 与 Db2 向量存储及搜索的先决条件¶
安装 llama-index-vector-stores-db2
包,这是用于 db2 LlamaIndex 向量存储及搜索的集成包。
In [ ]:
Copied!
# pip install llama-index-vector-stores-db2
# pip install llama-index-vector-stores-db2
连接至 Db2 向量存储库¶
以下示例代码将展示如何连接至 Db2 数据库。除上述依赖项外,您需要运行一个 Db2 数据库实例(版本需为 v12.1.2 或更高,该版本支持向量数据类型)。
In [ ]:
Copied!
import ibm_db
import ibm_db_dbi
database = ""
username = ""
password = ""
try:
connection = ibm_db_dbi.connect(database, username, password)
print("Connection successful!")
except Exception as e:
print("Connection failed!", e)
import ibm_db
import ibm_db_dbi
database = ""
username = ""
password = ""
try:
connection = ibm_db_dbi.connect(database, username, password)
print("Connection successful!")
except Exception as e:
print("Connection failed!", e)
导入必要的依赖项¶
In [ ]:
Copied!
from llama_index.core.schema import NodeRelationship, RelatedNodeInfo, TextNode
from llama_index.core.vector_stores.types import (
ExactMatchFilter,
MetadataFilters,
VectorStoreQuery,
)
from llama_index.vector_stores.db2 import base as db2llamavs
from llama_index.vector_stores.db2 import DB2LlamaVS, DistanceStrategy
from llama_index.core.schema import NodeRelationship, RelatedNodeInfo, TextNode
from llama_index.core.vector_stores.types import (
ExactMatchFilter,
MetadataFilters,
VectorStoreQuery,
)
from llama_index.vector_stores.db2 import base as db2llamavs
from llama_index.vector_stores.db2 import DB2LlamaVS, DistanceStrategy
加载文档¶
In [ ]:
Copied!
# Define a list of documents (These dummy examples are 4 random documents )
text_json_list = [
{
"text": "Db2 handles LOB data differently than other kinds of data. As a result, you sometimes need to take additional actions when you define LOB columns and insert the LOB data.",
"id_": "doc_1_2_P4",
"embedding": [1.0, 0.0],
"relationships": "test-0",
"metadata": {
"weight": 1.0,
"rank": "a",
"url": "https://www.ibm.com/docs/en/db2-for-zos/12?topic=programs-storing-lob-data-in-tables",
},
},
{
"text": "Introduced in Db2 13, SQL Data Insights brought artificial intelligence (AI) functionality to the Db2 for z/OS engine. It provided the capability to run SQL AI query to find valuable insights hidden in your Db2 data and help you make better business decisions.",
"id_": "doc_15.5.1_P1",
"embedding": [0.0, 1.0],
"relationships": "test-1",
"metadata": {
"weight": 2.0,
"rank": "c",
"url": "https://community.ibm.com/community/user/datamanagement/blogs/neena-cherian/2023/03/07/accelerating-db2-ai-queries-with-the-new-vector-pr",
},
},
{
"text": "Data structures are elements that are required to use DB2®. You can access and use these elements to organize your data. Examples of data structures include tables, table spaces, indexes, index spaces, keys, views, and databases.",
"id_": "id_22.3.4.3.1_P2",
"embedding": [1.0, 1.0],
"relationships": "test-2",
"metadata": {
"weight": 3.0,
"rank": "d",
"url": "https://www.ibm.com/docs/en/zos-basic-skills?topic=concepts-db2-data-structures",
},
},
{
"text": "DB2® maintains a set of tables that contain information about the data that DB2 controls. These tables are collectively known as the catalog. The catalog tables contain information about DB2 objects such as tables, views, and indexes. When you create, alter, or drop an object, DB2 inserts, updates, or deletes rows of the catalog that describe the object.",
"id_": "id_3.4.3.1_P3",
"embedding": [2.0, 1.0],
"relationships": "test-3",
"metadata": {
"weight": 4.0,
"rank": "e",
"url": "https://www.ibm.com/docs/en/zos-basic-skills?topic=objects-db2-catalog",
},
},
]
# Define a list of documents (These dummy examples are 4 random documents )
text_json_list = [
{
"text": "Db2 handles LOB data differently than other kinds of data. As a result, you sometimes need to take additional actions when you define LOB columns and insert the LOB data.",
"id_": "doc_1_2_P4",
"embedding": [1.0, 0.0],
"relationships": "test-0",
"metadata": {
"weight": 1.0,
"rank": "a",
"url": "https://www.ibm.com/docs/en/db2-for-zos/12?topic=programs-storing-lob-data-in-tables",
},
},
{
"text": "Introduced in Db2 13, SQL Data Insights brought artificial intelligence (AI) functionality to the Db2 for z/OS engine. It provided the capability to run SQL AI query to find valuable insights hidden in your Db2 data and help you make better business decisions.",
"id_": "doc_15.5.1_P1",
"embedding": [0.0, 1.0],
"relationships": "test-1",
"metadata": {
"weight": 2.0,
"rank": "c",
"url": "https://community.ibm.com/community/user/datamanagement/blogs/neena-cherian/2023/03/07/accelerating-db2-ai-queries-with-the-new-vector-pr",
},
},
{
"text": "Data structures are elements that are required to use DB2®. You can access and use these elements to organize your data. Examples of data structures include tables, table spaces, indexes, index spaces, keys, views, and databases.",
"id_": "id_22.3.4.3.1_P2",
"embedding": [1.0, 1.0],
"relationships": "test-2",
"metadata": {
"weight": 3.0,
"rank": "d",
"url": "https://www.ibm.com/docs/en/zos-basic-skills?topic=concepts-db2-data-structures",
},
},
{
"text": "DB2® maintains a set of tables that contain information about the data that DB2 controls. These tables are collectively known as the catalog. The catalog tables contain information about DB2 objects such as tables, views, and indexes. When you create, alter, or drop an object, DB2 inserts, updates, or deletes rows of the catalog that describe the object.",
"id_": "id_3.4.3.1_P3",
"embedding": [2.0, 1.0],
"relationships": "test-3",
"metadata": {
"weight": 4.0,
"rank": "e",
"url": "https://www.ibm.com/docs/en/zos-basic-skills?topic=objects-db2-catalog",
},
},
]
In [ ]:
Copied!
# Create Llama Text Nodes
text_nodes = []
for text_json in text_json_list:
# Construct the relationships using RelatedNodeInfo
relationships = {
NodeRelationship.SOURCE: RelatedNodeInfo(
node_id=text_json["relationships"]
)
}
# Prepare the metadata dictionary; you might want to exclude certain metadata fields if necessary
metadata = {
"weight": text_json["metadata"]["weight"],
"rank": text_json["metadata"]["rank"],
}
# Create a TextNode instance
text_node = TextNode(
text=text_json["text"],
id_=text_json["id_"],
embedding=text_json["embedding"],
relationships=relationships,
metadata=metadata,
)
text_nodes.append(text_node)
print(text_nodes)
# Create Llama Text Nodes
text_nodes = []
for text_json in text_json_list:
# Construct the relationships using RelatedNodeInfo
relationships = {
NodeRelationship.SOURCE: RelatedNodeInfo(
node_id=text_json["relationships"]
)
}
# Prepare the metadata dictionary; you might want to exclude certain metadata fields if necessary
metadata = {
"weight": text_json["metadata"]["weight"],
"rank": text_json["metadata"]["rank"],
}
# Create a TextNode instance
text_node = TextNode(
text=text_json["text"],
id_=text_json["id_"],
embedding=text_json["embedding"],
relationships=relationships,
metadata=metadata,
)
text_nodes.append(text_node)
print(text_nodes)
使用 AI 向量搜索创建具有不同距离策略的向量存储集¶
首先我们将创建三个向量存储,每个采用不同的距离函数。
您可以手动连接 Db2 数据库,此时会看到三个表: Documents_DOT(点积)、Documents_COSINE(余弦)和 Documents_EUCLIDEAN(欧氏距离)。
In [ ]:
Copied!
# Ingest documents into Db2 Vector Store using different distance strategies
vector_store_dot = DB2LlamaVS.from_documents(
text_nodes,
table_name="Documents_DOT",
client=connection,
distance_strategy=DistanceStrategy.DOT_PRODUCT,
embed_dim=2,
)
vector_store_max = DB2LlamaVS.from_documents(
text_nodes,
table_name="Documents_COSINE",
client=connection,
distance_strategy=DistanceStrategy.COSINE,
embed_dim=2,
)
vector_store_euclidean = DB2LlamaVS.from_documents(
text_nodes,
table_name="Documents_EUCLIDEAN",
client=connection,
distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
embed_dim=2,
)
# Ingest documents into Db2 Vector Store using different distance strategies
vector_store_dot = DB2LlamaVS.from_documents(
text_nodes,
table_name="Documents_DOT",
client=connection,
distance_strategy=DistanceStrategy.DOT_PRODUCT,
embed_dim=2,
)
vector_store_max = DB2LlamaVS.from_documents(
text_nodes,
table_name="Documents_COSINE",
client=connection,
distance_strategy=DistanceStrategy.COSINE,
embed_dim=2,
)
vector_store_euclidean = DB2LlamaVS.from_documents(
text_nodes,
table_name="Documents_EUCLIDEAN",
client=connection,
distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
embed_dim=2,
)
演示文本的增删操作及基础相似性搜索¶
In [ ]:
Copied!
def manage_texts(vector_stores):
for i, vs in enumerate(vector_stores, start=1):
# Adding texts
try:
vs.add_texts(text_nodes, metadata)
print(f"\n\n\nAdd texts complete for vector store {i}\n\n\n")
except Exception as ex:
print(
f"\n\n\nExpected error on duplicate add for vector store {i}\n\n\n"
)
# Deleting texts using the value of 'doc_id'
vs.delete("test-1")
print(f"\n\n\nDelete texts complete for vector store {i}\n\n\n")
# Similarity search
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], similarity_top_k=3
)
results = vs.query(query=query)
print(
f"\n\n\nSimilarity search results for vector store {i}: {results}\n\n\n"
)
vector_store_list = [
vector_store_dot,
vector_store_max,
vector_store_euclidean,
]
manage_texts(vector_store_list)
def manage_texts(vector_stores):
for i, vs in enumerate(vector_stores, start=1):
# Adding texts
try:
vs.add_texts(text_nodes, metadata)
print(f"\n\n\nAdd texts complete for vector store {i}\n\n\n")
except Exception as ex:
print(
f"\n\n\nExpected error on duplicate add for vector store {i}\n\n\n"
)
# Deleting texts using the value of 'doc_id'
vs.delete("test-1")
print(f"\n\n\nDelete texts complete for vector store {i}\n\n\n")
# Similarity search
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], similarity_top_k=3
)
results = vs.query(query=query)
print(
f"\n\n\nSimilarity search results for vector store {i}: {results}\n\n\n"
)
vector_store_list = [
vector_store_dot,
vector_store_max,
vector_store_euclidean,
]
manage_texts(vector_store_list)
现在我们将对所有三个向量存储执行一系列高级搜索。¶
In [ ]:
Copied!
def conduct_advanced_searches(vector_stores):
for i, vs in enumerate(vector_stores, start=1):
def query_without_filters_returns_all_rows_sorted_by_similarity():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search without a filter
print("\nSimilarity search results without filter:")
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], similarity_top_k=3
)
print(vs.query(query=query))
query_without_filters_returns_all_rows_sorted_by_similarity()
def query_with_filters_returns_multiple_matches():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with filter
print("\nSimilarity search results with filter:")
filters = MetadataFilters(
filters=[ExactMatchFilter(key="rank", value="c")]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters, similarity_top_k=3
)
result = vs.query(query)
print(result.ids)
query_with_filters_returns_multiple_matches()
def query_with_filter_applies_top_k():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with a filter
print("\nSimilarity search results with top k filter:")
filters = MetadataFilters(
filters=[ExactMatchFilter(key="rank", value="c")]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters, similarity_top_k=1
)
result = vs.query(query)
print(result.ids)
query_with_filter_applies_top_k()
def query_with_filter_applies_node_id_filter():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with a filter
print("\nSimilarity search results with node_id filter:")
filters = MetadataFilters(
filters=[ExactMatchFilter(key="rank", value="c")]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0],
filters=filters,
similarity_top_k=3,
node_ids=["452D24AB-F185-414C-A352-590B4B9EE51B"],
)
result = vs.query(query)
print(result.ids)
query_with_filter_applies_node_id_filter()
def query_with_exact_filters_returns_single_match():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with a filter
print("\nSimilarity search results with filter:")
filters = MetadataFilters(
filters=[
ExactMatchFilter(key="rank", value="c"),
ExactMatchFilter(key="weight", value=2),
]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters
)
result = vs.query(query)
print(result.ids)
query_with_exact_filters_returns_single_match()
def query_with_contradictive_filter_returns_no_matches():
filters = MetadataFilters(
filters=[
ExactMatchFilter(key="weight", value=2),
ExactMatchFilter(key="weight", value=3),
]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters
)
result = vs.query(query)
print(result.ids)
query_with_contradictive_filter_returns_no_matches()
def query_with_filter_on_unknown_field_returns_no_matches():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with a filter
print("\nSimilarity search results with filter:")
filters = MetadataFilters(
filters=[ExactMatchFilter(key="unknown_field", value="c")]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters
)
result = vs.query(query)
print(result.ids)
query_with_filter_on_unknown_field_returns_no_matches()
def delete_removes_document_from_query_results():
vs.delete("test-1")
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], similarity_top_k=2
)
result = vs.query(query)
print(result.ids)
delete_removes_document_from_query_results()
conduct_advanced_searches(vector_store_list)
def conduct_advanced_searches(vector_stores):
for i, vs in enumerate(vector_stores, start=1):
def query_without_filters_returns_all_rows_sorted_by_similarity():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search without a filter
print("\nSimilarity search results without filter:")
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], similarity_top_k=3
)
print(vs.query(query=query))
query_without_filters_returns_all_rows_sorted_by_similarity()
def query_with_filters_returns_multiple_matches():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with filter
print("\nSimilarity search results with filter:")
filters = MetadataFilters(
filters=[ExactMatchFilter(key="rank", value="c")]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters, similarity_top_k=3
)
result = vs.query(query)
print(result.ids)
query_with_filters_returns_multiple_matches()
def query_with_filter_applies_top_k():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with a filter
print("\nSimilarity search results with top k filter:")
filters = MetadataFilters(
filters=[ExactMatchFilter(key="rank", value="c")]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters, similarity_top_k=1
)
result = vs.query(query)
print(result.ids)
query_with_filter_applies_top_k()
def query_with_filter_applies_node_id_filter():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with a filter
print("\nSimilarity search results with node_id filter:")
filters = MetadataFilters(
filters=[ExactMatchFilter(key="rank", value="c")]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0],
filters=filters,
similarity_top_k=3,
node_ids=["452D24AB-F185-414C-A352-590B4B9EE51B"],
)
result = vs.query(query)
print(result.ids)
query_with_filter_applies_node_id_filter()
def query_with_exact_filters_returns_single_match():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with a filter
print("\nSimilarity search results with filter:")
filters = MetadataFilters(
filters=[
ExactMatchFilter(key="rank", value="c"),
ExactMatchFilter(key="weight", value=2),
]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters
)
result = vs.query(query)
print(result.ids)
query_with_exact_filters_returns_single_match()
def query_with_contradictive_filter_returns_no_matches():
filters = MetadataFilters(
filters=[
ExactMatchFilter(key="weight", value=2),
ExactMatchFilter(key="weight", value=3),
]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters
)
result = vs.query(query)
print(result.ids)
query_with_contradictive_filter_returns_no_matches()
def query_with_filter_on_unknown_field_returns_no_matches():
print(f"\n--- Vector Store {i} Advanced Searches ---")
# Similarity search with a filter
print("\nSimilarity search results with filter:")
filters = MetadataFilters(
filters=[ExactMatchFilter(key="unknown_field", value="c")]
)
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], filters=filters
)
result = vs.query(query)
print(result.ids)
query_with_filter_on_unknown_field_returns_no_matches()
def delete_removes_document_from_query_results():
vs.delete("test-1")
query = VectorStoreQuery(
query_embedding=[1.0, 1.0], similarity_top_k=2
)
result = vs.query(query)
print(result.ids)
delete_removes_document_from_query_results()
conduct_advanced_searches(vector_store_list)