OceanBase 向量存储¶

OceanBase 数据库是一款分布式关系型数据库，由蚂蚁集团完全自主研发。该数据库构建在普通服务器集群之上，基于 Paxos 协议及其分布式架构，提供高可用性和线性扩展能力。OceanBase 数据库不依赖于特定的硬件架构。

本指南详细介绍了如何在 LlamaIndex 中使用 OceanBase 向量存储功能。

安装¶

In [ ]:

Copied!





%pip install llama-index-vector-stores-oceanbase
%pip install llama-index
# choose dashscope as embedding and llm model, your can also use default openai or other model to test
%pip install llama-index-embeddings-dashscope
%pip install llama-index-llms-dashscope
%pip install llama-index-vector-stores-oceanbase
%pip install llama-index
# choose dashscope as embedding and llm model, your can also use default openai or other model to test
%pip install llama-index-embeddings-dashscope
%pip install llama-index-llms-dashscope

使用 Docker 部署独立 OceanBase 服务器¶

In [ ]:

Copied!

%docker run --name=ob433 -e MODE=slim -p 2881:2881 -d oceanbase/oceanbase-ce:4.3.3.0-100000142024101215
%docker run --name=ob433 -e MODE=slim -p 2881:2881 -d oceanbase/oceanbase-ce:4.3.3.0-100000142024101215

创建 ObVecClient¶

In [ ]:

Copied!





from pyobvector import ObVecClient

client = ObVecClient()
client.perform_raw_text_sql(
    "ALTER SYSTEM ob_vector_memory_limit_percentage = 30"
)
from pyobvector import ObVecClient

client = ObVecClient()
client.perform_raw_text_sql(
    "ALTER SYSTEM ob_vector_memory_limit_percentage = 30"
)

配置 dashscope 嵌入模型与大语言模型。

In [ ]:

Copied!





# set Embbeding model
import os
from llama_index.core import Settings
from llama_index.embeddings.dashscope import DashScopeEmbedding

# Global Settings
Settings.embed_model = DashScopeEmbedding()

# config llm model
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels

dashscope_llm = DashScope(
    model_name=DashScopeGenerationModels.QWEN_MAX,
    api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
)
# set Embbeding model
import os
from llama_index.core import Settings
from llama_index.embeddings.dashscope import DashScopeEmbedding

# Global Settings
Settings.embed_model = DashScopeEmbedding()

# config llm model
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels

dashscope_llm = DashScope(
    model_name=DashScopeGenerationModels.QWEN_MAX,
    api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
)

加载文档¶

In [ ]:

Copied!





from llama_index.core import (
    SimpleDirectoryReader,
    load_index_from_storage,
    VectorStoreIndex,
    StorageContext,
)
from llama_index.vector_stores.oceanbase import OceanBaseVectorStore
from llama_index.core import (
    SimpleDirectoryReader,
    load_index_from_storage,
    VectorStoreIndex,
    StorageContext,
)
from llama_index.vector_stores.oceanbase import OceanBaseVectorStore

下载数据与加载数据

In [ ]:

Copied!

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [ ]:

Copied!

# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

In [ ]:

Copied!





oceanbase = OceanBaseVectorStore(
    client=client,
    dim=1536,
    drop_old=True,
    normalize=True,
)

storage_context = StorageContext.from_defaults(vector_store=oceanbase)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)
oceanbase = OceanBaseVectorStore(
    client=client,
    dim=1536,
    drop_old=True,
    normalize=True,
)

storage_context = StorageContext.from_defaults(vector_store=oceanbase)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

查询索引¶

In [ ]:

Copied!





# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine(llm=dashscope_llm)
res = query_engine.query("What did the author do growing up?")
res.response
# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine(llm=dashscope_llm)
res = query_engine.query("What did the author do growing up?")
res.response

Out[ ]:

'Growing up, the author worked on two main activities outside of school: writing and programming. They wrote short stories, which they admits were not particularly good, lacking plot but containing characters with strong emotions. They also started programming at a young age, initially on an IBM 1401 computer using an early version of Fortran, though they found it challenging due to the limitations of punch card input and their lack of data to process. Their programming journey真正 took off when microcomputers became available, allowing them to write more interactive programs such as games, a rocket flight predictor, and a simple word processor.'

元数据过滤¶

OceanBase 向量存储支持在查询时使用 =、>、<、!=、>=、<=、in、not in、like、IS NULL 等形式的元数据过滤。

In [ ]:

Copied!





from llama_index.core.vector_stores import (
    MetadataFilters,
    MetadataFilter,
)

query_engine = index.as_query_engine(
    llm=dashscope_llm,
    filters=MetadataFilters(
        filters=[
            MetadataFilter(key="book", value="paul_graham", operator="!="),
        ]
    ),
    similarity_top_k=10,
)

res = query_engine.query("What did the author learn?")
res.response
from llama_index.core.vector_stores import (
    MetadataFilters,
    MetadataFilter,
)

query_engine = index.as_query_engine(
    llm=dashscope_llm,
    filters=MetadataFilters(
        filters=[
            MetadataFilter(key="book", value="paul_graham", operator="!="),
        ]
    ),
    similarity_top_k=10,
)

res = query_engine.query("What did the author learn?")
res.response

Out[ ]:

'Empty Response'

删除文档¶

In [ ]:

Copied!

oceanbase.delete(documents[0].doc_id)

query_engine = index.as_query_engine(llm=dashscope_llm)
res = query_engine.query("What did the author do growing up?")
res.response
oceanbase.delete(documents[0].doc_id)

query_engine = index.as_query_engine(llm=dashscope_llm)
res = query_engine.query("What did the author do growing up?")
res.response

Out[ ]:

'Empty Response'