Tair 向量数据库¶
在本笔记本中,我们将快速演示如何使用 TairVectorStore。
如果您在 Colab 上打开此 Notebook,可能需要安装 LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-vector-stores-tair
%pip install llama-index-vector-stores-tair
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
import os
import sys
import logging
import textwrap
import warnings
warnings.filterwarnings("ignore")
# stop huggingface warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Uncomment to see debug logs
# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import (
GPTVectorStoreIndex,
SimpleDirectoryReader,
Document,
)
from llama_index.vector_stores.tair import TairVectorStore
from IPython.display import Markdown, display
import os
import sys
import logging
import textwrap
import warnings
warnings.filterwarnings("ignore")
# stop huggingface warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Uncomment to see debug logs
# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import (
GPTVectorStoreIndex,
SimpleDirectoryReader,
Document,
)
from llama_index.vector_stores.tair import TairVectorStore
from IPython.display import Markdown, display
配置 OpenAI¶
首先让我们添加 OpenAI 的 API 密钥。这将使我们能够访问 OpenAI 以获取嵌入向量并使用 ChatGPT。
In [ ]:
Copied!
import os
os.environ["OPENAI_API_KEY"] = "sk-<your key here>"
import os
os.environ["OPENAI_API_KEY"] = "sk-"
下载数据¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
读取数据集¶
In [ ]:
Copied!
# load documents
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print(
"Document ID:",
documents[0].doc_id,
"Document Hash:",
documents[0].doc_hash,
)
# load documents
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print(
"Document ID:",
documents[0].doc_id,
"Document Hash:",
documents[0].doc_hash,
)
从文档构建索引¶
让我们使用 GPTVectorStoreIndex 构建一个向量索引,并以 TairVectorStore 作为其后端存储。请将 tair_url 替换为您 Tair 实例的实际地址。
In [ ]:
Copied!
from llama_index.core import StorageContext
tair_url = "redis://{username}:{password}@r-bp****************.redis.rds.aliyuncs.com:{port}"
vector_store = TairVectorStore(
tair_url=tair_url, index_name="pg_essays", overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
from llama_index.core import StorageContext
tair_url = "redis://{username}:{password}@r-bp****************.redis.rds.aliyuncs.com:{port}"
vector_store = TairVectorStore(
tair_url=tair_url, index_name="pg_essays", overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
查询数据¶
现在我们可以将索引作为知识库,并向其提出问题。
In [ ]:
Copied!
query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
In [ ]:
Copied!
response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))
response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))
删除文档¶
要从索引中删除文档,请使用 delete 方法。
In [ ]:
Copied!
document_id = documents[0].doc_id
document_id
document_id = documents[0].doc_id
document_id
In [ ]:
Copied!
info = vector_store.client.tvs_get_index("pg_essays")
print("Number of documents", int(info["data_count"]))
info = vector_store.client.tvs_get_index("pg_essays")
print("Number of documents", int(info["data_count"]))
In [ ]:
Copied!
vector_store.delete(document_id)
vector_store.delete(document_id)
In [ ]:
Copied!
info = vector_store.client.tvs_get_index("pg_essays")
print("Number of documents", int(info["data_count"]))
info = vector_store.client.tvs_get_index("pg_essays")
print("Number of documents", int(info["data_count"]))
删除索引¶
使用 delete_index 方法可删除整个索引。
In [ ]:
Copied!
vector_store.delete_index()
vector_store.delete_index()
In [ ]:
Copied!
print("Check index existence:", vector_store.client._index_exists())
print("Check index existence:", vector_store.client._index_exists())