指南:在现有 Weaviate 向量存储中使用向量存储索引¶
如果您在 Colab 上打开此 Notebook,可能需要安装 LlamaIndex 🦙。
In [ ]:
Copied!
%pip install llama-index-vector-stores-weaviate
%pip install llama-index-embeddings-openai
%pip install llama-index-vector-stores-weaviate
%pip install llama-index-embeddings-openai
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
import weaviate
import weaviate
In [ ]:
Copied!
client = weaviate.Client("https://test-cluster-bbn8vqsn.weaviate.network")
client = weaviate.Client("https://test-cluster-bbn8vqsn.weaviate.network")
准备示例"现有" Weaviate 向量存储¶
定义模式¶
我们为"Book"类创建一个模式,包含4个属性:title(字符串类型)、author(字符串类型)、content(字符串类型)和year(整型)
In [ ]:
Copied!
try:
client.schema.delete_class("Book")
except:
pass
try:
client.schema.delete_class("Book")
except:
pass
In [ ]:
Copied!
schema = {
"classes": [
{
"class": "Book",
"properties": [
{"name": "title", "dataType": ["text"]},
{"name": "author", "dataType": ["text"]},
{"name": "content", "dataType": ["text"]},
{"name": "year", "dataType": ["int"]},
],
},
]
}
if not client.schema.contains(schema):
client.schema.create(schema)
schema = {
"classes": [
{
"class": "Book",
"properties": [
{"name": "title", "dataType": ["text"]},
{"name": "author", "dataType": ["text"]},
{"name": "content", "dataType": ["text"]},
{"name": "year", "dataType": ["int"]},
],
},
]
}
if not client.schema.contains(schema):
client.schema.create(schema)
定义示例数据¶
我们创建了4本示例书籍
In [ ]:
Copied!
books = [
{
"title": "To Kill a Mockingbird",
"author": "Harper Lee",
"content": (
"To Kill a Mockingbird is a novel by Harper Lee published in"
" 1960..."
),
"year": 1960,
},
{
"title": "1984",
"author": "George Orwell",
"content": (
"1984 is a dystopian novel by George Orwell published in 1949..."
),
"year": 1949,
},
{
"title": "The Great Gatsby",
"author": "F. Scott Fitzgerald",
"content": (
"The Great Gatsby is a novel by F. Scott Fitzgerald published in"
" 1925..."
),
"year": 1925,
},
{
"title": "Pride and Prejudice",
"author": "Jane Austen",
"content": (
"Pride and Prejudice is a novel by Jane Austen published in"
" 1813..."
),
"year": 1813,
},
]
books = [
{
"title": "To Kill a Mockingbird",
"author": "Harper Lee",
"content": (
"To Kill a Mockingbird is a novel by Harper Lee published in"
" 1960..."
),
"year": 1960,
},
{
"title": "1984",
"author": "George Orwell",
"content": (
"1984 is a dystopian novel by George Orwell published in 1949..."
),
"year": 1949,
},
{
"title": "The Great Gatsby",
"author": "F. Scott Fitzgerald",
"content": (
"The Great Gatsby is a novel by F. Scott Fitzgerald published in"
" 1925..."
),
"year": 1925,
},
{
"title": "Pride and Prejudice",
"author": "Jane Austen",
"content": (
"Pride and Prejudice is a novel by Jane Austen published in"
" 1813..."
),
"year": 1813,
},
]
添加数据¶
我们将示例书籍添加到 Weaviate 的 "Book" 类中(包含对 content 字段的嵌入处理
In [ ]:
Copied!
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding()
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding()
In [ ]:
Copied!
with client.batch as batch:
for book in books:
vector = embed_model.get_text_embedding(book["content"])
batch.add_data_object(
data_object=book, class_name="Book", vector=vector
)
with client.batch as batch:
for book in books:
vector = embed_model.get_text_embedding(book["content"])
batch.add_data_object(
data_object=book, class_name="Book", vector=vector
)
查询"现有"的 Weaviate 向量存储¶
In [ ]:
Copied!
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_source_node
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_source_node
你必须正确指定与目标 Weaviate 类匹配的 "index_name",并选择一个类属性作为 "text" 字段。
In [ ]:
Copied!
vector_store = WeaviateVectorStore(
weaviate_client=client, index_name="Book", text_key="content"
)
vector_store = WeaviateVectorStore(
weaviate_client=client, index_name="Book", text_key="content"
)
In [ ]:
Copied!
retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever(
similarity_top_k=1
)
retriever = VectorStoreIndex.from_vector_store(vector_store).as_retriever(
similarity_top_k=1
)
In [ ]:
Copied!
nodes = retriever.retrieve("What is that book about a bird again?")
nodes = retriever.retrieve("What is that book about a bird again?")
让我们检查检索到的节点。可以看到书籍数据以 LlamaIndex 的 Node 对象形式加载,其中 "content" 字段存储着主要文本内容。
In [ ]:
Copied!
pprint_source_node(nodes[0])
pprint_source_node(nodes[0])
Document ID: cf927ce7-0672-4696-8aae-7e77b33b9659 Similarity: None Text: author: Harper Lee title: To Kill a Mockingbird year: 1960 To Kill a Mockingbird is a novel by Harper Lee published in 1960......
其余字段应作为元数据加载(在 metadata 中)
In [ ]:
Copied!
nodes[0].node.metadata
nodes[0].node.metadata
Out[ ]:
{'author': 'Harper Lee', 'title': 'To Kill a Mockingbird', 'year': 1960}