Neo4j 向量存储¶
如果您在 Colab 上打开此 Notebook,可能需要安装 LlamaIndex 🦙。
%pip install llama-index-vector-stores-neo4jvector
!pip install llama-index
import os
import openai
os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"
openai.api_key = os.environ["OPENAI_API_KEY"]
初始化 Neo4j 向量包装器¶
from llama_index.vector_stores.neo4jvector import Neo4jVectorStore
username = "neo4j"
password = "pleaseletmein"
url = "bolt://localhost:7687"
embed_dim = 1536
neo4j_vector = Neo4jVectorStore(username, password, url, embed_dim)
加载文档并构建 VectorStoreIndex¶
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from IPython.display import Markdown, display
下载数据
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2023-12-14 18:44:00-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: ‘data/paul_graham/paul_graham_essay.txt’ data/paul_graham/pa 100%[===================>] 73,28K --.-KB/s in 0,03s 2023-12-14 18:44:00 (2,16 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
# load documents
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
from llama_index.core import StorageContext
storage_context = StorageContext.from_defaults(vector_store=neo4j_vector)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
query_engine = index.as_query_engine()
response = query_engine.query("What happened at interleaf?")
display(Markdown(f"<b>{response}</b>"))
At Interleaf, they added a scripting language inspired by Emacs and made it a dialect of Lisp. They were looking for a Lisp hacker to write things in this scripting language. The author of the text worked at Interleaf and mentioned that their Lisp was the thinnest icing on a giant C cake. The author also mentioned that they didn't know C and didn't want to learn it, so they never understood most of the software at Interleaf. Additionally, the author admitted to being a bad employee and spending much of their time working on a separate project called On Lisp.
混合搜索¶
混合搜索结合了关键词搜索与向量搜索两种方式
如需启用混合搜索功能,需将 hybrid_search 参数设置为 True
neo4j_vector_hybrid = Neo4jVectorStore(
username, password, url, embed_dim, hybrid_search=True
)
storage_context = StorageContext.from_defaults(
vector_store=neo4j_vector_hybrid
)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
query_engine = index.as_query_engine()
response = query_engine.query("What happened at interleaf?")
display(Markdown(f"<b>{response}</b>"))
At Interleaf, they added a scripting language inspired by Emacs and made it a dialect of Lisp. They were looking for a Lisp hacker to write things in this scripting language. The author of the essay worked at Interleaf but didn't understand most of the software because he didn't know C and didn't want to learn it. He also mentioned that their Lisp was the thinnest icing on a giant C cake. The author admits to being a bad employee and spending much of his time working on a contract to publish On Lisp.
加载现有向量索引¶
要连接到现有的向量索引,您需要定义以下参数:
- index_name:现有向量索引的名称(默认为
vector) - text_node_property:包含文本值的属性名称(默认为
text)
index_name = "existing_index"
text_node_property = "text"
existing_vector = Neo4jVectorStore(
username,
password,
url,
embed_dim,
index_name=index_name,
text_node_property=text_node_property,
)
loaded_index = VectorStoreIndex.from_vector_store(existing_vector)
自定义响应¶
您可以通过 retrieval_query 参数来自定义从知识图谱中检索的信息。
检索查询必须返回以下四列数据:
- text:str - 返回文档的文本内容
- score:str - 相似度评分
- id:str - 节点ID
- metadata: Dict - 包含额外元数据的字典(必须包含
_node_type和_node_content键)
retrieval_query = (
"RETURN 'Interleaf hired Tomaz' AS text, score, node.id AS id, "
"{author: 'Tomaz', _node_type:node._node_type, _node_content:node._node_content} AS metadata"
)
neo4j_vector_retrieval = Neo4jVectorStore(
username, password, url, embed_dim, retrieval_query=retrieval_query
)
loaded_index = VectorStoreIndex.from_vector_store(
neo4j_vector_retrieval
).as_query_engine()
response = loaded_index.query("What happened at interleaf?")
display(Markdown(f"<b>{response}</b>"))
Interleaf hired Tomaz.