Neo4j 属性图索引¶
Neo4j 是一款生产级图数据库,能够存储属性图、执行向量搜索、过滤等操作。
最简单的入门方式是使用 Neo4j Aura 提供的云托管实例。
在本笔记本中,我们将重点介绍如何通过 Docker 在本地运行该数据库。
如果您已拥有现成的图数据库,请直接跳转至本笔记本末尾部分。
In [ ]:
Copied!
%pip install llama-index llama-index-graph-stores-neo4j
%pip install llama-index llama-index-graph-stores-neo4j
Docker 环境配置¶
要在本地运行 Neo4j,请先确保已安装 Docker。随后可通过以下 docker 命令启动数据库:
docker run \
-p 7474:7474 -p 7687:7687 \
-v $PWD/data:/data -v $PWD/plugins:/plugins \
--name neo4j-apoc \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
neo4j:latest
启动后,可通过 http://localhost:7474/ 访问数据库。首次登录时需使用默认用户名/密码 neo4j
和 neo4j
。
初次登录后系统会要求修改密码。完成此步骤后,即可开始创建您的第一个属性图!
环境配置¶
只需进行少量环境设置即可开始。
In [ ]:
Copied!
import os
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
import os
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
In [ ]:
Copied!
import nest_asyncio
nest_asyncio.apply()
import nest_asyncio
nest_asyncio.apply()
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
/Users/loganmarkewich/Library/Caches/pypoetry/virtualenvs/llama-index-caVs7DDe-py3.11/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
索引构建¶
In [ ]:
Copied!
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
# Note: used to be `Neo4jPGStore`
graph_store = Neo4jPropertyGraphStore(
username="neo4j",
password="llamaindex",
url="bolt://localhost:7687",
)
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
# Note: used to be `Neo4jPGStore`
graph_store = Neo4jPropertyGraphStore(
username="neo4j",
password="llamaindex",
url="bolt://localhost:7687",
)
In [ ]:
Copied!
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
index = PropertyGraphIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
kg_extractors=[
SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
)
],
property_graph_store=graph_store,
show_progress=True,
)
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
index = PropertyGraphIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
kg_extractors=[
SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
)
],
property_graph_store=graph_store,
show_progress=True,
)
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 21.63it/s] Extracting paths from text with schema: 100%|██████████| 22/22 [01:06<00:00, 3.02s/it] Generating embeddings: 100%|██████████| 1/1 [00:00<00:00, 1.06it/s] Generating embeddings: 100%|██████████| 1/1 [00:00<00:00, 1.89it/s]
现在图形已创建完毕,我们可以通过访问 http://localhost:7474/ 在用户界面中探索它。
查看整个图形最简单的方法是在顶部使用类似 "match n=() return n"
的 Cypher 查询命令。
要删除整个图形,可使用 "match n=() detach delete n"
这一实用命令。
查询与检索¶
In [ ]:
Copied!
retriever = index.as_retriever(
include_text=False, # include source text in returned nodes, default True
)
nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")
for node in nodes:
print(node.text)
retriever = index.as_retriever(
include_text=False, # include source text in returned nodes, default True
)
nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")
for node in nodes:
print(node.text)
Interleaf -> Got crushed by -> Moore's law Interleaf -> Made -> Scripting language Interleaf -> Had -> Smart people Interleaf -> Inspired by -> Emacs Interleaf -> Had -> Few years to live Interleaf -> Made -> Software Interleaf -> Had done -> Something bold Interleaf -> Added -> Scripting language Interleaf -> Built -> Impressive technology Interleaf -> Was -> Company Viaweb -> Was -> Profitable Viaweb -> Was -> Growing rapidly Viaweb -> Suggested -> Hospital Idea -> Was clear from -> Experience Idea -> Would have to be embodied as -> Company Painting department -> Seemed to be -> Rigorous
In [ ]:
Copied!
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("What happened at Interleaf and Viaweb?")
print(str(response))
Interleaf had smart people and built impressive technology but got crushed by Moore's Law. Viaweb was profitable and growing rapidly.
从现有图数据库加载¶
如果您已有一个现成的图数据库(无论是通过 LlamaIndex 创建还是其他方式创建),我们都可以连接并使用它!
注意: 如果您的图数据库是在 LlamaIndex 之外创建的,最有用的检索器将是 文本转 Cypher 查询 或 Cypher 模板。其他检索器依赖于 LlamaIndex 插入的特殊属性。
In [ ]:
Copied!
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
graph_store = Neo4jPropertyGraphStore(
username="neo4j",
password="794613852",
url="bolt://localhost:7687",
)
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
)
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
graph_store = Neo4jPropertyGraphStore(
username="neo4j",
password="794613852",
url="bolt://localhost:7687",
)
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
)
从这里开始,我们仍然可以插入更多文档!
In [ ]:
Copied!
from llama_index.core import Document
document = Document(text="LlamaIndex is great!")
index.insert(document)
from llama_index.core import Document
document = Document(text="LlamaIndex is great!")
index.insert(document)
In [ ]:
Copied!
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")
print(nodes[0].text)
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")
print(nodes[0].text)
Llamaindex -> Is -> Great
有关属性图的构建、检索和查询的完整详情,请参阅完整文档页面。