Memgraph 属性图索引¶
Memgraph 是一个开源图数据库,专为实时流数据处理和存储数据的快速分析而构建。
在运行 Memgraph 之前,请确保 Docker 已在后台运行。首次体验 Memgraph 平台(Memgraph 数据库 + MAGE 库 + Memgraph Lab)的最快捷方式是执行以下命令:
Linux/macOS 系统:
curl https://install.memgraph.com | sh
Windows 系统:
iwr https://windows.memgraph.com | iex
此后,您可以通过 http://localhost:3000/ 访问 Memgraph 的可视化工具 Memgraph Lab,或下载该应用的桌面版本。
In [ ]:
Copied!
%pip install llama-index llama-index-graph-stores-memgraph
%pip install llama-index llama-index-graph-stores-memgraph
环境配置¶
In [ ]:
Copied!
import os
os.environ[
"OPENAI_API_KEY"
] = "sk-proj-..." # Replace with your OpenAI API key
import os
os.environ[
"OPENAI_API_KEY"
] = "sk-proj-..." # Replace with your OpenAI API key
创建数据目录并下载我们将用作本示例输入数据的 Paul Graham 文章。
In [ ]:
Copied!
import urllib.request
os.makedirs("data/paul_graham/", exist_ok=True)
url = "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
output_path = "data/paul_graham/paul_graham_essay.txt"
urllib.request.urlretrieve(url, output_path)
import urllib.request
os.makedirs("data/paul_graham/", exist_ok=True)
url = "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
output_path = "data/paul_graham/paul_graham_essay.txt"
urllib.request.urlretrieve(url, output_path)
In [ ]:
Copied!
import nest_asyncio
nest_asyncio.apply()
import nest_asyncio
nest_asyncio.apply()
读取文件内容,替换单引号,保存修改后的内容,并使用 SimpleDirectoryReader
加载文档数据
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader
with open(output_path, "r", encoding="utf-8") as file:
content = file.read()
with open(output_path, "w", encoding="utf-8") as file:
file.write(content)
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader
with open(output_path, "r", encoding="utf-8") as file:
content = file.read()
with open(output_path, "w", encoding="utf-8") as file:
file.write(content)
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
配置 Memgraph 连接¶
通过提供数据库凭证来设置您的图存储类。
In [ ]:
Copied!
from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore
username = "" # Enter your Memgraph username (default "")
password = "" # Enter your Memgraph password (default "")
url = "" # Specify the connection URL, e.g., 'bolt://localhost:7687'
graph_store = MemgraphPropertyGraphStore(
username=username,
password=password,
url=url,
)
from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore
username = "" # Enter your Memgraph username (default "")
password = "" # Enter your Memgraph password (default "")
url = "" # Specify the connection URL, e.g., 'bolt://localhost:7687'
graph_store = MemgraphPropertyGraphStore(
username=username,
password=password,
url=url,
)
索引构建¶
In [ ]:
Copied!
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
index = PropertyGraphIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
kg_extractors=[
SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
)
],
property_graph_store=graph_store,
show_progress=True,
)
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
index = PropertyGraphIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
kg_extractors=[
SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
)
],
property_graph_store=graph_store,
show_progress=True,
)
现在图数据库已创建完成,我们可以通过访问 http://localhost:3000/ 在用户界面中探索它。
可视化整个图的最简单方法是运行类似以下的 Cypher 命令:
MATCH p=()-[]-() RETURN p;
该命令会匹配图中所有可能的路径,并返回完整的图结构。
若要查看图的模式结构,请访问"Graph schema"标签页,基于新创建的图生成模式方案。
要删除整个图,请使用:
MATCH (n) DETACH DELETE n;
查询与检索¶
In [ ]:
Copied!
retriever = index.as_retriever(include_text=False)
# Example query: "What happened at Interleaf and Viaweb?"
nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")
# Output results
print("Query Results:")
for node in nodes:
print(node.text)
# Alternatively, using a query engine
query_engine = index.as_query_engine(include_text=True)
# Perform a query and print the detailed response
response = query_engine.query("What happened at Interleaf and Viaweb?")
print("\nDetailed Query Response:")
print(str(response))
retriever = index.as_retriever(include_text=False)
# Example query: "What happened at Interleaf and Viaweb?"
nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")
# Output results
print("Query Results:")
for node in nodes:
print(node.text)
# Alternatively, using a query engine
query_engine = index.as_query_engine(include_text=True)
# Perform a query and print the detailed response
response = query_engine.query("What happened at Interleaf and Viaweb?")
print("\nDetailed Query Response:")
print(str(response))
从现有图加载¶
如果您已有一个现成的图数据库(无论是用 LlamaIndex 创建的还是其他方式创建的),我们都可以连接并使用它!
注意: 如果您的图数据库是在 LlamaIndex 外部创建的,最有用的检索器将是 文本转 Cypher 查询 或 Cypher 模板。其他检索器依赖于 LlamaIndex 插入的属性。
In [ ]:
Copied!
llm = OpenAI(model="gpt-4", temperature=0.0)
kg_extractors = [SchemaLLMPathExtractor(llm=llm)]
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
kg_extractors=kg_extractors,
embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
show_progress=True,
)
llm = OpenAI(model="gpt-4", temperature=0.0)
kg_extractors = [SchemaLLMPathExtractor(llm=llm)]
index = PropertyGraphIndex.from_existing(
property_graph_store=graph_store,
kg_extractors=kg_extractors,
embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
show_progress=True,
)