Memgraph 属性图索引¶

Memgraph 是一个开源图数据库，专为实时流数据处理和存储数据的快速分析而构建。

在运行 Memgraph 之前，请确保 Docker 已在后台运行。首次体验 Memgraph 平台（Memgraph 数据库 + MAGE 库 + Memgraph Lab）的最快捷方式是执行以下命令：

Linux/macOS 系统：

curl https://install.memgraph.com | sh

Windows 系统：

iwr https://windows.memgraph.com | iex

此后，您可以通过 http://localhost:3000/ 访问 Memgraph 的可视化工具 Memgraph Lab，或下载该应用的桌面版本。

In [ ]:

Copied!

%pip install llama-index llama-index-graph-stores-memgraph
%pip install llama-index llama-index-graph-stores-memgraph

环境配置¶

In [ ]:

Copied!

import os

os.environ[
    "OPENAI_API_KEY"
] = "sk-proj-..."  # Replace with your OpenAI API key
import os

os.environ[
    "OPENAI_API_KEY"
] = "sk-proj-..."  # Replace with your OpenAI API key

创建数据目录并下载我们将用作本示例输入数据的 Paul Graham 文章。

In [ ]:

Copied!

import urllib.request

os.makedirs("data/paul_graham/", exist_ok=True)

url = "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
output_path = "data/paul_graham/paul_graham_essay.txt"
urllib.request.urlretrieve(url, output_path)
import urllib.request

os.makedirs("data/paul_graham/", exist_ok=True)

url = "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
output_path = "data/paul_graham/paul_graham_essay.txt"
urllib.request.urlretrieve(url, output_path)

In [ ]:

Copied!

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio

nest_asyncio.apply()

读取文件内容，替换单引号，保存修改后的内容，并使用 SimpleDirectoryReader 加载文档数据

In [ ]:

Copied!

from llama_index.core import SimpleDirectoryReader

with open(output_path, "r", encoding="utf-8") as file:
    content = file.read()

with open(output_path, "w", encoding="utf-8") as file:
    file.write(content)

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader

with open(output_path, "r", encoding="utf-8") as file:
    content = file.read()

with open(output_path, "w", encoding="utf-8") as file:
    file.write(content)

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

配置 Memgraph 连接¶

通过提供数据库凭证来设置您的图存储类。

In [ ]:

Copied!





from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore

username = ""  # Enter your Memgraph username (default "")
password = ""  # Enter your Memgraph password (default "")
url = ""  # Specify the connection URL, e.g., 'bolt://localhost:7687'

graph_store = MemgraphPropertyGraphStore(
    username=username,
    password=password,
    url=url,
)
from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore

username = ""  # Enter your Memgraph username (default "")
password = ""  # Enter your Memgraph password (default "")
url = ""  # Specify the connection URL, e.g., 'bolt://localhost:7687'

graph_store = MemgraphPropertyGraphStore(
    username=username,
    password=password,
    url=url,
)

索引构建¶

In [ ]:

Copied!





from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

现在图数据库已创建完成，我们可以通过访问 http://localhost:3000/ 在用户界面中探索它。

可视化整个图的最简单方法是运行类似以下的 Cypher 命令：

MATCH p=()-[]-() RETURN p;

该命令会匹配图中所有可能的路径，并返回完整的图结构。

若要查看图的模式结构，请访问"Graph schema"标签页，基于新创建的图生成模式方案。

要删除整个图，请使用：

MATCH (n) DETACH DELETE n;

查询与检索¶

In [ ]:

Copied!





retriever = index.as_retriever(include_text=False)

# Example query: "What happened at Interleaf and Viaweb?"
nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

# Output results
print("Query Results:")
for node in nodes:
    print(node.text)

# Alternatively, using a query engine
query_engine = index.as_query_engine(include_text=True)

# Perform a query and print the detailed response
response = query_engine.query("What happened at Interleaf and Viaweb?")
print("\nDetailed Query Response:")
print(str(response))
retriever = index.as_retriever(include_text=False)

# Example query: "What happened at Interleaf and Viaweb?"
nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

# Output results
print("Query Results:")
for node in nodes:
    print(node.text)

# Alternatively, using a query engine
query_engine = index.as_query_engine(include_text=True)

# Perform a query and print the detailed response
response = query_engine.query("What happened at Interleaf and Viaweb?")
print("\nDetailed Query Response:")
print(str(response))

从现有图加载¶

如果您已有一个现成的图数据库（无论是用 LlamaIndex 创建的还是其他方式创建的），我们都可以连接并使用它！

注意： 如果您的图数据库是在 LlamaIndex 外部创建的，最有用的检索器将是文本转 Cypher 查询或 Cypher 模板。其他检索器依赖于 LlamaIndex 插入的属性。

In [ ]:

Copied!





llm = OpenAI(model="gpt-4", temperature=0.0)
kg_extractors = [SchemaLLMPathExtractor(llm=llm)]

index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    kg_extractors=kg_extractors,
    embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
    show_progress=True,
)
llm = OpenAI(model="gpt-4", temperature=0.0)
kg_extractors = [SchemaLLMPathExtractor(llm=llm)]

index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    kg_extractors=kg_extractors,
    embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
    show_progress=True,
)