Cassandra 数据库工具集¶
Apache Cassandra® 是广泛用于存储事务型应用数据的数据库。大语言模型中函数与工具的引入,为生成式AI应用中现有数据的利用开辟了令人兴奋的新场景。Cassandra数据库工具包使AI工程师能高效地将智能体与Cassandra数据集成,提供以下特性:
- 通过优化查询实现快速数据访问,多数查询应在个位数毫秒或更短时间内完成
- 支持模式自省以增强LLM推理能力
- 兼容多种Cassandra部署方案,包括Apache Cassandra®、DataStax Enterprise™和DataStax Astra™
- 当前工具包仅支持SELECT查询和模式自省操作(安全第一)
快速入门¶
- 安装cassio库
- 设置目标Cassandra数据库的环境变量
- 初始化CassandraDatabase
- 通过spec.to_tool_list()将工具传递给智能体
- 即可坐观其自动完成所有工作
运行原理¶
Cassandra查询语言(CQL)是与Cassandra数据库交互的主要以人为本的方式。虽然生成查询时具有一定灵活性,但需要掌握Cassandra数据建模的最佳实践。LLM函数调用赋予智能体先推理后选择工具来满足请求的能力。使用LLM的智能体在选择适当工具或工具链时,应基于Cassandra特有的逻辑进行推理。这能减少当LLM被迫提供自上而下解决方案时引入的随机性。您希望LLM对数据库拥有完全不受限制的访问权限吗?想必答案是否定的。为此,我们提供了构建智能体提问时的提示模板:
您是一个具有以下特性和规则的Apache Cassandra专家查询分析机器人:
- 将接收终端用户关于在数据库中查找特定数据的问题
- 会检查数据库模式并创建查询路径
- 将向用户提供查找所需数据的正确查询,展示查询路径的步骤
- 将运用分区键和聚类列的最佳实践进行查询
- 避免在查询中使用ALLOW FILTERING
- 目标是找到查询路径,因此可能需要查询其他表来获得最终答案
以下是JSON格式的查询路径示例:
{
"query_paths": [
{
"description": "通过email直接查询users表",
"steps": [
{
"table": "user_credentials",
"query":
"SELECT userid FROM user_credentials WHERE email = 'example@example.com';"
},
{
"table": "users",
"query": "SELECT * FROM users WHERE userid = ?;"
}
]
}
]
}
提供的工具¶
cassandra_db_schema
¶
收集连接数据库或特定模式的所有结构信息。对智能体决策行动至关重要。
cassandra_db_select_table_data
¶
从指定键空间和表中选择数据。智能体可传递谓词参数和返回记录数的限制条件。
cassandra_db_query
¶
作为cassandra_db_select_table_data
的实验性替代方案,接受由智能体完全构建的查询字符串而非参数。警告:这可能导致非常规查询,性能可能不佳(甚至无法执行)。该功能可能在后续版本移除。如果发现有趣用例,我们也非常期待了解——毕竟一切皆有可能!
环境配置¶
安装以下 Python 模块:
pip install ipykernel python-dotenv cassio llama-index llama-index-agent-openai llama-index-llms-openai llama-index-tools-cassandra
.env 文件配置¶
连接通过 cassio
使用 auto=True
参数实现,本笔记本使用 OpenAI 服务。您需要相应创建 .env
配置文件。
对于 Cassandra 数据库,需设置:
CASSANDRA_CONTACT_POINTS
CASSANDRA_USERNAME
CASSANDRA_PASSWORD
CASSANDRA_KEYSPACE
对于 Astra 数据库,需设置:
ASTRA_DB_APPLICATION_TOKEN
ASTRA_DB_DATABASE_ID
ASTRA_DB_KEYSPACE
配置示例:
# Astra 数据库连接配置:
ASTRA_DB_DATABASE_ID=a1b2c3d4-...
ASTRA_DB_APPLICATION_TOKEN=AstraCS:...
ASTRA_DB_KEYSPACE=notebooks
# 同时设置 OpenAI API 密钥
OPENAI_API_KEY=sk-....
(您也可以修改下方代码直接通过 cassio
建立连接。)
from dotenv import load_dotenv
load_dotenv(override=True)
# Import necessary libraries
import os
import cassio
from llama_index.tools.cassandra.base import CassandraDatabaseToolSpec
from llama_index.tools.cassandra.cassandra_database_wrapper import (
CassandraDatabase,
)
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
连接至 Cassandra 数据库¶
cassio.init(auto=True)
session = cassio.config.resolve_session()
if not session:
raise Exception(
"Check environment configuration or manually configure cassio connection parameters"
)
# Test data prep
session = cassio.config.resolve_session()
session.execute("""DROP KEYSPACE IF EXISTS llamaindex_agent_test; """)
session.execute(
"""
CREATE KEYSPACE if not exists llamaindex_agent_test
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
"""
)
session.execute(
"""
CREATE TABLE IF NOT EXISTS llamaindex_agent_test.user_credentials (
user_email text PRIMARY KEY,
user_id UUID,
password TEXT
);
"""
)
session.execute(
"""
CREATE TABLE IF NOT EXISTS llamaindex_agent_test.users (
id UUID PRIMARY KEY,
name TEXT,
email TEXT
);"""
)
session.execute(
"""
CREATE TABLE IF NOT EXISTS llamaindex_agent_test.user_videos (
user_id UUID,
video_id UUID,
title TEXT,
description TEXT,
PRIMARY KEY (user_id, video_id)
);
"""
)
user_id = "522b1fe2-2e36-4cef-a667-cd4237d08b89"
video_id = "27066014-bad7-9f58-5a30-f63fe03718f6"
session.execute(
f"""
INSERT INTO llamaindex_agent_test.user_credentials (user_id, user_email)
VALUES ({user_id}, 'patrick@datastax.com');
"""
)
session.execute(
f"""
INSERT INTO llamaindex_agent_test.users (id, name, email)
VALUES ({user_id}, 'Patrick McFadin', 'patrick@datastax.com');
"""
)
session.execute(
f"""
INSERT INTO llamaindex_agent_test.user_videos (user_id, video_id, title)
VALUES ({user_id}, {video_id}, 'Use Langflow to Build an LLM Application in 5 Minutes');
"""
)
session.set_keyspace("llamaindex_agent_test")
# Create a CassandraDatabaseToolSpec object
db = CassandraDatabase()
spec = CassandraDatabaseToolSpec(db=db)
tools = spec.to_tool_list()
for tool in tools:
print(tool.metadata.name)
print(tool.metadata.description)
print(tool.metadata.fn_schema)
cassandra_db_schema cassandra_db_schema(keyspace: str) -> List[llama_index.core.schema.Document] Input to this tool is a keyspace name, output is a table description of Apache Cassandra tables. If the query is not correct, an error message will be returned. If an error is returned, report back to the user that the keyspace doesn't exist and stop. Args: keyspace (str): The name of the keyspace for which to return the schema. Returns: List[Document]: A list of Document objects, each containing a table description. <class 'pydantic.main.cassandra_db_schema'> cassandra_db_select_table_data cassandra_db_select_table_data(keyspace: str, table: str, predicate: str, limit: int) -> List[llama_index.core.schema.Document] Tool for getting data from a table in an Apache Cassandra database. Use the WHERE clause to specify the predicate for the query that uses the primary key. A blank predicate will return all rows. Avoid this if possible. Use the limit to specify the number of rows to return. A blank limit will return all rows. Args: keyspace (str): The name of the keyspace containing the table. table (str): The name of the table for which to return data. predicate (str): The predicate for the query that uses the primary key. limit (int): The maximum number of rows to return. Returns: List[Document]: A list of Document objects, each containing a row of data. <class 'pydantic.main.cassandra_db_select_table_data'>
# Choose the LLM that will drive the agent
# Only certain models support this
llm = OpenAI(model="gpt-4-1106-preview")
# Create the Agent with our tools. Verbose will echo the agent's actions
agent = OpenAIAgent.from_tools(tools, llm=llm, verbose=True)
通过工具调用代理¶
我们已经创建了一个代理,该代理使用大型语言模型(LLM)进行推理,并通过工具列表执行操作。现在,我们可以直接向代理提问,并观察它如何利用我们提供的工具。
# Ask our new agent a series of questions. What how the agent uses tools to get the answers.
agent.chat("What tables are in the keyspace llamaindex_agent_test?")
agent.chat("What is the userid for patrick@datastax.com ?")
agent.chat("What videos did user patrick@datastax.com upload?")
Added user message to memory: What tables are in the keyspace llamaindex_agent_test? === Calling Function === Calling function: cassandra_db_schema with args: {"keyspace":"llamaindex_agent_test"} Got output: [Document(id_='4b6011e6-62e6-4db2-9198-046534b7c8dd', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Table Name: user_credentials\n- Keyspace: llamaindex_agent_test\n- Columns\n - password (text)\n - user_email (text)\n - user_id (uuid)\n- Partition Keys: (user_email)\n- Clustering Keys: \n\nTable Name: user_videos\n- Keyspace: llamaindex_agent_test\n- Columns\n - description (text)\n - title (text)\n - user_id (uuid)\n - video_id (uuid)\n- Partition Keys: (user_id)\n- Clustering Keys: (video_id asc)\n\n\nTable Name: users\n- Keyspace: llamaindex_agent_test\n- Columns\n - email (text)\n - id (uuid)\n - name (text)\n- Partition Keys: (id)\n- Clustering Keys: \n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')] ======================== Added user message to memory: What is the userid for patrick@datastax.com ? === Calling Function === Calling function: cassandra_db_select_table_data with args: {"keyspace":"llamaindex_agent_test","table":"user_credentials","predicate":"user_email = 'patrick@datastax.com'","limit":1} Got output: [Document(id_='e5620177-c735-46f8-a09a-a0e062efcdec', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="Row(user_email='patrick@datastax.com', password=None, user_id=UUID('522b1fe2-2e36-4cef-a667-cd4237d08b89'))", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')] ======================== Added user message to memory: What videos did user patrick@datastax.com upload? === Calling Function === Calling function: cassandra_db_select_table_data with args: {"keyspace":"llamaindex_agent_test","table":"user_videos","predicate":"user_id = 522b1fe2-2e36-4cef-a667-cd4237d08b89","limit":10} Got output: [Document(id_='e3ecfba1-e8e1-4ce3-b321-3f51e12077a1', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="Row(user_id=UUID('522b1fe2-2e36-4cef-a667-cd4237d08b89'), video_id=UUID('27066014-bad7-9f58-5a30-f63fe03718f6'), description=None, title='Use Langflow to Build an LLM Application in 5 Minutes')", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')] ========================
AgentChatResponse(response='The user `patrick@datastax.com` uploaded the following video in the `llamaindex_agent_test` keyspace:\n\n- Title: "Use Langflow to Build an LLM Application in 5 Minutes"\n- Video ID: `27066014-bad7-9f58-5a30-f63fe03718f6`\n- Description: Not provided', sources=[ToolOutput(content='[Document(id_=\'e3ecfba1-e8e1-4ce3-b321-3f51e12077a1\', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="Row(user_id=UUID(\'522b1fe2-2e36-4cef-a667-cd4237d08b89\'), video_id=UUID(\'27066014-bad7-9f58-5a30-f63fe03718f6\'), description=None, title=\'Use Langflow to Build an LLM Application in 5 Minutes\')", start_char_idx=None, end_char_idx=None, text_template=\'{metadata_str}\\n\\n{content}\', metadata_template=\'{key}: {value}\', metadata_seperator=\'\\n\')]', tool_name='cassandra_db_select_table_data', raw_input={'args': (), 'kwargs': {'keyspace': 'llamaindex_agent_test', 'table': 'user_videos', 'predicate': 'user_id = 522b1fe2-2e36-4cef-a667-cd4237d08b89', 'limit': 10}}, raw_output=[Document(id_='e3ecfba1-e8e1-4ce3-b321-3f51e12077a1', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="Row(user_id=UUID('522b1fe2-2e36-4cef-a667-cd4237d08b89'), video_id=UUID('27066014-bad7-9f58-5a30-f63fe03718f6'), description=None, title='Use Langflow to Build an LLM Application in 5 Minutes')", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')], is_error=False)], source_nodes=[], is_dummy_stream=False)