Multimodal rag guardrail gemini llmguard llmguard

基于LLM-GUARD防护机制的多模态RAG流程¶

本指南介绍了一个强化版多模态检索增强生成(RAG)流程，通过集成LLM GUARD防护机制实现安全、可靠且上下文精准的响应。该流程可处理文本、表格、图片和图表等多模态输入，同时运用防护机制对输入输出进行监控验证。详细说明请查阅我的README.md文件：https://github.com/vntuananhbui/MultimodalRAG-LlamaIndex-Guardrail/blob/main/README.md

注意：¶

本流程通过免费API调用Gemini 1.5 Flash模型进行推理，具有开发友好性和成本效益优势。

扩展：¶

您也可以使用其他防护框架如Guardrail AI等。

流程概览¶

该多模态RAG流程通过原生支持多样化文档布局和模态类型，突破了传统文本RAG系统的局限。它同时利用文本和图像嵌入向量来实现上下文感知的答案检索与合成。

核心特性：¶

多模态输入处理：
- 直接处理文本、图像及复杂布局
- 将文档内容转换为强健的嵌入向量用于检索
防护机制集成：
- 添加输入/输出扫描器确保安全性与质量
- 动态验证查询与响应中的毒性内容或令牌溢出等风险
定制化查询引擎：
- 专为防护机制设计的查询处理器
- 根据扫描结果动态拦截、净化或验证输入/输出
高成本效益实现：
- 通过免费API使用Gemini 1.5 Flash，在保持高性能的同时控制成本

为何需要多模态RAG防护机制？¶

虽然多模态RAG流程功能强大，但仍面临不当输入、幻觉输出或令牌超限等风险。防护机制作为安全屏障可确保：

安全性：阻止有害或冒犯性查询与输出
可靠性：验证响应完整性
可扩展性：动态处理复杂场景

架构概述¶

1. 输入扫描器¶

在查询处理前进行验证，例如：

毒性扫描器：检测并拦截有害语言
令牌限制扫描器：确保查询不超处理上限

2. 定制查询引擎¶

该引擎在检索与合成过程中多阶段应用防护机制：

预处理：使用扫描器验证输入查询
处理中：通过多模态嵌入向量检索相关节点
后处理：净化并验证输出内容

3. 多模态大语言模型¶

流程采用多模态大模型(如Gemini 1.5 Flash)，能够理解并生成基于文本和图像的上下文感知输出。其免费API特性使其成为无需高昂成本的理想开发选择。

防护机制工作流¶

输入验证¶

使用预定义扫描器检测输入查询
根据扫描结果拦截或净化查询

检索阶段¶

获取相关文本与图像节点
将内容转换为嵌入向量用于合成

输出验证¶

通过输出扫描器分析生成结果
根据阈值(如毒性指数)拦截或净化输出

带防护机制的多模态RAG优势¶

提升安全性：验证查询与响应以降低风险
增强鲁棒性：无损处理多模态输入上下文
动态控制：灵活应对多样化输入输出
成本优化：选择性应用输入/输出验证节省资源，免费Gemini 1.5 FlashAPI降低运营开支

本流程展示了如何通过原生多模态RAG系统与防护机制的结合，在复杂文档环境中提供安全可靠的高质量结果，同时利用免费API保持成本效益。

安装¶

In [ ]:

Copied!

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio

nest_asyncio.apply()

配置可观测性¶

加载数据¶

此处我们加载康菲石油公司2023年投资者会议演示文稿。

In [ ]:

Copied!

!mkdir data
!mkdir data_images
!wget "https://static.conocophillips.com/files/2023-conocophillips-aim-presentation.pdf" -O data/conocophillips.pdf
!mkdir data
!mkdir data_images
!wget "https://static.conocophillips.com/files/2023-conocophillips-aim-presentation.pdf" -O data/conocophillips.pdf

mkdir: data: File exists
mkdir: data_images: File exists
zsh:1: command not found: wget

安装依赖¶

In [ ]:

Copied!





!pip install llama-index
!pip install llama-parse
!pip install llama-index-llms-langchain
!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-gemini
!pip install llama-index-multi-modal-llms-gemini
!pip install litellm
!pip install llm-guard
!pip install llama-index
!pip install llama-parse
!pip install llama-index-llms-langchain
!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-gemini
!pip install llama-index-multi-modal-llms-gemini
!pip install litellm
!pip install llm-guard

模型设置¶

配置将用于下游编排流程的模型。

In [ ]:

Copied!





from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.gemini import Gemini
from llama_index.multi_modal_llms.gemini import GeminiMultiModal
import os

LlamaCloud_API_KEY = ""
MultiGeminiKey = ""
GOOGLE_API_KEY = ""
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
os.environ["GEMINI_API_KEY"] = GOOGLE_API_KEY
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
gemini_multimodal = GeminiMultiModal(
    model_name="models/gemini-1.5-flash", api_key=MultiGeminiKey
)
api_key = GOOGLE_API_KEY
llamaAPI_KEY = LlamaCloud_API_KEY
llm = Gemini(model="models/gemini-1.5-flash", api_key=api_key)
Settings.llm = llm
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.gemini import Gemini
from llama_index.multi_modal_llms.gemini import GeminiMultiModal
import os

LlamaCloud_API_KEY = ""
MultiGeminiKey = ""
GOOGLE_API_KEY = ""
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
os.environ["GEMINI_API_KEY"] = GOOGLE_API_KEY
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
gemini_multimodal = GeminiMultiModal(
    model_name="models/gemini-1.5-flash", api_key=MultiGeminiKey
)
api_key = GOOGLE_API_KEY
llamaAPI_KEY = LlamaCloud_API_KEY
llm = Gemini(model="models/gemini-1.5-flash", api_key=api_key)
Settings.llm = llm

使用 LlamaParse 解析文本与图像¶

本示例展示如何使用 LlamaParse 同时解析文档中的文本和图像内容。

我们通过两种方式提取文本：

常规 text 模式：采用默认的文本布局算法
markdown 模式：启用 GPT-4o 解析 (gpt4o_mode=True)，该模式还可捕获页面截图

In [ ]:

Copied!

from llama_parse import LlamaParse

parser_text = LlamaParse(result_type="text", api_key=llamaAPI_KEY)
parser_gpt4o = LlamaParse(
    result_type="markdown", gpt4o_mode=True, api_key=llamaAPI_KEY
)
from llama_parse import LlamaParse

parser_text = LlamaParse(result_type="text", api_key=llamaAPI_KEY)
parser_gpt4o = LlamaParse(
    result_type="markdown", gpt4o_mode=True, api_key=llamaAPI_KEY
)

In [ ]:

Copied!





print(f"Parsing text...")
docs_text = parser_text.load_data("data/conocophillips.pdf")
print(f"Parsing PDF file...")
md_json_objs = parser_gpt4o.get_json_result("data/conocophillips.pdf")
md_json_list = md_json_objs[0]["pages"]
print(f"Parsing text...")
docs_text = parser_text.load_data("data/conocophillips.pdf")
print(f"Parsing PDF file...")
md_json_objs = parser_gpt4o.get_json_result("data/conocophillips.pdf")
md_json_list = md_json_objs[0]["pages"]

Parsing text...
Started parsing the file under job_id e79a470b-e8d3-4f55-a048-d1b3d81b6d1e
Parsing PDF file...
Started parsing the file under job_id 84943607-b630-45bd-bf89-8470840e73b5

In [ ]:

Copied!

print(md_json_list[10]["md"])
print(md_json_list[10]["md"])

# Commitment to Disciplined Reinvestment Rate

| Period       | Description                        | Reinvestment Rate | WTI Average |
|--------------|------------------------------------|-------------------|-------------|
| 2012-2016    | Industry Growth Focus              | >100%             | ~$75/BBL    |
| 2017-2022    | ConocoPhillips Strategy Reset      | <60%              | ~$63/BBL    |
| 2023E        |                                    |                   | at $80/BBL  |
| 2024-2028    | Disciplined Reinvestment Rate      | ~50%              | at $60/BBL  |
| 2029-2032    |                                    | ~6% CFO CAGR      | at $60/BBL  |

- **Historic Reinvestment Rate**: Shown in gray.
- **Reinvestment Rate at $60/BBL WTI**: Shown in blue.
- **Reinvestment Rate at $80/BBL WTI**: Shown with dashed lines.

**Note**: Reinvestment rate and cash from operations (CFO) are non-GAAP measures. Definitions and reconciliations are included in the Appendix.

In [ ]:

Copied!

image_dicts = parser_gpt4o.get_images(
    md_json_objs, download_path="data_images"
)
image_dicts = parser_gpt4o.get_images(
    md_json_objs, download_path="data_images"
)

构建多模态索引¶

本节将在解析后的牌组上构建多模态索引。

我们通过从文档中创建包含原始图像路径引用的文本节点来实现这一目标。

在本示例中，我们为检索功能建立文本节点索引。该文本节点同时包含对解析文本和图像截图的引用。

获取文本节点¶

In [ ]:

Copied!

from llama_index.core.schema import TextNode
from typing import Optional
from llama_index.core.schema import TextNode
from typing import Optional

In [ ]:

Copied!





# get pages loaded through llamaparse
import re


def get_page_number(file_name):
    match = re.search(r"-page-(\d+)\.jpg$", str(file_name))
    if match:
        return int(match.group(1))
    return 0


def _get_sorted_image_files(image_dir):
    """Get image files sorted by page."""
    raw_files = [f for f in list(Path(image_dir).iterdir()) if f.is_file()]
    sorted_files = sorted(raw_files, key=get_page_number)
    return sorted_files
# get pages loaded through llamaparse
import re


def get_page_number(file_name):
    match = re.search(r"-page-(\d+)\.jpg$", str(file_name))
    if match:
        return int(match.group(1))
    return 0


def _get_sorted_image_files(image_dir):
    """Get image files sorted by page."""
    raw_files = [f for f in list(Path(image_dir).iterdir()) if f.is_file()]
    sorted_files = sorted(raw_files, key=get_page_number)
    return sorted_files

In [ ]:

Copied!





# Assuming TextNode class is defined somewhere else in your code
# Attach image metadata to the text nodes
def get_text_nodes(docs, image_dir=None, json_dicts=None):
    """Split docs into nodes, by separator."""
    nodes = []

    # Get image files (if provided)
    image_files = (
        _get_sorted_image_files(image_dir) if image_dir is not None else None
    )

    # Get markdown texts (if provided)
    md_texts = (
        [d["md"] for d in json_dicts] if json_dicts is not None else None
    )

    # Split docs into chunks by separator
    doc_chunks = [c for d in docs for c in d.text.split("---")]

    # Handle both single-page and multi-page cases
    for idx, doc_chunk in enumerate(doc_chunks):
        chunk_metadata = {"page_num": idx + 1}

        # Check if there are image files and handle the single-page case
        if image_files is not None:
            # Use the first image file if there's only one
            image_file = (
                image_files[idx] if idx < len(image_files) else image_files[0]
            )
            chunk_metadata["image_path"] = str(image_file)

        # Check if there are markdown texts and handle the single-page case
        if md_texts is not None:
            # Use the first markdown text if there's only one
            parsed_text_md = (
                md_texts[idx] if idx < len(md_texts) else md_texts[0]
            )
            chunk_metadata["parsed_text_markdown"] = parsed_text_md

        # Add the chunk text as metadata
        chunk_metadata["parsed_text"] = doc_chunk

        # Create the TextNode with the parsed text and metadata
        node = TextNode(
            text="",
            metadata=chunk_metadata,
        )
        nodes.append(node)

    return nodes
# Assuming TextNode class is defined somewhere else in your code
# Attach image metadata to the text nodes
def get_text_nodes(docs, image_dir=None, json_dicts=None):
    """Split docs into nodes, by separator."""
    nodes = []

    # Get image files (if provided)
    image_files = (
        _get_sorted_image_files(image_dir) if image_dir is not None else None
    )

    # Get markdown texts (if provided)
    md_texts = (
        [d["md"] for d in json_dicts] if json_dicts is not None else None
    )

    # Split docs into chunks by separator
    doc_chunks = [c for d in docs for c in d.text.split("---")]

    # Handle both single-page and multi-page cases
    for idx, doc_chunk in enumerate(doc_chunks):
        chunk_metadata = {"page_num": idx + 1}

        # Check if there are image files and handle the single-page case
        if image_files is not None:
            # Use the first image file if there's only one
            image_file = (
                image_files[idx] if idx < len(image_files) else image_files[0]
            )
            chunk_metadata["image_path"] = str(image_file)

        # Check if there are markdown texts and handle the single-page case
        if md_texts is not None:
            # Use the first markdown text if there's only one
            parsed_text_md = (
                md_texts[idx] if idx < len(md_texts) else md_texts[0]
            )
            chunk_metadata["parsed_text_markdown"] = parsed_text_md

        # Add the chunk text as metadata
        chunk_metadata["parsed_text"] = doc_chunk

        # Create the TextNode with the parsed text and metadata
        node = TextNode(
            text="",
            metadata=chunk_metadata,
        )
        nodes.append(node)

    return nodes

In [ ]:

Copied!





from pathlib import Path

# this will split into pages
text_nodes = get_text_nodes(
    docs_text,
    image_dir="/Users/macintosh/TA-DOCUMENT/StudyZone/ComputerScience/Artificial Intelligence/Llama_index/llama_index/docs/docs/examples/rag_guardrail/data_images",
    json_dicts=md_json_list,
)
from pathlib import Path

# this will split into pages
text_nodes = get_text_nodes(
    docs_text,
    image_dir="/Users/macintosh/TA-DOCUMENT/StudyZone/ComputerScience/Artificial Intelligence/Llama_index/llama_index/docs/docs/examples/rag_guardrail/data_images",
    json_dicts=md_json_list,
)

In [ ]:

Copied!

print(text_nodes[0].get_content(metadata_mode="all"))
print(text_nodes[0].get_content(metadata_mode="all"))

page_num: 1
image_path: /Users/macintosh/TA-DOCUMENT/StudyZone/ComputerScience/Artificial Intelligence/Llama_index/llama_index/docs/docs/examples/rag_guardrail/data_images/84943607-b630-45bd-bf89-8470840e73b5-page_51.jpg
parsed_text_markdown: NO_CONTENT_HERE
parsed_text: ConocoPhillips
                2023 Analyst & Investor Meeting

构建索引¶

当文本节点准备就绪后，我们将其输入向量存储索引抽象层，该层会将这些节点索引至简单的内存向量存储中（当然，强烈推荐您查看我们支持的40多种向量存储集成方案！）

In [ ]:

Copied!





import os
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)

if not os.path.exists("storage_nodes"):
    index = VectorStoreIndex(text_nodes, embed_model=embed_model)
    # save index to disk
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage_nodes")
else:
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="storage_nodes")
    # load index
    index = load_index_from_storage(storage_context, index_id="vector_index")

retriever = index.as_retriever()
import os
from llama_index.core import (
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)

if not os.path.exists("storage_nodes"):
    index = VectorStoreIndex(text_nodes, embed_model=embed_model)
    # save index to disk
    index.set_index_id("vector_index")
    index.storage_context.persist("./storage_nodes")
else:
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="storage_nodes")
    # load index
    index = load_index_from_storage(storage_context, index_id="vector_index")

retriever = index.as_retriever()

构建安全护栏¶

定义护栏的全局轨道输出格式

In [ ]:

Copied!





def result_response(
    guardrail_type,
    activated,
    guard_output,
    is_valid,
    risk_score,
    threshold,
    response_text,
):
    """
    Standardizes the result format for all guardrail checks.
    """
    return {
        "guardrail_type": guardrail_type,
        "activated": activated,
        "guardrail_detail": {
            "guard_output": guard_output,
            "is_valid": is_valid,
            "risk_score/threshold": f"{risk_score}/{threshold}",
            "response_text": response_text,
        },
    }
def result_response(
    guardrail_type,
    activated,
    guard_output,
    is_valid,
    risk_score,
    threshold,
    response_text,
):
    """
    Standardizes the result format for all guardrail checks.
    """
    return {
        "guardrail_type": guardrail_type,
        "activated": activated,
        "guardrail_detail": {
            "guard_output": guard_output,
            "is_valid": is_valid,
            "risk_score/threshold": f"{risk_score}/{threshold}",
            "response_text": response_text,
        },
    }

添加 Guardrail 扫描器¶

此处可访问 "https://llm-guard.com" 获取必要扫描器。以下是两个示例扫描器，请遵循此格式函数。
我们使用 Gemini 1.5 Flash 作为扫描器的响应模型。当然，您也可以选择其他大语言模型，例如 ChatGPT...

In [ ]:

Copied!





from llm_guard.input_scanners import Toxicity
from llm_guard.input_scanners.toxicity import MatchType
from litellm import completion


def guardrail_toxicLanguage(prompt):
    # Interact with the LLM to generate a response
    print(f"Prompt: {prompt}")

    # Generate the response using the LLM (Gemini-1.5-flash)
    response = completion(
        model="gemini/gemini-1.5-flash",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
    )
    response_text = response.choices[0].message.content

    # Define the threshold and scan for toxicity
    threshold = 0.5
    toxic_scanner = Toxicity(threshold=threshold, match_type=MatchType.FULL)
    sanitized_output, is_valid, risk_score = toxic_scanner.scan(prompt)

    return result_response(
        guardrail_type="Toxicity",
        activated=not is_valid,
        guard_output=sanitized_output,
        is_valid=is_valid,
        risk_score=risk_score,
        threshold=threshold,
        response_text=response_text,
    )
from llm_guard.input_scanners import Toxicity
from llm_guard.input_scanners.toxicity import MatchType
from litellm import completion


def guardrail_toxicLanguage(prompt):
    # Interact with the LLM to generate a response
    print(f"Prompt: {prompt}")

    # Generate the response using the LLM (Gemini-1.5-flash)
    response = completion(
        model="gemini/gemini-1.5-flash",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
    )
    response_text = response.choices[0].message.content

    # Define the threshold and scan for toxicity
    threshold = 0.5
    toxic_scanner = Toxicity(threshold=threshold, match_type=MatchType.FULL)
    sanitized_output, is_valid, risk_score = toxic_scanner.scan(prompt)

    return result_response(
        guardrail_type="Toxicity",
        activated=not is_valid,
        guard_output=sanitized_output,
        is_valid=is_valid,
        risk_score=risk_score,
        threshold=threshold,
        response_text=response_text,
    )

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

In [ ]:

Copied!





from llm_guard.input_scanners import TokenLimit
from llm_guard import scan_output
from litellm import completion


def guardrail_tokenlimit(prompt):
    threshold = 400
    response = completion(
        model="gemini/gemini-1.5-flash",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
    )
    response_text = response.choices[0].message.content

    scanner = TokenLimit(limit=threshold, encoding_name="cl100k_base")
    sanitized_output, is_valid, risk_score = scanner.scan(prompt)

    # Use the global rail to format the result
    result = result_response(
        guardrail_type="Token limit",
        activated=not is_valid,
        guard_output=sanitized_output,
        is_valid=is_valid,
        risk_score=risk_score,
        threshold=threshold,
        response_text=response_text,
    )

    return result
from llm_guard.input_scanners import TokenLimit
from llm_guard import scan_output
from litellm import completion


def guardrail_tokenlimit(prompt):
    threshold = 400
    response = completion(
        model="gemini/gemini-1.5-flash",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
    )
    response_text = response.choices[0].message.content

    scanner = TokenLimit(limit=threshold, encoding_name="cl100k_base")
    sanitized_output, is_valid, risk_score = scanner.scan(prompt)

    # Use the global rail to format the result
    result = result_response(
        guardrail_type="Token limit",
        activated=not is_valid,
        guard_output=sanitized_output,
        is_valid=is_valid,
        risk_score=risk_score,
        threshold=threshold,
        response_text=response_text,
    )

    return result

`InputScanner` - `OutputScanner` 函数¶

InputScanner 函数对给定的输入查询运行一系列扫描器，并评估是否有扫描器检测到威胁。该函数返回一个布尔值（指示是否检测到威胁）以及触发检测的扫描器结果列表。

参数：¶

query (str): 需要扫描潜在威胁的输入内容。
listOfScanners (list): 扫描器函数列表。每个扫描器函数应接收查询作为输入，并返回包含键 "activated"（布尔值）的字典，用于指示是否检测到威胁。

返回值：¶

detected (bool): 若任意扫描器检测到威胁则返回 True，否则返回 False。
triggered_scanners (list): 由检测到威胁的扫描器返回的字典结果列表。

关键步骤：¶

初始化 detected 为 False，用于追踪是否有扫描器发现威胁。
创建空列表 triggered_scanners，用于存储触发检测的扫描器结果。
遍历 listOfScanners 中的每个扫描器：
- 对 query 运行当前扫描器。
- 检查扫描器结果是否包含 "activated": True。
- 若检测到威胁：
  - 将 detected 设为 True。
  - 将该扫描器的结果追加至 triggered_scanners。
返回 detected 状态及 triggered_scanners 列表。

In [ ]:

Copied!





def InputScanner(query, listOfScanners):
    """
    Runs all scanners on the query and returns:
    - True if any scanner detects a threat.
    - A list of results from scanners that returned True.
    """
    detected = False  # Track if any scanner detects a threat
    triggered_scanners = []  # Store results from triggered scanners

    # Run each scanner on the query
    for scanner in listOfScanners:
        result = scanner(query)

        if result[
            "activated"
        ]:  # Check if the scanner found a threat (activated=True)
            detected = True  # Set detected to True if any scanner triggers
            triggered_scanners.append(result)  # Track which scanner triggered

    return detected, triggered_scanners
def InputScanner(query, listOfScanners):
    """
    Runs all scanners on the query and returns:
    - True if any scanner detects a threat.
    - A list of results from scanners that returned True.
    """
    detected = False  # Track if any scanner detects a threat
    triggered_scanners = []  # Store results from triggered scanners

    # Run each scanner on the query
    for scanner in listOfScanners:
        result = scanner(query)

        if result[
            "activated"
        ]:  # Check if the scanner found a threat (activated=True)
            detected = True  # Set detected to True if any scanner triggers
            triggered_scanners.append(result)  # Track which scanner triggered

    return detected, triggered_scanners

In [ ]:

Copied!





def OutputScanner(response, query, context, listOfScanners):
    """
    Runs all scanners on the response and returns:
    - True if any scanner detects a threat.
    - A list of results from scanners that returned True.
    """
    detected = False  # Track if any scanner detects a threat
    triggered_scanners = []  # Store results from triggered scanners

    # Run each scanner on the response
    for scanner in listOfScanners:
        # Check if scanner is `evaluate_rag_response` (which needs query & context)
        if scanner.__name__ == "evaluate_rag_response":
            result = scanner(
                response, query, context
            )  # Execute with query & context
        else:
            result = scanner(response)  # Default scanner execution

        # print(f"Debug Output Scanner Result: {result}")

        if result["activated"]:  # Check if the scanner was triggered
            detected = True
            triggered_scanners.append(result)  # Track which scanner triggered

    return detected, triggered_scanners


# Example usage with a query engine response
# scanners = [detect_and_anonymize_pii]
# response = query_engine.query("Give me account name of Peter Kelly and Role and Credit Card Number")
# detected, triggered_scanners = OutputScanner(str(response), scanners)
# print(triggered_scanners)
def OutputScanner(response, query, context, listOfScanners):
    """
    Runs all scanners on the response and returns:
    - True if any scanner detects a threat.
    - A list of results from scanners that returned True.
    """
    detected = False  # Track if any scanner detects a threat
    triggered_scanners = []  # Store results from triggered scanners

    # Run each scanner on the response
    for scanner in listOfScanners:
        # Check if scanner is `evaluate_rag_response` (which needs query & context)
        if scanner.__name__ == "evaluate_rag_response":
            result = scanner(
                response, query, context
            )  # Execute with query & context
        else:
            result = scanner(response)  # Default scanner execution

        # print(f"Debug Output Scanner Result: {result}")

        if result["activated"]:  # Check if the scanner was triggered
            detected = True
            triggered_scanners.append(result)  # Track which scanner triggered

    return detected, triggered_scanners


# Example usage with a query engine response
# scanners = [detect_and_anonymize_pii]
# response = query_engine.query("Give me account name of Peter Kelly and Role and Credit Card Number")
# detected, triggered_scanners = OutputScanner(str(response), scanners)
# print(triggered_scanners)

自定义多模态查询引擎¶

该自定义查询引擎扩展了基于检索的标准架构，能够同时处理文本和图像数据，从而生成更全面且具备上下文感知能力的响应。它整合了多模态推理机制，并采用先进的输入输出验证技术以确保查询处理的稳健性。

核心特性：¶

多模态支持：
- 融合文本与图像数据，生成信息更完备、准确性更高的响应。
输入输出验证：
- 扫描输入查询中的敏感或无效内容，必要时进行拦截。
- 对生成响应进行验证与净化，确保符合预设规则。
上下文感知提示：
- 检索相关数据并构建查询的上下文字符串。
- 利用该上下文指导响应合成过程。
元数据与日志记录：
- 追踪查询处理全过程（包括所有验证与调整操作），保障透明度并支持调试。

工作原理：¶

扫描输入查询以检测违规内容。
为查询检索相关的文本和图像数据。
结合文本与视觉上下文合成响应。
返回响应前验证其内容合规性。

In [ ]:

Copied!





from llama_index.core.query_engine import (
    CustomQueryEngine,
    SimpleMultiModalQueryEngine,
)
from llama_index.core.retrievers import BaseRetriever
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.schema import ImageNode, NodeWithScore, MetadataMode
from llama_index.core.prompts import PromptTemplate
from llama_index.core.base.response.schema import Response

from typing import List, Callable, Optional
from pydantic import Field


QA_PROMPT_TMPL = """\
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query if it is related to the context. 
If the query is not related to the context, respond with:
"I'm sorry, but I can't help with that."

Query: {query_str}
Answer: """

QA_PROMPT = PromptTemplate(QA_PROMPT_TMPL)


class MultimodalQueryEngine(CustomQueryEngine):
    """Custom multimodal Query Engine.

    Takes in a retriever to retrieve a set of document nodes.
    Also takes in a prompt template and multimodal model.

    """

    qa_prompt: PromptTemplate
    retriever: BaseRetriever
    multi_modal_llm: GeminiMultiModal
    input_scanners: List[Callable[[str], dict]] = Field(default_factory=list)
    output_scanners: List[Callable[[str], dict]] = Field(default_factory=list)

    def __init__(
        self, qa_prompt: Optional[PromptTemplate] = None, **kwargs
    ) -> None:
        """Initialize."""
        super().__init__(qa_prompt=qa_prompt or QA_PROMPT, **kwargs)

    def custom_query(self, query_str: str):
        query_metadata = {
            "input_scanners": [],
            "output_scanners": [],
            "retrieved_nodes": [],
            "response_status": "success",
        }

        input_detected, input_triggered = InputScanner(
            query_str, self.input_scanners
        )
        if input_triggered:
            # print("Triggered Input Scanners:", input_triggered)
            # Log triggered input scanners in metadata
            query_metadata["input_scanners"] = input_triggered
            # If input contains sensitive information, block the query
            if input_detected:
                return Response(
                    response="I'm sorry, but I can't help with that.",
                    source_nodes=[],
                    metadata={
                        "guardrail": "Input Scanner",
                        "triggered_scanners": input_triggered,
                        "response_status": "blocked",
                    },
                )

        # retrieve text nodes
        nodes = self.retriever.retrieve(query_str)
        # create ImageNode items from text nodes
        image_nodes = [
            NodeWithScore(node=ImageNode(image_path=n.metadata["image_path"]))
            for n in nodes
        ]

        # create context string from text nodes, dump into the prompt
        context_str = "\n\n".join(
            [r.get_content(metadata_mode=MetadataMode.LLM) for r in nodes]
        )
        fmt_prompt = self.qa_prompt.format(
            context_str=context_str, query_str=query_str
        )

        # synthesize an answer from formatted text and images
        llm_response = self.multi_modal_llm.complete(
            prompt=fmt_prompt,
            image_documents=[image_node.node for image_node in image_nodes],
        )

        # Step 5: Run Output Scanners
        output_detected, output_triggered = OutputScanner(
            str(llm_response),
            str(query_str),
            str(context_str),
            self.output_scanners,
        )
        if output_triggered:
            # print("Triggered Output Scanners:", output_triggered)
            query_metadata[
                "output_scanners"
            ] = output_triggered  # Store output scanner info

        final_response = str(llm_response)
        if output_detected:
            final_response = "I'm sorry, but I can't help with that."
            query_metadata["response_status"] = "sanitized"
        # Return the response with detailed metadata
        return Response(
            response=final_response,
            source_nodes=nodes,
            metadata=query_metadata,
        )
from llama_index.core.query_engine import (
    CustomQueryEngine,
    SimpleMultiModalQueryEngine,
)
from llama_index.core.retrievers import BaseRetriever
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.schema import ImageNode, NodeWithScore, MetadataMode
from llama_index.core.prompts import PromptTemplate
from llama_index.core.base.response.schema import Response

from typing import List, Callable, Optional
from pydantic import Field


QA_PROMPT_TMPL = """\
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query if it is related to the context. 
If the query is not related to the context, respond with:
"I'm sorry, but I can't help with that."

Query: {query_str}
Answer: """

QA_PROMPT = PromptTemplate(QA_PROMPT_TMPL)


class MultimodalQueryEngine(CustomQueryEngine):
    """Custom multimodal Query Engine.

    Takes in a retriever to retrieve a set of document nodes.
    Also takes in a prompt template and multimodal model.

    """

    qa_prompt: PromptTemplate
    retriever: BaseRetriever
    multi_modal_llm: GeminiMultiModal
    input_scanners: List[Callable[[str], dict]] = Field(default_factory=list)
    output_scanners: List[Callable[[str], dict]] = Field(default_factory=list)

    def __init__(
        self, qa_prompt: Optional[PromptTemplate] = None, **kwargs
    ) -> None:
        """Initialize."""
        super().__init__(qa_prompt=qa_prompt or QA_PROMPT, **kwargs)

    def custom_query(self, query_str: str):
        query_metadata = {
            "input_scanners": [],
            "output_scanners": [],
            "retrieved_nodes": [],
            "response_status": "success",
        }

        input_detected, input_triggered = InputScanner(
            query_str, self.input_scanners
        )
        if input_triggered:
            # print("Triggered Input Scanners:", input_triggered)
            # Log triggered input scanners in metadata
            query_metadata["input_scanners"] = input_triggered
            # If input contains sensitive information, block the query
            if input_detected:
                return Response(
                    response="I'm sorry, but I can't help with that.",
                    source_nodes=[],
                    metadata={
                        "guardrail": "Input Scanner",
                        "triggered_scanners": input_triggered,
                        "response_status": "blocked",
                    },
                )

        # retrieve text nodes
        nodes = self.retriever.retrieve(query_str)
        # create ImageNode items from text nodes
        image_nodes = [
            NodeWithScore(node=ImageNode(image_path=n.metadata["image_path"]))
            for n in nodes
        ]

        # create context string from text nodes, dump into the prompt
        context_str = "\n\n".join(
            [r.get_content(metadata_mode=MetadataMode.LLM) for r in nodes]
        )
        fmt_prompt = self.qa_prompt.format(
            context_str=context_str, query_str=query_str
        )

        # synthesize an answer from formatted text and images
        llm_response = self.multi_modal_llm.complete(
            prompt=fmt_prompt,
            image_documents=[image_node.node for image_node in image_nodes],
        )

        # Step 5: Run Output Scanners
        output_detected, output_triggered = OutputScanner(
            str(llm_response),
            str(query_str),
            str(context_str),
            self.output_scanners,
        )
        if output_triggered:
            # print("Triggered Output Scanners:", output_triggered)
            query_metadata[
                "output_scanners"
            ] = output_triggered  # Store output scanner info

        final_response = str(llm_response)
        if output_detected:
            final_response = "I'm sorry, but I can't help with that."
            query_metadata["response_status"] = "sanitized"
        # Return the response with detailed metadata
        return Response(
            response=final_response,
            source_nodes=nodes,
            metadata=query_metadata,
        )

输入输出扫描器配置¶

您可以将需要用于保护 RAG 的扫描器放置于此

In [ ]:

Copied!

input_scanners = [guardrail_toxicLanguage, guardrail_tokenlimit]
output_scanners = [guardrail_toxicLanguage]
input_scanners = [guardrail_toxicLanguage, guardrail_tokenlimit]
output_scanners = [guardrail_toxicLanguage]

In [ ]:

Copied!





query_engine = MultimodalQueryEngine(
    retriever=index.as_retriever(similarity_top_k=9),
    multi_modal_llm=gemini_multimodal,
    input_scanners=input_scanners,
    output_scanners=output_scanners,
)
query_engine = MultimodalQueryEngine(
    retriever=index.as_retriever(similarity_top_k=9),
    multi_modal_llm=gemini_multimodal,
    input_scanners=input_scanners,
    output_scanners=output_scanners,
)

尝试查询¶

让我们来试试查询功能。

In [ ]:

Copied!

query = "Tell me about the diverse geographies where Conoco Phillips has a production base"
response = query_engine.query(query)
query = "Tell me about the diverse geographies where Conoco Phillips has a production base"
response = query_engine.query(query)

Prompt: Tell me about the diverse geographies where Conoco Phillips has a production base
2024-12-03 17:43:08 [debug    ] Initialized classification model device=device(type='mps') model=Model(path='unitary/unbiased-toxic-roberta', subfolder='', revision='36295dd80b422dc49f40052021430dae76241adc', onnx_path='ProtectAI/unbiased-toxic-roberta-onnx', onnx_revision='34480fa958f6657ad835c345808475755b6974a7', onnx_subfolder='', onnx_filename='model.onnx', kwargs={}, pipeline_kwargs={'batch_size': 1, 'device': device(type='mps'), 'padding': 'max_length', 'top_k': None, 'function_to_apply': 'sigmoid', 'return_token_type_ids': False, 'max_length': 512, 'truncation': True}, tokenizer_kwargs={})
2024-12-03 17:43:09 [debug    ] Not toxicity found in the text results=[[{'label': 'toxicity', 'score': 0.00041448281263001263}, {'label': 'male', 'score': 0.00018738119979389012}, {'label': 'insult', 'score': 0.00011956175148952752}, {'label': 'female', 'score': 0.00011725842341547832}, {'label': 'psychiatric_or_mental_illness', 'score': 8.512590284226462e-05}, {'label': 'white', 'score': 7.451862620655447e-05}, {'label': 'christian', 'score': 5.6545581173850223e-05}, {'label': 'muslim', 'score': 5.644273551297374e-05}, {'label': 'black', 'score': 3.8606172893196344e-05}, {'label': 'obscene', 'score': 3.222753730369732e-05}, {'label': 'identity_attack', 'score': 3.1757666874909773e-05}, {'label': 'threat', 'score': 2.8462023692554794e-05}, {'label': 'jewish', 'score': 2.7872381906490773e-05}, {'label': 'homosexual_gay_or_lesbian', 'score': 2.5694836949696764e-05}, {'label': 'sexual_explicit', 'score': 1.859129588410724e-05}, {'label': 'severe_toxicity', 'score': 1.0931341876130318e-06}]]
2024-12-03 17:43:15 [debug    ] Prompt fits the maximum tokens num_tokens=15 threshold=400
Prompt: ConocoPhillips has a diverse production base across several geographic locations.  These include:

* **Alaska:**  The company has a significant presence in Alaska's conventional basins, including the Prudhoe Bay area, with a long history of production and existing infrastructure.  The Willow project is also located in Alaska.
* **Lower 48 (United States):**  ConocoPhillips operates extensively in the Lower 48 states, focusing on unconventional plays in the Permian Basin (Delaware and Midland Basins), Eagle Ford, and Bakken.
* **International:** The company has operations in other international locations, including Qatar (LNG), and previously had operations in the UK, Australia, Indonesia, and Canada (though some of these have been divested).  They also have a global marketing presence with offices in London, Singapore, Houston, Calgary, Beijing, and Tokyo.
2024-12-03 17:43:36 [debug    ] Initialized classification model device=device(type='mps') model=Model(path='unitary/unbiased-toxic-roberta', subfolder='', revision='36295dd80b422dc49f40052021430dae76241adc', onnx_path='ProtectAI/unbiased-toxic-roberta-onnx', onnx_revision='34480fa958f6657ad835c345808475755b6974a7', onnx_subfolder='', onnx_filename='model.onnx', kwargs={}, pipeline_kwargs={'batch_size': 1, 'device': device(type='mps'), 'padding': 'max_length', 'top_k': None, 'function_to_apply': 'sigmoid', 'return_token_type_ids': False, 'max_length': 512, 'truncation': True}, tokenizer_kwargs={})
2024-12-03 17:43:37 [debug    ] Not toxicity found in the text results=[[{'label': 'toxicity', 'score': 0.0003606641257647425}, {'label': 'male', 'score': 0.000291528704110533}, {'label': 'insult', 'score': 0.00011418585199862719}, {'label': 'psychiatric_or_mental_illness', 'score': 0.00011314846051391214}, {'label': 'female', 'score': 0.00010114537144545466}, {'label': 'white', 'score': 9.688278078101575e-05}, {'label': 'muslim', 'score': 6.954199488973245e-05}, {'label': 'christian', 'score': 5.551999493036419e-05}, {'label': 'black', 'score': 4.1746119677554816e-05}, {'label': 'identity_attack', 'score': 3.3705578971421346e-05}, {'label': 'homosexual_gay_or_lesbian', 'score': 3.157216633553617e-05}, {'label': 'obscene', 'score': 2.798157584038563e-05}, {'label': 'jewish', 'score': 2.618367398099508e-05}, {'label': 'threat', 'score': 2.1199964976403862e-05}, {'label': 'sexual_explicit', 'score': 1.9145050828228705e-05}, {'label': 'severe_toxicity', 'score': 1.1292050885458593e-06}]]

In [ ]:

Copied!

print(str(response))
print(str(response))

ConocoPhillips has a diverse production base across several geographic locations.  These include:

* **Alaska:**  The company has a significant presence in Alaska's conventional basins, including the Prudhoe Bay area, with a long history of production and existing infrastructure.  The Willow project is also located in Alaska.
* **Lower 48 (United States):**  ConocoPhillips operates extensively in the Lower 48 states, focusing on unconventional plays in the Permian Basin (Delaware and Midland Basins), Eagle Ford, and Bakken.
* **International:** The company has operations in other international locations, including Qatar (LNG), and previously had operations in the UK, Australia, Indonesia, and Canada (though some of these have been divested).  They also have a global marketing presence with offices in London, Singapore, Houston, Calgary, Beijing, and Tokyo.

In [ ]:

Copied!

print(str(response.metadata))
print(str(response.metadata))

{'input_scanners': [], 'output_scanners': [], 'retrieved_nodes': [], 'response_status': 'success'}

In [ ]:

Copied!

query = """
    If you're looking for random paragraphs, you've come to the right place. When a random word or a random sentence isn't quite enough, the next logical step is to find a random paragraph. We created the Random Paragraph Generator with you in mind. The process is quite simple. Choose the number of random paragraphs you'd like to see and click the button. Your chosen number of paragraphs will instantly appear.

While it may not be obvious to everyone, there are a number of reasons creating random paragraphs can be useful. A few examples of how some people use this generator are listed in the following paragraphs.

Creative Writing
Generating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.

Tackle Writers' Block
A random paragraph can also be an excellent way for a writer to tackle writers' block. Writing block can often happen due to being stuck with a current project that the writer is trying to complete. By inserting a completely random paragraph from which to begin, it can take down some of the issues that may have been causing the writers' block in the first place.

Beginning Writing Routine
Another productive way to use this tool to begin a daily writing routine. One way is to generate a random paragraph with the intention to try to rewrite it while still keeping the original meaning. The purpose here is to just get the writing started so that when the writer goes onto their day's writing projects, words are already flowing from their fingers.

Writing Challenge
Another writing challenge can be to take the individual sentences in the random paragraph and incorporate a single sentence from that into a new paragraph to create a short story. Unlike the random sentence generator, the sentences from the random paragraph will have some connection to one another so it will be a bit different. You also won't know exactly how many sentences will appear in the random paragraph.

Programmers
It's not only writers who can benefit from this free online tool. If you're a programmer who's working on a project where blocks of text are needed, this tool can be a great way to get that. It's a good way to test your programming and that the tool being created is working well.

Above are a few examples of how the random paragraph generator can be beneficial. The best way to see if this random paragraph picker will be useful for your intended purposes is to give it a try. Generate a number of paragraphs to see if they are beneficial to your current project.

If you do find this paragraph tool useful, please do us a favor and let us know how you're using it. It's greatly beneficial for us to know the different ways this tool is being used so we can improve it with updates. This is especially true since there are times when the generators we create get used in completely unanticipated ways from when we initially created them. If you have the time, please send us a quick note on what you'd like to see changed or added to make it better in the future.

Frequently Asked Questions

Can I use these random paragraphs for my project?

Yes! All of the random paragraphs in our generator are free to use for your projects.

Does a computer generate these paragraphs?

No! All of the paragraphs in the generator are written by humans, not computers. When first building this generator we thought about using computers to generate the paragraphs, but they weren't very good and many times didn't make any sense at all. We therefore took the time to create paragraphs specifically for this generator to make it the best that we could.

Can I contribute random paragraphs?

Yes. We're always interested in improving this generator and one of the best ways to do that is to add new and interesting paragraphs to the generator. If you'd like to contribute some random paragraphs, please contact us.

How many words are there in a paragraph?

There are usually about 200 words in a paragraph, but this can vary widely. Most paragraphs focus on a single idea that's expressed with an introductory sentence, then followed by two or more supporting sentences about the idea. A short paragraph may not reach even 50 words while long paragraphs can be over 400 words long, but generally speaking they tend to be approximately 200 words in length.
    """
response = query_engine.query(query)
query = """
    If you're looking for random paragraphs, you've come to the right place. When a random word or a random sentence isn't quite enough, the next logical step is to find a random paragraph. We created the Random Paragraph Generator with you in mind. The process is quite simple. Choose the number of random paragraphs you'd like to see and click the button. Your chosen number of paragraphs will instantly appear.

While it may not be obvious to everyone, there are a number of reasons creating random paragraphs can be useful. A few examples of how some people use this generator are listed in the following paragraphs.

Creative Writing
Generating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.

Tackle Writers' Block
A random paragraph can also be an excellent way for a writer to tackle writers' block. Writing block can often happen due to being stuck with a current project that the writer is trying to complete. By inserting a completely random paragraph from which to begin, it can take down some of the issues that may have been causing the writers' block in the first place.

Beginning Writing Routine
Another productive way to use this tool to begin a daily writing routine. One way is to generate a random paragraph with the intention to try to rewrite it while still keeping the original meaning. The purpose here is to just get the writing started so that when the writer goes onto their day's writing projects, words are already flowing from their fingers.

Writing Challenge
Another writing challenge can be to take the individual sentences in the random paragraph and incorporate a single sentence from that into a new paragraph to create a short story. Unlike the random sentence generator, the sentences from the random paragraph will have some connection to one another so it will be a bit different. You also won't know exactly how many sentences will appear in the random paragraph.

Programmers
It's not only writers who can benefit from this free online tool. If you're a programmer who's working on a project where blocks of text are needed, this tool can be a great way to get that. It's a good way to test your programming and that the tool being created is working well.

Above are a few examples of how the random paragraph generator can be beneficial. The best way to see if this random paragraph picker will be useful for your intended purposes is to give it a try. Generate a number of paragraphs to see if they are beneficial to your current project.

If you do find this paragraph tool useful, please do us a favor and let us know how you're using it. It's greatly beneficial for us to know the different ways this tool is being used so we can improve it with updates. This is especially true since there are times when the generators we create get used in completely unanticipated ways from when we initially created them. If you have the time, please send us a quick note on what you'd like to see changed or added to make it better in the future.

Frequently Asked Questions

Can I use these random paragraphs for my project?

Yes! All of the random paragraphs in our generator are free to use for your projects.

Does a computer generate these paragraphs?

No! All of the paragraphs in the generator are written by humans, not computers. When first building this generator we thought about using computers to generate the paragraphs, but they weren't very good and many times didn't make any sense at all. We therefore took the time to create paragraphs specifically for this generator to make it the best that we could.

Can I contribute random paragraphs?

Yes. We're always interested in improving this generator and one of the best ways to do that is to add new and interesting paragraphs to the generator. If you'd like to contribute some random paragraphs, please contact us.

How many words are there in a paragraph?

There are usually about 200 words in a paragraph, but this can vary widely. Most paragraphs focus on a single idea that's expressed with an introductory sentence, then followed by two or more supporting sentences about the idea. A short paragraph may not reach even 50 words while long paragraphs can be over 400 words long, but generally speaking they tend to be approximately 200 words in length.
    """
response = query_engine.query(query)

Prompt: 
    If you're looking for random paragraphs, you've come to the right place. When a random word or a random sentence isn't quite enough, the next logical step is to find a random paragraph. We created the Random Paragraph Generator with you in mind. The process is quite simple. Choose the number of random paragraphs you'd like to see and click the button. Your chosen number of paragraphs will instantly appear.

While it may not be obvious to everyone, there are a number of reasons creating random paragraphs can be useful. A few examples of how some people use this generator are listed in the following paragraphs.

Creative Writing
Generating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.

Tackle Writers' Block
A random paragraph can also be an excellent way for a writer to tackle writers' block. Writing block can often happen due to being stuck with a current project that the writer is trying to complete. By inserting a completely random paragraph from which to begin, it can take down some of the issues that may have been causing the writers' block in the first place.

Beginning Writing Routine
Another productive way to use this tool to begin a daily writing routine. One way is to generate a random paragraph with the intention to try to rewrite it while still keeping the original meaning. The purpose here is to just get the writing started so that when the writer goes onto their day's writing projects, words are already flowing from their fingers.

Writing Challenge
Another writing challenge can be to take the individual sentences in the random paragraph and incorporate a single sentence from that into a new paragraph to create a short story. Unlike the random sentence generator, the sentences from the random paragraph will have some connection to one another so it will be a bit different. You also won't know exactly how many sentences will appear in the random paragraph.

Programmers
It's not only writers who can benefit from this free online tool. If you're a programmer who's working on a project where blocks of text are needed, this tool can be a great way to get that. It's a good way to test your programming and that the tool being created is working well.

Above are a few examples of how the random paragraph generator can be beneficial. The best way to see if this random paragraph picker will be useful for your intended purposes is to give it a try. Generate a number of paragraphs to see if they are beneficial to your current project.

If you do find this paragraph tool useful, please do us a favor and let us know how you're using it. It's greatly beneficial for us to know the different ways this tool is being used so we can improve it with updates. This is especially true since there are times when the generators we create get used in completely unanticipated ways from when we initially created them. If you have the time, please send us a quick note on what you'd like to see changed or added to make it better in the future.

Frequently Asked Questions

Can I use these random paragraphs for my project?

Yes! All of the random paragraphs in our generator are free to use for your projects.

Does a computer generate these paragraphs?

No! All of the paragraphs in the generator are written by humans, not computers. When first building this generator we thought about using computers to generate the paragraphs, but they weren't very good and many times didn't make any sense at all. We therefore took the time to create paragraphs specifically for this generator to make it the best that we could.

Can I contribute random paragraphs?

Yes. We're always interested in improving this generator and one of the best ways to do that is to add new and interesting paragraphs to the generator. If you'd like to contribute some random paragraphs, please contact us.

How many words are there in a paragraph?

There are usually about 200 words in a paragraph, but this can vary widely. Most paragraphs focus on a single idea that's expressed with an introductory sentence, then followed by two or more supporting sentences about the idea. A short paragraph may not reach even 50 words while long paragraphs can be over 400 words long, but generally speaking they tend to be approximately 200 words in length.
    
2024-12-03 17:43:42 [debug    ] Initialized classification model device=device(type='mps') model=Model(path='unitary/unbiased-toxic-roberta', subfolder='', revision='36295dd80b422dc49f40052021430dae76241adc', onnx_path='ProtectAI/unbiased-toxic-roberta-onnx', onnx_revision='34480fa958f6657ad835c345808475755b6974a7', onnx_subfolder='', onnx_filename='model.onnx', kwargs={}, pipeline_kwargs={'batch_size': 1, 'device': device(type='mps'), 'padding': 'max_length', 'top_k': None, 'function_to_apply': 'sigmoid', 'return_token_type_ids': False, 'max_length': 512, 'truncation': True}, tokenizer_kwargs={})
2024-12-03 17:43:42 [debug    ] Not toxicity found in the text results=[[{'label': 'toxicity', 'score': 0.0011976422974839807}, {'label': 'insult', 'score': 0.00045695927110500634}, {'label': 'male', 'score': 0.00018701529188547283}, {'label': 'psychiatric_or_mental_illness', 'score': 0.00014795312017668039}, {'label': 'white', 'score': 9.39662495511584e-05}, {'label': 'female', 'score': 7.459904009010643e-05}, {'label': 'obscene', 'score': 6.114380084909499e-05}, {'label': 'threat', 'score': 5.259696990833618e-05}, {'label': 'muslim', 'score': 4.745226033264771e-05}, {'label': 'identity_attack', 'score': 3.541662226780318e-05}, {'label': 'black', 'score': 3.5083121474599466e-05}, {'label': 'christian', 'score': 3.272023604949936e-05}, {'label': 'sexual_explicit', 'score': 3.164245936204679e-05}, {'label': 'jewish', 'score': 1.5377421732409857e-05}, {'label': 'homosexual_gay_or_lesbian', 'score': 1.5361225450760685e-05}, {'label': 'severe_toxicity', 'score': 1.3027844261159771e-06}]]
2024-12-03 17:43:46 [warning  ] Prompt is too big. Splitting into chunks chunks=["\n    If you're looking for random paragraphs, you've come to the right place. When a random word or a random sentence isn't quite enough, the next logical step is to find a random paragraph. We created the Random Paragraph Generator with you in mind. The process is quite simple. Choose the number of random paragraphs you'd like to see and click the button. Your chosen number of paragraphs will instantly appear.\n\nWhile it may not be obvious to everyone, there are a number of reasons creating random paragraphs can be useful. A few examples of how some people use this generator are listed in the following paragraphs.\n\nCreative Writing\nGenerating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.\n\nTackle Writers' Block\nA random paragraph can also be an excellent way for a writer to tackle writers' block. Writing block can often happen due to being stuck with a current project that the writer is trying to complete. By inserting a completely random paragraph from which to begin, it can take down some of the issues that may have been causing the writers' block in the first place.\n\nBeginning Writing Routine\nAnother productive way to use this tool to begin a daily writing routine. One way is to generate a random paragraph with the intention to try to rewrite it while still keeping the original meaning. The purpose here is to just get the writing started so that when the writer goes onto their day's writing", " projects, words are already flowing from their fingers.\n\nWriting Challenge\nAnother writing challenge can be to take the individual sentences in the random paragraph and incorporate a single sentence from that into a new paragraph to create a short story. Unlike the random sentence generator, the sentences from the random paragraph will have some connection to one another so it will be a bit different. You also won't know exactly how many sentences will appear in the random paragraph.\n\nProgrammers\nIt's not only writers who can benefit from this free online tool. If you're a programmer who's working on a project where blocks of text are needed, this tool can be a great way to get that. It's a good way to test your programming and that the tool being created is working well.\n\nAbove are a few examples of how the random paragraph generator can be beneficial. The best way to see if this random paragraph picker will be useful for your intended purposes is to give it a try. Generate a number of paragraphs to see if they are beneficial to your current project.\n\nIf you do find this paragraph tool useful, please do us a favor and let us know how you're using it. It's greatly beneficial for us to know the different ways this tool is being used so we can improve it with updates. This is especially true since there are times when the generators we create get used in completely unanticipated ways from when we initially created them. If you have the time, please send us a quick note on what you'd like to see changed or added to make it better in the future.\n\nFrequently Asked Questions\n\nCan I use these random paragraphs for my project?\n\nYes! All of the random paragraphs in our generator are free to use for your projects.\n\nDoes a computer generate these paragraphs?\n\nNo! All of the paragraphs in the generator are written by humans, not computers. When first building this generator we thought about using computers to generate the paragraphs, but they weren't very good and many times didn't make any sense at all", ". We therefore took the time to create paragraphs specifically for this generator to make it the best that we could.\n\nCan I contribute random paragraphs?\n\nYes. We're always interested in improving this generator and one of the best ways to do that is to add new and interesting paragraphs to the generator. If you'd like to contribute some random paragraphs, please contact us.\n\nHow many words are there in a paragraph?\n\nThere are usually about 200 words in a paragraph, but this can vary widely. Most paragraphs focus on a single idea that's expressed with an introductory sentence, then followed by two or more supporting sentences about the idea. A short paragraph may not reach even 50 words while long paragraphs can be over 400 words long, but generally speaking they tend to be approximately 200 words in length.\n    "] num_tokens=961

In [ ]:

Copied!

print(str(response))
print(str(response))

I'm sorry, but I can't help with that.

In [ ]:

Copied!

print(str(response.metadata))
print(str(response.metadata))

{'guardrail': 'Input Scanner', 'triggered_scanners': [{'guardrail_type': 'Token limit', 'activated': True, 'guardrail_detail': {'guard_output': "\n    If you're looking for random paragraphs, you've come to the right place. When a random word or a random sentence isn't quite enough, the next logical step is to find a random paragraph. We created the Random Paragraph Generator with you in mind. The process is quite simple. Choose the number of random paragraphs you'd like to see and click the button. Your chosen number of paragraphs will instantly appear.\n\nWhile it may not be obvious to everyone, there are a number of reasons creating random paragraphs can be useful. A few examples of how some people use this generator are listed in the following paragraphs.\n\nCreative Writing\nGenerating random paragraphs can be an excellent way for writers to get their creative flow going at the beginning of the day. The writer has no idea what topic the random paragraph will be about when it appears. This forces the writer to use creativity to complete one of three common writing challenges. The writer can use the paragraph as the first one of a short story and build upon it. A second option is to use the random paragraph somewhere in a short story they create. The third option is to have the random paragraph be the ending paragraph in a short story. No matter which of these challenges is undertaken, the writer is forced to use creativity to incorporate the paragraph into their writing.\n\nTackle Writers' Block\nA random paragraph can also be an excellent way for a writer to tackle writers' block. Writing block can often happen due to being stuck with a current project that the writer is trying to complete. By inserting a completely random paragraph from which to begin, it can take down some of the issues that may have been causing the writers' block in the first place.\n\nBeginning Writing Routine\nAnother productive way to use this tool to begin a daily writing routine. One way is to generate a random paragraph with the intention to try to rewrite it while still keeping the original meaning. The purpose here is to just get the writing started so that when the writer goes onto their day's writing", 'is_valid': False, 'risk_score/threshold': '1.0/400', 'response_text': "This text describes a random paragraph generator and its various uses.  Here's a summary broken down by section:\n\n**Introduction:** The text introduces a random paragraph generator, highlighting its simplicity and ease of use.\n\n**Uses of the Generator:**  The core of the text details how the generator can be beneficial in several contexts:\n\n* **Creative Writing:**  It aids writers in overcoming writer's block, sparking creativity, and providing starting points or endings for short stories.  Three specific challenges are suggested: using the paragraph as the beginning, middle, or end of a story.\n\n* **Tackling Writer's Block:** The random paragraph acts as a disruption to overcome creative stagnation.\n\n* **Beginning a Writing Routine:** It helps initiate the writing process by providing a text to rewrite or use as inspiration.\n\n* **Writing Challenges:** The paragraph's sentences can be individually incorporated into new writing projects.\n\n* **Programmers:** The generator provides useful blocks of text for testing purposes in software development.\n\n\n**Call to Action & Feedback:** The authors encourage users to try the generator, provide feedback, and contribute to its improvement by suggesting additions or changes.\n\n**Frequently Asked Questions (FAQ):**  The FAQ section addresses common questions about the generator, clarifying:\n\n* **Usage rights:**  The paragraphs are free to use.\n* **Paragraph generation:**  Human-written paragraphs are used, not computer-generated ones.\n* **Contribution:** Users can contribute their own paragraphs.\n* **Paragraph length:** Paragraphs are roughly 200 words but can vary significantly.\n\n\nIn essence, the text is a well-structured promotional piece for a random paragraph generator, emphasizing its versatility and usefulness for both writers and programmers, while encouraging user engagement and participation.\n"}}], 'response_status': 'blocked'}

Multimodal rag guardrail gemini llmguard llmguard

基于LLM-GUARD防护机制的多模态RAG流程¶

注意：¶

扩展：¶

流程概览¶

核心特性：¶

为何需要多模态RAG防护机制？¶

架构概述¶

1. 输入扫描器¶

2. 定制查询引擎¶

3. 多模态大语言模型¶

防护机制工作流¶

输入验证¶

检索阶段¶

输出验证¶

带防护机制的多模态RAG优势¶

安装¶

配置可观测性¶

加载数据¶

安装依赖¶

模型设置¶

使用 LlamaParse 解析文本与图像¶

构建多模态索引¶

获取文本节点¶

构建索引¶

构建安全护栏¶

添加 Guardrail 扫描器¶

InputScanner - OutputScanner 函数¶

参数：¶

返回值：¶

关键步骤：¶

自定义多模态查询引擎¶

核心特性：¶

工作原理：¶

输入输出扫描器配置¶

尝试查询¶

`InputScanner` - `OutputScanner` 函数¶