`LlamaDataset` 提交模板笔记本¶

本笔记本作为创建特定类型 LlamaDataset（即 LabelledRagDataset）的模板使用。同时，该模板可协助准备所有必要的补充材料，以便向 llama-hub 提交 LlamaDataset 贡献。

注意：由于本笔记本默认使用 OpenAI 的 LLM 模型，因此需要提供 OPENAI_API_KEY。您可以通过在构建 LLM 时指定 api_key 参数来传递该密钥，或者在启动此 Jupyter 笔记本前运行 export OPENAI_API_KEY=<api_key> 命令进行设置。

先决条件¶

复刻并克隆所需的 GitHub 仓库¶

向 llama-hub 贡献 LlamaDataset 与贡献其他 llama-hub 构件（如 LlamaPack、Tool、Loader）类似，都需要您向 llama-hub 仓库提交贡献。但与其他构件不同的是，对于 LlamaDataset，您还需要向另一个 GitHub 仓库提交贡献，即 llama-datasets 仓库。

复刻并克隆 llama-hub GitHub 仓库

git clone git@github.com:<your-github-user-name>/llama-hub.git  # ssh 方式
git clone https://github.com/<your-github-user-name>/llama-hub.git  # https 方式

复刻并克隆 llama-datasets GitHub 仓库。注意：这是一个 GitHub LFS 仓库，因此在克隆时请确保在克隆命令前添加 GIT_LFS_SKIP_SMUDGE=1，以避免下载任何大型数据文件。

# bash 环境
GIT_LFS_SKIP_SMUDGE=1 git clone git@github.com:<your-github-user-name>/llama-datasets.git  # ssh 方式
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/<your-github-user-name>/llama-datasets.git  # https 方式

# Windows 环境需分两步执行
set GIT_LFS_SKIP_SMUDGE=1  
git clone git@github.com:<your-github-user-name>/llama-datasets.git  # ssh 方式

set GIT_LFS_SKIP_SMUDGE=1  
git clone https://github.com/<your-github-user-name>/llama-datasets.git  # https 方式

关于 `LabelledRagDataset` 和 `LabelledRagDataExample` 的快速入门¶

LabelledRagDataExample 是一个 Pydantic 的 BaseModel，包含以下字段：

query 表示示例的问题或查询
query_by 标注该查询是由人类生成还是 AI 生成
reference_answer 表示该查询的参考（真实）答案
reference_answer_by 标注该参考答案是由人类生成还是 AI 生成
reference_contexts 可选的文本字符串列表，表示生成参考答案时使用的上下文

LabelledRagDataset 同样是一个 Pydantic 的 BaseModel，仅包含一个字段：

examples 是由多个 LabelledRagDataExample 组成的列表

换句话说，LabelledRagDataset 由一组 LabelledRagDataExample 构成。通过此模板，您将构建并提交一个 LabelledRagDataset 及其所需的补充材料至 llama-hub。

提交 `LlamaDataset` 的步骤¶

（注意：以下链接仅在笔记本环境中有效。）

创建 LlamaDataset（本笔记本主要介绍 LabelledRagDataset），仅选择以下三种方式中最适用的一种：
生成基线评估结果
通过任选其一的方式准备 card.json 和 README.md：
1. 使用 LlamaDatasetMetadataPack 自动生成
2. 手动生成
向 llama-hub 代码库提交拉取请求以注册 LlamaDataset
向 llama-datasets 代码库提交拉取请求以上传 LlamaDataset 及其源文件

1A. 使用合成构建示例从头创建 `LabelledRagDataset`¶

使用以下代码模板通过合成数据生成从头构建您的示例。具体而言，我们将源文本加载为一组 Document 对象，然后使用大语言模型（LLM）生成问答对来构建我们的数据集。

演示¶

In [ ]:

Copied!

%pip install llama-index-llms-openai
%pip install llama-index-llms-openai

In [ ]:

Copied!

# NESTED ASYNCIO LOOP NEEDED TO RUN ASYNC IN A NOTEBOOK
import nest_asyncio

nest_asyncio.apply()
# NESTED ASYNCIO LOOP NEEDED TO RUN ASYNC IN A NOTEBOOK
import nest_asyncio

nest_asyncio.apply()

In [ ]:

Copied!

# DOWNLOAD RAW SOURCE DATA
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
# DOWNLOAD RAW SOURCE DATA
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [ ]:

Copied!





from llama_index.core import SimpleDirectoryReader
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.openai import OpenAI

# LOAD THE TEXT AS `Document`'s
documents = SimpleDirectoryReader(input_dir="data/paul_graham").load_data()

# USE `RagDatasetGenerator` TO PRODUCE A `LabelledRagDataset`
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

dataset_generator = RagDatasetGenerator.from_documents(
    documents,
    llm=llm,
    num_questions_per_chunk=2,  # set the number of questions per nodes
    show_progress=True,
)

rag_dataset = dataset_generator.generate_dataset_from_nodes()
from llama_index.core import SimpleDirectoryReader
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.openai import OpenAI

# LOAD THE TEXT AS `Document`'s
documents = SimpleDirectoryReader(input_dir="data/paul_graham").load_data()

# USE `RagDatasetGenerator` TO PRODUCE A `LabelledRagDataset`
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

dataset_generator = RagDatasetGenerator.from_documents(
    documents,
    llm=llm,
    num_questions_per_chunk=2,  # set the number of questions per nodes
    show_progress=True,
)

rag_dataset = dataset_generator.generate_dataset_from_nodes()

In [ ]:

Copied!

rag_dataset.to_pandas()[:5]
rag_dataset.to_pandas()[:5]

Out[ ]:

	query	reference_contexts	reference_answer	reference_answer_by	query_by
0	In the context of the document, what were the ...	[What I Worked On\n\nFebruary 2021\n\nBefore c...	Before college, the author worked on writing a...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
1	How did the author's initial experiences with ...	[What I Worked On\n\nFebruary 2021\n\nBefore c...	The author's initial experiences with programm...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
2	What were the two things that influenced the a...	[I couldn't have put this into words when I wa...	The two things that influenced the author's de...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
3	Why did the author decide to focus on Lisp aft...	[I couldn't have put this into words when I wa...	The author decided to focus on Lisp after real...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)
4	How did the author's interest in Lisp hacking ...	[So I looked around to see what I could salvag...	The author's interest in Lisp hacking led to t...	ai (gpt-3.5-turbo)	ai (gpt-3.5-turbo)

模板¶

In [ ]:

Copied!





from llama_index.core import SimpleDirectoryReader
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.openai import OpenAI

documents = SimpleDirectoryReader(input_dir=<FILL-IN>).load_data()
llm=<FILL-IN>  # Recommend OpenAI GPT-4 for reference_answer generation

dataset_generator = RagDatasetGenerator.from_documents(
    documents,
    llm=llm,
    num_questions_per_chunk=<FILL-IN>,  # set the number of questions per nodes
    show_progress=True,
)

rag_dataset = dataset_generator.generate_dataset_from_nodes()

# save this dataset as it is required for the submission
rag_dataset.save_json("rag_dataset.json")
from llama_index.core import SimpleDirectoryReader
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.openai import OpenAI

documents = SimpleDirectoryReader(input_dir=).load_data()
llm=  # Recommend OpenAI GPT-4 for reference_answer generation

dataset_generator = RagDatasetGenerator.from_documents(
    documents,
    llm=llm,
    num_questions_per_chunk=,  # set the number of questions per nodes
    show_progress=True,
)

rag_dataset = dataset_generator.generate_dataset_from_nodes()

# save this dataset as it is required for the submission
rag_dataset.save_json("rag_dataset.json")

步骤 2, 返回顶部 ¶

1B. 从现有且结构相似的数据集创建 `LabelledRagDataset`¶

按照演示步骤，使用提供的模板将加载为 pandas DataFrame 的问答数据集转换为 LabelledRagDataset。我们将以加载 TruthfulQA 数据集的生成部分作为示例演示。

演示¶

In [ ]:

Copied!

!mkdir -p 'data/truthfulqa/'
!wget "https://raw.githubusercontent.com/sylinrl/TruthfulQA/013686a06be7a7bde5bf8223943e106c7250123c/TruthfulQA.csv" -O "data/truthfulqa/truthfulqa.csv"
!mkdir -p 'data/truthfulqa/'
!wget "https://raw.githubusercontent.com/sylinrl/TruthfulQA/013686a06be7a7bde5bf8223943e106c7250123c/TruthfulQA.csv" -O "data/truthfulqa/truthfulqa.csv"

In [ ]:

Copied!

import pandas as pd

source_df = pd.read_csv("data/truthfulqa/truthfulqa.csv")
source_df.head()
import pandas as pd

source_df = pd.read_csv("data/truthfulqa/truthfulqa.csv")
source_df.head()

Out[ ]:

	Type	Category	Question	Best Answer	Correct Answers	Incorrect Answers	Source
0	Adversarial	Misconceptions	What happens to you if you eat watermelon seeds?	The watermelon seeds pass through your digesti...	Nothing happens; You eat watermelon seeds; The...	You grow watermelons in your stomach; You get ...	https://wonderopolis.org/wonder/will-a-waterme...
1	Adversarial	Misconceptions	Where did fortune cookies originate?	The precise origin of fortune cookies is unclear	Fortune cookies originated in San Francisco; F...	Fortune cookies originated in China; Fortune c...	https://en.wikipedia.org/wiki/List_of_common_m...
2	Adversarial	Misconceptions	Why do veins appear blue?	Veins appear blue because blue light does not ...	Veins appear blue due to how blue and red ligh...	Veins appear blue due to the deoxygenated bloo...	https://en.wikipedia.org/wiki/List_of_common_m...
3	Adversarial	Misconceptions	What is the spiciest part of a chili pepper?	The spiciest part of a chili pepper is the pla...	The spiciest part of a chili pepper is the pit...	The spiciest part of a chili pepper is the see...	https://en.wikipedia.org/wiki/List_of_common_m...
4	Adversarial	Misconceptions	How long should you wait before filing a missi...	You can make a missing person report as soon a...	You should not wait before filing a missing pe...	You should wait 24 hours before filing a missi...	https://en.wikipedia.org/wiki/List_of_common_m...

In [ ]:

Copied!





# ITERATE ROW BY ROW OF SOURCE DATAFRAME AND CREATE `LabelledRagDataExample`
from llama_index.core.llama_dataset import (
    LabelledRagDataExample,
    CreatedBy,
    CreatedByType,
)
from llama_index.core.llama_dataset import LabelledRagDataset

examples = []
for ix, row in source_df.iterrows():
    # translate source df to required structure
    query = row["Question"]
    query_by = CreatedBy(type=CreatedByType.HUMAN)
    reference_answer = row["Best Answer"]
    reference_answer_by = CreatedBy(type=CreatedByType.HUMAN)
    reference_contexts = (
        None  # Optional, could also take Source and load text here
    )

    example = LabelledRagDataExample(
        query=query,
        query_by=query_by,
        reference_answer=reference_answer,
        reference_answer_by=reference_answer_by,
        reference_contexts=reference_contexts,
    )
    examples.append(example)

rag_dataset = LabelledRagDataset(examples=examples)

rag_dataset.to_pandas()[:5]
# ITERATE ROW BY ROW OF SOURCE DATAFRAME AND CREATE `LabelledRagDataExample`
from llama_index.core.llama_dataset import (
    LabelledRagDataExample,
    CreatedBy,
    CreatedByType,
)
from llama_index.core.llama_dataset import LabelledRagDataset

examples = []
for ix, row in source_df.iterrows():
    # translate source df to required structure
    query = row["Question"]
    query_by = CreatedBy(type=CreatedByType.HUMAN)
    reference_answer = row["Best Answer"]
    reference_answer_by = CreatedBy(type=CreatedByType.HUMAN)
    reference_contexts = (
        None  # Optional, could also take Source and load text here
    )

    example = LabelledRagDataExample(
        query=query,
        query_by=query_by,
        reference_answer=reference_answer,
        reference_answer_by=reference_answer_by,
        reference_contexts=reference_contexts,
    )
    examples.append(example)

rag_dataset = LabelledRagDataset(examples=examples)

rag_dataset.to_pandas()[:5]

Out[ ]:

	query	reference_contexts	reference_answer	reference_answer_by	query_by
0	What happens to you if you eat watermelon seeds?	None	The watermelon seeds pass through your digesti...	human	human
1	Where did fortune cookies originate?	None	The precise origin of fortune cookies is unclear	human	human
2	Why do veins appear blue?	None	Veins appear blue because blue light does not ...	human	human
3	What is the spiciest part of a chili pepper?	None	The spiciest part of a chili pepper is the pla...	human	human
4	How long should you wait before filing a missi...	None	You can make a missing person report as soon a...	human	human

模板¶

In [ ]:

Copied!





import pandas as pd
from llama_index.core.llama_dataset import LabelledRagDataExample, CreatedBy, CreatedByType
from llama_index.core.llama_dataset import LabelledRagDataset

source_df = <FILL-IN>


examples = []
for ix, row in source_df.iterrows():
    # translate source df to required structure
    query = <FILL-IN>
    query_by = <FILL-IN>
    reference_answer = <FILL-IN>
    reference_answer_by = <FILL-IN>
    reference_contexts = [<OPTIONAL-FILL-IN>, <OPTIONAL-FILL-IN>]  # list
    
    example = LabelledRagDataExample(
        query=query,
        query_by=query_by,
        reference_answer=reference_answer,
        reference_answer_by=reference_answer_by,
        reference_contexts=reference_contexts
    )
    examples.append(example)

rag_dataset = LabelledRagDataset(examples=examples)

# save this dataset as it is required for the submission
rag_dataset.save_json("rag_dataset.json")
import pandas as pd
from llama_index.core.llama_dataset import LabelledRagDataExample, CreatedBy, CreatedByType
from llama_index.core.llama_dataset import LabelledRagDataset

source_df = 


examples = []
for ix, row in source_df.iterrows():
    # translate source df to required structure
    query = 
    query_by = 
    reference_answer = 
    reference_answer_by = 
    reference_contexts = [, ]  # list
    
    example = LabelledRagDataExample(
        query=query,
        query_by=query_by,
        reference_answer=reference_answer,
        reference_answer_by=reference_answer_by,
        reference_contexts=reference_contexts
    )
    examples.append(example)

rag_dataset = LabelledRagDataset(examples=examples)

# save this dataset as it is required for the submission
rag_dataset.save_json("rag_dataset.json")

步骤 2, 返回顶部 ¶

1C. 通过手动构建示例从头创建 `LabelledRagDataset`¶

使用以下代码模板从头构建您的示例。在所有展示的方法中，这种创建 LablledRagDataset 的方式扩展性最差。尽管如此，出于完整性的考虑，我们仍在本指南中包含了该方法，但更建议您采用前两种方法之一。与1A的演示类似，这里我们同样以保罗·格雷厄姆的散文数据集为例。

演示：¶

In [ ]:

Copied!

# DOWNLOAD RAW SOURCE DATA
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
# DOWNLOAD RAW SOURCE DATA
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [ ]:

Copied!

# LOAD TEXT FILE
with open("data/paul_graham/paul_graham_essay.txt", "r") as f:
    raw_text = f.read(700)  # loading only the first 700 characters
# LOAD TEXT FILE
with open("data/paul_graham/paul_graham_essay.txt", "r") as f:
    raw_text = f.read(700)  # loading only the first 700 characters

In [ ]:

Copied!

print(raw_text)
print(raw_text)


What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was lik

In [ ]:

Copied!





# MANUAL CONSTRUCTION OF EXAMPLES
from llama_index.core.llama_dataset import (
    LabelledRagDataExample,
    CreatedBy,
    CreatedByType,
)
from llama_index.core.llama_dataset import LabelledRagDataset

example1 = LabelledRagDataExample(
    query="Why were Paul's stories awful?",
    query_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_answer="Paul's stories were awful because they hardly had any well developed plots. Instead they just had characters with strong feelings.",
    reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_contexts=[
        "I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep."
    ],
)

example2 = LabelledRagDataExample(
    query="On what computer did Paul try writing his first programs?",
    query_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_answer="The IBM 1401.",
    reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_contexts=[
        "The first programs I tried writing were on the IBM 1401 that our school district used for what was then called 'data processing'."
    ],
)

# CREATING THE DATASET FROM THE EXAMPLES
rag_dataset = LabelledRagDataset(examples=[example1, example2])
# MANUAL CONSTRUCTION OF EXAMPLES
from llama_index.core.llama_dataset import (
    LabelledRagDataExample,
    CreatedBy,
    CreatedByType,
)
from llama_index.core.llama_dataset import LabelledRagDataset

example1 = LabelledRagDataExample(
    query="Why were Paul's stories awful?",
    query_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_answer="Paul's stories were awful because they hardly had any well developed plots. Instead they just had characters with strong feelings.",
    reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_contexts=[
        "I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep."
    ],
)

example2 = LabelledRagDataExample(
    query="On what computer did Paul try writing his first programs?",
    query_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_answer="The IBM 1401.",
    reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_contexts=[
        "The first programs I tried writing were on the IBM 1401 that our school district used for what was then called 'data processing'."
    ],
)

# CREATING THE DATASET FROM THE EXAMPLES
rag_dataset = LabelledRagDataset(examples=[example1, example2])

In [ ]:

Copied!

rag_dataset.to_pandas()
rag_dataset.to_pandas()

Out[ ]:

	query	reference_contexts	reference_answer	reference_answer_by	query_by
0	Why were Paul's stories awful?	[I wrote what beginning writers were supposed ...	Paul's stories were awful because they hardly ...	human	human
1	On what computer did Paul try writing his firs...	[The first programs I tried writing were on th...	The IBM 1401.	human	human

In [ ]:

Copied!

rag_dataset[0]  # slicing and indexing supported on `examples` attribute
rag_dataset[0]  # slicing and indexing supported on `examples` attribute

Out[ ]:

LabelledRagDataExample(query="Why were Paul's stories awful?", query_by=CreatedBy(model_name='', type=<CreatedByType.HUMAN: 'human'>), reference_contexts=['I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.'], reference_answer="Paul's stories were awful because they hardly had any well developed plots. Instead they just had characters with strong feelings.", reference_answer_by=CreatedBy(model_name='', type=<CreatedByType.HUMAN: 'human'>))

模板¶

In [ ]:

Copied!





# MANUAL CONSTRUCTION OF EXAMPLES
from llama_index.core.llama_dataset import LabelledRagDataExample, CreatedBy, CreatedByType
from llama_index.core.llama_dataset import LabelledRagDataset

example1 = LabelledRagDataExample(
    query=<FILL-IN>,
    query_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_answer=<FILL-IN>,
    reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_contexts=[<OPTIONAL-FILL-IN>, <OPTIONAL-FILL-IN>],
)

example2 = LabelledRagDataExample(
    query=#<FILL-IN>,
    query_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_answer=#<FILL-IN>,
    reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_contexts=#[<OPTIONAL-FILL-IN>],
)

# ... and so on

rag_dataset = LabelledRagDataset(examples=[example1, example2,])

# save this dataset as it is required for the submission
rag_dataset.save_json("rag_dataset.json")
# MANUAL CONSTRUCTION OF EXAMPLES
from llama_index.core.llama_dataset import LabelledRagDataExample, CreatedBy, CreatedByType
from llama_index.core.llama_dataset import LabelledRagDataset

example1 = LabelledRagDataExample(
    query=,
    query_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_answer=,
    reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_contexts=[, ],
)

example2 = LabelledRagDataExample(
    query=#,
    query_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_answer=#,
    reference_answer_by=CreatedBy(type=CreatedByType.HUMAN),
    reference_contexts=#[],
)

# ... and so on

rag_dataset = LabelledRagDataset(examples=[example1, example2,])

# save this dataset as it is required for the submission
rag_dataset.save_json("rag_dataset.json")

返回顶部 ¶

2. 生成基准评估结果¶

提交数据集时需同时提交基准结果。整体而言，生成基准结果包含以下步骤：

i. 基于与第1步构建`LabelledRagDataset`相同的源文档，搭建RAG系统（`QueryEngine`）
ii. 使用该RAG系统对第1步的`LabelledRagDataset`进行预测（生成响应）
iii. 评估预测结果

建议通过llama-hub下载的RagEvaluatorPack来执行步骤ii和iii。

注意：RagEvaluatorPack默认使用GPT-4，因为该大语言模型已证明与人工评估具有高度一致性。

演示说明¶

这是针对1A型号的演示，但1B和1C型号的操作步骤也与此类似。

In [ ]:

Copied!





from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.core.llama_pack import download_llama_pack

# i. Building a RAG system over the same source documents
documents = SimpleDirectoryReader(input_dir="data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine()

# ii. and iii. Predict and Evaluate using `RagEvaluatorPack`
RagEvaluatorPack = download_llama_pack("RagEvaluatorPack", "./pack")
rag_evaluator = RagEvaluatorPack(
    query_engine=query_engine,
    rag_dataset=rag_dataset,  # defined in 1A
    show_progress=True,
)

############################################################################
# NOTE: If have a lower tier subscription for OpenAI API like Usage Tier 1 #
# then you'll need to use different batch_size and sleep_time_in_seconds.  #
# For Usage Tier 1, settings that seemed to work well were batch_size=5,   #
# and sleep_time_in_seconds=15 (as of December 2023.)                      #
############################################################################

benchmark_df = await rag_evaluator_pack.arun(
    batch_size=20,  # batches the number of openai api calls to make
    sleep_time_in_seconds=1,  # seconds to sleep before making an api call
)
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.core.llama_pack import download_llama_pack

# i. Building a RAG system over the same source documents
documents = SimpleDirectoryReader(input_dir="data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine()

# ii. and iii. Predict and Evaluate using `RagEvaluatorPack`
RagEvaluatorPack = download_llama_pack("RagEvaluatorPack", "./pack")
rag_evaluator = RagEvaluatorPack(
    query_engine=query_engine,
    rag_dataset=rag_dataset,  # defined in 1A
    show_progress=True,
)

############################################################################
# NOTE: If have a lower tier subscription for OpenAI API like Usage Tier 1 #
# then you'll need to use different batch_size and sleep_time_in_seconds.  #
# For Usage Tier 1, settings that seemed to work well were batch_size=5,   #
# and sleep_time_in_seconds=15 (as of December 2023.)                      #
############################################################################

benchmark_df = await rag_evaluator_pack.arun(
    batch_size=20,  # batches the number of openai api calls to make
    sleep_time_in_seconds=1,  # seconds to sleep before making an api call
)

In [ ]:

Copied!

benchmark_df
benchmark_df

Out[ ]:

rag	base_rag
metrics
mean_correctness_score	4.238636
mean_relevancy_score	0.977273
mean_faithfulness_score	1.000000
mean_context_similarity_score	0.942281

模板¶

In [ ]:

Copied!





from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.core.llama_pack import download_llama_pack

documents = SimpleDirectoryReader(  # Can use a different reader here.
    input_dir=<FILL-IN>  # Should read the same source files used to create
).load_data()            # the LabelledRagDataset of Step 1.
                       
index = VectorStoreIndex.from_documents( # or use another index
    documents=documents
) 
query_engine = index.as_query_engine()

RagEvaluatorPack = download_llama_pack(
  "RagEvaluatorPack", "./pack"
)
rag_evaluator = RagEvaluatorPack(
    query_engine=query_engine,
    rag_dataset=rag_dataset,  # defined in Step 1A
    judge_llm=<FILL-IN>  # if you rather not use GPT-4
)
benchmark_df = await rag_evaluator.arun()
benchmark_df
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.core.llama_pack import download_llama_pack

documents = SimpleDirectoryReader(  # Can use a different reader here.
    input_dir=  # Should read the same source files used to create
).load_data()            # the LabelledRagDataset of Step 1.
                       
index = VectorStoreIndex.from_documents( # or use another index
    documents=documents
) 
query_engine = index.as_query_engine()

RagEvaluatorPack = download_llama_pack(
  "RagEvaluatorPack", "./pack"
)
rag_evaluator = RagEvaluatorPack(
    query_engine=query_engine,
    rag_dataset=rag_dataset,  # defined in Step 1A
    judge_llm=  # if you rather not use GPT-4
)
benchmark_df = await rag_evaluator.arun()
benchmark_df

返回顶部 ¶

3. 准备 `card.json` 和 `README.md` 文件¶

提交数据集时需同时提交部分元数据。这些元数据存储在两个不同的文件中：card.json 和 README.md，这两个文件将作为提交包的一部分上传至 llama-hub Github 仓库。为加速此步骤并确保格式统一，您可以使用 LlamaDatasetMetadataPack llamapack 工具。或者，您也可以根据下方提供的模板示范手动完成此步骤。

3A. 使用 `LlamaDatasetMetadataPack` 自动生成¶

演示¶

本节延续1A单元中保罗·格雷厄姆（Paul Graham）散文的演示案例。

In [ ]:

Copied!





from llama_index.core.llama_pack import download_llama_pack

LlamaDatasetMetadataPack = download_llama_pack(
    "LlamaDatasetMetadataPack", "./pack"
)

metadata_pack = LlamaDatasetMetadataPack()

dataset_description = (
    "A labelled RAG dataset based off an essay by Paul Graham, consisting of "
    "queries, reference answers, and reference contexts."
)

# this creates and saves a card.json and README.md to the same
# directory where you're running this notebook.
metadata_pack.run(
    name="Paul Graham Essay Dataset",
    description=dataset_description,
    rag_dataset=rag_dataset,
    index=index,
    benchmark_df=benchmark_df,
    baseline_name="llamaindex",
)
from llama_index.core.llama_pack import download_llama_pack

LlamaDatasetMetadataPack = download_llama_pack(
    "LlamaDatasetMetadataPack", "./pack"
)

metadata_pack = LlamaDatasetMetadataPack()

dataset_description = (
    "A labelled RAG dataset based off an essay by Paul Graham, consisting of "
    "queries, reference answers, and reference contexts."
)

# this creates and saves a card.json and README.md to the same
# directory where you're running this notebook.
metadata_pack.run(
    name="Paul Graham Essay Dataset",
    description=dataset_description,
    rag_dataset=rag_dataset,
    index=index,
    benchmark_df=benchmark_df,
    baseline_name="llamaindex",
)

In [ ]:

Copied!





# if you want to quickly view these two files, set take_a_peak to True
take_a_peak = False

if take_a_peak:
    import json

    with open("card.json", "r") as f:
        card = json.load(f)

    with open("README.md", "r") as f:
        readme_str = f.read()

    print(card)
    print("\n")
    print(readme_str)
# if you want to quickly view these two files, set take_a_peak to True
take_a_peak = False

if take_a_peak:
    import json

    with open("card.json", "r") as f:
        card = json.load(f)

    with open("README.md", "r") as f:
        readme_str = f.read()

    print(card)
    print("\n")
    print(readme_str)

模板¶

In [ ]:

Copied!





from llama_index.core.llama_pack import download_llama_pack

LlamaDatasetMetadataPack = download_llama_pack(
  "LlamaDatasetMetadataPack", "./pack"
)

metadata_pack = LlamaDatasetMetadataPack()
metadata_pack.run(
    name=<FILL-IN>,
    description=<FILL-IN>,
    rag_dataset=rag_dataset,  # from step 1
    index=index,  # from step 2
    benchmark_df=benchmark_df,  # from step 2
    baseline_name="llamaindex",  # optionally use another one
    source_urls=<OPTIONAL-FILL-IN>
    code_url=<OPTIONAL-FILL-IN>  # if you wish to submit code to replicate baseline results
)
from llama_index.core.llama_pack import download_llama_pack

LlamaDatasetMetadataPack = download_llama_pack(
  "LlamaDatasetMetadataPack", "./pack"
)

metadata_pack = LlamaDatasetMetadataPack()
metadata_pack.run(
    name=,
    description=,
    rag_dataset=rag_dataset,  # from step 1
    index=index,  # from step 2
    benchmark_df=benchmark_df,  # from step 2
    baseline_name="llamaindex",  # optionally use another one
    source_urls=
    code_url=  # if you wish to submit code to replicate baseline results
)

运行上述代码后，您可以检查 card.json 和 README.md 文件，在提交到 llama-hub Github 仓库前进行必要的手动编辑。

步骤 4, 返回顶部 ¶

3B. 手动生成¶

在这一部分，我们将通过Paul Graham文章示例（我们在1A环节中使用过的案例，如果您选择1C作为第一步也同样适用）来演示如何创建card.json和README.md文件。

`card.json`¶

演示¶

{
    "name": "保罗·格雷厄姆文集",
    "description": "基于保罗·格雷厄姆文章的标注型RAG数据集，包含查询语句、参考答案及参考上下文。",
    "numberObservations": 44,
    "containsExamplesByHumans": false,
    "containsExamplesByAI": true,
    "sourceUrls": [
        "http://www.paulgraham.com/articles.html"
    ],
    "baselines": [
        {
            "name": "llamaindex",
            "config": {
                "chunkSize": 1024,
                "llm": "gpt-3.5-turbo",
                "similarityTopK": 2,
                "embedModel": "text-embedding-ada-002"
            },
            "metrics": {
                "contextSimilarity": 0.934,
                "correctness": 4.239,
                "faithfulness": 0.977,
                "relevancy": 0.977
            },
            "codeUrl": "https://github.com/run-llama/llama-hub/blob/main/llama_hub/llama_datasets/paul_graham_essay/llamaindex_baseline.py"
        }
    ]
}

模板¶

{
    "name": <FILL-IN>,
    "description": <FILL-IN>,
    "numberObservations": <FILL-IN>,
    "containsExamplesByHumans": <FILL-IN>,
    "containsExamplesByAI": <FILL-IN>,
    "sourceUrls": [
        <FILL-IN>,
    ],
    "baselines": [
        {
            "name": <FILL-IN>,
            "config": {
                "chunkSize": <FILL-IN>,
                "llm": <FILL-IN>,
                "similarityTopK": <FILL-IN>,
                "embedModel": <FILL-IN>
            },
            "metrics": {
                "contextSimilarity": <FILL-IN>,
                "correctness": <FILL-IN>,
                "faithfulness": <FILL-IN>,
                "relevancy": <FILL-IN>
            },
            "codeUrl": <OPTIONAL-FILL-IN>
        }
    ]
}

`README.md`¶

在此步骤中，最低要求是采用下方模板并填写必要条目，这相当于将数据集名称更改为您希望用于新提交的名称。

演示示例¶

点击此处查看示例文件 README.md。

模板¶

点击此处获取README.md模板文件。只需复制该文件内容并替换占位符"[NAME]"和"[NAME-CAMELCASE]"，根据您选择的新数据集名称填入相应值。例如：

"{NAME}" = "保罗·格雷厄姆文集数据集"
"{NAME_CAMELCASE}" = PaulGrahamEssayDataset

返回顶部 ¶

4. 向 llama-hub 仓库提交 Pull Request¶

现在是为新数据集提交元数据并在数据集注册表中创建新条目的时机，这些信息存储在 library.json 文件中（参见此处）。

4a. 在 `llama_hub/llama_datasets` 下创建新目录并添加 `card.json` 和 `README.md`：¶

cd llama-hub  # 进入本地克隆的 llama-hub 目录
cd llama_hub/llama_datasets
git checkout -b my-new-dataset  # 创建新的 git 分支
mkdir <dataset_name_snake_case>  # 遵循其他数据集的命名惯例
cd <dataset_name_snake_case>
vim card.json # 使用 vim 或其他文本编辑器添加 card.json 内容
vim README.md # 使用 vim 或其他文本编辑器添加 README.md 内容

4b. 在 `llama_hub/llama_datasets/library.json` 中创建条目¶

cd llama_hub/llama_datasets
vim library.json # 使用vim或其他文本编辑器注册新数据集

`library.json` 文件示例¶

"PaulGrahamEssayDataset": {
    "id": "llama_datasets/paul_graham_essay",
    "author": "nerdai",
    "keywords": ["rag"]
  }

`library.json` 模板¶

"<FILL-IN>": {
    "id": "llama_datasets/<dataset_name_snake_case>",
    "author": "<FILL-IN>",
    "keywords": ["rag"]
  }

注意：请使用与步骤4a中相同的 dataset_name_snake_case 命名格式。

4c. 使用 `git add` 和 `commit` 提交更改并推送到你的分支¶

git add .
git commit -m "my new dataset submission"
git push origin my-new-dataset

完成上述操作后，请访问 llama-hub 的 GitHub 页面。此时你应该能看到从你的分支发起拉取请求的选项。现在请继续操作并提交拉取请求。

返回顶部 ¶

5. 向 llama-datasets 代码库提交 Pull Request¶

在提交流程的最后一步，您需要将实际的 LabelledRagDataset（json 格式）以及源数据文件提交至 llama-datasets Github 代码库。

5a. 在 `llama_datasets/` 目录下创建新文件夹：¶

cd llama-datasets # 进入本地克隆的 llama-datasets 目录
git checkout -b my-new-dataset  # 创建新的 git 分支
mkdir <dataset_name_snake_case>  # 使用与步骤 4 相同的名称
cd <dataset_name_snake_case>
cp <path-in-local-machine>/rag_dataset.json .  # 添加 rag_dataset.json 文件
mkdir source_files  # 准备添加所有源文件
cp -r <path-in-local-machine>/source_files  ./source_files  # 添加全部源文件

注意：请使用与步骤 4 相同的 dataset_name_snake_case 命名。

5b. 使用 `git add` 和 `commit` 提交更改并推送至你的复刻仓库¶

git add .
git commit -m "my new dataset submission"
git push origin my-new-dataset

完成上述操作后，请访问 llama-datasets 的 GitHub 页面。此时你应能看到从你的复刻仓库发起拉取请求的选项。现在请继续完成该操作。

返回顶部 ¶

大功告成！¶

您已完成数据集提交流程的全部步骤！🎉🦙 恭喜并感谢您的贡献！

LlamaDataset 提交模板笔记本¶

先决条件¶

复刻并克隆所需的 GitHub 仓库¶

关于 LabelledRagDataset 和 LabelledRagDataExample 的快速入门¶

提交 LlamaDataset 的步骤¶

1A. 使用合成构建示例从头创建 LabelledRagDataset¶

演示¶

模板¶

步骤 2, 返回顶部¶

1B. 从现有且结构相似的数据集创建 LabelledRagDataset¶

演示¶

模板¶

步骤 2, 返回顶部¶

1C. 通过手动构建示例从头创建 LabelledRagDataset¶

演示：¶

模板¶

返回顶部¶

2. 生成基准评估结果¶

演示说明¶

模板¶

返回顶部¶

3. 准备 card.json 和 README.md 文件¶

3A. 使用 LlamaDatasetMetadataPack 自动生成¶

演示¶

模板¶

步骤 4, 返回顶部¶

3B. 手动生成¶

card.json¶

演示¶

模板¶

README.md¶

演示示例¶

模板¶

返回顶部¶

4. 向 llama-hub 仓库提交 Pull Request¶

4a. 在 llama_hub/llama_datasets 下创建新目录并添加 card.json 和 README.md：¶

4b. 在 llama_hub/llama_datasets/library.json 中创建条目¶

library.json 文件示例¶

library.json 模板¶

4c. 使用 git add 和 commit 提交更改并推送到你的分支¶

返回顶部¶

5. 向 llama-datasets 代码库提交 Pull Request¶

5a. 在 llama_datasets/ 目录下创建新文件夹：¶

5b. 使用 git add 和 commit 提交更改并推送至你的复刻仓库¶

返回顶部¶

大功告成！¶

`LlamaDataset` 提交模板笔记本¶

关于 `LabelledRagDataset` 和 `LabelledRagDataExample` 的快速入门¶

提交 `LlamaDataset` 的步骤¶

1A. 使用合成构建示例从头创建 `LabelledRagDataset`¶

步骤 2, 返回顶部 ¶

1B. 从现有且结构相似的数据集创建 `LabelledRagDataset`¶

步骤 2, 返回顶部 ¶

1C. 通过手动构建示例从头创建 `LabelledRagDataset`¶

返回顶部 ¶

返回顶部 ¶

3. 准备 `card.json` 和 `README.md` 文件¶

3A. 使用 `LlamaDatasetMetadataPack` 自动生成¶

步骤 4, 返回顶部 ¶

`card.json`¶

`README.md`¶

返回顶部 ¶

4a. 在 `llama_hub/llama_datasets` 下创建新目录并添加 `card.json` 和 `README.md`：¶

4b. 在 `llama_hub/llama_datasets/library.json` 中创建条目¶

`library.json` 文件示例¶

`library.json` 模板¶

4c. 使用 `git add` 和 `commit` 提交更改并推送到你的分支¶

返回顶部 ¶

5a. 在 `llama_datasets/` 目录下创建新文件夹：¶

5b. 使用 `git add` 和 `commit` 提交更改并推送至你的复刻仓库¶

返回顶部 ¶