LLM Pydantic 程序¶

本指南将向您展示如何使用我们的 LLMTextCompletionProgram 生成结构化数据。给定一个大型语言模型（LLM）和输出用的 Pydantic 类，即可生成结构化的 Pydantic 对象。

对于目标对象，您可以选择直接指定 output_cls，或指定一个能生成 Pydantic 对象的 PydanticOutputParser 或其他任何 BaseOutputParser。

在以下示例中，我们将展示多种提取到 Album 对象（该对象可包含 Song 对象列表）的方法。

提取到 `Album` 类中¶

这是一个将输出解析为 Album 模式的简单示例，该模式可以包含多首歌曲。

只需在初始化 LLMTextCompletionProgram 时，将 Album 传递给 output_cls 属性即可。

如果您在 Colab 上打开此 Notebook，可能需要安装 LlamaIndex 🦙。

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!

from pydantic import BaseModel
from typing import List

from llama_index.core.program import LLMTextCompletionProgram
from pydantic import BaseModel
from typing import List

from llama_index.core.program import LLMTextCompletionProgram

定义输出架构

In [ ]:

Copied!





class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]
class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

定义 LLM pydantic 程序

In [ ]:

Copied!

from llama_index.core.program import LLMTextCompletionProgram
from llama_index.core.program import LLMTextCompletionProgram

In [ ]:

Copied!





prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)
prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)

运行程序以获取结构化输出。

In [ ]:

Copied!

output = program(movie_name="The Shining")
output = program(movie_name="The Shining")

输出是一个有效的 Pydantic 对象，我们可以用它来调用函数/API。

In [ ]:

Copied!

output
output

Out[ ]:

Album(name='The Overlook', artist='Jack Torrance', songs=[Song(title='Redrum', length_seconds=240), Song(title="Here's Johnny", length_seconds=180), Song(title='Room 237', length_seconds=300), Song(title='All Work and No Play', length_seconds=210), Song(title='The Maze', length_seconds=270)])

使用 Pydantic 输出解析器初始化¶

上述操作等同于定义一个 Pydantic 输出解析器，然后直接传入该解析器而非 output_cls。

In [ ]:

Copied!





from llama_index.core.output_parsers import PydanticOutputParser

program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(output_cls=Album),
    prompt_template_str=prompt_template_str,
    verbose=True,
)
from llama_index.core.output_parsers import PydanticOutputParser

program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(output_cls=Album),
    prompt_template_str=prompt_template_str,
    verbose=True,
)

In [ ]:

Copied!

output = program(movie_name="Lord of the Rings")
output
output = program(movie_name="Lord of the Rings")
output

Out[ ]:

Album(name='The Fellowship of the Ring', artist='Middle-earth Ensemble', songs=[Song(title='The Shire', length_seconds=240), Song(title='Concerning Hobbits', length_seconds=180), Song(title='The Ring Goes South', length_seconds=300), Song(title='A Knife in the Dark', length_seconds=270), Song(title='Flight to the Ford', length_seconds=210), Song(title='Many Meetings', length_seconds=240), Song(title='The Council of Elrond', length_seconds=330), Song(title='The Great Eye', length_seconds=180), Song(title='The Breaking of the Fellowship', length_seconds=360)])

定义自定义输出解析器¶

有时您可能需要以自定义方式将输出解析为 JSON 对象。

In [ ]:

Copied!





from llama_index.core.output_parsers import ChainableOutputParser


class CustomAlbumOutputParser(ChainableOutputParser):
    """Custom Album output parser.

    Assume first line is name and artist.

    Assume each subsequent line is the song.

    """

    def __init__(self, verbose: bool = False):
        self.verbose = verbose

    def parse(self, output: str) -> Album:
        """Parse output."""
        if self.verbose:
            print(f"> Raw output: {output}")
        lines = output.split("\n")
        name, artist = lines[0].split(",")
        songs = []
        for i in range(1, len(lines)):
            title, length_seconds = lines[i].split(",")
            songs.append(Song(title=title, length_seconds=length_seconds))

        return Album(name=name, artist=artist, songs=songs)
from llama_index.core.output_parsers import ChainableOutputParser


class CustomAlbumOutputParser(ChainableOutputParser):
    """Custom Album output parser.

    Assume first line is name and artist.

    Assume each subsequent line is the song.

    """

    def __init__(self, verbose: bool = False):
        self.verbose = verbose

    def parse(self, output: str) -> Album:
        """Parse output."""
        if self.verbose:
            print(f"> Raw output: {output}")
        lines = output.split("\n")
        name, artist = lines[0].split(",")
        songs = []
        for i in range(1, len(lines)):
            title, length_seconds = lines[i].split(",")
            songs.append(Song(title=title, length_seconds=length_seconds))

        return Album(name=name, artist=artist, songs=songs)

In [ ]:

Copied!





prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\

Return answer in following format.
The first line is:
<album_name>, <album_artist>
Every subsequent line is a song with format:
<song_title>, <song_length_seconds>

"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=CustomAlbumOutputParser(verbose=True),
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)
prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\

Return answer in following format.
The first line is:
, 
Every subsequent line is a song with format:
, 

"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=CustomAlbumOutputParser(verbose=True),
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    verbose=True,
)

In [ ]:

Copied!

output = program(movie_name="The Dark Knight")
output = program(movie_name="The Dark Knight")

> Raw output: Gotham's Reckoning, The Dark Knight
A Dark Knight Rises, 240
The Joker's Symphony, 180
Harvey Dent's Lament, 210
Gotham's Guardian, 195
The Batmobile Chase, 225
The Dark Knight's Theme, 150
The Joker's Mind Games, 180
Rachel's Tragedy, 210
Gotham's Last Stand, 240
The Dark Knight's Triumph, 180

In [ ]:

Copied!

output
output

Out[ ]:

Album(name="Gotham's Reckoning", artist=' The Dark Knight', songs=[Song(title='A Dark Knight Rises', length_seconds=240), Song(title="The Joker's Symphony", length_seconds=180), Song(title="Harvey Dent's Lament", length_seconds=210), Song(title="Gotham's Guardian", length_seconds=195), Song(title='The Batmobile Chase', length_seconds=225), Song(title="The Dark Knight's Theme", length_seconds=150), Song(title="The Joker's Mind Games", length_seconds=180), Song(title="Rachel's Tragedy", length_seconds=210), Song(title="Gotham's Last Stand", length_seconds=240), Song(title="The Dark Knight's Triumph", length_seconds=180)])

LLM Pydantic 程序¶

提取到 Album 类中¶

使用 Pydantic 输出解析器初始化¶

定义自定义输出解析器¶

提取到 `Album` 类中¶