句子嵌入优化器¶

该后处理器通过移除与查询无关的句子来优化令牌使用（这是通过嵌入技术实现的）。百分位截断值用于保留相关性最高的前百分之几的句子。也可以指定阈值截断，该方法使用原始相似度阈值来决定保留哪些句子。

In [ ]:

Copied!

%pip install llama-index-readers-wikipedia
%pip install llama-index-readers-wikipedia

In [ ]:

Copied!

# My OpenAI Key
import os

os.environ["OPENAI_API_KEY"] = "INSERT OPENAI KEY"
# My OpenAI Key
import os

os.environ["OPENAI_API_KEY"] = "INSERT OPENAI KEY"

安装配置¶

如果您在 Colab 上打开此 Notebook，可能需要安装 LlamaIndex 🦙。

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!

from llama_index.core import download_loader

from llama_index.readers.wikipedia import WikipediaReader

loader = WikipediaReader()
documents = loader.load_data(pages=["Berlin"])
from llama_index.core import download_loader

from llama_index.readers.wikipedia import WikipediaReader

loader = WikipediaReader()
documents = loader.load_data(pages=["Berlin"])

In [ ]:

Copied!

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)

<class 'llama_index.readers.schema.base.Document'>

INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 18390 tokens

比较查询在以下方面的优化前后差异：LLM令牌使用量、查询时的嵌入模型使用量、优化器的嵌入模型使用量以及总耗时。

In [ ]:

Copied!





import time
from llama_index.core import VectorStoreIndex
from llama_index.core.postprocessor import SentenceEmbeddingOptimizer

print("Without optimization")
start_time = time.time()
query_engine = index.as_query_engine()
res = query_engine.query("What is the population of Berlin?")
end_time = time.time()
print("Total time elapsed: {}".format(end_time - start_time))
print("Answer: {}".format(res))

print("With optimization")
start_time = time.time()
query_engine = index.as_query_engine(
    node_postprocessors=[SentenceEmbeddingOptimizer(percentile_cutoff=0.5)]
)
res = query_engine.query("What is the population of Berlin?")
end_time = time.time()
print("Total time elapsed: {}".format(end_time - start_time))
print("Answer: {}".format(res))

print("Alternate optimization cutoff")
start_time = time.time()
query_engine = index.as_query_engine(
    node_postprocessors=[SentenceEmbeddingOptimizer(threshold_cutoff=0.7)]
)
res = query_engine.query("What is the population of Berlin?")
end_time = time.time()
print("Total time elapsed: {}".format(end_time - start_time))
print("Answer: {}".format(res))
import time
from llama_index.core import VectorStoreIndex
from llama_index.core.postprocessor import SentenceEmbeddingOptimizer

print("Without optimization")
start_time = time.time()
query_engine = index.as_query_engine()
res = query_engine.query("What is the population of Berlin?")
end_time = time.time()
print("Total time elapsed: {}".format(end_time - start_time))
print("Answer: {}".format(res))

print("With optimization")
start_time = time.time()
query_engine = index.as_query_engine(
    node_postprocessors=[SentenceEmbeddingOptimizer(percentile_cutoff=0.5)]
)
res = query_engine.query("What is the population of Berlin?")
end_time = time.time()
print("Total time elapsed: {}".format(end_time - start_time))
print("Answer: {}".format(res))

print("Alternate optimization cutoff")
start_time = time.time()
query_engine = index.as_query_engine(
    node_postprocessors=[SentenceEmbeddingOptimizer(threshold_cutoff=0.7)]
)
res = query_engine.query("What is the population of Berlin?")
end_time = time.time()
print("Total time elapsed: {}".format(end_time - start_time))
print("Answer: {}".format(res))

Without optimization

INFO:root:> [query] Total LLM token usage: 3545 tokens
INFO:root:> [query] Total embedding token usage: 7 tokens

Total time elapsed: 2.8928110599517822
Answer: 
The population of Berlin in 1949 was approximately 2.2 million inhabitants. After the fall of the Berlin Wall in 1989, the population of Berlin increased to approximately 3.7 million inhabitants.

With optimization

INFO:root:> [optimize] Total embedding token usage: 7 tokens
INFO:root:> [query] Total LLM token usage: 1779 tokens
INFO:root:> [query] Total embedding token usage: 7 tokens

Total time elapsed: 2.346346139907837
Answer: 
The population of Berlin is around 4.5 million.
Alternate optimization cutoff

INFO:root:> [optimize] Total embedding token usage: 7 tokens
INFO:root:> [query] Total LLM token usage: 3215 tokens
INFO:root:> [query] Total embedding token usage: 7 tokens

Total time elapsed: 2.101111888885498
Answer: 
The population of Berlin is around 4.5 million.