嵌入相似度评估器¶

本笔记本展示 SemanticSimilarityEvaluator 的用法，该评估器通过语义相似度来衡量问答系统的质量。

具体而言，它会计算生成答案与参考答案的嵌入向量之间的相似度得分。

如果您在 Colab 上打开此 Notebook，可能需要安装 LlamaIndex 🦙。

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

In [ ]:

Copied!

from llama_index.core.evaluation import SemanticSimilarityEvaluator

evaluator = SemanticSimilarityEvaluator()
from llama_index.core.evaluation import SemanticSimilarityEvaluator

evaluator = SemanticSimilarityEvaluator()

In [ ]:

Copied!





# This evaluator only uses `response` and `reference`, passing in query does not influence the evaluation
# query = 'What is the color of the sky'

response = "The sky is typically blue"
reference = """The color of the sky can vary depending on several factors, including time of day, weather conditions, and location.

During the day, when the sun is in the sky, the sky often appears blue. 
This is because of a phenomenon called Rayleigh scattering, where molecules and particles in the Earth's atmosphere scatter sunlight in all directions, and blue light is scattered more than other colors because it travels as shorter, smaller waves. 
This is why we perceive the sky as blue on a clear day.
"""

result = await evaluator.aevaluate(
    response=response,
    reference=reference,
)
# This evaluator only uses `response` and `reference`, passing in query does not influence the evaluation
# query = 'What is the color of the sky'

response = "The sky is typically blue"
reference = """The color of the sky can vary depending on several factors, including time of day, weather conditions, and location.

During the day, when the sun is in the sky, the sky often appears blue. 
This is because of a phenomenon called Rayleigh scattering, where molecules and particles in the Earth's atmosphere scatter sunlight in all directions, and blue light is scattered more than other colors because it travels as shorter, smaller waves. 
This is why we perceive the sky as blue on a clear day.
"""

result = await evaluator.aevaluate(
    response=response,
    reference=reference,
)

In [ ]:

Copied!

print("Score: ", result.score)
print("Passing: ", result.passing)  # default similarity threshold is 0.8
print("Score: ", result.score)
print("Passing: ", result.passing)  # default similarity threshold is 0.8

Score:  0.874911773340899
Passing:  True

In [ ]:

Copied!





response = "Sorry, I do not have sufficient context to answer this question."
reference = """The color of the sky can vary depending on several factors, including time of day, weather conditions, and location.

During the day, when the sun is in the sky, the sky often appears blue. 
This is because of a phenomenon called Rayleigh scattering, where molecules and particles in the Earth's atmosphere scatter sunlight in all directions, and blue light is scattered more than other colors because it travels as shorter, smaller waves. 
This is why we perceive the sky as blue on a clear day.
"""

result = await evaluator.aevaluate(
    response=response,
    reference=reference,
)
response = "Sorry, I do not have sufficient context to answer this question."
reference = """The color of the sky can vary depending on several factors, including time of day, weather conditions, and location.

During the day, when the sun is in the sky, the sky often appears blue. 
This is because of a phenomenon called Rayleigh scattering, where molecules and particles in the Earth's atmosphere scatter sunlight in all directions, and blue light is scattered more than other colors because it travels as shorter, smaller waves. 
This is why we perceive the sky as blue on a clear day.
"""

result = await evaluator.aevaluate(
    response=response,
    reference=reference,
)

In [ ]:

Copied!

print("Score: ", result.score)
print("Passing: ", result.passing)  # default similarity threshold is 0.8
print("Score: ", result.score)
print("Passing: ", result.passing)  # default similarity threshold is 0.8

Score:  0.7221738929165528
Passing:  False

自定义¶

In [ ]:

Copied!





from llama_index.core.evaluation import SemanticSimilarityEvaluator
from llama_index.core.embeddings import SimilarityMode, resolve_embed_model

embed_model = resolve_embed_model("local")
evaluator = SemanticSimilarityEvaluator(
    embed_model=embed_model,
    similarity_mode=SimilarityMode.DEFAULT,
    similarity_threshold=0.6,
)
from llama_index.core.evaluation import SemanticSimilarityEvaluator
from llama_index.core.embeddings import SimilarityMode, resolve_embed_model

embed_model = resolve_embed_model("local")
evaluator = SemanticSimilarityEvaluator(
    embed_model=embed_model,
    similarity_mode=SimilarityMode.DEFAULT,
    similarity_threshold=0.6,
)

In [ ]:

Copied!





response = "The sky is yellow."
reference = "The sky is blue."

result = await evaluator.aevaluate(
    response=response,
    reference=reference,
)
response = "The sky is yellow."
reference = "The sky is blue."

result = await evaluator.aevaluate(
    response=response,
    reference=reference,
)

In [ ]:

Copied!

print("Score: ", result.score)
print("Passing: ", result.passing)
print("Score: ", result.score)
print("Passing: ", result.passing)

Score:  0.9178505509625874
Passing:  True

我们在此说明，高分并不代表答案总是正确的。

嵌入相似度主要捕捉的是"相关性"的概念。由于回答和参考文本都讨论了"天空"和颜色，它们在语义上是相似的。