Vectara

Models by this creator

🔎

hallucination_evaluation_model

vectara

Total Score

165

The hallucination_evaluation_model is an open-source model created by Vectara to detect hallucinations in large language models (LLMs). It is particularly useful in the context of building retrieval-augmented-generation (RAG) applications, where an LLM summarizes a set of facts, but this model can also be used in other contexts. The model is based on the SentenceTransformers Cross-Encoder class and is trained on datasets like FEVER, Vitamin C, and PAWS to determine textual entailment and factual consistency. Model inputs and outputs Inputs Text data, which can be summarizations or other outputs from large language models Outputs A probability score from 0 to 1, where 0 indicates a hallucination and 1 indicates a factually consistent output Capabilities The hallucination_evaluation_model can be used to assess the factual consistency of text generated by large language models. This is particularly useful for applications like retrieval-augmented generation, where the model needs to maintain fidelity to the source information. The model has been evaluated on various benchmarks, including the TRUE Dataset, SummaC, and the AnyScale Ranking Test for Hallucinations, achieving strong performance. What can I use it for? The hallucination_evaluation_model can be used to assess the factual consistency of text generated by large language models, which is particularly useful for applications like retrieval-augmented generation, where the model needs to maintain fidelity to the source information. If you are interested in learning more about RAG or experimenting with Vectara, you can sign up for a free Vectara account. Things to try One interesting thing to try with the hallucination_evaluation_model is to use it to evaluate the factual consistency of outputs from different large language models. This could help identify models that are better at maintaining fidelity to source information, which could be useful for a variety of applications. Additionally, you could experiment with using the model in the context of a retrieval-augmented generation system, to see how it performs in that specific use case.

Read more

Updated 5/19/2024