Prometheus-eval

Models by this creator

🏅

prometheus-13b-v1.0

prometheus-eval

Total Score

115

prometheus-13b-v1.0 is an alternative to GPT-4 for fine-grained evaluation of language models. Developed by prometheus-eval, it uses the Llama-2-Chat model as a base and fine-tunes it on 100K feedback samples from the Feedback Collection dataset. This specialized fine-tuning allows prometheus-13b-v1.0 to outperform GPT-3.5-Turbo and Llama-2-Chat 70B, and perform on par with GPT-4 on various benchmarks. In contrast to GPT-4, prometheus-13b-v1.0 is a more affordable and customizable evaluation model that can be tuned to assess language models based on specific criteria like child readability, cultural sensitivity, or creativity. Model inputs and outputs Inputs Instruction**: The task or prompt to be evaluated Response**: The text response to be evaluated Reference answer**: A reference answer that would receive a score of 5 Score rubric**: A set of criteria and descriptions for scoring the response on a scale of 1 to 5 Outputs Feedback**: A detailed assessment of the response quality based on the provided score rubric Score**: An integer between 1 and 5 indicating the quality of the response, as per the score rubric Capabilities prometheus-13b-v1.0 excels at fine-grained evaluation of language model outputs. It can provide detailed feedback and scoring for responses across a wide range of criteria, making it a powerful tool for model developers and researchers looking to assess the performance of their language models. The model's specialized fine-tuning on human feedback data enables it to identify and react appropriately to the emotional context of user inputs, a key capability for providing empathetic and nuanced evaluations. What can I use it for? prometheus-13b-v1.0 can be used as a cost-effective alternative to GPT-4 for evaluating the performance of language models. It is particularly well-suited for assessing models based on customized criteria, such as child readability, cultural sensitivity, or creativity. The model can also be used as a reward model for Reinforcement Learning from Human Feedback (RLHF) approaches, helping to fine-tune language models to align with human preferences and values. Things to try One interesting use case for prometheus-13b-v1.0 is to provide detailed feedback on the outputs of large language models, helping to identify areas for improvement and guide further model development. Researchers and developers could use the model to evaluate their models on a wide range of benchmarks and tasks, and then use the detailed feedback to inform their fine-tuning and training processes. Additionally, the model could be used to assess the safety and appropriateness of language model outputs, ensuring that they align with ethical guidelines and promote positive behavior.

Read more

Updated 5/21/2024