We present H2O-Danube, a series of small 1.8B language models consisting of H2O-Danube-1.8B, trained on 1T tokens, and the incremental improved H2O-Danube2-1.8B trained on an additional 2T tokens. Our models exhibit highly competitive metrics across a multitude of benchmarks and, as of the time of this writing, H2O-Danube2-1.8B achieves the top ranking on Open LLM Leaderboard for all models below the 2B parameter range. The models follow core principles of LLama 2 and Mistral, and we leverage and refine various techniques for pre-training large language models. We additionally release chat models trained with supervised fine-tuning followed by direct preference optimization. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.

## Overview

- Presents H2O-Danube, a series of small 1.8B language models
- H2O-Danube-1.8B is trained on 1T tokens, and H2O-Danube2-1.8B is trained on an additional 2T tokens
- Models exhibit highly competitive metrics across multiple benchmarks
- H2O-Danube2-1.8B achieves top ranking on Open LLM Leaderboard for models below 2B parameters
- Follow core principles of [LLama 2](https://aimodels.fyi/papers/arxiv/jetmoe-reaching-llama2-performance-01m-dollars) and [Mistral](https://aimodels.fyi/papers/arxiv/solar-107b-scaling-large-language-models-simple), leveraging and refining techniques for pre-training large language models
- Release chat models trained with supervised fine-tuning and direct preference optimization
- Models made openly available under Apache 2.0 license to democratize LLMs

## Plain English Explanation

The researchers have developed a series of small 1.8 billion parameter language models called H2O-Danube. The first model, H2O-Danube-1.8B, was trained on 1 trillion tokens of text data, while the second model, H2O-Danube2-1.8B, was trained on an additional 2 trillion tokens. These models perform extremely well on a variety of benchmarks, with H2O-Danube2-1.8B even ranking first among all models with under 2 billion parameters on the Open LLM Leaderboard.

The models are built upon the foundations of [LLama 2](https://aimodels.fyi/papers/arxiv/jetmoe-reaching-llama2-performance-01m-dollars) and [Mistral](https://aimodels.fyi/papers/arxiv/solar-107b-scaling-large-language-models-simple), two other influential large language models. The researchers have further refined and improved the techniques used to pre-train these large models.

In addition to the main language models, the researchers have also released chat models that have been fine-tuned with supervised training and then optimized for direct user preferences. All of these models are made freely available to the public under the Apache 2.0 license, which helps make large language models more accessible and widely usable.

## Technical Explanation

The H2O-Danube series of language models consists of two main versions: H2O-Danube-1.8B, which was trained on 1 trillion tokens of text data, and H2O-Danube2-1.8B, which was trained on an additional 2 trillion tokens. Both models have 1.8 billion parameters, placing them in the "small" category of large language models.

These models were developed by leveraging and refining the core principles and techniques used in the [LLama 2](https://aimodels.fyi/papers/arxiv/jetmoe-reaching-llama2-performance-01m-dollars) and [Mistral](https://aimodels.fyi/papers/arxiv/solar-107b-scaling-large-language-models-simple) language models. The researchers integrated various advancements in pre-training large language models to achieve highly competitive performance across a wide range of benchmarks.

In addition to the main language models, the researchers also trained chat models using supervised fine-tuning followed by direct preference optimization. These chat models are designed to engage in more natural, conversational interactions with users.

All of the H2O-Danube models, including the chat variants, are made openly available under the Apache 2.0 license. This open-source approach helps democratize access to large language models, allowing a wider audience to utilize and build upon these powerful AI systems.

## Critical Analysis

The H2O-Danube models represent a significant advancement in the field of large language models, particularly in terms of their impressive performance on a wide range of benchmarks. The researchers' approach of building upon the foundations of LLama 2 and Mistral, while further refining and improving the pre-training techniques, has led to the development of highly capable models.

However, it's important to note that the paper does not provide detailed information about the specific techniques and methodologies used in the pre-training process. While the researchers mention leveraging and refining various approaches, a more in-depth explanation of the innovations and modifications would be helpful for a deeper understanding of the models' capabilities and potential limitations.

Additionally, the paper does not discuss the potential biases or ethical considerations associated with the H2O-Danube models. As large language models can sometimes exhibit undesirable biases or generate harmful content, it would be valuable for the researchers to address these concerns and outline their strategies for mitigating such issues.

Furthermore, the paper lacks a comprehensive analysis of the chat models' performance and their ability to engage in natural, contextual conversations. While the release of these chat models is a positive step, a more detailed evaluation of their conversational skills and user experience would provide valuable insights.

## Conclusion

The H2O-Danube series of language models represents a significant advancement in the field of large language models. By building upon the foundations of LLama 2 and Mistral and further refining the pre-training techniques, the researchers have developed highly capable models that exhibit strong performance across a variety of benchmarks.

The open-source release of these models, including the chat variants, is a commendable effort to democratize access to powerful AI systems and foster a wider ecosystem of language model development and application. However, the paper could benefit from more detailed explanations of the technical innovations, potential biases and ethical considerations, as well as a more in-depth evaluation of the chat models' conversational abilities.

Overall, the H2O-Danube models are a promising development in the ongoing quest to create highly capable and accessible large language models that can positively impact various domains and applications.