Large language models (LLMs) are now widely used in various fields, including finance. However, Japanese financial-specific LLMs have not been proposed yet. Hence, this study aims to construct a Japanese financial-specific LLM through continual pre-training. Before tuning, we constructed Japanese financial-focused datasets for continual pre-training. As a base model, we employed a Japanese LLM that achieved state-of-the-art performance on Japanese financial benchmarks among the 10-billion-class parameter models. After continual pre-training using the datasets and the base model, the tuned model performed better than the original model on the Japanese financial benchmarks. Moreover, the outputs comparison results reveal that the tuned model's outputs tend to be better than the original model's outputs in terms of the quality and length of the answers. These findings indicate that domain-specific continual pre-training is also effective for LLMs. The tuned model is publicly available on Hugging Face.

## Overview

- Researchers have developed a Japanese financial-specific large language model (LLM) through continual pre-training.
- They first constructed Japanese financial-focused datasets for the continual pre-training process.
- They then used a state-of-the-art Japanese LLM as a base model and fine-tuned it on the financial datasets.
- The resulting model outperformed the original model on Japanese financial benchmarks in terms of the quality and length of the answers.
- The tuned model is publicly available on Hugging Face.

## Plain English Explanation

Large language models (LLMs) are powerful AI systems that can understand and generate human-like text. These models have been widely used in various fields, including finance. However, until now, there haven't been any LLMs specifically trained on Japanese financial data.

To address this gap, researchers created a new Japanese financial-focused LLM. They first gathered a collection of Japanese financial documents and data, which they used to further train an existing high-performing Japanese LLM. This process of "continual pre-training" helps the model develop a deeper understanding of financial concepts and language.

After the additional training, the researchers found that the updated model performed better than the original on tests of Japanese financial tasks. The tuned model was able to provide higher-quality and more detailed answers to financial questions compared to the original.

The researchers have made this improved Japanese financial LLM publicly available on the Hugging Face platform, where others can access and use it for their own financial applications and research.

## Technical Explanation

The researchers first constructed [Japanese financial-focused datasets](https://aimodels.fyi/papers/arxiv/jafin-japanese-financial-instruction-dataset) for the continual pre-training process. As a base model, they employed a Japanese LLM that had achieved state-of-the-art performance on Japanese financial benchmarks among the 10-billion-class parameter models.

They then used this base model and the financial datasets to perform continual pre-training, a technique that further adapts a pre-trained model to a specific domain or task. After this fine-tuning process, the resulting model outperformed the original base model on the Japanese financial benchmarks.

Specifically, the researchers found that the tuned model's outputs were of higher quality and contained more detailed answers compared to the original model. This indicates that **domain-specific continual pre-training** can be an effective approach for improving the performance of LLMs in specialized areas like finance.

The researchers have made the tuned [Japanese financial-specific LLM](https://aimodels.fyi/papers/arxiv/pretraining-updating-language-domain-specific-large-language) publicly available on Hugging Face, allowing others to access and utilize this resource for their own financial applications and research.

## Critical Analysis

The researchers have provided a valuable contribution by developing a Japanese financial-specific LLM, as this fills an important gap in the field. By using continual pre-training, they were able to adapt an existing high-performing Japanese LLM to the financial domain, resulting in improved performance on relevant benchmarks.

However, the paper does not delve into the specific details of the datasets used for the continual pre-training or the exact architectural changes made to the base model. Additionally, the researchers do not provide a thorough analysis of the model's limitations or potential biases that may arise from the financial domain-specific training.

Further research could explore the generalizability of this approach to other specialized domains, such as [legal](https://aimodels.fyi/papers/arxiv/novel-paradigm-boosting-translation-capabilities-large-language) or [medical](https://aimodels.fyi/papers/arxiv/chinese-tiny-llm-pretraining-chinese-centric-large) fields. Investigating the model's robustness to distribution shifts or its ability to handle complex financial reasoning tasks would also be valuable areas for future work.

## Conclusion

This research demonstrates the effectiveness of **domain-specific continual pre-training** for improving the performance of large language models in specialized areas like finance. By fine-tuning a high-performing Japanese LLM on Japanese financial datasets, the researchers were able to develop a model that outperformed the original on relevant benchmarks.

The availability of this [Japanese financial-specific LLM](https://aimodels.fyi/papers/arxiv/pretraining-updating-language-domain-specific-large-language) on Hugging Face represents a valuable resource for researchers and practitioners in the field of finance, who can leverage this model for a wide range of applications, from [automated financial analysis](https://aimodels.fyi/papers/arxiv/scalable-language-model-generalized-continual-learning) to [natural language-based financial decision support](https://aimodels.fyi/papers/arxiv/pretraining-updating-language-domain-specific-large-language). This research paves the way for further advancements in domain-specific language models and their real-world applications.