Baichuan 2
==========

[GitHub](https://github.com/baichuan-inc/Baichuan2) | [WeChat](https://github.com/baichuan-inc/Baichuan-7B/blob/main/media/wechat.jpeg?raw=true)

API192K  
 [](https://www.baichuan-ai.com/)  

[](#table-of-contents)/Table of Contents
============================================

*   [ /Introduction](#Introduction)
*   [ /Quick Start](#Start)
*   [ Benchmark/Benchmark Evaluation](#Benchmark)
*   [ /Terms and Conditions](#Terms)

[](#introduction)/Introduction
======================================

Baichuan 2 [](https://www.baichuan-ai.com) **2.6 ** Tokens  benchmark  7B13B  Base  Chat  Chat  4bits [](mailto:opensource@baichuan-inc.com)

Baichuan 2 is the new generation of large-scale open-source language models launched by [Baichuan Intelligence inc.](https://www.baichuan-ai.com/). It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size. This release includes 7B and 13B versions for both Base and Chat models, along with a 4bits quantized version for the Chat model. All versions are fully open to academic research, and developers can also use them for free in commercial applications after obtaining an official commercial license through [email request](mailto:opensource@baichuan-inc.com). The specific release versions and download links are listed in the table below:

Base Model

Chat Model

4bits Quantized Chat Model

7B

[Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base)

[Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)

[Baichuan2-7B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base-4bits)

13B

[Baichuan2-13B-Base](https://huggingface.co/baichuan-inc/Baichuan2-13B-Base)

[Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat)

[Baichuan2-13B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat-4bits)

[](#quick-start)/Quick Start
====================================

Baichuan2Pytorch2.0F.scaled\_dot\_product\_attentionPytorch2.0

In the Baichuan 2 series models, we have utilized the new feature `F.scaled_dot_product_attention` introduced in PyTorch 2.0 to accelerate inference speed. Therefore, the model needs to be run in a PyTorch 2.0 environment.

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan2-13B-Base", use_fast=False, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-13B-Base", device_map="auto", trust_remote_code=True)
    inputs = tokenizer('->\n->', return_tensors='pt')
    inputs = inputs.to('cuda:0')
    pred = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
    print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
    

[](#benchmark-benchmark-evaluation)Benchmark /Benchmark Evaluation
======================================================================

[](https://github.com/baichuan-inc/Baichuan2#%E9%80%9A%E7%94%A8%E9%A2%86%E5%9F%9F)[](https://github.com/baichuan-inc/Baichuan2#%E6%B3%95%E5%BE%8B%E5%8C%BB%E7%96%97)[](https://github.com/baichuan-inc/Baichuan2#%E6%B3%95%E5%BE%8B%E5%8C%BB%E7%96%97)[](https://github.com/baichuan-inc/Baichuan2#%E6%95%B0%E5%AD%A6%E4%BB%A3%E7%A0%81)[](https://github.com/baichuan-inc/Baichuan2#%E6%95%B0%E5%AD%A6%E4%BB%A3%E7%A0%81)[](https://github.com/baichuan-inc/Baichuan2#%E5%A4%9A%E8%AF%AD%E8%A8%80%E7%BF%BB%E8%AF%91)[GitHub](https://github.com/baichuan-inc/Baichuan2)

We have extensively tested the model on authoritative Chinese-English datasets across six domains: [General](https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#general-domain), [Legal](https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#law-and-medicine), [Medical](https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#law-and-medicine), [Mathematics](https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#mathematics-and-code), [Code](https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#mathematics-and-code), and [Multilingual Translation](https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#multilingual-translation). For more detailed evaluation results, please refer to [GitHub](https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md).

### [](#7b-model-results)7B Model Results

**C-Eval**

**MMLU**

**CMMLU**

**Gaokao**

**AGIEval**

**BBH**

5-shot

5-shot

5-shot

5-shot

5-shot

3-shot

**GPT-4**

68.40

83.93

70.33

66.15

63.27

75.12

**GPT-3.5 Turbo**

51.10

68.54

54.06

47.07

46.13

61.59

**LLaMA-7B**

27.10

35.10

26.75

27.81

28.17

32.38

**LLaMA2-7B**

28.90

45.73

31.38

25.97

26.53

39.16

**MPT-7B**

27.15

27.93

26.00

26.54

24.83

35.20

**Falcon-7B**

24.23

26.03

25.66

24.24

24.10

28.77

**ChatGLM2-6B**

50.20

45.90

49.00

49.44

45.28

31.65

**[Baichuan-7B](https://huggingface.co/baichuan-inc/Baichuan-7B)**

42.80

42.30

44.02

36.34

34.44

32.48

**[Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base)**

54.00

54.16

57.07

47.47

42.73

41.56

### [](#13b-model-results)13B Model Results

**C-Eval**

**MMLU**

**CMMLU**

**Gaokao**

**AGIEval**

**BBH**

5-shot

5-shot

5-shot

5-shot

5-shot

3-shot

**GPT-4**

68.40

83.93

70.33

66.15

63.27

75.12

**GPT-3.5 Turbo**

51.10

68.54

54.06

47.07

46.13

61.59

**LLaMA-13B**

28.50

46.30

31.15

28.23

28.22

37.89

**LLaMA2-13B**

35.80

55.09

37.99

30.83

32.29

46.98

**Vicuna-13B**

32.80

52.00

36.28

30.11

31.55

43.04

**Chinese-Alpaca-Plus-13B**

38.80

43.90

33.43

34.78

35.46

28.94

**XVERSE-13B**

53.70

55.21

58.44

44.69

42.54

38.06

**[Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)**

52.40

51.60

55.30

49.69

43.20

43.01

**[Baichuan2-13B-Base](https://huggingface.co/baichuan-inc/Baichuan2-13B-Base)**

58.10

59.17

61.97

54.33

48.17

48.78

[](#training-dynamics)/Training Dynamics
----------------------------------------------------

 2.6  Tokens  [Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base)  11  0.2 ~ 2.4  Tokens [checkpoint](https://huggingface.co/baichuan-inc/Baichuan2-7B-Intermediate-Checkpoints) checkpoints  C-EvalMMLUCMMLU  benchmark 

In addition to the [Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base) model trained on 2.6 trillion tokens, we also offer 11 additional intermediate-stage models for community research, corresponding to training on approximately 0.2 to 2.4 trillion tokens each ([Intermediate Checkpoints Download](https://huggingface.co/baichuan-inc/Baichuan2-7B-Intermediate-Checkpoints)). The graph below shows the performance changes of these checkpoints on three benchmarks: C-Eval, MMLU, and CMMLU.

[![checkpoint](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/checkpoints.jpeg)](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/checkpoints.jpeg)

[](#terms-and-conditions)/Terms and Conditions
========================================================

[](#)
---------

 Baichuan 2  iOSAndroid Baichuan 2  Baichuan 2 

 Baichuan 2 

We hereby declare that our team has not developed any applications based on Baichuan 2 models, not on iOS, Android, the web, or any other platform. We strongly call on all users not to use Baichuan 2 models for any activities that harm national / social security or violate the law. Also, we ask users not to use Baichuan 2 models for Internet services that have not undergone appropriate security reviews and filings. We hope that all users can abide by this principle and ensure that the development of technology proceeds in a regulated and legal environment.

We have done our best to ensure the compliance of the data used in the model training process. However, despite our considerable efforts, there may still be some unforeseeable issues due to the complexity of the model and data. Therefore, if any problems arise due to the use of Baichuan 2 open-source models, including but not limited to data security issues, public opinion risks, or any risks and problems brought about by the model being misled, abused, spread or improperly exploited, we will not assume any responsibility.

[](#)
---------

 Baichuan 2  [Apache 2.0](https://github.com/baichuan-inc/Baichuan2/blob/main/LICENSE) [Baichuan 2 ](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/Baichuan%202%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf)Baichuan 2  Baichuan 2 

1.  DAU100
2.  
3.  

 [opensource@baichuan-inc.com](mailto:opensource@baichuan-inc.com) Baichuan 2 

The community usage of Baichuan 2 model requires adherence to [Apache 2.0](https://github.com/baichuan-inc/Baichuan2/blob/main/LICENSE) and [Community License for Baichuan2 Model](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/Baichuan%202%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf). The Baichuan 2 model supports commercial use. If you plan to use the Baichuan 2 model or its derivatives for commercial purposes, please ensure that your entity meets the following conditions:

1.  The Daily Active Users (DAU) of your or your affiliate's service or product is less than 1 million.
2.  Neither you nor your affiliates are software service providers or cloud service providers.
3.  There is no possibility for you or your affiliates to grant the commercial license given to you, to reauthorize it to other third parties without Baichuan's permission.

Upon meeting the above conditions, you need to submit the application materials required by the Baichuan 2 Model Community License Agreement via the following contact email: [opensource@baichuan-inc.com](mailto:opensource@baichuan-inc.com). Once approved, Baichuan will hereby grant you a non-exclusive, global, non-transferable, non-sublicensable, revocable commercial copyright license.

## Model overview

`Baichuan2-13B-Base` is a large language model developed by [Baichuan Intelligence inc.](https://www.baichuan-ai.com/), a leading AI research company in China. It is part of the Baichuan 2 series, which also includes 7B and 13B versions for both Base and Chat models, along with a 4bits quantized version for the Chat model. The Baichuan2-13B-Base model was trained on a high-quality corpus of 2.6 trillion tokens and has achieved state-of-the-art performance on authoritative Chinese and English benchmarks for models of the same size.

Compared to similar models like [Baichuan2-7B-Base](https://aimodels.fyi/models/huggingFace/baichuan2-7b-base-baichuan-inc), [Baichuan2-13B-Chat](https://aimodels.fyi/models/huggingFace/baichuan2-13b-chat-baichuan-inc), and [Baichuan-7B](https://aimodels.fyi/models/huggingFace/baichuan-7b-baichuan-inc), the Baichuan2-13B-Base model offers superior performance across a range of tasks and domains, including general language understanding, legal and medical applications, mathematics, code generation, and multilingual translation.

## Model inputs and outputs

### Inputs
- **Text**: The Baichuan2-13B-Base model can accept text inputs for tasks such as language generation, text completion, and question answering.

### Outputs
- **Text**: The model generates text outputs, which can be used for a variety of applications, such as dialogue, summarization, and content creation.

## Capabilities

The Baichuan2-13B-Base model demonstrates impressive capabilities across a wide range of tasks and domains. It has achieved state-of-the-art performance on authoritative Chinese and English benchmarks, outperforming models of similar size on metrics such as C-Eval, MMLU, CMMLU, Gaokao, and AGIEval. 

For example, on the C-Eval benchmark, the Baichuan2-13B-Base model scored 58.10, significantly higher than other models like GPT-4 (68.40), GPT-3.5 Turbo (51.10), and [Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base) (52.40). On the MMLU benchmark, it achieved a score of 59.17, again outperforming GPT-4 (83.93), GPT-3.5 Turbo (68.54), and other large language models.

## What can I use it for?

The Baichuan2-13B-Base model can be used for a wide range of applications, from content creation and dialogue generation to task-specific fine-tuning and domain-specific knowledge extraction. Given its strong performance on benchmarks, it could be particularly useful for applications that require in-depth language understanding, such as legal and medical research, scientific writing, and educational content generation.

Developers and researchers can also use the model for free in commercial applications after obtaining an official commercial license through [email request](mailto:opensource@baichuan-inc.com), provided that their entity meets the specified conditions outlined in the [Baichuan 2 Model Community License Agreement](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/Baichuan%202%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf).

## Things to try

One interesting aspect of the Baichuan2-13B-Base model is its ability to handle both Chinese and English content, as evidenced by its strong performance on benchmarks spanning these two languages. This makes it a potentially useful tool for applications that require cross-lingual understanding or translation, such as multilingual customer support, international business communications, or educational resources targeting diverse language learners.

Additionally, the model's strong performance on specialized domains like legal, medical, and mathematical tasks suggests it could be valuable for applications that require subject-matter expertise, such as legal research, medical diagnosis support, or advanced mathematical problem-solving.