[](#baichuan-7b)Baichuan-7B
===========================

Baichuan-7BTransformer1.2tokens704096benchmarkC-EVAL/MMLU

Baichuan-7BFinetune[Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B)

Baichuan-7B is an open-source large-scale pre-trained model developed by Baichuan Intelligent Technology. Based on the Transformer architecture, it is a model with 7 billion parameters trained on approximately 1.2 trillion tokens. It supports both Chinese and English, with a context window length of 4096. It achieves the best performance of its size on standard Chinese and English authoritative benchmarks (C-EVAL/MMLU).

If you wish to use Baichuan-7B (for inference, finetuning, etc.), we recommend using the accompanying code library [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B).

[](#why-use-baichuan-7b)Why use Baichuan-7B
-------------------------------------------

*   Baichuan-7BSOTAMMLU
    
*   Baichuan-7BC-EvalSOTA
    
*   LLaMABaichuan-7B
    
*   Among models of the same size, Baichuan-7B has achieved the current state-of-the-art (SOTA) level, as evidenced by the following MMLU metrics.
    
*   Baichuan-7B is trained on proprietary bilingual Chinese-English corpora, optimized for Chinese, and achieves SOTA performance on C-Eval.
    
*   Unlike LLaMA, which completely prohibits commercial use, Baichuan-7B employs a more lenient open-source license, allowing for commercial purposes.
    

[](#how-to-get-started-with-the-model)How to Get Started with the Model
-----------------------------------------------------------------------

Baichuan-7B1-shot"->"

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", device_map="auto", trust_remote_code=True)
    inputs = tokenizer('->\n->', return_tensors='pt')
    inputs = inputs.to('cuda:0')
    pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
    print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
    

The following is a task of performing 1-shot inference using Baichuan-7B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez"

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", device_map="auto", trust_remote_code=True)
    inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
    inputs = inputs.to('cuda:0')
    pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
    print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
    

[](#model-details)Model Details
-------------------------------

### [](#model-description)Model Description

*   **Developed by:** (Baichuan Intelligent Technology)
*   **Email**: [opensource@baichuan-inc.com](mailto:opensource@baichuan-inc.com)
*   **Language(s) (NLP):** Chinese/English
*   **License:** [Baichuan-7B License](https://huggingface.co/baichuan-inc/Baichuan-7B/blob/main/baichuan-7B%20%E6%A8%A1%E5%9E%8B%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf)

### [](#model-sources)Model Sources

TransformerLLaMA

*   **Position Embedding**rotary-embedding
*   **Feedforward Layer**SwiGLUFeedforward(8/3)11008
*   **Layer Normalization**: [RMSNorm](https://arxiv.org/abs/1910.07467)Pre-Normalization



Hyperparameter

Value

n\_parameters

7000559616

n\_layers

32

n\_heads

32

d\_model

4096

vocab size

64000

sequence length

4096

The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA:

*   Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
*   Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
*   Layer Normalization: Pre-Normalization based on [RMSNorm](https://arxiv.org/abs/1910.07467).

The specific parameters are as follows:

Hyperparameter

Value

n\_parameters

7000559616

n\_layers

32

n\_heads

32

d\_model

4096

vocab size

64000

sequence length

4096

[](#uses)Uses
-------------

### [](#downstream-use)Downstream Use

Finetune[Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B)

We have also open-sourced the training code that accompanies this model, allowing for efficient finetuning for downstream tasks. For more details, please refer to [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B).

### [](#out-of-scope-use)Out-of-Scope Use



Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
-----------------------------------------------------------

Baichuan-7BBaichuan-7B

Baichuan-7B can produce factually incorrect output, and should not be relied on to produce factually accurate information. Baichuan-7B was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

[](#training-details)Training Details
-------------------------------------

[Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B)

For specific training settings, please refer to [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B).

[](#evaluation)Evaluation
-------------------------

### [](#)

#### [](#c-eval)C-Eval

[CEval](https://cevalbenchmark.com/index.html)52devfew-shottest5-shot

Model 5-shot

Average

Avg(Hard)

STEM

Social Sciences

Humanities

Others

GPT-4

68.7

54.9

67.1

77.6

64.5

67.8

ChatGPT

54.4

41.4

52.9

61.8

50.9

53.6

Claude-v1.3

54.2

39.0

51.9

61.7

52.1

53.7

Claude-instant-v1.0

45.9

35.5

43.1

53.8

44.2

45.4

moss-moon-003-base (16B)

27.4

24.5

27.0

29.1

27.2

26.9

Ziya-LLaMA-13B-pretrain

30.2

22.7

27.7

34.4

32.0

28.9

LLaMA-7B-hf

27.1

25.9

27.1

26.8

27.9

26.3

ChatGLM-6B

34.5

23.1

30.4

39.6

37.4

34.5

Falcon-7B

25.8

24.3

25.8

26.0

25.8

25.6

Open-LLaMA-v2-pretrain (7B)

24.0

22.5

23.1

25.3

25.2

23.2

TigerBot-7B-base

25.7

27.0

27.3

24.7

23.4

26.1

Aquila-7B\*

25.5

25.2

25.6

24.6

25.2

26.6

BLOOM-7B

22.8

20.2

21.8

23.3

23.9

23.3

BLOOMZ-7B

35.7

25.8

31.3

43.5

36.6

35.6

**Baichuan-7B**

42.8

31.5

38.2

52.0

46.2

39.3

#### [](#gaokao)Gaokao

[Gaokao](https://github.com/ExpressAI/AI-Gaokao)  5-shot



Model

Average

Open-LLaMA-v2-pretrain

21.41

Ziya-LLaMA-13B-pretrain

23.17

Falcon-7B

23.98

TigerBot-7B-base

25.94

LLaMA-7B

27.81

ChatGLM-6B

21.41

BLOOM-7B

26.96

BLOOMZ-7B

28.72

Aquila-7B\*

24.39

**Baichuan-7B**

**36.24**

#### [](#agieval)AGIEval

[AGIEval](https://github.com/microsoft/AGIEval)  5-shot

Model

Average

Open-LLaMA-v2-pretrain

23.49

Ziya-LLaMA-13B-pretrain

27.64

Falcon-7B

27.18

TigerBot-7B-base

25.19

LLaMA-7B

28.17

ChatGLM-6B

23.49

BLOOM-7B

26.55

BLOOMZ-7B

30.27

Aquila-7B\*

25.58

**Baichuan-7B**

**34.44**

\*Aquila[](https://model.baai.ac.cn/model-detail/100098)

### [](#english-leaderboard)English Leaderboard

In addition to Chinese, we also tested the model's performance in English.

#### [](#mmlu)MMLU

[MMLU](https://arxiv.org/abs/2009.03300) is an English evaluation dataset that includes 57 multiple-choice tasks, covering elementary mathematics, American history, computer science, law, etc. The difficulty ranges from high school level to expert level, making it a mainstream LLM evaluation dataset.

We adopted the [open-source](/baichuan-inc/Baichuan-7B/blob/main/(https://github.com/hendrycks/test)) evaluation scheme, and the final 5-shot results are as follows:

Model

Humanities

Social Sciences

STEM

Other

Average

LLaMA-7B2

34.0

38.3

30.5

38.1

35.1

Falcon-7B1

\-

\-

\-

\-

35.0

mpt-7B1

\-

\-

\-

\-

35.6

ChatGLM-6B0

35.4

41.0

31.3

40.5

36.9

BLOOM 7B0

25.0

24.4

26.5

26.4

25.5

BLOOMZ 7B0

31.3

42.1

34.4

39.0

36.1

moss-moon-003-base (16B)0

24.2

22.8

22.4

24.4

23.6

moss-moon-003-sft (16B)0

30.5

33.8

29.3

34.4

31.9

**Baichuan-7B0**

38.4

48.9

35.6

48.1

42.3

The superscript in the Model column indicates the source of the results.

    0:reimplemented
    1:https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
    2:https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu
    

[](#our-group)Our Group
-----------------------

[![WeChat](https://github.com/baichuan-inc/Baichuan-13B/blob/main/media/wechat.jpeg?raw=true)](https://github.com/baichuan-inc/Baichuan-13B/blob/main/media/wechat.jpeg?raw=true)

## Model Overview

`Baichuan-7B` is an open-source large-scale pre-trained model developed by [Baichuan Intelligent Technology](https://aimodels.fyi/creators/huggingFace/baichuan-inc). Based on the Transformer architecture, it is a model with 7 billion parameters trained on approximately 1.2 trillion tokens. It supports both Chinese and English, with a context window length of 4096. Baichuan-7B achieves the best performance of its size on standard Chinese and English authoritative benchmarks (C-EVAL/MMLU), outperforming similar models like [BELLE-7B-2M](https://aimodels.fyi/models/huggingFace/belle-7b-2m-bellegroup) and LLaMA.

## Model Inputs and Outputs

Baichuan-7B is a text-to-text model, taking in prompts as input and generating relevant text as output. The model can handle both Chinese and English input, and the outputs are also in the corresponding language.

### Inputs
- Prompts or text in Chinese or English

### Outputs
- Generated text in Chinese or English, based on the input prompt

## Capabilities

Baichuan-7B has demonstrated strong performance on standard Chinese and English benchmarks, achieving state-of-the-art results for models of its size. It is particularly adept at tasks like language understanding, question answering, and text generation.

## What Can I Use it For?

The Baichuan-7B model can be used as a foundation for a wide range of natural language processing applications, such as chatbots, language translation, content generation, and more. Its strong performance on benchmarks and flexibility with both Chinese and English make it a valuable tool for developers and researchers working on multilingual AI projects.

## Things to Try

One interesting thing to try with Baichuan-7B is its ability to perform few-shot learning. By providing just a handful of relevant examples in the input prompt, the model can generate high-quality, contextual responses. This makes it a powerful tool for applications that require adaptability and rapid learning.