A 6B parameter open bilingual chat LLM | 开源双语对话语言模型

## Model overview

`ChatGLM3-6B` is an open-source bilingual conversational language model co-developed by Zhipu AI and Tsinghua University's KEG Lab. Building upon the strengths of previous ChatGLM models in conversational fluency and low deployment barriers, ChatGLM3-6B introduces several key enhancements:

- **Stronger base model**: The ChatGLM3-6B-Base model powering ChatGLM3-6B uses a more diverse training dataset, more extensive training steps, and a more robust training strategy. Evaluations on various datasets show that **ChatGLM3-6B-Base has the strongest performance among sub-10B base models**.
- **Expanded functionality**: ChatGLM3-6B adopts a new [prompt format](PROMPT.md) that supports not only normal multi-turn conversations, but also native [tool invocation](tools_using_demo/README.md) (Function Call), code execution (Code Interpreter), and agent tasks.
- **Comprehensive open-source series**: In addition to the conversational model [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b), the team has also open-sourced the base model [ChatGLM3-6B-Base](https://huggingface.co/THUDM/chatglm3-6b-base), the long-text conversation model [ChatGLM3-6B-32K](https://huggingface.co/THUDM/chatglm3-6b-32k), and the [ChatGLM3-6B-128K](https://huggingface.co/THUDM/chatglm3-6b-128k) model with further enhanced long-text understanding capabilities. All these weights are **completely open for academic research** and **free commercial use after registration**.

## Model inputs and outputs

### Inputs
- **Prompt**: The input prompt for the model to generate a response. The prompt should be organized using the specific format described in the [prompt guide](PROMPT.md).
- **Max Tokens**: The maximum number of new tokens to generate in the response.
- **Temperature**: A value controlling the randomness of the model's output. Higher values lead to more diverse but less coherent outputs.
- **Top P**: A value controlling the diversity of the model's output. Lower values lead to more focused and less diverse outputs.

### Outputs
The model will generate a response as a sequence of text tokens. The response can be used as the next part of a conversational exchange.

## Capabilities

`ChatGLM3-6B` has demonstrated strong performance across a wide range of tasks, including language understanding, reasoning, math, coding, and knowledge-intensive applications. Compared to previous ChatGLM models, it exhibits significant improvements in areas like long-form text generation, task-oriented dialogue, and multi-turn reasoning.

## What can I use it for?

`ChatGLM3-6B` is a versatile model that can be applied to a variety of natural language processing tasks, such as:

- **Conversational AI**: The model can be used to build intelligent conversational assistants that can engage in fluent and contextual dialogues.
- **Content Generation**: The model can generate high-quality text for applications like creative writing, summarization, and question-answering.
- **Code Generation and Interpretation**: The model can be used to generate, explain, and debug code across multiple programming languages.
- **Knowledge-Intensive Tasks**: The model can be fine-tuned for tasks that require deep understanding of a specific domain, such as financial analysis, scientific research, or legal reasoning.

## Things to try

Some key things to try with `ChatGLM3-6B` include:

- **Exploring the model's multi-turn dialogue capabilities**: Engage the model in a back-and-forth conversation and see how it maintains context and coherence.
- **Testing the model's reasoning and problem-solving skills**: Prompt the model with math problems, logical puzzles, or open-ended questions that require thoughtful analysis.
- **Evaluating the model's code generation and interpretation abilities**: Ask the model to write, explain, or debug code in various programming languages.
- **Experimenting with different prompting strategies**: Try different prompt formats, styles, and tones to see how the model's outputs vary.

By pushing the boundaries of what the model can do, you can uncover its strengths, limitations, and unique capabilities, and develop innovative applications that leverage the power of large language models.