Pushing the Limits of Mathematical Reasoning in Open Language Models - Instruct model

## Model overview

`deepseek-math-7b-instruct` is an AI model developed by DeepSeek AI that aims to push the limits of mathematical reasoning in open language models. It is an instruct-tuned version of the base `deepseek-math-7b-base` model, which was initialized with the `deepseek-coder-7b-base-v1.5` model and then further pre-trained on math-related tokens from Common Crawl, along with natural language and code data. 

The [base model](https://aimodels.fyi/models/replicate/deepseek-math-7b-base-deepseek-ai) has achieved an impressive 51.7% score on the competition-level MATH benchmark, approaching the performance of Gemini-Ultra and GPT-4 without relying on external toolkits or voting techniques. The [instruct model](https://aimodels.fyi/models/replicate/deepseek-math-7b-instruct-deepseek-ai) and the [RL model](https://aimodels.fyi/models/replicate/deepseek-math-7b-rl-deepseek-ai) built on top of the base model further improve its mathematical problem-solving capabilities.

## Model inputs and outputs

### Inputs
- **text**: The input text, which can be a mathematical question or problem statement. For example: "what is the integral of x^2 from 0 to 2? Please reason step by step, and put your final answer within \boxed{}."
- **top_k**: The number of highest probability vocabulary tokens to keep for top-k-filtering.
- **top_p**: If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
- **temperature**: The value used to modulate the next token probabilities.
- **max_new_tokens**: The maximum number of tokens to generate, ignoring the number of tokens in the prompt.

### Outputs
The model generates a text response that provides a step-by-step solution and final answer to the input mathematical problem.

## Capabilities

The `deepseek-math-7b-instruct` model is capable of solving a wide range of mathematical problems, from basic arithmetic to advanced calculus and linear algebra. It can provide detailed, step-by-step reasoning and solutions without relying on external tools or resources.

The model has also demonstrated strong performance on other benchmarks, such as natural language understanding, reasoning, and programming. It can be used for tasks like answering math-related questions, generating proofs and derivations, and even writing code to solve mathematical problems.

## What can I use it for?

The `deepseek-math-7b-instruct` model can be useful for a variety of applications, including:

- **Educational tools**: The model can be integrated into educational platforms or tutoring systems to provide personalized, step-by-step math instruction and feedback to students.
- **Research and academic work**: Researchers and academics working in fields like mathematics, physics, or engineering can use the model to assist with problem-solving, proof generation, and other math-related tasks.
- **Business and finance**: The model can be used to automate the analysis of financial data, perform risk assessments, and support decision-making in various business domains.
- **AI and ML development**: The model's strong mathematical reasoning capabilities can be leveraged to build more robust and capable AI systems, particularly in domains that require advanced mathematical modeling and problem-solving.

## Things to try

Some ideas for things to try with the `deepseek-math-7b-instruct` model include:

- Posing a variety of mathematical problems, from basic arithmetic to advanced calculus and linear algebra, and observing the model's step-by-step reasoning and solutions.
- Exploring the model's performance on different mathematical benchmarks and datasets, and comparing it to other state-of-the-art models.
- Integrating the model into educational or research tools to enhance mathematical learning and problem-solving capabilities.
- Experimenting with different input parameters, such as top_k, top_p, and temperature, to observe their impact on the model's outputs.
- Investigating the model's ability to generate proofs, derivations, and other mathematical artifacts beyond just problem-solving.

Pushing the Limits of Mathematical Reasoning in Open Language Models - Base model

## Model overview

`deepseek-math-7b-base` is a large language model (LLM) developed by DeepSeek AI, a leading AI research company. The model is part of the DeepSeekMath series, which focuses on pushing the limits of mathematical reasoning in open language models. The base model is initialized with [DeepSeek-Coder-v1.5 7B](https://huggingface.co/deepseek-ai/deepseek-coder-7b-base-v1.5) and continues pre-training on math-related tokens from Common Crawl, natural language, and code data for a total of 500B tokens. This model has achieved an impressive score of 51.7% on the competition-level MATH benchmark, approaching the performance of Gemini-Ultra and GPT-4 without relying on external toolkits or voting techniques.

The DeepSeekMath series also includes instructed (`deepseek-math-7b-instruct`) and reinforcement learning (`deepseek-math-7b-rl`) variants, which demonstrate even stronger mathematical capabilities. The instructed model is derived from the base model with further mathematical training, while the RL model is trained on top of the instructed model using a novel Group Relative Policy Optimization (GRPO) algorithm.

## Model inputs and outputs

### Inputs
- **text**: The input text to be processed by the model, such as a mathematical problem or a natural language prompt.
- **top_k**: The number of highest probability vocabulary tokens to keep for top-k-filtering during text generation.
- **top_p**: If set to a float less than 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
- **temperature**: The value used to modulate the next token probabilities during text generation.
- **max_new_tokens**: The maximum number of new tokens to generate, ignoring the number of tokens in the prompt.

### Outputs
The model outputs a sequence of generated text, which can be a step-by-step solution to a mathematical problem, a natural language response to a prompt, or a combination of both.

## Capabilities

The `deepseek-math-7b-base` model demonstrates superior mathematical reasoning capabilities, outperforming existing open-source base models by more than 10% on the competition-level MATH dataset through few-shot chain-of-thought prompting. It also shows strong tool use ability, leveraging its foundations in [DeepSeek-Coder-Base-7B-v1.5](https://huggingface.co/deepseek-ai/deepseek-coder-7b-base-v1.5) to effectively solve and prove mathematical problems by writing programs. Additionally, the model achieves comparable performance to DeepSeek-Coder-Base-7B-v1.5 in natural language reasoning and coding tasks.

## What can I use it for?

The `deepseek-math-7b-base` model, along with its instructed and RL variants, can be used for a wide range of applications that require advanced mathematical reasoning and problem-solving abilities. Some potential use cases include:

- **Educational tools**: The model can be used to develop interactive math tutoring systems, homework assistants, or exam preparation tools.
- **Scientific research**: Researchers in fields like physics, engineering, or finance can leverage the model's mathematical capabilities to aid in problem-solving, data analysis, and theorem proving.
- **AI-powered productivity tools**: The model's ability to generate step-by-step solutions and write programs can be integrated into productivity tools to boost efficiency in various mathematical and technical tasks.
- **Conversational AI**: The model's natural language understanding and generation capabilities can be used to build advanced chatbots and virtual assistants that can engage in meaningful mathematical discussions.

## Things to try

One interesting aspect of the `deepseek-math-7b-base` model is its ability to tackle mathematical problems using a combination of step-by-step reasoning and tool use. Users can experiment with prompts that require the model to not only solve a problem but also explain its reasoning and, if necessary, write code to aid in the solution. This can help users better understand the model's unique approach to mathematical problem-solving.

Additionally, users can explore the model's performance on a diverse range of mathematical domains, from algebra and calculus to probability and statistics, to gain insights into its strengths and limitations. Comparing the model's outputs with those of human experts or other AI systems can also yield valuable insights.

![DeepSeek Coder](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/pictures/logo.png?raw=true)

[\[Homepage\]](https://www.deepseek.com/) | [\[ Chat with DeepSeek Coder\]](https://coder.deepseek.com/) | [\[Discord\]](https://discord.gg/Tc7c45Zzu5) | [\[Wechat()\]](https://github.com/guoday/assert/blob/main/QR.png?raw=true)

* * *

### [](#1-introduction-of-deepseek-coder)1\. Introduction of Deepseek Coder

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.

*   **Massive Training Data**: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
    
*   **Highly Flexible & Scalable**: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
    
*   **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
    
*   **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.
    

### [](#2-model-summary)2\. Model Summary

deepseek-coder-33b-instruct is a 33B parameter model initialized from deepseek-coder-33b-base and fine-tuned on 2B tokens of instruction data.

*   **Home Page:** [DeepSeek](https://deepseek.com/)
*   **Repository:** [deepseek-ai/deepseek-coder](https://github.com/deepseek-ai/deepseek-coder)
*   **Chat With DeepSeek Coder:** [DeepSeek-Coder](https://coder.deepseek.com/)

### [](#3-how-to-use)3\. How to Use

Here give some examples of how to use our model.

#### [](#chat-model-inference)Chat Model Inference

    from transformers import AutoTokenizer, AutoModelForCausalLM
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    messages=[
        { 'role': 'user', 'content': "write a quick sort algorithm in python."}
    ]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    # tokenizer.eos_token_id is the id of <|EOT|> token
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
    print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
    

### [](#4-license)4\. License

This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.

See the [LICENSE-MODEL](https://github.com/deepseek-ai/deepseek-coder/blob/main/LICENSE-MODEL) for more details.

### [](#5-contact)5\. Contact

If you have any questions, please raise an issue or contact us at [agi\_code@deepseek.com](mailto:agi_code@deepseek.com).

## Model overview

`deepseek-coder-33b-instruct` is a 33B parameter AI model developed by [DeepSeek AI](https://aimodels.fyi/creators/huggingFace/deepseek-ai) that is specialized for coding tasks. The model is composed of a series of code language models, each trained from scratch on 2T tokens with a composition of 87% code and 13% natural language in both English and Chinese. DeepSeek Coder offers various model sizes ranging from 1B to 33B parameters, enabling users to choose the setup best suited for their needs. The 33B version has been fine-tuned on 2B tokens of instruction data to enhance its coding capabilities.

Similar models include [StarCoder2-15B](https://aimodels.fyi/models/huggingFace/starcoder2-15b-bigcode), a 15B parameter model trained on 600+ programming languages, and [StarCoder](https://aimodels.fyi/models/huggingFace/starcoder-bigcode), a 15.5B parameter model trained on 80+ programming languages.

## Model inputs and outputs

### Inputs
- Free-form natural language instructions for coding tasks

### Outputs
- Relevant code snippets or completions in response to the input instructions

## Capabilities

`deepseek-coder-33b-instruct` has demonstrated state-of-the-art performance on a range of coding benchmarks, including HumanEval, MultiPL-E, MBPP, DS-1000, and APPS. The model's advanced code completion capabilities are enabled by a large 16K context window and a fill-in-the-blank training task, allowing it to handle project-level coding tasks.

## What can I use it for?

`deepseek-coder-33b-instruct` can be used for a variety of coding-related tasks, such as:

- Generating code snippets or completing partially written code based on natural language instructions
- Assisting with refactoring, debugging, or improving existing code
- Aiding in the development of new software applications by providing helpful code suggestions and insights

The flexibility of the model's different size versions allows users to choose the most suitable setup for their specific needs and resources.

## Things to try

One interesting aspect of `deepseek-coder-33b-instruct` is its ability to handle both English and Chinese inputs, making it a versatile tool for developers working in multilingual environments. You could try providing the model with instructions or prompts in both languages and observe how it responds.

Another interesting avenue to explore is the model's performance on more complex, multi-step coding tasks. By carefully crafting prompts that require the model to write, test, and refine code, you can push the boundaries of its capabilities and gain deeper insights into its strengths and limitations.

Platform did not provide a description for this model.

## Model overview

The `DeepSeek-V2-Chat` model is a text-to-text AI assistant developed by [deepseek-ai](https://aimodels.fyi/creators/huggingFace/deepseek-ai). It is similar to other large language models like [DeepSeek-V2](https://aimodels.fyi/models/huggingFace/deepseek-v2-deepseek-ai), [jais-13b-chat](https://aimodels.fyi/models/huggingFace/jais-13b-chat-core42), and [deepseek-vl-7b-chat](https://aimodels.fyi/models/huggingFace/deepseek-vl-7b-chat-deepseek-ai), which are also designed for conversational tasks.

## Model inputs and outputs

The `DeepSeek-V2-Chat` model takes in text-based inputs and generates text-based outputs, making it well-suited for a variety of language tasks. 

### Inputs
- Text prompts or questions from users

### Outputs
- Coherent and contextually-relevant responses to the user's input

## Capabilities

The `DeepSeek-V2-Chat` model can engage in open-ended conversations, answer questions, and assist with a wide range of language-based tasks. It demonstrates strong capabilities in natural language understanding and generation.

## What can I use it for?

The `DeepSeek-V2-Chat` model could be useful for building conversational AI assistants, chatbots, and other applications that require natural language interaction. It could also be fine-tuned for domain-specific tasks like customer service, education, or research assistance.

## Things to try

Experiment with the model by providing it with a variety of prompts and questions. Observe how it responds and note any interesting insights or capabilities. You can also try combining the `DeepSeek-V2-Chat` model with other AI systems or data sources to expand its functionality.

![DeepSeek-V2](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true)

* * *

 [![Homepage](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true)](https://www.deepseek.com/)[![Chat](https://img.shields.io/badge/%20Chat-DeepSeek%20V2-536af5?color=536af5&logoColor=white) ](https://chat.deepseek.com/)[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/deepseek-ai)

 [![Discord](https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da)](https://discord.gg/Tc7c45Zzu5)[![Wechat](https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white) ](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true)[![Twitter Follow](https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white)](https://twitter.com/deepseek_ai)

 [![Code License](https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53)](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-CODE)[![Model License](https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53)](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL)

[API Platform](#4-api-platform) | [How to Use](#5-how-to-run-locally) | [License](#6-license) |

[**Paper Link**](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/paper.pdf)

[](#deepseek-coder-v2-breaking-the-barrier-of-closed-source-models-in-code-intelligence)DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
============================================================================================================================================================================

[](#1-introduction)1\. Introduction
-----------------------------------

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.

![](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/figures/performance.png?raw=true)

In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found [here](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/supported_langs.txt).

[](#2-model-downloads)2\. Model Downloads
-----------------------------------------

We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the [DeepSeekMoE](https://arxiv.org/pdf/2401.06066) framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public.

**Model**

**#Total Params**

**#Active Params**

**Context Length**

**Download**

DeepSeek-Coder-V2-Lite-Base

16B

2.4B

128k

[ HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Base)

DeepSeek-Coder-V2-Lite-Instruct

16B

2.4B

128k

[ HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct)

DeepSeek-Coder-V2-Base

236B

21B

128k

[ HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Base)

DeepSeek-Coder-V2-Instruct

236B

21B

128k

[ HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct)

[](#3-chat-website)3\. Chat Website
-----------------------------------

You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: [coder.deepseek.com](https://coder.deepseek.com/sign_in)

[](#4-api-platform)4\. API Platform
-----------------------------------

We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/), and you can also pay-as-you-go at an unbeatable price.

![](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/figures/model_price.jpg?raw=true)

[](#5-how-to-run-locally)5\. How to run locally
-----------------------------------------------

**Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB\*8 GPUs are required.**

### [](#inference-with-huggingfaces-transformers)Inference with Huggingface's Transformers

You can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.

#### [](#code-completion)Code Completion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    input_text = "#write a quick sort algorithm"
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_length=128)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    

#### [](#code-insertion)Code Insertion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    input_text = """<fimbegin>def quick_sort(arr):
        if len(arr) <= 1:
            return arr
        pivot = arr[0]
        left = []
        right = []
    <fimhole>
            if arr[i] < pivot:
                left.append(arr[i])
            else:
                right.append(arr[i])
        return quick_sort(left) + [pivot] + quick_sort(right)<fimend>"""
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_length=128)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])
    

#### [](#chat-completion)Chat Completion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    messages=[
        { 'role': 'user', 'content': "write a quick sort algorithm in python."}
    ]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    # tokenizer.eos_token_id is the id of <|EOT|> token
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
    print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
    

The complete chat template can be found within `tokenizer_config.json` located in the huggingface model repository.

An example of chat template is as belows:

    <beginofsentence>User: {user_message_1}
    
    Assistant: {assistant_message_1}<endofsentence>User: {user_message_2}
    
    Assistant:
    

You can also add an optional system message:

    <beginofsentence>{system_message}
    
    User: {user_message_1}
    
    Assistant: {assistant_message_1}<endofsentence>User: {user_message_2}
    
    Assistant:
    

### [](#inference-with-vllm-recommended)Inference with vLLM (recommended)

To utilize [vLLM](https://github.com/vllm-project/vllm) for model inference, please merge this Pull Request into your vLLM codebase: [https://github.com/vllm-project/vllm/pull/4650](https://github.com/vllm-project/vllm/pull/4650).

    from transformers import AutoTokenizer
    from vllm import LLM, SamplingParams
    
    max_model_len, tp_size = 8192, 1
    model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
    sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
    
    messages_list = [
        [{"role": "user", "content": "Who are you?"}],
        [{"role": "user", "content": "write a quick sort algorithm in python."}],
        [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
    ]
    
    prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
    
    outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
    
    generated_text = [output.outputs[0].text for output in outputs]
    print(generated_text)
    

[](#6-license)6\. License
-------------------------

This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-CODE). The use of DeepSeek-Coder-V2 Base/Instruct models is subject to [the Model License](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-MODEL). DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use.

[](#7-contact)7\. Contact
-------------------------

If you have any questions, please raise an issue or contact us at [service@deepseek.com](/deepseek-ai/DeepSeek-Coder-V2-Instruct/blob/main/service@deepseek.com).

## Model overview

`DeepSeek-Coder-V2` is an open-source Mixture-of-Experts (MoE) code language model that builds upon the capabilities of the earlier `DeepSeek-V2` model. Compared to its predecessor, `DeepSeek-Coder-V2` demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. The model was further pre-trained from an intermediate checkpoint of `DeepSeek-V2` with an additional 6 trillion tokens, enhancing its coding and mathematical reasoning abilities while maintaining comparable performance in general language tasks.

One key distinction is that `DeepSeek-Coder-V2` expands its support for programming languages from 86 to 338, and extends the context length from 16K to 128K, making it a more flexible and powerful code intelligence tool. The model's impressive performance on benchmarks like HumanEval, MultiPL-E, MBPP, DS-1000, and APPS further underscores its capabilities compared to other open-source code models, as highlighted in the [paper](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/paper.pdf).

## Model inputs and outputs

`DeepSeek-Coder-V2` is a text-to-text model that can handle a wide range of code-related tasks, from code generation and completion to code understanding and reasoning. The model takes in natural language prompts or partial code snippets as input and generates relevant code or text outputs.

### Inputs
- Natural language prompts describing a coding task or problem
- Incomplete or partial code snippets that the model can complete or expand upon

### Outputs
- Generated code in a variety of programming languages
- Explanations or insights about the provided code
- Solutions to coding problems or challenges

## Capabilities

`DeepSeek-Coder-V2` demonstrates impressive capabilities in a variety of code-related tasks, including but not limited to:

- **Code Generation**: The model can generate complete, functioning code in response to natural language prompts, such as "Write a quicksort algorithm in Python."
- **Code Completion**: `DeepSeek-Coder-V2` can intelligently complete partially provided code, filling in the missing parts based on the context.
- **Code Understanding**: The model can analyze and explain existing code, providing insights into its logic, structure, and potential improvements.
- **Mathematical Reasoning**: In addition to coding skills, `DeepSeek-Coder-V2` also exhibits strong mathematical reasoning capabilities, making it a valuable tool for solving algorithmic problems.

## What can I use it for?

With its robust coding and reasoning abilities, `DeepSeek-Coder-V2` can be a valuable asset for a wide range of applications and use cases, including:

- **Automated Code Generation**: Developers can leverage the model to generate boilerplate code, implement common algorithms, or even create complete applications based on high-level requirements.
- **Code Assistance and Productivity Tools**: `DeepSeek-Coder-V2` can be integrated into IDEs or code editors to provide intelligent code completion, refactoring suggestions, and explanations, boosting developer productivity.
- **Educational and Training Applications**: The model can be used to create interactive coding exercises, tutorials, and learning resources for students and aspiring developers.
- **AI-powered Programming Assistants**: `DeepSeek-Coder-V2` can be the foundation for building advanced programming assistants that can engage in natural language dialogue, understand user intent, and provide comprehensive code-related support.

## Things to try

One interesting aspect of `DeepSeek-Coder-V2` is its ability to handle large-scale, project-level code contexts, thanks to its extended 128K context length. This makes the model well-suited for tasks like repository-level code completion, where it can intelligently predict and generate code based on the overall structure and context of a codebase.

Another intriguing use case is exploring the model's mathematical reasoning capabilities beyond just coding tasks. Developers can experiment with prompts that combine natural language and symbolic mathematical expressions, and observe how `DeepSeek-Coder-V2` responds in terms of problem-solving, derivations, and explanations.

Overall, the versatility and advanced capabilities of `DeepSeek-Coder-V2` make it a compelling open-source resource for a wide range of code-related applications and research endeavors.

![DeepSeek Coder](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/pictures/logo.png?raw=true)

[\[Homepage\]](https://www.deepseek.com/) | [\[ Chat with DeepSeek Coder\]](https://coder.deepseek.com/) | [\[Discord\]](https://discord.gg/Tc7c45Zzu5) | [\[Wechat()\]](https://github.com/guoday/assert/blob/main/QR.png?raw=true)

* * *

### [](#1-introduction-of-deepseek-coder)1\. Introduction of Deepseek Coder

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.

*   **Massive Training Data**: Trained from scratch fon 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
    
*   **Highly Flexible & Scalable**: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
    
*   **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
    
*   **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.
    

### [](#2-model-summary)2\. Model Summary

deepseek-coder-6.7b-instruct is a 6.7B parameter model initialized from deepseek-coder-6.7b-base and fine-tuned on 2B tokens of instruction data.

*   **Home Page:** [DeepSeek](https://deepseek.com/)
*   **Repository:** [deepseek-ai/deepseek-coder](https://github.com/deepseek-ai/deepseek-coder)
*   **Chat With DeepSeek Coder:** [DeepSeek-Coder](https://coder.deepseek.com/)

### [](#3-how-to-use)3\. How to Use

Here give some examples of how to use our model.

#### [](#chat-model-inference)Chat Model Inference

    from transformers import AutoTokenizer, AutoModelForCausalLM
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    messages=[
        { 'role': 'user', 'content': "write a quick sort algorithm in python."}
    ]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    # tokenizer.eos_token_id is the id of <|EOT|> token
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
    print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
    

### [](#4-license)4\. License

This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.

See the [LICENSE-MODEL](https://github.com/deepseek-ai/deepseek-coder/blob/main/LICENSE-MODEL) for more details.

### [](#5-contact)5\. Contact

If you have any questions, please raise an issue or contact us at [agi\_code@deepseek.com](mailto:agi_code@deepseek.com).

## Model Overview

`deepseek-coder-6.7b-instruct` is a 6.7B parameter language model developed by [DeepSeek AI](https://aimodels.fyi/creators/huggingFace/deepseek-ai) that has been fine-tuned on 2B tokens of instruction data. It is part of the DeepSeek Coder family of code models, which are composed of models ranging from 1B to 33B parameters, all trained from scratch on a massive 2T token corpus of 87% code and 13% natural language data in English and Chinese.

The DeepSeek Coder models, including the `deepseek-coder-6.7b-instruct` model, are designed to excel at coding tasks. They achieve state-of-the-art performance on benchmarks like HumanEval, MultiPL-E, MBPP, DS-1000, and APPS, thanks to their large training data and advanced architecture. The models leverage a 16K window size and a fill-in-the-blank task to support project-level code completion and infilling.

Other similar models in the DeepSeek Coder family include the [deepseek-coder-33b-instruct](https://aimodels.fyi/models/huggingFace/deepseek-coder-33b-instruct-deepseek-ai) model, which is a larger 33B parameter version, and the [Magicoder-S-DS-6.7B](https://aimodels.fyi/models/huggingFace/magicoder-s-ds-67b-ise-uiuc) model, which was fine-tuned from the `deepseek-coder-6.7b-base` model using a novel approach called OSS-Instruct to generate more diverse and realistic instruction data.

## Model Inputs and Outputs

### Inputs
- **Natural language instructions**: The model can take in natural language instructions or prompts related to coding tasks, such as "write a quick sort algorithm in python."

### Outputs
- **Generated code**: The model outputs the generated code that attempts to fulfill the provided instruction or prompt.

## Capabilities

The `deepseek-coder-6.7b-instruct` model is highly capable at a wide range of coding tasks, from writing algorithms and functions to generating entire programs. Due to its large training dataset and advanced architecture, the model is able to produce high-quality, contextual code that often performs well on benchmarks. 

For example, when prompted to "write a quick sort algorithm in python", the model can generate the following code:

```python
def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)
```

This demonstrates the model's ability to understand coding concepts and generate complete, working solutions to algorithmic problems.

## What Can I Use It For?

The `deepseek-coder-6.7b-instruct` model can be leveraged for a variety of coding-related applications and tasks, such as:

- **Code generation**: Automatically generate code snippets, functions, or even entire programs based on natural language instructions or prompts.
- **Code completion**: Use the model to intelligently complete partially written code, suggesting the most relevant and appropriate next steps.
- **Code refactoring**: Leverage the model to help refactor existing code, improving its structure, readability, and performance.
- **Prototyping and ideation**: Quickly generate code to explore and experiment with new ideas, without having to start from scratch.

Companies or developers working on tools and applications related to software development, coding, or programming could potentially use this model to enhance their offerings and improve developer productivity.

## Things to Try

Some interesting things to try with the `deepseek-coder-6.7b-instruct` model include:

- **Exploring different programming languages**: Test the model's capabilities across a variety of programming languages, not just Python, to see how it performs.
- **Prompting for complex algorithms and architectures**: Challenge the model with more advanced coding tasks, like generating entire software systems or complex data structures, to push the limits of its abilities.
- **Combining with other tools**: Integrate the model into your existing development workflows and tools, such as IDEs or code editors, to streamline and enhance the coding process.
- **Experimenting with fine-tuning**: Try fine-tuning the model on your own datasets or tasks to further customize its performance for your specific needs.

By exploring the full range of the `deepseek-coder-6.7b-instruct` model's capabilities, you can unlock new possibilities for improving and automating your coding workflows.

## Model overview

`DeepSeek-V2` is a text-to-image AI model developed by [deepseek-ai](https://aimodels.fyi/creators/huggingFace/deepseek-ai). It is similar to other popular text-to-image models like [stable-diffusion](https://aimodels.fyi/models/huggingFace/stable-diffusion-stability-ai) and the [DeepSeek-VL](https://aimodels.fyi/models/huggingFace/deepseek-vl-7b-base-lucataco) series, which are capable of generating photo-realistic images from text prompts. The `DeepSeek-V2` model is designed for real-world vision and language understanding applications.

## Model inputs and outputs

### Inputs
- Text prompts that describe the desired image

### Outputs
- Photorealistic images generated based on the input text prompts

## Capabilities

`DeepSeek-V2` can generate a wide variety of images from detailed text descriptions, including logical diagrams, web pages, formula recognition, scientific literature, natural images, and more. It has been trained on a large corpus of vision and language data to develop robust multimodal understanding capabilities.

## What can I use it for?

The `DeepSeek-V2` model can be used for a variety of applications that require generating images from text, such as content creation, product visualization, data visualization, and even creative projects. Developers and businesses can leverage this model to automate image creation, enhance design workflows, and provide more engaging visual experiences for their users.

## Things to try

One interesting thing to try with `DeepSeek-V2` is generating images that combine both abstract and concrete elements, such as a futuristic cityscape with floating holographic displays. Another idea is to use the model to create visualizations of complex scientific or technical concepts, making them more accessible and understandable.

![DeepSeek-V2](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true)

* * *

 [![Homepage](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true)](https://www.deepseek.com/)[![Chat](https://img.shields.io/badge/%20Chat-DeepSeek%20V2-536af5?color=536af5&logoColor=white) ](https://chat.deepseek.com/)[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/deepseek-ai)

 [![Discord](https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da)](https://discord.gg/Tc7c45Zzu5)[![Wechat](https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white) ](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true)[![Twitter Follow](https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white)](https://twitter.com/deepseek_ai)

 [![Code License](https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53)](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-CODE)[![Model License](https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53)](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL)

[API Platform](#4-api-platform) | [How to Use](#5-how-to-run-locally) | [License](#6-license) |

[**Paper Link**](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/paper.pdf)

[](#deepseek-coder-v2-breaking-the-barrier-of-closed-source-models-in-code-intelligence)DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
============================================================================================================================================================================

[](#1-introduction)1\. Introduction
-----------------------------------

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.

![](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/figures/performance.png?raw=true)

In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found [here](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/supported_langs.txt).

[](#2-model-downloads)2\. Model Downloads
-----------------------------------------

We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the [DeepSeekMoE](https://arxiv.org/pdf/2401.06066) framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public.

**Model**

**#Total Params**

**#Active Params**

**Context Length**

**Download**

DeepSeek-Coder-V2-Lite-Base

16B

2.4B

128k

[ HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Base)

DeepSeek-Coder-V2-Lite-Instruct

16B

2.4B

128k

[ HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct)

DeepSeek-Coder-V2-Base

236B

21B

128k

[ HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Base)

DeepSeek-Coder-V2-Instruct

236B

21B

128k

[ HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct)

[](#3-chat-website)3\. Chat Website
-----------------------------------

You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: [coder.deepseek.com](https://coder.deepseek.com/sign_in)

[](#4-api-platform)4\. API Platform
-----------------------------------

We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/), and you can also pay-as-you-go at an unbeatable price.

![](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/figures/model_price.jpg?raw=true)

[](#5-how-to-run-locally)5\. How to run locally
-----------------------------------------------

**Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB\*8 GPUs are required.**

### [](#inference-with-huggingfaces-transformers)Inference with Huggingface's Transformers

You can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.

#### [](#code-completion)Code Completion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    input_text = "#write a quick sort algorithm"
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_length=128)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    

#### [](#code-insertion)Code Insertion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    input_text = """<fimbegin>def quick_sort(arr):
        if len(arr) <= 1:
            return arr
        pivot = arr[0]
        left = []
        right = []
    <fimhole>
            if arr[i] < pivot:
                left.append(arr[i])
            else:
                right.append(arr[i])
        return quick_sort(left) + [pivot] + quick_sort(right)<fimend>"""
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_length=128)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])
    

#### [](#chat-completion)Chat Completion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
    messages=[
        { 'role': 'user', 'content': "write a quick sort algorithm in python."}
    ]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    # tokenizer.eos_token_id is the id of <|EOT|> token
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
    print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
    

The complete chat template can be found within `tokenizer_config.json` located in the huggingface model repository.

An example of chat template is as belows:

    <beginofsentence>User: {user_message_1}
    
    Assistant: {assistant_message_1}<endofsentence>User: {user_message_2}
    
    Assistant:
    

You can also add an optional system message:

    <beginofsentence>{system_message}
    
    User: {user_message_1}
    
    Assistant: {assistant_message_1}<endofsentence>User: {user_message_2}
    
    Assistant:
    

### [](#inference-with-vllm-recommended)Inference with vLLM (recommended)

To utilize [vLLM](https://github.com/vllm-project/vllm) for model inference, please merge this Pull Request into your vLLM codebase: [https://github.com/vllm-project/vllm/pull/4650](https://github.com/vllm-project/vllm/pull/4650).

    from transformers import AutoTokenizer
    from vllm import LLM, SamplingParams
    
    max_model_len, tp_size = 8192, 1
    model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
    sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
    
    messages_list = [
        [{"role": "user", "content": "Who are you?"}],
        [{"role": "user", "content": "write a quick sort algorithm in python."}],
        [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
    ]
    
    prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
    
    outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
    
    generated_text = [output.outputs[0].text for output in outputs]
    print(generated_text)
    

[](#6-license)6\. License
-------------------------

This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-CODE). The use of DeepSeek-Coder-V2 Base/Instruct models is subject to [the Model License](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-MODEL). DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use.

[](#7-contact)7\. Contact
-------------------------

If you have any questions, please raise an issue or contact us at [service@deepseek.com](/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct/blob/main/service@deepseek.com).

## Model overview

`DeepSeek-Coder-V2-Lite-Instruct` is an open-source Mixture-of-Experts (MoE) code language model developed by [deepseek-ai](https://aimodels.fyi/creators/huggingFace/deepseek-ai) that achieves performance comparable to GPT4-Turbo in code-specific tasks. It is further pre-trained from an intermediate checkpoint of `DeepSeek-V2` with an additional 6 trillion tokens, substantially enhancing the coding and mathematical reasoning capabilities while maintaining comparable performance in general language tasks. Compared to `DeepSeek-Coder-33B`, `DeepSeek-Coder-V2` demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, it expands support for programming languages from 86 to 338 and extends the context length from 16K to 128K.

The model is part of a series of code language models from DeepSeek, including `deepseek-coder-1.3b-instruct`, `deepseek-coder-6.7b-instruct`, and `deepseek-coder-33b-instruct`, which are trained from scratch on 2 trillion tokens with 87% code and 13% natural language data in English and Chinese.

## Model inputs and outputs

### Inputs
- Raw text input for code completion, code insertion, and chat completion tasks.

### Outputs
- Completed or generated code based on the input prompt.
- Responses to chat prompts, including code-related tasks.

## Capabilities

`DeepSeek-Coder-V2-Lite-Instruct` demonstrates state-of-the-art performance on code-related benchmarks such as HumanEval, MultiPL-E, MBPP, DS-1000, and APPS, outperforming closed-source models like GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro. It can handle a wide range of programming languages, from Python and C++ to more exotic languages, and can assist with tasks like code completion, code generation, code refactoring, and even mathematical reasoning.

## What can I use it for?

You can use `DeepSeek-Coder-V2-Lite-Instruct` for a variety of code-related tasks, such as:

- **Code completion**: The model can suggest relevant code completions to help speed up the coding process.
- **Code generation**: Given a description or high-level requirements, the model can generate working code snippets.
- **Code refactoring**: The model can help restructure and optimize existing code for improved performance and maintainability.
- **Programming tutorials and education**: The model can be used to generate explanations, examples, and step-by-step guides for learning programming concepts and techniques.
- **Chatbot integration**: The model's capabilities can be integrated into chatbots or virtual assistants to provide code-related support and assistance.

By leveraging the open-source nature and strong performance of `DeepSeek-Coder-V2-Lite-Instruct`, developers and companies can build innovative applications and services that leverage the model's advanced code intelligence capabilities.

## Things to try

One interesting aspect of `DeepSeek-Coder-V2-Lite-Instruct` is its ability to handle long-range dependencies and project-level code understanding. Try providing the model with a partially complete codebase and see how it can fill in the missing pieces or suggest relevant code additions to complete the project. Additionally, experiment with the model's versatility by challenging it with code problems in a wide range of programming languages, not just the typical suspects like Python and Java.

[](#1-introduction)1\. Introduction
-----------------------------------

Introducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.

[DeepSeek-VL: Towards Real-World Vision-Language Understanding](https://arxiv.org/abs/2403.05525)

[**Github Repository**](https://github.com/deepseek-ai/DeepSeek-VL)

Haoyu Lu\*, Wen Liu\*, Bo Zhang\*\*, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan (\*Equal Contribution, \*\*Project Lead)

[![](https://github.com/deepseek-ai/DeepSeek-VL/blob/main/images/sample.jpg)](https://github.com/deepseek-ai/DeepSeek-VL/blob/main/images/sample.jpg)

### [](#2-model-summary)2\. Model Summary

DeepSeek-VL-7b-base uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) and [SAM-B](https://huggingface.co/facebook/sam-vit-base) as the hybrid vision encoder supporting 1024 x 1024 image input and is constructed based on the DeepSeek-LLM-7b-base which is trained on an approximate corpus of 2T text tokens. The whole DeepSeek-VL-7b-base model is finally trained around 400B vision-language tokens. DeekSeel-VL-7b-chat is an instructed version based on [DeepSeek-VL-7b-base](https://huggingface.co/deepseek-ai/deepseek-vl-7b-base).

[](#3-quick-start)3\. Quick Start
---------------------------------

### [](#installation)Installation

On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command:

    git clone https://github.com/deepseek-ai/DeepSeek-VL
    cd DeepSeek-VL
    
    pip install -e .
    

### [](#simple-inference-example)Simple Inference Example

    import torch
    from transformers import AutoModelForCausalLM
    
    from deepseek_vl.models import VLChatProcessor, MultiModalityCausalLM
    from deepseek_vl.utils.io import load_pil_images
    
    
    # specify the path to the model
    model_path = "deepseek-ai/deepseek-vl-7b-chat"
    vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
    tokenizer = vl_chat_processor.tokenizer
    
    vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
    vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
    
    conversation = [
        {
            "role": "User",
            "content": "<image_placeholder>Describe each stage of this image.",
            "images": ["./images/training_pipelines.png"]
        },
        {
            "role": "Assistant",
            "content": ""
        }
    ]
    
    # load images and prepare for inputs
    pil_images = load_pil_images(conversation)
    prepare_inputs = vl_chat_processor(
        conversations=conversation,
        images=pil_images,
        force_batchify=True
    ).to(vl_gpt.device)
    
    # run image encoder to get the image embeddings
    inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
    
    # run the model to get the response
    outputs = vl_gpt.language_model.generate(
        inputs_embeds=inputs_embeds,
        attention_mask=prepare_inputs.attention_mask,
        pad_token_id=tokenizer.eos_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        max_new_tokens=512,
        do_sample=False,
        use_cache=True
    )
    
    answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
    print(f"{prepare_inputs['sft_format'][0]}", answer)
    

### [](#cli-chat)CLI Chat

    
    python cli_chat.py --model_path "deepseek-ai/deepseek-vl-7b-chat"
    
    # or local path
    python cli_chat.py --model_path "local model path"
    

[](#4-license)4\. License
-------------------------

This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of DeepSeek-VL Base/Chat models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL). DeepSeek-VL series (including Base and Chat) supports commercial use.

[](#5-citation)5\. Citation
---------------------------

    @misc{lu2024deepseekvl,
          title={DeepSeek-VL: Towards Real-World Vision-Language Understanding}, 
          author={Haoyu Lu and Wen Liu and Bo Zhang and Bingxuan Wang and Kai Dong and Bo Liu and Jingxiang Sun and Tongzheng Ren and Zhuoshu Li and Yaofeng Sun and Chengqi Deng and Hanwei Xu and Zhenda Xie and Chong Ruan},
          year={2024},
          eprint={2403.05525},
          archivePrefix={arXiv},
          primaryClass={cs.AI}
    }
    

[](#6-contact)6\. Contact
-------------------------

If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).

## Model overview

`deepseek-vl-7b-chat` is an instructed version of the `deepseek-vl-7b-base` model, which is an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. The `deepseek-vl-7b-base` model uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) and [SAM-B](https://huggingface.co/facebook/sam-vit-base) as the hybrid vision encoder, and is constructed based on the `deepseek-llm-7b-base` model, which is trained on an approximate corpus of 2T text tokens. The whole `deepseek-vl-7b-base` model is finally trained around 400B vision-language tokens.

The `deepseek-vl-7b-chat` model is an instructed version of the `deepseek-vl-7b-base` model, making it capable of engaging in real-world vision and language understanding applications, including processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.

## Model inputs and outputs

### Inputs
- **Image**: The model can take images as input, supporting a resolution of up to 1024 x 1024.
- **Text**: The model can also take text as input, allowing for multimodal understanding and interaction.

### Outputs
- **Text**: The model can generate relevant and coherent text responses based on the provided image and/or text inputs.
- **Bounding Boxes**: The model can also output bounding boxes, enabling it to localize and identify objects or regions of interest within the input image.

## Capabilities

`deepseek-vl-7b-chat` has impressive capabilities in tasks such as visual question answering, image captioning, and multimodal understanding. For example, the model can accurately describe the content of an image, answer questions about it, and even draw bounding boxes around relevant objects or regions.

## What can I use it for?

The `deepseek-vl-7b-chat` model can be utilized in a variety of real-world applications that require vision and language understanding, such as:

- **Content Moderation**: The model can be used to analyze images and text for inappropriate or harmful content.
- **Visual Assistance**: The model can help visually impaired users by describing images and answering questions about their contents.
- **Multimodal Search**: The model can be used to develop search engines that can understand and retrieve relevant information from both text and visual sources.
- **Education and Training**: The model can be used to create interactive educational materials that combine text and visuals to enhance learning.

## Things to try

One interesting thing to try with `deepseek-vl-7b-chat` is its ability to engage in multi-round conversations about images. By providing the model with an image and a series of follow-up questions or prompts, you can explore its understanding of the visual content and its ability to reason about it over time. This can be particularly useful for tasks like visual task planning, where the model needs to comprehend the scene and take multiple steps to achieve a goal.

Another interesting aspect to explore is the model's performance on specialized tasks like formula recognition or scientific literature understanding. By providing it with relevant inputs, you can assess its capabilities in these domains and see how it compares to more specialized models.

![DeepSeek Chat](https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/images/logo.png?raw=true)

[\[Homepage\]](https://www.deepseek.com/) | [\[ Chat with DeepSeek LLM\]](https://chat.deepseek.com/) | [\[Discord\]](https://discord.gg/Tc7c45Zzu5) | [\[Wechat()\]](https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/images/qr.jpeg)

* * *

### [](#1-introduction-of-deepseek-llm)1\. Introduction of Deepseek LLM

Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

### [](#2-model-summary)2\. Model Summary

`deepseek-llm-67b-chat` is a 67B parameter model initialized from `deepseek-llm-67b-base` and fine-tuned on extra instruction data.

*   **Home Page:** [DeepSeek](https://deepseek.com/)
*   **Repository:** [deepseek-ai/deepseek-LLM](https://github.com/deepseek-ai/deepseek-LLM)
*   **Chat With DeepSeek LLM:** [DeepSeek-LLM](https://chat.deepseek.com/)

### [](#3-how-to-use)3\. How to Use

Here give some examples of how to use our model.

#### [](#chat-completion)Chat Completion

    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
    
    model_name = "deepseek-ai/deepseek-llm-67b-chat"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
    model.generation_config = GenerationConfig.from_pretrained(model_name)
    model.generation_config.pad_token_id = model.generation_config.eos_token_id
    
    messages = [
        {"role": "user", "content": "Who are you?"}
    ]
    input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
    outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
    
    result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
    print(result)
    

Avoiding the use of the provided function `apply_chat_template`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input.

    User: {messages[0]['content']}
    
    Assistant: {messages[1]['content']}<endofsentence>User: {messages[2]['content']}
    
    Assistant:
    

**Note:** By default (`add_special_tokens=True`), our tokenizer automatically adds a `bos_token` (`<beginofsentence>`) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input.

### [](#4-license)4\. License

This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use.

See the [LICENSE-MODEL](https://github.com/deepseek-ai/deepseek-LLM/blob/main/LICENSE-MODEL) for more details.

### [](#5-contact)5\. Contact

If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).

## Model overview

`deepseek-llm-67b-chat` is a 67 billion parameter language model created by [DeepSeek AI](https://aimodels.fyi/creators/huggingFace/deepseek-ai). It is an advanced model trained on a vast dataset of 2 trillion tokens in both English and Chinese. The model is fine-tuned on extra instruction data compared to the `deepseek-llm-67b-base` version, making it well-suited for conversational tasks.

Similar models include the `deepseek-coder-6.7b-instruct` and `deepseek-coder-33b-instruct` models, which are specialized for code generation and programming tasks. These models were also developed by DeepSeek AI and have shown state-of-the-art performance on various coding benchmarks.

## Model inputs and outputs

### Inputs
- **Text Prompts**: The model accepts natural language text prompts as input, which can include instructions, questions, or statements.
- **Chat History**: The model can maintain a conversation history, allowing it to provide coherent and contextual responses.

### Outputs
- **Text Generations**: The primary output of the model is generated text, which can range from short responses to longer form paragraphs or essays.

## Capabilities

The `deepseek-llm-67b-chat` model is capable of engaging in open-ended conversations, answering questions, and generating coherent text on a wide variety of topics. It has demonstrated strong performance on benchmarks evaluating language understanding, reasoning, and generation.

## What can I use it for?

The `deepseek-llm-67b-chat` model can be used for a variety of applications, such as:

- **Conversational AI Assistants**: The model can be used to power intelligent chatbots and virtual assistants that can engage in natural dialogue.
- **Content Generation**: The model can be used to generate text for articles, stories, or other creative writing tasks.
- **Question Answering**: The model can be used to answer questions on a wide range of topics, making it useful for educational or research applications.

## Things to try

One interesting aspect of the `deepseek-llm-67b-chat` model is its ability to maintain context and engage in multi-turn conversations. You can try providing the model with a series of related prompts and see how it responds, building upon the prior context. This can help showcase the model's coherence and understanding of the overall dialogue.

Another thing to explore is the model's performance on specialized tasks, such as code generation or mathematical problem-solving. By fine-tuning or prompting the model appropriately, you may be able to unlock additional capabilities beyond open-ended conversation.