![DeepSeek Coder](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/pictures/logo.png?raw=true)

[\[Homepage\]](https://www.deepseek.com/) | [\[ Chat with DeepSeek Coder\]](https://coder.deepseek.com/) | [\[Discord\]](https://discord.gg/Tc7c45Zzu5) | [\[Wechat()\]](https://github.com/guoday/assert/blob/main/QR.png?raw=true)

* * *

### [](#1-introduction-of-deepseek-coder)1\. Introduction of Deepseek Coder

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.

*   **Massive Training Data**: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
    
*   **Highly Flexible & Scalable**: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
    
*   **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
    
*   **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.
    

### [](#2-model-summary)2\. Model Summary

deepseek-coder-33b-base is a 33B parameter model with Grouped-Query Attention trained on 2 trillion tokens.

*   **Home Page:** [DeepSeek](https://deepseek.com/)
*   **Repository:** [deepseek-ai/deepseek-coder](https://github.com/deepseek-ai/deepseek-coder)
*   **Chat With DeepSeek Coder:** [DeepSeek-Coder](https://coder.deepseek.com/)

### [](#3-how-to-use)3\. How to Use

Here give some examples of how to use our model.

#### [](#1code-completion)1Code Completion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-33b-base", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-33b-base", trust_remote_code=True).cuda()
    input_text = "#write a quick sort algorithm"
    inputs = tokenizer(input_text, return_tensors="pt").cuda()
    outputs = model.generate(**inputs, max_length=128)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    

#### [](#2code-insertion)2Code Insertion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-33b-base", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-33b-base", trust_remote_code=True).cuda()
    input_text = """<fimbegin>def quick_sort(arr):
        if len(arr) <= 1:
            return arr
        pivot = arr[0]
        left = []
        right = []
    <fimhole>
            if arr[i] < pivot:
                left.append(arr[i])
            else:
                right.append(arr[i])
        return quick_sort(left) + [pivot] + quick_sort(right)<fimend>"""
    inputs = tokenizer(input_text, return_tensors="pt").cuda()
    outputs = model.generate(**inputs, max_length=128)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])
    

#### [](#3repository-level-code-completion)3Repository Level Code Completion

    from transformers import AutoTokenizer, AutoModelForCausalLM
    tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-33b-base", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-33b-base", trust_remote_code=True).cuda()
    
    input_text = """#utils.py
    import torch
    from sklearn import datasets
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    from sklearn.metrics import accuracy_score
    
    def load_data():
        iris = datasets.load_iris()
        X = iris.data
        y = iris.target
    
        # Standardize the data
        scaler = StandardScaler()
        X = scaler.fit_transform(X)
    
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    
        # Convert numpy data to PyTorch tensors
        X_train = torch.tensor(X_train, dtype=torch.float32)
        X_test = torch.tensor(X_test, dtype=torch.float32)
        y_train = torch.tensor(y_train, dtype=torch.int64)
        y_test = torch.tensor(y_test, dtype=torch.int64)
        
        return X_train, X_test, y_train, y_test
    
    def evaluate_predictions(y_test, y_pred):
        return accuracy_score(y_test, y_pred)
    #model.py
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torch.utils.data import DataLoader, TensorDataset
    
    class IrisClassifier(nn.Module):
        def __init__(self):
            super(IrisClassifier, self).__init__()
            self.fc = nn.Sequential(
                nn.Linear(4, 16),
                nn.ReLU(),
                nn.Linear(16, 3)
            )
    
        def forward(self, x):
            return self.fc(x)
    
        def train_model(self, X_train, y_train, epochs, lr, batch_size):
            criterion = nn.CrossEntropyLoss()
            optimizer = optim.Adam(self.parameters(), lr=lr)
            
            # Create DataLoader for batches
            dataset = TensorDataset(X_train, y_train)
            dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
    
            for epoch in range(epochs):
                for batch_X, batch_y in dataloader:
                    optimizer.zero_grad()
                    outputs = self(batch_X)
                    loss = criterion(outputs, batch_y)
                    loss.backward()
                    optimizer.step()
    
        def predict(self, X_test):
            with torch.no_grad():
                outputs = self(X_test)
                _, predicted = outputs.max(1)
            return predicted.numpy()
    #main.py
    from utils import load_data, evaluate_predictions
    from model import IrisClassifier as Classifier
    
    def main():
        # Model training and evaluation
    """
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=140)
    print(tokenizer.decode(outputs[0]))
    

### [](#4-license)4\. License

This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.

See the [LICENSE-MODEL](https://github.com/deepseek-ai/deepseek-coder/blob/main/LICENSE-MODEL) for more details.

### [](#5-contact)5\. Contact

If you have any questions, please raise an issue or contact us at [agi\_code@deepseek.com](mailto:agi_code@deepseek.com).

## Model Overview

`deepseek-coder-33b-base` is a 33B parameter model with Grouped-Query Attention trained on 2 trillion tokens, including 87% code and 13% natural language in both English and Chinese. It is part of the DeepSeek Coder series, which offers various model sizes from 1B to 33B parameters to suit different user requirements. DeepSeek Coder models have shown state-of-the-art performance on multiple programming language benchmarks like HumanEval, MultiPL-E, MBPP, DS-1000, and APPS.

Similar models in the DeepSeek Coder series include the [6.7B parameter `deepseek-coder-6.7b-base`](https://aimodels.fyi/models/huggingFace/deepseek-coder-67b-base-deepseek-ai), the [33B parameter `deepseek-coder-33b-instruct`](https://aimodels.fyi/models/huggingFace/deepseek-coder-33b-instruct-deepseek-ai), and the [6.7B parameter `deepseek-coder-6.7b-instruct`](https://aimodels.fyi/models/huggingFace/deepseek-coder-67b-instruct-deepseek-ai). These models differ in size and whether they have been fine-tuned on instruction data in addition to the base pretraining.

## Model Inputs and Outputs

`deepseek-coder-33b-base` is a language model that can generate and complete code. It takes in text prompts as input and generates relevant code completions or continuations as output.

### Inputs
- Text prompts, such as:
  - Code stubs or partial code snippets
  - Natural language descriptions of desired code functionality
  - Queries about coding concepts or algorithms

### Outputs
- Completed or generated code, such as:
  - Filled-in code to complete a partial snippet
  - Novel code to implement a requested functionality
  - Explanations of coding concepts or algorithms

## Capabilities

`deepseek-coder-33b-base` demonstrates advanced code generation and completion capabilities, supported by its large-scale pretraining on a vast corpus of code and text data. It can assist with a variety of coding tasks, from implementing algorithms to explaining programming constructs.

For example, the model can take a prompt like "#write a quick sort algorithm" and generate a complete Python implementation of the quicksort algorithm. It can also fill in missing parts of code snippets to complete the functionality.

## What Can I Use It For?

`deepseek-coder-33b-base` can be leveraged for a wide range of applications that involve programming and code generation. Some potential use cases include:

- Developing intelligent code editors or IDEs that offer advanced code completion and generation features
- Building chatbots or virtual assistants that can engage in dialog about coding and provide programming help
- Automating repetitive coding tasks by generating boilerplate code or implementing common algorithms
- Enhancing software development productivity by assisting programmers with coding tasks

The model's scalability and strong performance make it well-suited for commercial use cases that require robust code generation capabilities.

## Things to Try

One interesting aspect of `deepseek-coder-33b-base` is its ability to work at the repository level, generating code that is coherent and consistent with the overall context of a codebase. You can try providing the model with a larger code context, such as imports, function definitions, and other supporting code, and see how it generates new functionality that seamlessly integrates with the existing structure.

Another area to explore is the model's handling of more complex coding challenges, such as implementing data structures and algorithms. You can provide it with prompts that require reasoning about edge cases, optimizations, and other advanced programming concepts to see the depth of its capabilities.