![DeepSeek Chat](https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/images/logo.png?raw=true)

[\[Homepage\]](https://www.deepseek.com/) | [\[ Chat with DeepSeek LLM\]](https://chat.deepseek.com/) | [\[Discord\]](https://discord.gg/Tc7c45Zzu5) | [\[Wechat()\]](https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/images/qr.jpeg)

* * *

### [](#1-introduction-of-deepseek-llm)1\. Introduction of Deepseek LLM

Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

### [](#2-model-summary)2\. Model Summary

`deepseek-llm-67b-base` is a 67B parameter model with Grouped-Query Attention trained on 2 trillion tokens from scratch.

*   **Home Page:** [DeepSeek](https://deepseek.com/)
*   **Repository:** [deepseek-ai/deepseek-LLM](https://github.com/deepseek-ai/deepseek-LLM)
*   **Chat With DeepSeek LLM:** [DeepSeek-LLM](https://chat.deepseek.com/)

### [](#3-how-to-use)3\. How to Use

Here give some examples of how to use our model.

#### [](#text-completion)Text Completion

    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
    
    model_name = "deepseek-ai/deepseek-llm-67b-base"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
    model.generation_config = GenerationConfig.from_pretrained(model_name)
    model.generation_config.pad_token_id = model.generation_config.eos_token_id
    
    text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
    
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(result)
    

### [](#4-license)4\. License

This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use.

See the [LICENSE-MODEL](https://github.com/deepseek-ai/deepseek-LLM/blob/main/LICENSE-MODEL) for more details.

### [](#5-contact)5\. Contact

If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).

## Model overview

The `deepseek-llm-67b-base` is a 67 billion parameter language model developed by DeepSeek AI. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. DeepSeek AI has also created smaller 7 billion parameter versions of their language model, including the [deepseek-llm-7b-chat](https://aimodels.fyi/models/huggingFace/deepseek-llm-7b-chat-deepseek-ai) model, which has been fine-tuned on additional instructional data. Additionally, the company has developed a series of code-focused models called [DeepSeek Coder](https://aimodels.fyi/creators/huggingFace/deepseek-ai), which range in size from 1.3 billion to 33 billion parameters and are tailored for programming tasks.

## Model inputs and outputs

The `deepseek-llm-67b-base` model is a text-to-text transformer that can be used for a variety of natural language processing tasks. It takes plain text as input and generates new text as output.

### Inputs
- **Text**: The model accepts any natural language text as input, such as sentences, paragraphs, or short passages.

### Outputs
- **Generated Text**: The model outputs new text that continues or is relevant to the input. This can include completions, continuations, or responses to the input text.

## Capabilities

The `deepseek-llm-67b-base` model has been trained on a massive corpus of text data, enabling it to engage in open-ended text generation on a wide range of topics. It can be used for tasks like question answering, summarization, translation, and creative writing. The model's large size and broad training data also allow it to demonstrate strong few-shot learning capabilities, where it can adapt to new tasks with only a small number of examples.

## What can I use it for?

The `deepseek-llm-67b-base` model and its smaller variants can be used for a variety of natural language processing applications. Some potential use cases include:

- **Content Generation**: Generating coherent and contextually relevant text for things like articles, stories, product descriptions, and marketing copy.
- **Conversational AI**: Building chatbots and virtual assistants that can engage in natural language dialog.
- **Summarization**: Producing concise summaries of long-form text, such as research papers or news articles.
- **Question Answering**: Answering open-ended questions by extracting relevant information from a knowledge base.
- **Code Generation**: The DeepSeek Coder models can be used to automatically generate, complete, and refine code snippets, as demonstrated in the provided examples.

## Things to try

One interesting aspect of the `deepseek-llm-67b-base` model is its ability to generate coherent and contextually relevant text even when provided with relatively little input. This few-shot learning capability allows the model to adapt to new tasks and domains with ease. Developers could experiment with prompting the model with just a sentence or two and see how it continues the narrative or responds to the input. Additionally, the code-focused DeepSeek Coder models present an opportunity to explore more advanced programming tasks, such as generating entire functions or refactoring existing code.