[](#decilm-6b)DeciLM 6B
=======================

DeciLM 6B is a 5.7 billion parameter decoder-only text generation model. With a context window of 4096 tokens, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency. The model's architecture was generated using Deci's proprietary Neural Architecture Search-based technology, AutoNAC.

[](#model-details)Model Details
-------------------------------

### [](#model-description)Model Description

Deci developed and publically released the DeciLM 6B large language model, a pretrained, high-efficiency generative text model with 5.7 billion parameters. DeciLM 6B outpaces pretrained models in its class, with a throughput that's up to 15 times that of Llama 2 7B's. DeciLM-6B was further fine-tuned using [LoRA](https://arxiv.org/pdf/2106.09685.pdf) for instruction following on a subset of the OpenOrca dataset, creating [DeciLM 6B-Instruct](https://huggingface.co/Deci/DeciLM-6b-instruct)

*   **Developed by:** Deci
*   **Model type:** DeciLM is an auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention.
*   **Language(s) (NLP):** English
*   **License:** [Llama 2 Community License Agreement](https://huggingface.co/Deci/DeciLM-6b/blob/main/LICENSE.md) with an extention of Deci regarding hosting service providers.

[](#model-architecture)Model Architecture
-----------------------------------------

Parameters

Layers

Heads

Sequence Length

GQA num\_key\_value\_heads\*

Hidden Size

5.7B

32

32

4096

Variable

4096

\*AutoNAC was employed to optimize the selection of the GQA num\_key\_value\_heads for each layer of the model.

*   **Decoder layer:** Varible Grouped Query Attention. Grouped Query Attention (GQA) was introduced in [Ainslie et al., 2023](https://arxiv.org/abs/2305.13245)
*   **Position Embeddings:** Dynamic NTK Scaling Rotary Position Embeddings [Su et al., 2021](https://arxiv.org/abs/2104.09864)

### [](#model-sources)Model Sources

*   **Paper:** [DeciLM Technical Blog](https://deci.ai/blog/decilm-15-times-faster-than-llama2-nas-generated-llm-with-variable-gqa/?utm_campaign=repos&utm_source=hugging-face&utm_medium=model-card&utm_content=decilm-6b)
*   **Demo:** [DeciLM 6B Instruct Demo](https://huggingface.co/spaces/Deci/DeciLM-6b-instruct)
*   **Notebook:** [DeciLM 6B Notebook](https://colab.research.google.com/drive/1LugJCifOv0L426ukRHjOblBRWwUImAit)

[](#uses)Uses
-------------

The model is intended for commercial and research use in English and can be fine-tuned for use in other languages.

[](#how-to-get-started-with-the-model)How to Get Started with the Model
-----------------------------------------------------------------------

Use the code below to get started with the model.

    # pip install -q transformers
    
    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    checkpoint = "Deci/DeciLM-6b"
    device = "cuda" # for GPU usage or "cpu" for CPU usage
    
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device)
    
    inputs = tokenizer.encode("In a shocking finding, scientists discovered a herd of unicorns living in", return_tensors="pt").to(device)
    outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_p=0.95)
    print(tokenizer.decode(outputs[0]))
    

[](#training-details)Training Details
-------------------------------------

DeciLM 6B underwent training utilizing a subset of the SlimPajamas dataset, leveraging advanced proprietary methodologies allowing for fast training.

[](#evaluation)Evaluation
-------------------------

Below are DeciLM's 6B evaluation results.

Average

ARC Challenge\*

ARC Easy\*

BoolQ

HellaSwag\*

LAMBDA OpenAI

OpenBookQA

PIQA

TruthfulQA

Winogrande

60.33

42.06

70.02

71.01

74.58

69.78

34

77.09

36.19

68.03

Accuracy-norm score\*

### [](#runtime-benchmarks)Runtime Benchmarks

Inference Tool/Hardware

A10 (tokens/sec)

PyTorch

652.49

Infery LLM

2,029.6

*   Throughput (tokens/sec) - Measured with optimal batch - PyTorch BS 64, Infery LLM BS 128
*   In order to replicate the results of the PyTorch benchmark, use this [code example](https://huggingface.co/Deci/DeciLM-6b/blob/main/hf_benchmark_example.py)

[](#how-to-cite)How to Cite
---------------------------

Please cite this model using this format.

    @misc{DeciFoundationModels,
    title = {DeciLM 6B},
    author = {DeciAI Research Team},
    year = {2023}
    url={[https://huggingface.co/Deci/DeciLM-6b](https://huggingface.co/Deci/DeciLM-6b)},
    }

## Model overview

`DeciLM-6b` is a 5.7 billion parameter decoder-only text generation model developed by [Deci](https://aimodels.fyi/creators/huggingFace/Deci). With a context window of 4096 tokens, the highly efficient model uses variable Grouped-Query Attention (GQA) to achieve an optimal balance between performance and computational efficiency. The model's architecture was generated using Deci's proprietary Neural Architecture Search-based technology, AutoNAC.

`DeciLM-6b` outpaces pretrained models in its class, with a throughput that's up to 15 times that of [LLaMA 2 7B](https://aimodels.fyi/models/huggingFace/llama-2-7b-anthropic). It was further fine-tuned using [LoRA](https://arxiv.org/pdf/2106.09685.pdf) for instruction following on a subset of the OpenOrca dataset, creating [DeciLM 6B-Instruct](https://huggingface.co/Deci/DeciLM-6b-instruct).

## Model inputs and outputs

`DeciLM-6b` is a text generation model that takes text prompts as input and generates coherent, human-like text as output. The model can be used for a variety of text-based tasks, such as:

### Inputs
- Text prompts
- Context windows up to 4096 tokens

### Outputs
- Relevant, human-like text continuations
- Responses to instructions and queries

## Capabilities

`DeciLM-6b` is capable of generating high-quality, informative text across a range of topics. It can effectively handle tasks like:
- Summarizing information
- Answering questions
- Generating creative stories and narratives
- Translating text between languages
- Providing informative and engaging responses to prompts

The model's exceptional efficiency and throughput make it well-suited for applications that require fast, high-volume text generation.

## What can I use it for?

`DeciLM-6b` is a versatile model that can be applied to a variety of commercial and research use cases, such as:

- Content generation for websites, marketing materials, and social media
- Chatbots and virtual assistants
- Summarization and information extraction
- Educational and training applications
- Research into large language models and their capabilities

The model's open-source license and pre-trained weights make it easy to integrate into your own projects and applications.

## Things to try

One interesting aspect of `DeciLM-6b` is its use of variable Grouped-Query Attention (GQA), which allows the model to balance performance and efficiency. You could experiment with how adjusting the number of key-value heads in the GQA layers affects the model's capabilities and performance.

Additionally, the model's fine-tuning on the OpenOrca dataset for instruction following suggests that it may excel at tasks that require understanding and carrying out complex instructions. You could try providing the model with a variety of instruction-based prompts to see how it responds.