[![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)

[](#table-of-contents)Table of Contents
=======================================

1.  [Model Summary](#model-summary)
2.  [Use](#use)
3.  [Training](#training)
4.  [Citation](#citation)

[](#model-summary)Model Summary
===============================

> OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper.

*   **Repository:** [bigcode-project/octopack](https://github.com/bigcode-project/octopack)
*   **Paper:** [OctoPack: Instruction Tuning Code Large Language Models](https://arxiv.org/abs/2308.07124)
*   **Languages:** 80+ Programming languages
*   **OctoPack:**
    
    Data
    
    [CommitPack](https://huggingface.co/datasets/bigcode/commitpack)
    
    4TB of GitHub commits across 350 programming languages
    
    [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft)
    
    Filtered version of CommitPack for high-quality commit messages that resemble instructions
    
    Model
    
    [OctoCoder](https://huggingface.co/bigcode/octocoder)
    
    StarCoder (16B parameters) instruction tuned on CommitPackFT + OASST
    
    [OctoGeeX](https://huggingface.co/bigcode/octogeex)
    
    CodeGeeX2 (6B parameters) instruction tuned on CommitPackFT + OASST
    
    Evaluation
    
    [HumanEvalPack](https://huggingface.co/datasets/bigcode/humanevalpack)
    
    Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages
    

[](#use)Use
===========

[](#intended-use)Intended use
-----------------------------

The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\\n\\nAnswer:"

**Feel free to share your generations in the Community tab!**

[](#generation)Generation
-------------------------

    # pip install -q transformers
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    checkpoint = "bigcode/octocoder"
    device = "cuda" # for GPU usage or "cpu" for CPU usage
    
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
    
    inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device)
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))
    

[](#training)Training
=====================

[](#model)Model
---------------

*   **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
*   **Steps:** 250k pretraining & 30 instruction tuning
*   **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
*   **Precision:** bfloat16

[](#hardware)Hardware
---------------------

*   **Pretraining:**
    *   **GPUs:** 512 Tesla A100
    *   **Training time:** 24 days
*   **Instruction tuning:**
    *   **GPUs:** 8 Tesla A100
    *   **Training time:** 4 hours

[](#software)Software
---------------------

*   **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
*   **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)

[](#citation)Citation
=====================

    @article{muennighoff2023octopack,
          title={OctoPack: Instruction Tuning Code Large Language Models}, 
          author={Niklas Muennighoff and Qian Liu and Armel Zebaze and Qinkai Zheng and Binyuan Hui and Terry Yue Zhuo and Swayam Singh and Xiangru Tang and Leandro von Werra and Shayne Longpre},
          journal={arXiv preprint arXiv:2308.07124},
          year={2023}
    }

## Model overview

`octocoder` is an instruction-tuned model with 15.5B parameters created by fine-tuning [StarCoder](https://aimodels.fyi/models/huggingFace/starcoder-bigcode) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [OASST](https://huggingface.co/datasets/bigcode/oasst) as described in the [OctoPack paper](https://arxiv.org/abs/2308.07124). It supports over 80 programming languages.

## Model inputs and outputs

### Inputs
- Text prompts that describe the desired programming task or instruction

### Outputs
- Generated code that attempts to fulfill the provided instruction or programming task

## Capabilities

`octocoder` can generate code in a variety of programming languages based on natural language prompts. It demonstrates strong task-completion abilities, being able to generate solutions for tasks like writing a function to perform bubble sort or printing "Hello, world!" in Python.

## What can I use it for?

`octocoder` can be used for a variety of software development and engineering tasks. Developers can leverage it to speed up prototyping, generate boilerplate code, or explore novel solutions to programming problems. Businesses may find it helpful for automating routine coding tasks or empowering non-technical users to create basic programs. However, the generated code is not guaranteed to be bug-free or optimized, so users should carefully review and test any outputs before deploying them.

## Things to try

One interesting aspect of `octocoder` is its ability to handle prompts that include specific technical requirements or constraints. For example, you could try providing a prompt like "Write a function in Python that sorts a list using the bubble sort algorithm" and see how the model responds. Exploring the model's handling of such detailed prompts can give you a sense of its capabilities and limitations.