[](#phind-codellama-34b-v2)**Phind-CodeLlama-34B-v2**
=====================================================

We've fine-tuned Phind-CodeLlama-34B-v1 on an additional 1.5B tokens high-quality programming-related data, achieving **73.8% pass@1** on HumanEval. It's the current state-of-the-art amongst open-source models.

Furthermore, this model is **instruction-tuned** on the Alpaca/Vicuna format to be steerable and easy-to-use.

More details can be found on our [blog post](https://www.phind.com/blog/code-llama-beats-gpt4).

[](#model-details)Model Details
-------------------------------

This model is fine-tuned from Phind-CodeLlama-34B-v1 and achieves **73.8% pass@1** on HumanEval.

Phind-CodeLlama-34B-v2 is **multi-lingual** and is proficient in Python, C/C++, TypeScript, Java, and more.

[](#dataset-details)Dataset Details
-----------------------------------

We fined-tuned on a proprietary dataset of 1.5B tokens of high quality programming problems and solutions. This dataset consists of instruction-answer pairs instead of code completion examples, making it structurally different from HumanEval. LoRA was not used -- both models are a native finetune. We used DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in 15 hours on 32 A100-80GB GPUs. We used a sequence length of 4096 tokens.

[](#how-to-get-started-with-the-model)How to Get Started with the Model
-----------------------------------------------------------------------

Make sure to install Transformers from the main git branch:

    pip install git+https://github.com/huggingface/transformers.git
    

[](#how-to-prompt-the-model)How to Prompt the Model
---------------------------------------------------

This model accepts the Alpaca/Vicuna instruction format.

For example:

    ### System Prompt
    You are an intelligent programming assistant.
    
    ### User Message
    Implement a linked list in C++
    
    ### Assistant
    ...
    

[](#how-to-reproduce-humaneval-results)How to reproduce HumanEval Results
-------------------------------------------------------------------------

To reproduce our results:

    
    from transformers import AutoTokenizer, LlamaForCausalLM
    from human_eval.data import write_jsonl, read_problems
    from tqdm import tqdm
    
    # initialize the model
    
    model_path = "Phind/Phind-CodeLlama-34B-v2"
    model = LlamaForCausalLM.from_pretrained(model_path, device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    
    # HumanEval helper
    
    def generate_one_completion(prompt: str):
        tokenizer.pad_token = tokenizer.eos_token
        inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)
    
        # Generate
        generate_ids = model.generate(inputs.input_ids.to("cuda"), max_new_tokens=384, do_sample=True, top_p=0.75, top_k=40, temperature=0.1)
        completion = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
        completion = completion.replace(prompt, "").split("\n\n\n")[0]
    
        return completion
    
    # perform HumanEval
    problems = read_problems()
    
    num_samples_per_task = 1
    samples = [
        dict(task_id=task_id, completion=generate_one_completion(problems[task_id]["prompt"]))
        for task_id in tqdm(problems)
        for _ in range(num_samples_per_task)
    ]
    write_jsonl("samples.jsonl", samples)
    
    # run `evaluate_functional_correctness samples.jsonl` in your HumanEval code sandbox
    

[](#bias-risks-and-limitations)Bias, Risks, and Limitations
-----------------------------------------------------------

This model has undergone very limited testing. Additional safety testing should be performed before any real-world deployments.

[](#training-details)Training details
-------------------------------------

*   **Hardware Type:** 32x A100-80GB
*   **Hours used:** 480 GPU-hours
*   **Cloud Provider:** AWS
*   **Compute Region:** us-east-1

## Model overview

`Phind-CodeLlama-34B-v2` is a 34 billion parameter language model fine-tuned by [Phind](https://aimodels.fyi/creators/huggingFace/Phind) on 1.5B tokens of high-quality programming data. It achieves 73.8% pass@1 on the HumanEval benchmark, making it the current state-of-the-art open-source model for code generation. This model has been further instruction-tuned on the Alpaca/Vicuna format to be steerable and easy-to-use. It is comparable to other large language models like [CodeLlama-13b-Instruct-hf](https://aimodels.fyi/models/huggingFace/codellama-13b-instruct-hf-codellama) and [CodeLlama-7b-Instruct-hf](https://aimodels.fyi/models/huggingFace/codellama-7b-instruct-hf-codellama) from Meta, but with improved performance on programming tasks.

## Model inputs and outputs

### Inputs
- **Text prompts**: The model accepts text prompts in the Alpaca/Vicuna instruction format, where the user provides a task description or query for the model to respond to.

### Outputs
- **Generated text**: The model generates fluent text completions in response to the input prompts. It can produce code snippets, explanations, and solutions to programming problems.

## Capabilities

`Phind-CodeLlama-34B-v2` is a powerful code generation model that can handle a variety of programming tasks, from implementing data structures in C++ to solving algorithmic problems in Python. It demonstrates strong capabilities in areas like code completion, infilling, and following natural language instructions. The model is also multilingual, with proficiency in languages like Python, C/C++, TypeScript, and Java.

## What can I use it for?

This model can be used for a wide range of programming-related applications, such as building intelligent code assistants, automating code generation, and enhancing developer productivity. Potential use cases include:

- **Code completion**: Suggesting relevant code snippets or completions as a developer is writing code.
- **Code generation**: Generating full program solutions from high-level descriptions or requirements.
- **Prototyping and ideation**: Quickly exploring different coding approaches or solutions to problems.
- **Educational tools**: Assisting students in learning to code or understand programming concepts.
- **Technical content generation**: Automatically producing technical documentation, tutorials, or educational materials.

## Things to try

One interesting aspect of `Phind-CodeLlama-34B-v2` is its ability to follow natural language instructions and generate code that meets specific requirements. For example, you could prompt the model to "Implement a linked list in C++ that supports insertion, deletion, and search operations" and it would generate a working code solution. This makes the model well-suited for building AI-powered programming assistants that can understand and execute coding tasks.

Another intriguing capability is the model's multilingual proficiency. You could try prompting it with programming problems in different languages and observe how it handles the task. This could be useful for building applications that need to work across a variety of programming languages.