[](#finetune-mistral-gemma-llama-2-5x-faster-with-70-less-memory-via-unsloth)Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!
=========================================================================================================================================================

Directly quantized 4bit model with `bitsandbytes`. Built with Meta Llama 3

We have a Google Colab Tesla T4 notebook for Llama-3 8b here: [https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing](https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing)

[![](https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png)](https://discord.gg/u54VK8m8tk) [![](https://raw.githubusercontent.com/unslothai/unsloth/main/images/buy%20me%20a%20coffee%20button.png)](https://ko-fi.com/unsloth) [![](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)](https://github.com/unslothai/unsloth)

[](#-finetune-for-free) Finetune for Free
-------------------------------------------

All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

Unsloth supports

Free Notebooks

Performance

Memory use

**Llama-3 8b**

[ Start on Colab](https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing)

2.4x faster

58% less

**Gemma 7b**

[ Start on Colab](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing)

2.4x faster

58% less

**Mistral 7b**

[ Start on Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing)

2.2x faster

62% less

**Llama-2 7b**

[ Start on Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)

2.2x faster

43% less

**TinyLlama**

[ Start on Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)

3.9x faster

74% less

**CodeLlama 34b** A100

[ Start on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)

1.9x faster

27% less

**Mistral 7b** 1xT4

[ Start on Kaggle](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)

5x faster\*

62% less

**DPO - Zephyr**

[ Start on Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)

1.9x faster

19% less

*   This [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing) is useful for ShareGPT ChatML / Vicuna templates.
*   This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
*   \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.

## Model overview

The `llama-3-8b-bnb-4bit` model is a version of the Meta Llama 3 language model that has been quantized to 4-bit precision using the `bitsandbytes` library. This model was created by the maintainer [unsloth](https://aimodels.fyi/creators/huggingFace/unsloth) and is designed to provide faster finetuning and lower memory usage compared to the original Llama 3 model.

The maintainer has also created quantized 4-bit versions of other large language models like [Gemma 7b](https://aimodels.fyi/models/huggingFace/llama-3-8b-instruct-bnb-4bit-unsloth), [Mistral 7b](https://aimodels.fyi/models/huggingFace/llama-3-8b-instruct-bnb-4bit-unsloth), [Llama-2 7b](https://aimodels.fyi/models/huggingFace/llama-3-8b-instruct-bnb-4bit-unsloth), and [TinyLlama](https://aimodels.fyi/models/huggingFace/tinyllama-11b-intermediate-step-1431k-3t-tinyllama), all of which can be finetuned 2-5x faster with 43-74% less memory usage.

## Model inputs and outputs

### Inputs
- Natural language text prompts

### Outputs
- Natural language text continuations and completions

## Capabilities

The `llama-3-8b-bnb-4bit` model can be used for a variety of text generation tasks, such as language modeling, text summarization, and question answering. The maintainer has provided examples of using this model to finetune on custom datasets and export the resulting models for use in other applications.

## What can I use it for?

The `llama-3-8b-bnb-4bit` model can be a useful starting point for a wide range of natural language processing projects that require a large language model with reduced memory and faster finetuning times. For example, you could use this model to build chatbots, content generation tools, or other applications that rely on text-based AI. The maintainer has also provided a [Colab notebook](https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing) to help get you started with finetuning the model.

## Things to try

One interesting aspect of the `llama-3-8b-bnb-4bit` model is its ability to be finetuned quickly and efficiently. This could make it a good choice for quickly iterating on new ideas or testing different approaches to a problem. Additionally, the reduced memory usage of the 4-bit quantized model could allow you to run it on less powerful hardware, opening up more opportunities to experiment and deploy your models.