![TheBlokeAI](https://i.imgur.com/EBdldam.jpg)

[Chat & support: my new Discord server](https://discord.gg/theblokeai)

[Want to contribute? TheBloke's Patreon page](https://www.patreon.com/TheBlokeAI)

[](#llama2-70b-guanaco-qlora---fp16)Llama2 70b Guanaco QLoRA - fp16
===================================================================

*   Model creator: [Mikael110](https://huggingface.co/Mikael110)
*   Original model: [Llama2 70b Guanaco QLoRA](https://huggingface.co/Mikael110/llama-2-70b-guanaco-qlora)

[](#mikael110s-llama2-70b-guanaco-qlora-fp16)Mikael110's Llama2 70b Guanaco QLoRA fp16
======================================================================================

These files are pytorch format fp16 model files for [Mikael110's Llama2 70b Guanaco QLoRA](https://huggingface.co/Mikael110/llama-2-70b-guanaco-qlora).

It is the result of merging and/or converting the source repository to float16.

[](#repositories-available)Repositories available
-------------------------------------------------

*   [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/llama-2-70b-Guanaco-QLoRA-GPTQ)
*   [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/llama-2-70b-Guanaco-QLoRA-GGML)
*   [Merged fp16 model, for GPU inference and further conversions](https://huggingface.co/TheBloke/llama-2-70b-Guanaco-QLoRA-fp16)
*   [Mikael110's original QLoRA adapter](https://huggingface.co/Mikael110/llama-2-70b-guanaco-qlora)

[](#prompt-template-guanaco)Prompt template: Guanaco
----------------------------------------------------

    ### Human: {prompt}
    ### Assistant:
    

[](#discord)Discord
-------------------

For further support, and discussions on these models and AI in general, join us at:

[TheBloke AI's Discord server](https://discord.gg/theblokeai)

[](#thanks-and-how-to-contribute)Thanks, and how to contribute.
---------------------------------------------------------------

Thanks to the [chirper.ai](https://chirper.ai) team!

I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.

If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.

Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.

*   Patreon: [https://patreon.com/TheBlokeAI](https://patreon.com/TheBlokeAI)
*   Ko-Fi: [https://ko-fi.com/TheBlokeAI](https://ko-fi.com/TheBlokeAI)

**Special thanks to**: Luke from CarbonQuill, Aemon Algiz.

**Patreon special mentions**: Slarti, Chadd, John Detwiler, Pieter, zynix, K, Mano Prime, ReadyPlayerEmma, Ai Maven, Leonard Tan, Edmond Seymore, Joseph William Delisle, Luke @flexchar, Fred von Graf, Viktor Bowallius, Rishabh Srivastava, Nikolai Manek, Matthew Berman, Johann-Peter Hartmann, ya boyyy, Greatston Gnanesh, Femi Adebogun, Talal Aujan, Jonathan Leane, terasurfer, David Flickinger, William Sang, Ajan Kanaga, Vadim, Artur Olbinski, Raven Klaugh, Michael Levine, Oscar Rangel, Randy H, Cory Kujawski, RoA, Dave, Alex, Alexandros Triantafyllidis, Fen Risland, Eugene Pentland, vamX, Elle, Nathan LeClaire, Khalefa Al-Ahmad, Rainer Wilmers, subjectnull, Junyu Yang, Daniel P. Andersen, SuperWojo, LangChain4j, Mandus, Kalila, Illia Dulskyi, Trenton Dambrowitz, Asp the Wyvern, Derek Yates, Jeffrey Morgan, Deep Realms, Imad Khwaja, Pyrater, Preetika Verma, biorpg, Gabriel Tamborski, Stephen Murray, Spiking Neurons AB, Iucharbius, Chris Smitley, Willem Michiel, Luke Pendergrass, Sebastain Graf, senxiiz, Will Dee, Space Cruiser, Karl Bernard, Clay Pascal, Lone Striker, transmissions 11, webtim, WelcomeToTheClub, Sam, theTransient, Pierre Kircher, chris gileta, John Villwock, Sean Connelly, Willian Hasse

Thank you to all my generous patrons and donaters!

[](#original-model-card-mikael110s-llama2-70b-guanaco-qlora)Original model card: Mikael110's Llama2 70b Guanaco QLoRA
=====================================================================================================================

This is a Llama-2 version of [Guanaco](https://huggingface.co/timdettmers/guanaco-65b). It was finetuned from the base [Llama-70b](https://huggingface.co/meta-llama/Llama-2-70b-hf) model using the official training scripts found in the [QLoRA repo](https://github.com/artidoro/qlora). I wanted it to be as faithful as possible and therefore changed nothing in the training script beyond the model it was pointing to. The model prompt is therefore also the same as the original Guanaco model.

This repo contains the QLoRA adapter.

A 7b version of the adapter can be found [here](https://huggingface.co/Mikael110/llama-2-7b-guanaco-qlora). A 13b version of the adapter can be found [here](https://huggingface.co/Mikael110/llama-2-13b-guanaco-qlora).

**Legal Disclaimer: This model is bound by the usage restrictions of the original Llama-2 model. And comes with no warranty or gurantees of any kind.**

## Model overview

The `llama-2-70b-Guanaco-QLoRA-fp16` model is a 70 billion parameter large language model created by [Mikael110](https://huggingface.co/Mikael110). It is a quantized and compressed version of the original Llama 2 70B model, finetuned using QLoRA (Quantization and Low-Rank Adaptation) to improve performance and reduce resource usage. The model is available in PyTorch format with 16-bit floating point precision, allowing for efficient GPU inference.

This model is part of a family of [Llama 2 models](https://aimodels.fyi/models/huggingFace/llama-2-70b-ggml-thebloke) created by various contributors, with different quantization levels and model sizes. [TheBloke](https://aimodels.fyi/creators/huggingFace/TheBloke) has also provided GPTQ and GGML versions of this model for GPU and CPU/GPU inference respectively.

## Model inputs and outputs

### Inputs
- **Text**: The model accepts text input, which can be in the form of a single prompt or a conversation-style exchange.

### Outputs
- **Text**: The model generates text as output, continuing the input prompt or providing a response in a conversational exchange.

## Capabilities

The `llama-2-70b-Guanaco-QLoRA-fp16` model is a large, powerful language model capable of a wide range of natural language tasks. It has been shown to perform well on benchmarks for commonsense reasoning, world knowledge, reading comprehension, and mathematics. The model can be used for tasks such as question answering, summarization, translation, and open-ended text generation.

## What can I use it for?

This model can be used for a variety of natural language processing applications, such as building chatbots, virtual assistants, or content generation tools. The lightweight, quantized nature of the model makes it suitable for deployment on resource-constrained devices or in environments with limited compute power, while still maintaining strong performance.

## Things to try

One interesting aspect of this model is the use of QLoRA, which allows for efficient finetuning and adaptation of the model to specific domains or use cases. Developers could explore finetuning the model on their own datasets or for specialized tasks, leveraging the powerful base capabilities of the Llama 2 architecture.