Various models in GGUF format quantized with a new 2-bit approach. Intended for use with llama.cpp. Requires llama.cpp PR 4773.

Update: PR 4773 has been merged into `llama.cpp`, but I have added new models that require PR 4856. The new models are those that have around 2.3-2.4 bpw. They have a lower quantization error at the xpense of being ~10% larger.

## Model overview

The `various-2bit-sota-gguf` model is a collection of models quantized using a new 2-bit approach developed by the maintainer [ikawrakow](https://aimodels.fyi/creators/huggingFace/ikawrakow). These models are intended for use with the `llama.cpp` library, which requires a specific PR to be merged. The models come in a variety of bit-width configurations, ranging from 2-bit to 8-bit, allowing tradeoffs between model size, speed, and quality. Compared to similar 2-bit models like [Llama-2-7B-GGUF](https://aimodels.fyi/models/huggingFace/llama-2-7b-gguf-thebloke), the `various-2bit-sota-gguf` models offer improved quantization with a lower error at the expense of being slightly larger.

## Model inputs and outputs

### Inputs
- Text input only

### Outputs
- Text output only

## Capabilities

The `various-2bit-sota-gguf` models are capable of a variety of text-to-text tasks, such as natural language generation, language translation, and text summarization. Their performance will depend on the specific bit-width configuration chosen, with higher bit-widths generally offering better quality but larger model size.

## What can I use it for?

The `various-2bit-sota-gguf` models can be used for a range of commercial and research applications that involve text generation, such as chatbots, content creation, and language modeling. The maintainer has provided GGUF versions of these models that are compatible with the `llama.cpp` library, as well as other popular frameworks and UIs like [text-generation-webui](https://github.com/oobabooka/text-generation-webui) and [LangChain](https://python.langchain.com/docs/integrations/providers/ctransformers).

## Things to try

Experiment with the different bit-width configurations to find the right balance of model size, speed, and quality for your specific use case. You can also try fine-tuning the models on your own data to further improve performance on your task of interest.