[](#mamba-28b--fine-tuned-on-openhermes)MAMBA (2.8B)  fine-tuned on OpenHermes
==================================================================================

![mamba-hermes logo](https://huggingface.co/clibrain/mamba-2.8b-instruct-openhermes/resolve/main/mamba_hermes_logo_1.png?download=true)

Model Card is still WIP!

[](#base-model-info)Base model info
-----------------------------------

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on [structured state space models](https://github.com/state-spaces/s4), with an efficient hardware-aware design and implementation in the spirit of [FlashAttention](https://github.com/Dao-AILab/flash-attention).

[](#dataset-info)Dataset info
-----------------------------

The OpenHermes dataset is composed of 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:

OpenHermes 13B is the first fine tune of the Hermes dataset that has a fully open source dataset!

OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:

*   GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium
*   WizardLM (v1, evol\_instruct 70k), by WizardLM Team/nlpxucan
*   Airoboros GPT-4 (v1.0), by JonDurbin
*   Camel-AI's domain expert datasets, by the Camel-AI Team
*   CodeAlpaca, by Sahil2801
*   GPT4-LLM and Unnatural Instructions, by Microsoft Filtering included removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more The base dataset mix is identical to the original Nous-Hermes', minus the Nous-Instruct and PDACTL datasets which were private datasets.

[](#usage)Usage
---------------

    pip install torch==2.1.0 transformers==4.35.0 causal-conv1d==1.0.0 mamba-ssm==1.0.1
    

    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM
    from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
    
    CHAT_TEMPLATE_ID = "HuggingFaceH4/zephyr-7b-beta"
    
    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    model_name = "clibrain/mamba-2.8b-instruct-openhermes"
    
    eos_token = "<|endoftext|>"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.eos_token = eos_token
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.chat_template = AutoTokenizer.from_pretrained(CHAT_TEMPLATE_ID).chat_template
    
    model = MambaLMHeadModel.from_pretrained(
            model_name, device=device, dtype=torch.float16)
    
    messages = []
    prompt = "Tell me 5 sites to visit in Spain"
    messages.append(dict(role="user", content=prompt))
    
    input_ids = tokenizer.apply_chat_template(
                messages, return_tensors="pt", add_generation_prompt=True
    ).to(device)
    
    out = model.generate(
        input_ids=input_ids,
        max_length=2000,
        temperature=0.9,
        top_p=0.7,
        eos_token_id=tokenizer.eos_token_id,
    )
    
    decoded = tokenizer.batch_decode(out)
    assistant_message = (
        decoded[0].split("<|assistant|>\n")[-1].replace(eos_token, "")
    )
    
    print(assistant_message)
    

[](#gradio-demo)Gradio Demo
---------------------------

    git clone https://github.com/mrm8488/mamba-chat.git
    cd mamba-chat
    
    pip install -r requirements.txt
    pip install -q gradio==4.8.0
    
    python app.py \
    --model clibrain/mamba-2.8b-instruct-openhermes \
    --share
    

[](#evaluations)Evaluations
---------------------------

Coming soon!

[](#acknowledgments)Acknowledgments
-----------------------------------

Thanks to [mamba-chat](https://github.com/havenhq/mamba-chat/tree/main) for heavily inspiring our work

## Model overview

`mamba-2.8b-instruct-openhermes` is a state-of-the-art language model fine-tuned on a diverse dataset of over 242,000 entries, including GPT-4 generated data from sources like GPTeacher, WizardLM, Airoboros GPT-4, and Camel-AI's domain expert datasets. It was developed by clibrain and is an evolution of the [OpenHermes-2.5-Mistral-7B](https://aimodels.fyi/models/huggingFace/openhermes-25-mistral-7b-teknium) model, utilizing a novel Mamba architecture that shows promising performance on language modeling tasks.

Similar models include the [OpenHermes-2.5-Mistral-7B](https://aimodels.fyi/models/huggingFace/openhermes-25-mistral-7b-teknium), [Nous-Hermes-Llama2-7b](https://aimodels.fyi/models/huggingFace/nous-hermes-llama2-7b-nousresearch), [Nous-Hermes-Llama2-13b](https://aimodels.fyi/models/huggingFace/nous-hermes-llama2-13b-nousresearch), and [NeuralHermes-2.5-Mistral-7B](https://aimodels.fyi/models/huggingFace/neuralhermes-25-mistral-7b-mlabonne), all of which are fine-tuned versions of the original Hermes model with various dataset and architectural improvements.

## Model inputs and outputs

The `mamba-2.8b-instruct-openhermes` model is a text-to-text language model, taking in natural language prompts and generating relevant responses.

### Inputs
- **Prompt**: Natural language prompts or instructions for the model to generate a relevant response.

### Outputs
- **Text response**: The model's generated response to the input prompt, which can range from short answers to longer, more elaborative text.

## Capabilities

The `mamba-2.8b-instruct-openhermes` model excels at a variety of language tasks, including text generation, question answering, and following complex instructions. It has shown strong performance on benchmark tests like GPT4All, AGIEval, and BigBench, outperforming previous versions of the Hermes model.

## What can I use it for?

The `mamba-2.8b-instruct-openhermes` model can be used for a wide range of applications, from chatbots and virtual assistants to content generation and task completion. Its fine-tuning on a diverse dataset of high-quality data makes it a capable generalist model that can handle a variety of requests and use cases.

## Things to try

One interesting aspect of the `mamba-2.8b-instruct-openhermes` model is its ability to engage in multi-turn conversations and follow complex instructions, thanks to its training on the ChatML prompt format. Developers can experiment with using system prompts to set the model's persona and instructions, and then engage it in structured dialogues to see the range of its capabilities.