[](#training)Training
---------------------

*   8x A6000s
*   [Forked version of unsloth](https://github.com/serp-ai/unsloth) for efficient training
*   Sequence Length: 4096
*   Effective batch size: 128
*   Learning Rate: 2e-5 with linear decay
*   Epochs: 1
*   [Base model](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) trained with QLoRA (rank 64, alpha 16) and MoE adapters/routers trained in bf16
*   Num Experts: 16
*   Top K: 4
*   Adapter Dim: 512

[](#prompt-format)Prompt Format
-------------------------------

    <|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n
    

[](#usage)Usage
---------------

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2", trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2", device_map="auto", trust_remote_code=True).eval()
    
    system_str = "<|im_start|>system\n{message}<|im_end|>\n"
    user_str = "<|im_start|>user\n{message}<|im_end|>\n"
    assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"
    
    def construct_prompt(messages):
        prompt = ""
        for message in messages:
            if message["from"] in ["human", "user"]:
                prompt += user_str.format(
                    message=message["value"]
                )
            elif message["from"] in ["gpt", "assistant"]:
                prompt += assistant_str.format(
                    message=message["value"]
                )
            elif message["from"] in ["system", "instruction"]:
                prompt += system_str.format(
                    message=message["value"]
                )
            else:
                raise ValueError(
                    f"Unknown message type: {message['from']}"
                )
        return prompt + "<|im_start|>assistant\n"
    
    system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
    user = "Are you sentient?"
    
    messages = [
        {"from": "system", "value": system},
        {"from": "user", "value": user},
    ]
    
    prompt = construct_prompt(messages)
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs = inputs.to(model.device)
    pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
    print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
    

[](#other-information)Other Information
---------------------------------------

Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731)

[Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE)

[Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE)

If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support

## Model overview

The `sparsetral-16x7B-v2` is a large language model (LLM) developed by [serpdotai](https://aimodels.fyi/creators/huggingFace/serpdotai) using a sparse mixture-of-experts (MoE) architecture. It is based on a [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) base model that was further trained using QLoRA and MoE adapters/routers. The model has 16 experts that are sparsely selected during inference, providing an efficient and scalable approach to language modeling.

Similar models developed by MistralAI include the [Mixtral-8x7B-Instruct-v0.1](https://aimodels.fyi/models/huggingFace/mixtral-8x7b-instruct-v01-mistralai), [Mixtral-8x7B-v0.1](https://aimodels.fyi/models/huggingFace/mixtral-8x7b-v01-mistralai), [Mixtral-8x22B-v0.1-4bit](https://aimodels.fyi/models/huggingFace/mixtral-8x22b-v01-4bit-mistral-community), and [Mixtral-8x22B-v0.1](https://aimodels.fyi/models/huggingFace/mixtral-8x22b-v01-mistral-community). These models demonstrate the versatility of the sparse MoE approach in developing efficient and high-performing LLMs.

## Model inputs and outputs

### Inputs
- The model expects input prompts in a specific format, with the system, user, and assistant messages wrapped in `<|im_start|>` and `<|im_end|>` tags.

### Outputs
- The model generates a continuation of the input prompt, producing coherent and contextually relevant text.

## Capabilities

The `sparsetral-16x7B-v2` model has demonstrated strong performance on a variety of language tasks, thanks to its sparse MoE architecture. It can be used for general-purpose text generation, such as answering questions, engaging in conversations, and summarizing information.

## What can I use it for?

The `sparsetral-16x7B-v2` model can be a valuable tool for developers and researchers working on language-based applications. Some potential use cases include:

- Virtual assistants and chatbots: The model's ability to generate coherent and contextual responses can be leveraged to build more natural and engaging conversational agents.
- Content generation: The model can be used to assist in creating articles, stories, or other types of written content by providing relevant and creative text suggestions.
- Summarization: The model can be fine-tuned to summarize long-form text, making it easier for users to quickly grasp the key points.
- Question-answering: The model's understanding of language can be applied to build systems that can effectively answer questions on a wide range of topics.

## Things to try

One interesting aspect of the `sparsetral-16x7B-v2` model is its sparse MoE architecture, which allows for more efficient and scalable language modeling. Developers and researchers can experiment with techniques to further optimize the model's performance, such as exploring different expert selection strategies or investigating the model's ability to handle diverse inputs and tasks.