[![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/9XVgxKyuXTQVO5mO-EOd4.jpeg)](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/9XVgxKyuXTQVO5mO-EOd4.jpeg)

[](#-beyonder-4x7b-v3) Beyonder-4x7B-v3
===========================================

Beyonder-4x7B-v3 is an improvement over the popular [Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2). It's a Mixture of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):

*   [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
*   [beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B)
*   [SanjiWatsuki/Kunoichi-DPO-v2-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B)
*   [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B)

Special thanks to [beowolx](https://huggingface.co/beowolx) for making the best Mistral-based code model and to [SanjiWatsuki](https://huggingface.co/SanjiWatsuki) for creating one of the very best RP models.

**Try the demo**: [https://huggingface.co/spaces/mlabonne/Beyonder-4x7B-v3](https://huggingface.co/spaces/mlabonne/Beyonder-4x7B-v3)

[](#-applications) Applications
-----------------------------------

This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).

If you use SillyTavern, you might want to tweak the inference parameters. Here's what LM Studio uses as a reference: `temp` 0.8, `top_k` 40, `top_p` 0.95, `min_p` 0.05, `repeat_penalty` 1.1.

Thanks to its four experts, it's a well-rounded model, capable of achieving most tasks. As two experts are always used to generate an answer, every task benefits from other capabilities, like chat with RP, or math with code.

[](#-quantized-models) Quantized models
-----------------------------------------

Thanks [bartowski](https://huggingface.co/bartowski) for quantizing this model.

*   **GGUF**: [https://huggingface.co/mlabonne/Beyonder-4x7B-v3-GGUF](https://huggingface.co/mlabonne/Beyonder-4x7B-v3-GGUF)
*   **More GGUF**: [https://huggingface.co/bartowski/Beyonder-4x7B-v3-GGUF](https://huggingface.co/bartowski/Beyonder-4x7B-v3-GGUF)
*   **ExLlamaV2**: [https://huggingface.co/bartowski/Beyonder-4x7B-v3-exl2](https://huggingface.co/bartowski/Beyonder-4x7B-v3-exl2)

[](#-evaluation) Evaluation
-------------------------------

This model is not designed to excel in traditional benchmarks, as the code and role-playing models generally do not apply to those contexts. Nonetheless, it performs remarkably well thanks to strong general-purpose experts.

### [](#nous)Nous

Beyonder-4x7B-v3 is one of the best models on Nous' benchmark suite (evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval)) and significantly outperforms the v2. See the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).

Model

Average

AGIEval

GPT4All

TruthfulQA

Bigbench

[mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B) [](https://gist.github.com/mlabonne/1d33c86824b3a11d2308e36db1ba41c1)

62.74

45.37

77.01

78.39

50.2

[**mlabonne/Beyonder-4x7B-v3**](https://huggingface.co/mlabonne/Beyonder-4x7B-v3) [](https://gist.github.com/mlabonne/3740020807e559f7057c32e85ce42d92)

**61.91**

**45.85**

**76.67**

**74.98**

**50.12**

[mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B) [](https://gist.github.com/mlabonne/cbeb077d1df71cb81c78f742f19f4155)

59.39

45.23

76.2

67.61

48.52

[SanjiWatsuki/Kunoichi-DPO-v2-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B) [](https://gist.github.com/mlabonne/895ff5171e998abfdf2a41a4f9c84450)

58.29

44.79

75.05

65.68

47.65

[mlabonne/Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2) [](https://gist.github.com/mlabonne/f73baa140a510a676242f8a4496d05ca)

57.13

45.29

75.95

60.86

46.4

[beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B) [](https://gist.github.com/mlabonne/08b5280c221fbd7f98eb27561ae902a3)

50.35

39.98

71.77

48.73

40.92

### [](#eq-bench)EQ-Bench

Beyonder-4x7B-v3 is the best 4x7B model on the EQ-Bench leaderboard, outperforming older versions of ChatGPT and Llama-2-70b-chat. It is very close to Mixtral-8x7B-Instruct-v0.1 and Gemini Pro. Thanks [Sam Paech](https://huggingface.co/sam-paech) for running the eval.

[![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/-OSHe2ImrxN8wAREnSZAZ.png)](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/-OSHe2ImrxN8wAREnSZAZ.png)

### [](#open-llm-leaderboard)Open LLM Leaderboard

It's also a strong performer on the Open LLM Leaderboard, significantly outperforming the v2 model.

[![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/NFRYqzwuy9TB-s-Hy3gRy.png)](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/NFRYqzwuy9TB-s-Hy3gRy.png)

[](#-configuration) Configuration
-------------------------------------

    base_model: mlabonne/AlphaMonarch-7B
    experts:
      - source_model: mlabonne/AlphaMonarch-7B
        positive_prompts:
        - "chat"
        - "assistant"
        - "tell me"
        - "explain"
        - "I want"
      - source_model: beowolx/CodeNinja-1.0-OpenChat-7B
        positive_prompts:
        - "code"
        - "python"
        - "javascript"
        - "programming"
        - "algorithm"
      - source_model: SanjiWatsuki/Kunoichi-DPO-v2-7B
        positive_prompts:
        - "storywriting"
        - "write"
        - "scene"
        - "story"
        - "character"
      - source_model: mlabonne/NeuralDaredevil-7B
        positive_prompts:
        - "reason"
        - "math"
        - "mathematics"
        - "solve"
        - "count"
    

[](#-model-family-tree) Model Family Tree
---------------------------------------------

[![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zQi5VgmdqJv6pFaGoQ2AL.png)](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zQi5VgmdqJv6pFaGoQ2AL.png)

[](#-usage) Usage
---------------------

    !pip install -qU transformers bitsandbytes accelerate
    
    from transformers import AutoTokenizer
    import transformers
    import torch
    
    model = "mlabonne/Beyonder-4x7B-v3"
    
    tokenizer = AutoTokenizer.from_pretrained(model)
    pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
    )
    
    messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
    prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
    print(outputs[0]["generated_text"])
    

Output:

> A Mixture of Experts (MoE) is a neural network architecture that tackles complex tasks by dividing them into simpler subtasks, delegating each to specialized expert modules. These experts learn to independently handle specific problem aspects. The MoE structure combines their outputs, leveraging their expertise for improved overall performance. This approach promotes modularity, adaptability, and scalability, allowing for better generalization in various applications.

## Model Overview

`Beyonder-4x7B-v3` is an improvement over the popular [Beyonder-4x7B-v2](https://aimodels.fyi/models/huggingFace/beyonder-4x7b-v2-mlabonne) model. It is a Mixture of Experts (MoE) model that combines four specialized models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):

- [mlabonne/AlphaMonarch-7B](https://aimodels.fyi/models/huggingFace/alphamonarch-7b-mlabonne)
- [beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B)
- [SanjiWatsuki/Kunoichi-DPO-v2-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B)
- [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B)

## Model Inputs and Outputs

The `Beyonder-4x7B-v3` model uses a context window of 8k tokens. It is designed to work well with the Mistral Instruct chat template, which is compatible with LM Studio.

### Inputs
- Text prompts for a variety of tasks, including chat, code generation, role-playing, and math problems.

### Outputs
- Responses generated by the model, which can include:
  - Coherent and contextual conversations
  - Code snippets for various programming languages
  - Detailed role-playing narratives
  - Solutions to mathematical problems

## Capabilities

The `Beyonder-4x7B-v3` model is a well-rounded AI assistant capable of handling a diverse range of tasks. By combining four specialized experts, the model can leverage different capabilities to provide high-quality responses.

For example, the model can engage in natural conversations while also demonstrating strong coding and problem-solving abilities. The role-playing expert allows the model to create immersive narrative experiences.

## What Can I Use It For?

The `Beyonder-4x7B-v3` model can be used for a variety of applications, including:

- Conversational AI assistants: The model's strong conversational abilities make it suitable for building chatbots and virtual assistants.
- Content creation: The model's versatility allows it to assist with tasks like creative writing, scriptwriting, and story generation.
- Educational tools: The model's problem-solving and explanatory skills can be leveraged to create interactive learning experiences.
- Programming assistance: The model's coding capabilities can help developers with tasks like code generation, debugging, and algorithm design.

## Things to Try

One interesting aspect of the `Beyonder-4x7B-v3` model is its use of a Mixture of Experts (MoE) architecture. This approach allows the model to leverage the strengths of multiple specialized models, leading to improved overall performance.

To get the most out of the model, you can experiment with different inference parameters, such as temperature, top-k, and top-p, to find the settings that work best for your specific use case. Additionally, you can try leveraging the model's versatility by combining its different capabilities, such as using its coding skills to help with a math problem or its storytelling abilities to enhance a conversational experience.