[](#laser-dolphin-mixtral-2x7b-dpo)Laser-Dolphin-Mixtral-2x7b-dpo
=================================================================

[![laser_dolphin_image](/macadeliccc/laser-dolphin-mixtral-2x7b-dpo/resolve/main/dolphin_moe.png)](/macadeliccc/laser-dolphin-mixtral-2x7b-dpo/blob/main/dolphin_moe.png)

**New Version out now!**

Credit to Fernando Fernandes and Eric Hartford for their project [laserRMT](https://github.com/cognitivecomputations/laserRMT)

[](#overview)Overview
---------------------

This model is a medium-sized MoE implementation based on [cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser](https://huggingface.co/cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser)

*   The new version shows ~1 point increase in evaluation performance on average.

[](#process)Process
-------------------

*   The process is outlined in this [notebook](https://github.com/cognitivecomputations/laserRMT/blob/main/examples/laser-dolphin-mixtral-2x7b.ipynb)
    
*   The mergekit\_config is in the files.
    
*   The models used in the configuration are not lasered, but the final product is. This is an update from the last version.
    
*   This process is experimental. Your mileage may vary.
    

[](#future-goals)Future Goals
-----------------------------

*    Function Calling
*    v2 with new base model to improve performance

[](#quantizations)Quantizations
-------------------------------

### [](#exllamav2)ExLlamav2

_These are the recommended quantizations for users that are running the model on GPU_

Thanks to user [bartowski](https://huggingface.co/bartowski) we now have exllamav2 quantizations in 3.5 through 8 bpw. They are available here:

*   [bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2)

Branch

Bits

lm\_head bits

VRAM (4k)

VRAM (16k)

VRAM (32k)

Description

[8\_0](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/8_0)

8.0

8.0

13.7 GB

15.1 GB

17.2 GB

Maximum quality that ExLlamaV2 can produce, near unquantized performance.

[6\_5](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/6_5)

6.5

8.0

11.5 GB

12.9 GB

15.0 GB

Near unquantized performance at vastly reduced size, **recommended**.

[5\_0](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/5_0)

5.0

6.0

9.3 GB

10.7 GB

12.8 GB

Slightly lower quality vs 6.5, great for 12gb cards with 16k context.

[4\_25](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/4_25)

4.25

6.0

8.2 GB

9.6 GB

11.7 GB

GPTQ equivalent bits per weight.

[3\_5](https://huggingface.co/bartowski/laser-dolphin-mixtral-2x7b-dpo-exl2/tree/3_5)

3.5

6.0

7.0 GB

8.4 GB

10.5 GB

Lower quality, not recommended.

His quantizations represent the first ~13B model with GQA support. Check out his repo for more information!

### [](#gguf)GGUF

_Current GGUF [Quantizations](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF)_

### [](#awq)AWQ

\*Current AWQ [Quantizations](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo-AWQ)

### [](#thebloke)TheBloke

**These Quants will result in unpredicted behavior. New quants are available as I have updated the model**

Quatizations provided by [TheBloke](https://huggingface.co/TheBloke/laser-dolphin-mixtral-2x7b-dpo-GGUF)

[](#hf-spaces)HF Spaces
-----------------------

*   GGUF chat available [here](https://huggingface.co/spaces/macadeliccc/laser-dolphin-mixtral-chat-GGUF)
*   4-bit bnb chat available [here](https://huggingface.co/spaces/macadeliccc/laser-dolphin-mixtral-chat)

[](#ollama)Ollama
=================

    ollama run macadeliccc/laser-dolphin-mixtral-2x7b-dpo
    

[![image/png](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/oVwa7Dwkt00tk8_MtlJdR.png)](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/oVwa7Dwkt00tk8_MtlJdR.png)

[](#code-example)Code Example
-----------------------------

Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    def generate_response(prompt):
        """
        Generate a response from the model based on the input prompt.
    
        Args:
        prompt (str): Prompt for the model.
    
        Returns:
        str: The generated response from the model.
        """
        # Tokenize the input prompt
        inputs = tokenizer(prompt, return_tensors="pt")
    
        # Generate output tokens
        outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
    
        # Decode the generated tokens to a string
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
        return response
    
    # Load the model and tokenizer
    model_id = "macadeliccc/laser-dolphin-mixtral-2x7b-dpo"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
    
    prompt = "Write a quicksort algorithm in python"
    
    # Generate and print responses for each language
    print("Response:")
    print(generate_response(prompt), "\n")
    

[colab](https://colab.research.google.com/drive/1cmRhAkDWItV7utHNqNANVZnqDqQNsTUr?usp=sharing) with usage example

[](#eval)Eval
-------------

[](#eq-bench)EQ Bench
---------------------

\----Benchmark Complete----
2024-01-31 16:55:37
Time taken: 31.1 mins
Prompt Format: ChatML
Model: macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF
Score (v2): 72.76
Parseable: 171.0
---------------
Batch completed
Time taken: 31.2 mins
---------------

evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)

[](#summary-of-previous-evaluation)Summary of previous evaluation
-----------------------------------------------------------------

Model

AGIEval

GPT4All

TruthfulQA

Bigbench

Average

[laser-dolphin-mixtral-2x7b-dpo](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo)

41.31

73.67

61.69

42.79

54.87

[](#detailed-current-evaluation)Detailed current evaluation
-----------------------------------------------------------

Model

AGIEval

GPT4All

TruthfulQA

Bigbench

Average

[laser-dolphin-mixtral-2x7b-dpo](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo)

42.25

73.45

63.44

43.96

55.77

### [](#agieval)AGIEval

Task

Version

Metric

Value

Stderr

agieval\_aqua\_rat

0

acc

21.26



2.57

acc\_norm

21.65



2.59

agieval\_logiqa\_en

0

acc

34.72



1.87

acc\_norm

35.64



1.88

agieval\_lsat\_ar

0

acc

26.96



2.93

acc\_norm

26.96



2.93

agieval\_lsat\_lr

0

acc

45.88



2.21

acc\_norm

46.08



2.21

agieval\_lsat\_rc

0

acc

59.48



3.00

acc\_norm

59.48



3.00

agieval\_sat\_en

0

acc

73.79



3.07

acc\_norm

73.79



3.07

agieval\_sat\_en\_without\_passage

0

acc

42.23



3.45

acc\_norm

41.26



3.44

agieval\_sat\_math

0

acc

37.27



3.27

acc\_norm

33.18



3.18

Average: 42.25%

### [](#gpt4all)GPT4All

Task

Version

Metric

Value

Stderr

arc\_challenge

0

acc

58.36



1.44

acc\_norm

58.02



1.44

arc\_easy

0

acc

82.20



0.78

acc\_norm

77.40



0.86

boolq

1

acc

87.52



0.58

hellaswag

0

acc

67.50



0.47

acc\_norm

84.43



0.36

openbookqa

0

acc

34.40



2.13

acc\_norm

47.00



2.23

piqa

0

acc

81.61



0.90

acc\_norm

82.59



0.88

winogrande

0

acc

77.19



1.18

Average: 73.45%

### [](#gsm8k)GSM8K

Task

Version

Metric

Value

Stderr

gsm8k

2

exact\_match,get-answer

0.75

exact\_match\_stderr,get-answer

0.01

alias

gsm8k

### [](#truthfulqa)TruthfulQA

Task

Version

Metric

Value

Stderr

truthfulqa\_mc

1

mc1

45.90



1.74

mc2

63.44



1.56

Average: 63.44%

### [](#bigbench)Bigbench

Task

Version

Metric

Value

Stderr

bigbench\_causal\_judgement

0

multiple\_choice\_grade

58.42



3.59

bigbench\_date\_understanding

0

multiple\_choice\_grade

60.70



2.55

bigbench\_disambiguation\_qa

0

multiple\_choice\_grade

38.37



3.03

bigbench\_geometric\_shapes

0

multiple\_choice\_grade

21.73



2.18

exact\_str\_match

0.00



0.00

bigbench\_logical\_deduction\_five\_objects

0

multiple\_choice\_grade

35.00



2.14

bigbench\_logical\_deduction\_seven\_objects

0

multiple\_choice\_grade

23.57



1.61

bigbench\_logical\_deduction\_three\_objects

0

multiple\_choice\_grade

50.33



2.89

bigbench\_movie\_recommendation

0

multiple\_choice\_grade

45.00



2.23

bigbench\_navigate

0

multiple\_choice\_grade

50.00



1.58

bigbench\_reasoning\_about\_colored\_objects

0

multiple\_choice\_grade

60.35



1.09

bigbench\_ruin\_names

0

multiple\_choice\_grade

51.12



2.36

bigbench\_salient\_translation\_error\_detection

0

multiple\_choice\_grade

32.26



1.48

bigbench\_snarks

0

multiple\_choice\_grade

67.96



3.48

bigbench\_sports\_understanding

0

multiple\_choice\_grade

70.59



1.45

bigbench\_temporal\_sequences

0

multiple\_choice\_grade

35.80



1.52

bigbench\_tracking\_shuffled\_objects\_five\_objects

0

multiple\_choice\_grade

22.56



1.18

bigbench\_tracking\_shuffled\_objects\_seven\_objects

0

multiple\_choice\_grade

17.20



0.90

bigbench\_tracking\_shuffled\_objects\_three\_objects

0

multiple\_choice\_grade

50.33



2.89

Average: 43.96%

Average score: 55.77%

Elapsed time: 02:43:45

[](#citations)Citations
-----------------------

Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.

    @article{sharma2023truth,
    title={The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction},
    author={Sharma, Pratyusha and Ash, Jordan T and Misra, Dipendra},
    journal={arXiv preprint arXiv:2312.13558},
    year={2023} }
    

    @article{gao2021framework,
      title={A framework for few-shot language model evaluation},
      author={Gao, Leo and Tow, Jonathan and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and McDonell, Kyle and Muennighoff, Niklas and others},
      journal={Version v0. 0.1. Sept},
      year={2021}
    }
    

[](#open-llm-leaderboard-evaluation-results)Open LLM Leaderboard Evaluation Results
===================================================================================

Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_macadeliccc__laser-dolphin-mixtral-2x7b-dpo)

Metric

Value

Avg.

67.16

AI2 Reasoning Challenge (25-Shot)

65.96

HellaSwag (10-Shot)

85.80

MMLU (5-Shot)

63.17

TruthfulQA (0-shot)

60.76

Winogrande (5-shot)

79.01

GSM8k (5-shot)

48.29

## Model overview

The `laser-dolphin-mixtral-2x7b-dpo` model is a medium-sized Mixture-of-Experts (MoE) implementation based on the [cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser](https://aimodels.fyi/models/huggingFace/dolphin-26-mistral-7b-dpo-laser-cognitivecomputations) model. According to the maintainer, the new version shows a ~1 point increase in evaluation performance on average compared to the previous version.

The model was trained using a noise reduction technique based on Singular Value Decomposition (SVD) decomposition, with the optimal ranks calculated using Random Matrix Theory (Marchenko-Pastur theorem) instead of a brute-force search. This approach is outlined in the [laserRMT notebook](https://github.com/cognitivecomputations/laserRMT/blob/main/examples/laser-dolphin-mixtral-2x7b.ipynb).

## Model inputs and outputs

### Inputs
- **Prompt**: The input prompt for the model, which uses the ChatML format.

### Outputs
- **Text generation**: The model generates text in response to the input prompt.

## Capabilities

The `laser-dolphin-mixtral-2x7b-dpo` model is capable of generating diverse and coherent text, with potential improvements in robustness and performance compared to the previous version. According to the maintainer, the model has been "lasered" for better quality.

## What can I use it for?

The `laser-dolphin-mixtral-2x7b-dpo` model can be used for a variety of text generation tasks, such as creative writing, dialogue generation, and content creation. The maintainer also mentions potential future goals for the model, including function calling and a v2 version with a new base model to improve performance.

## Things to try

One interesting aspect of the `laser-dolphin-mixtral-2x7b-dpo` model is the availability of quantizations provided by user [bartowski](https://huggingface.co/bartowski). These quantizations, ranging from 3.5 to 8 bits per weight, allow users to trade off between model size, memory usage, and performance to fit their specific needs. Experimenting with these quantizations could be a valuable way to explore the capabilities of the model.