>  THIS IS A BASE MODEL 
> 
> This model is pruned from the base Llama 3 70B, which has no instruction tuning and randomly initialized special tokens.
> 
> Using this with the Llama 3 instruction format is injecting random noise into latent space and will give you deranged results. (It's pretty funny actually.) Treat this as the untrained foundation model this is and use appropriate prompts.

Meta's Llama 3 70B pruned to 42B parameters using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887). Post-pruning trained using QLoRA for ~100M tokens from [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile).

Layers to prune selected using [PruneMe](https://github.com/arcee-ai/PruneMe).

Still evaluating, don't get too excited! Might be incredibly dumb. Check out these numbers though:

Groups

Version

Filter

n-shot

Metric

Value

Stderr

mmlu

N/A

none

0

acc

0.7669



0.0034

\- humanities

N/A

none

5

acc

0.7296



0.0062

\- other

N/A

none

5

acc

0.8101



0.0067

\- social\_sciences

N/A

none

5

acc

0.8668



0.0060

\- stem

N/A

none

5

acc

0.6825



0.0079

winogrande

1

none

5

acc

0.8027



0.0112

hellaswag

1

none

10

acc\_norm

0.8025



0.0040

[![Built with Axolotl](https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png)](https://github.com/OpenAccess-AI-Collective/axolotl)

## Model overview

The `llama3-42b-v0` model is a pruned version of Meta's Llama 3 70B foundation model. It was created by [chargoddard](https://aimodels.fyi/creators/huggingFace/chargoddard) using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887) to prune the base Llama 3 model down to 42B parameters. The model was then further trained on around 100M tokens from the [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile) dataset using QLoRA. This pruned model is intended to be used as an untrained foundation, with appropriate prompts, as injecting random noise into the latent space will produce "deranged results".

## Model inputs and outputs

### Inputs
- The `llama3-42b-v0` model accepts text input only.

### Outputs
- The model generates text and code output.

## Capabilities

The `llama3-42b-v0` model has been evaluated on a variety of benchmarks, including MMLU, Winogrande, and HellaSwag, where it achieves respectable performance. However, the maintainer notes that the model is still being evaluated and may exhibit "incredibly dumb" behavior, so it should be treated as an untrained foundation model.

## What can I use it for?

Given the model's status as an untrained foundation, it is likely most useful for researchers and developers looking to experiment with pruning techniques or continue pre-training on additional data. The maintainer cautions against using the model with Llama 3's instruction format, as this will lead to "deranged results". Instead, users should focus on developing appropriate prompts to leverage the model's capabilities.

## Things to try

Developers interested in exploring the `llama3-42b-v0` model could try fine-tuning it on specific downstream tasks or datasets to evaluate its performance. Additionally, experimenting with different pruning techniques and training regimes could yield interesting insights about the model's behavior and potential.