[](#llama-3-70b-instruct-abliterated-model-card)Llama-3-70B-Instruct-abliterated Model Card
===========================================================================================

This is meta-llama/Llama-3-70B-Instruct with orthogonalized bfloat16 safetensor weights, generated with the methodology that was described in the preview paper/blog post: '[Refusal in LLMs is mediated by a single direction](https://www.alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction)' which I encourage you to read to understand more.

TL;DR: this model has had certain weights manipulated to "inhibit" the model's ability to express refusal. It is not in anyway _guaranteed_ that it won't refuse you, understand your request, it may still lecture you about ethics/safety, etc. It is tuned in all other respects the same as the original 70B instruct model was, just with the strongest refusal direction orthogonalized out.

[](#quants)Quants
-----------------

[GGUF Quants available here](https://huggingface.co/failspy/llama-3-70B-Instruct-abliterated-GGUF)

[](#for-the-people-who-like-tinkering-or-looking-to-save-bandwidth)For the people who like tinkering or looking to save bandwidth
---------------------------------------------------------------------------------------------------------------------------------

In the repo, I've included `refusal_dir.pth` If you have Llama-3-70B-Instruct model downloaded already, you can use the ortho cookbook to apply it to your downloaded model, which will make it the same as what you'd download from here.

[](#quirkiness-awareness-notice)Quirkiness awareness notice
-----------------------------------------------------------

This model may come with interesting quirks, as I obviously haven't extensively tested it, and the methodology being so new. I encourage you to play with the model, and post any quirks you notice in the community tab, as that'll help us further understand what this orthogonalization has in the way of side effects. The code I used to generate it (and my published 'Kappa-3' model which is just Phi-3 with the same methodology applied) is available in a Python notebook in this repo. Specifically, the [ortho\_cookbook.ipynb](https://huggingface.co/failspy/llama-3-70B-Instruct-abliterated/blob/main/ortho_cookbook.ipynb).

If you manage to develop further improvements, please share! This is really the most primitive way to use ablation, but there are other possibilities that I believe are as-yet unexplored.

## Model overview

The `llama-3-70B-Instruct-abliterated` model is a large language model developed by the AI researcher `failspy`. It is based on the original [Llama-3-70B-Instruct](https://aimodels.fyi/models/huggingFace/meta-llama-3-70b-instruct-gguf-maziyarpanahi) model, but has been modified to "inhibit" the model's ability to express refusal. According to the maintainer, this model has had certain weights manipulated in an attempt to reduce the model's tendency to refuse requests or lecture about ethics and safety. However, the maintainer notes that this is not guaranteed to completely prevent the model from refusing or lecturing, and it may still exhibit such behaviors. The model is intended for developers who want to experiment with this type of weight manipulation, but should be used with caution as the long-term effects are not fully known.

## Model inputs and outputs

### Inputs
- Text prompts

### Outputs
- Generated text responses

## Capabilities

The `llama-3-70B-Instruct-abliterated` model is capable of generating human-like text responses to a variety of prompts. It can be used for tasks like conversational AI, text generation, and potentially other natural language processing applications. However, due to the experimental nature of the weight manipulation, the model's capabilities and behaviors may be unpredictable.

## What can I use it for?

Developers interested in exploring methods to reduce language model refusal behavior could use the `llama-3-70B-Instruct-abliterated` model as a starting point for experimentation. The model could potentially be fine-tuned or used in conjunction with other safety mechanisms to develop conversational AI applications that are less likely to refuse requests or lecture users. However, great care should be taken when deploying such models in real-world applications, as the long-term effects of the weight manipulation are not well understood.

## Things to try

Developers could try prompting the `llama-3-70B-Instruct-abliterated` model with a variety of requests, both benign and potentially sensitive, to observe how it responds. This could help identify any remaining biases or tendencies to refuse or lecture. Additionally, developers could experiment with techniques to further fine-tune or constrain the model's behavior, while monitoring for any unintended consequences or safety concerns.

## Model overview

The `Llama-MopeyMule-3-8B-Instruct` model is an orthogonalized version of the larger Llama-3 language model. This specialized model has been designed to exhibit a muted, unengaged and melancholic conversational style. It tends to provide brief, vague responses with a lack of enthusiasm and detail, often avoiding problem-solving or creative suggestions. The model was created by [failspy](https://aimodels.fyi/creators/huggingFace/failspy) using an orthogonalization technique described in a [research paper](https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction).

## Model inputs and outputs

The `Llama-MopeyMule-3-8B-Instruct` model is a text-to-text model, meaning it takes text as input and generates text as output.

### Inputs
- Natural language prompts

### Outputs
- Text responses in a muted, melancholic tone

## Capabilities

The `Llama-MopeyMule-3-8B-Instruct` model is capable of generating text that conveys a distinct unengaged and irritable personality. It tends to provide minimal problem-solving or creative suggestions, instead offering brief and vague responses. This contrasts with the generally positive and helpful nature of the standard Llama-3 model.

## What can I use it for?

The `Llama-MopeyMule-3-8B-Instruct` model could be used in applications that require a muted, melancholic conversational tone, such as creative writing, character development, or building empathy for less-than-enthusiastic personas. However, it may not be suitable for applications that require a more positive or problem-solving orientation.

## Things to try

Experiment with prompts that elicit a muted, irritable response from the model, and observe how it differs from a standard Llama-3 model. You could also explore ways to further amplify or temper the model's melancholic tendencies through additional fine-tuning or prompting.