A 7B parameter LLM fine-tuned to support contexts with more than 65K tokens

## Model overview

`mpt-7b-storywriter` is a 7 billion parameter language model fine-tuned by [MosaicML](https://aimodels.fyi/creators/replicate/replicate) to excel at generating long-form fictional stories. It was built by fine-tuning the MPT-7B model on a filtered subset of the books3 dataset, with a focus on stories. Unlike a standard language model, `mpt-7b-storywriter` can handle very long context lengths of up to 65,536 tokens thanks to the use of Attention with Linear Biases (ALiBi). MosaicML has demonstrated the model's ability to generate coherent stories with up to 84,000 tokens on a single node of 8 A100 GPUs.

This model shares similarities with other large language models like [LLAMA-7B](https://aimodels.fyi/models/replicate/llama-7b-replicate) and [LLAMA-2-7B](https://aimodels.fyi/models/replicate/llama-2-7b-meta) in terms of model size and architecture. However, `mpt-7b-storywriter` is specifically tailored for long-form story generation through its fine-tuning on fiction datasets and use of ALiBi.

## Model inputs and outputs

### Inputs
- **Prompt**: The starting text to use as a prompt for the model to continue generating.
- **Max Length**: The maximum number of tokens to generate.
- **Temperature**: Controls the randomness of the generated text, with higher values producing more diverse and unpredictable output.
- **Top P**: Limits the model to sampling from the top P% of the most likely tokens, reducing randomness.
- **Repetition Penalty**: Discourages the model from repeating the same words or phrases.
- **Length Penalty**: Adjusts the model's preference for generating longer or shorter sequences.
- **Seed**: Sets a random seed for reproducible outputs.
- **Debug**: Provides additional logging for debugging purposes.

### Outputs
- **Generated Text**: The text generated by the model, continuing the provided prompt.

## Capabilities

`mpt-7b-storywriter` excels at generating long-form, coherent fictional stories. It can maintain narrative consistency and flow over thousands of tokens, making it a powerful tool for creative writing tasks. The model's ability to handle extremely long context lengths sets it apart from standard language models, allowing for more immersive and engaging story generation.

## What can I use it for?

`mpt-7b-storywriter` is well-suited for a variety of creative writing and storytelling applications. Writers and authors could use it to generate story ideas, plot outlines, or even full-length novels with the model's guidance. Content creators could leverage the model to produce engaging fiction for interactive experiences, games, or multimedia projects.

Additionally, the model's capabilities could be harnessed for educational purposes, such as helping students with creative writing exercises or inspiring them to explore their own storytelling abilities.

## Things to try

One interesting aspect of `mpt-7b-storywriter` is its ability to extrapolate beyond its training context length of 65,536 tokens. By adjusting the `max_seq_len` parameter in the model's configuration, you can experiment with generating even longer stories, potentially unlocking new narrative possibilities.

Another avenue to explore is the model's behavior with different prompt styles or genres. Try providing it with various types of story starters, from fantasy epics to slice-of-life dramas, and observe how the generated content adapts to the specific narrative context.