[](#llama-30b-instruct-2048-model-card)LLaMa-30b-instruct-2048 model card
=========================================================================

[](#model-details)Model Details
-------------------------------

*   **Developed by**: [Upstage](https://en.upstage.ai)
*   **Backbone Model**: [LLaMA](https://github.com/facebookresearch/llama/tree/llama_v1)
*   **Variations**: It has different model parameter sizes and sequence lengths: [30B/1024](https://huggingface.co/upstage/llama-30b-instruct), [30B/2048](https://huggingface.co/upstage/llama-30b-instruct-2048), [65B/1024](https://huggingface.co/upstage/llama-65b-instruct)
*   **Language(s)**: English
*   **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
*   **License**: This model is under a **Non-commercial** Bespoke License and governed by the Meta license. You should only use this repository if you have been granted access to the model by filling out [this form](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform), but have either lost your copy of the weights or encountered issues converting them to the Transformers format
*   **Where to send comments**: Instructions on how to provide feedback or comments on a model can be found by opening an issue in the [Hugging Face community's model repository](https://huggingface.co/upstage/llama-30b-instruct-2048/discussions)
*   **Contact**: For questions and comments about the model, please email [contact@upstage.ai](mailto:contact@upstage.ai)

[](#dataset-details)Dataset Details
-----------------------------------

### [](#used-datasets)Used Datasets

*   [openbookqa](https://huggingface.co/datasets/openbookqa)
*   [sciq](https://huggingface.co/datasets/sciq)
*   [Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca)
*   [metaeval/ScienceQA\_text\_only](https://huggingface.co/datasets/metaeval/ScienceQA_text_only)
*   [GAIR/lima](https://huggingface.co/datasets/GAIR/lima)
*   No other data was used except for the dataset mentioned above

### [](#prompt-template)Prompt Template

    ### System:
    {System}
    
    ### User:
    {User}
    
    ### Assistant:
    {Assistant}
    

[](#usage)Usage
---------------

*   Tested on A100 80GB
*   Our model can handle up to 10k+ input tokens, thanks to the `rope_scaling` option

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
    
    tokenizer = AutoTokenizer.from_pretrained("upstage/llama-30b-instruct-2048")
    model = AutoModelForCausalLM.from_pretrained(
        "upstage/llama-30b-instruct-2048",
        device_map="auto",
        torch_dtype=torch.float16,
        load_in_8bit=True,
        rope_scaling={"type": "dynamic", "factor": 2} # allows handling of longer inputs
    )
    
    prompt = "### User:\nThomas is healthy, but he has to go to the hospital. What could be the reasons?\n\n### Assistant:\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    del inputs["token_type_ids"]
    streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
    
    output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=float('inf'))
    output_text = tokenizer.decode(output[0], skip_special_tokens=True)
    

[](#hardware-and-software)Hardware and Software
-----------------------------------------------

*   **Hardware**: We utilized an A100x8 \* 1 for training our model
*   **Training Factors**: We fine-tuned this model using a combination of the [DeepSpeed library](https://github.com/microsoft/DeepSpeed) and the [HuggingFace Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) / [HuggingFace Accelerate](https://huggingface.co/docs/accelerate/index)

[](#evaluation-results)Evaluation Results
-----------------------------------------

### [](#overview)Overview

*   We conducted a performance evaluation based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). We evaluated our model on four benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU`, and `TruthfulQA` We used the [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness), specifically commit [b281b0921b636bc36ad05c0b0b0763bd6dd43463](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463)
*   We used [MT-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge), a set of challenging multi-turn open-ended questions, to evaluate the models

### [](#main-results)Main Results

Model

H4(Avg)

ARC

HellaSwag

MMLU

TruthfulQA

MT\_Bench

**[Llama-2-70b-instruct-v2](https://huggingface.co/upstage/Llama-2-70b-instruct-v2)**(Ours, Open LLM Leaderboard)

**73**

**71.1**

**87.9**

**70.6**

**62.2**

**7.44063**

[Llama-2-70b-instruct](https://huggingface.co/upstage/Llama-2-70b-instruct) (Ours, Open LLM Leaderboard)

72.3

70.9

87.5

69.8

61

7.24375

[llama-65b-instruct](https://huggingface.co/upstage/llama-65b-instruct) (Ours, Open LLM Leaderboard)

69.4

67.6

86.5

64.9

58.8

Llama-2-70b-hf

67.3

67.3

87.3

69.8

44.9

[llama-30b-instruct-2048](https://huggingface.co/upstage/llama-30b-instruct-2048) (_**Ours**_, _**Open LLM Leaderboard**_)

67.0

64.9

84.9

61.9

56.3

[llama-30b-instruct](https://huggingface.co/upstage/llama-30b-instruct) (Ours, Open LLM Leaderboard)

65.2

62.5

86.2

59.4

52.8

llama-65b

64.2

63.5

86.1

63.9

43.4

falcon-40b-instruct

63.4

61.6

84.3

55.4

52.5

### [](#scripts-for-h4-score-reproduction)Scripts for H4 Score Reproduction

*   Prepare evaluation environments:

    # clone the repository
    git clone https://github.com/EleutherAI/lm-evaluation-harness.git
    # check out the specific commit
    git checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463
    # change to the repository directory
    cd lm-evaluation-harness
    

[](#ethical-issues)Ethical Issues
---------------------------------

### [](#ethical-considerations)Ethical Considerations

*   There were no ethical issues involved, as we did not include the benchmark test set or the training set in the model's training process

[](#contact-us)Contact Us
-------------------------

### [](#why-upstage-llm)Why Upstage LLM?

*   [Upstage](https://en.upstage.ai)'s LLM research has yielded remarkable results. As of August 1st, our 70B model has reached the top spot in openLLM rankings, marking itself as the current leading performer globally. Recognizing the immense potential in implementing private LLM to actual businesses, we invite you to easily apply private LLM and fine-tune it with your own data. For a seamless and tailored solution, please do not hesitate to reach out to us.  [click here to contact](https://www.upstage.ai/private-llm?utm_source=huggingface&utm_medium=link&utm_campaign=privatellm)

## Model overview

`llama-30b-instruct-2048` is a large language model developed by Upstage, a company focused on creating advanced AI systems. It is based on the LLaMA model released by Facebook Research, with a larger 30 billion parameter size and a longer 2048 token sequence length. The model is designed for text generation and instruction-following tasks, and is optimized for tasks such as open-ended dialogue, content creation, and knowledge-intensive applications.

Similar models include the [Meta-Llama-3-8B-Instruct](https://aimodels.fyi/models/huggingFace/meta-llama-3-8b-instruct-meta-llama) and [Meta-Llama-3-70B](https://aimodels.fyi/models/huggingFace/meta-llama-3-70b-meta-llama) models, which are also large language models developed by Meta with different parameter sizes. The [Llama-2-7b-hf](https://aimodels.fyi/models/huggingFace/llama-2-7b-hf-nousresearch) model from NousResearch is another similar 7 billion parameter model based on the original LLaMA architecture.

## Model inputs and outputs

### Inputs
- The model takes in text prompts as input, which can be in the form of natural language instructions, conversations, or other types of textual data.

### Outputs
- The model generates text outputs in response to the input prompts, producing coherent and contextually relevant responses. The outputs can be used for a variety of language generation tasks, such as open-ended dialogue, content creation, and knowledge-intensive applications.

## Capabilities

The `llama-30b-instruct-2048` model is capable of generating human-like text across a wide range of topics and tasks. It has been trained on a diverse set of datasets, allowing it to demonstrate strong performance on benchmarks measuring commonsense reasoning, world knowledge, and reading comprehension. Additionally, the model has been optimized for instruction-following tasks, making it well-suited for conversational AI and virtual assistant applications.

## What can I use it for?

The `llama-30b-instruct-2048` model can be used for a variety of language generation and understanding tasks. Some potential use cases include:

- **Conversational AI**: The model can be used to power engaging and informative chatbots and virtual assistants, capable of natural dialogue and task completion.
- **Content creation**: The model can be used to generate creative and informative text, such as articles, stories, or product descriptions.
- **Knowledge-intensive applications**: The model's strong performance on benchmarks measuring world knowledge and reasoning makes it well-suited for applications that require in-depth understanding of a domain, such as question-answering systems or intelligent search.

## Things to try

One interesting aspect of the `llama-30b-instruct-2048` model is its ability to handle long input sequences, thanks to the `rope_scaling` option. This allows the model to process and generate text for more complex and open-ended tasks, beyond simple question-answering or dialogue. Developers could experiment with using the model for tasks like multi-step reasoning, long-form content generation, or even code generation and explanation.

Another interesting aspect to explore is the model's safety and alignment features. As mentioned in the [maintainer's profile](https://aimodels.fyi/creators/huggingFace/upstage), the model has been carefully designed with a focus on responsible AI development, including extensive testing and the implementation of safety mitigations. Developers could investigate how these features affect the model's behavior and outputs, and how they can be further customized to meet the specific needs of their applications.