[![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/5s12oq859qLfDkkTNam_C.png)](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/5s12oq859qLfDkkTNam_C.png)

[](#-einstein-v61-llama3-8b) Einstein-v6.1-Llama3-8B
========================================================

This model is a full fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on diverse datasets.

This model is finetuned using `8xRTX3090` + `1xRTXA6000` using [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).

This model's training was sponsored by [sablo.ai](https://sablo.ai).

See axolotl config

axolotl version: `0.4.0`

    base_model: meta-llama/Meta-Llama-3-8B
    model_type: LlamaForCausalLM
    tokenizer_type: AutoTokenizer
    
    load_in_8bit: false
    load_in_4bit: false
    strict: false
    
    chat_template: chatml
    datasets:
      - path: data/merged_all.json
        ds_type: json
        type: alpaca
        conversation: chatml
    
      - path: data/gpteacher-instruct-special-alpaca.json
        ds_type: json
        type: gpteacher
        conversation: chatml
    
      - path: data/wizardlm_evol_instruct_70k_random_half.json
        ds_type: json
        type: alpaca
        conversation: chatml
    
      - path: data/capybara_sharegpt.json
        ds_type: json
        type: sharegpt
        conversation: chatml
    
      - path: data/synthia-v1.3_sharegpt_12500.json
        ds_type: json
        type: sharegpt
        conversation: chatml  
    
      - path: data/cot_alpaca_gpt4_extracted_openhermes_2.5_sharegpt.json
        ds_type: json
        type: sharegpt
        conversation: chatml
    
      - path: data/slimorca_dedup_filtered_95k_sharegpt.json
        ds_type: json
        type: sharegpt
        conversation: chatml  
    
      - path: data/airoboros_3.2_without_contextual_slimorca_orca_sharegpt.json
        ds_type: json
        type: sharegpt
        conversation: chatml  
    
      - path: data/allenai_wild_chat_gpt4_english_toxic_random_half_4k_sharegpt.json
        ds_type: json
        type: sharegpt
        strict: false
        conversation: chatml  
    
      - path: data/pippa_bagel_repo_3k_sharegpt.json
        ds_type: json
        type: sharegpt
        conversation: chatml  
    
      - path: data/gpt4_data_lmys_1m_sharegpt.json
        ds_type: json
        type: sharegpt
        conversation: chatml  
    
      - path: data/sharegpt_gpt4_english.json
        ds_type: json
        type: sharegpt
        conversation: chatml
    
      - path: data/no_robots_sharegpt.json
        ds_type: json
        type: sharegpt
        strict: false
        conversation: chatml
    
      - path: data/oasst_top1_from_fusechatmixture_sharegpt.json
        ds_type: json
        type: sharegpt
        strict: false
        conversation: chatml
    
      - path: data/everythinglm-data-v3_sharegpt.json
        ds_type: json
        type: sharegpt
        strict: false
        conversation: chatml
    
    dataset_prepared_path: last_run_prepared
    val_set_size: 0.002
    
    output_dir: ./Einstein-v6.1-Llama3-8B-model
    
    sequence_len: 8192
    sample_packing: true
    pad_to_sequence_len: true
    eval_sample_packing: false
    
    wandb_project: Einstein
    wandb_entity:
    wandb_watch:
    wandb_name: Einstein-v6.1-Llama3-2-epoch
    wandb_log_model:
    hub_model_id: Weyaxi/Einstein-v6.1-Llama3-8B
    
    save_safetensors: true
    
    gradient_accumulation_steps: 4
    micro_batch_size: 1
    num_epochs: 2
    optimizer: adamw_bnb_8bit # look
    lr_scheduler: cosine
    learning_rate: 0.000005 # look
    
    train_on_inputs: false
    group_by_length: false
    bf16: true
    fp16: false
    tf32: false
    
    gradient_checkpointing: true
    early_stopping_patience:
    resume_from_checkpoint:
    local_rank:
    logging_steps: 1
    xformers_attention:
    flash_attention: true
    
    warmup_steps: 10
    evals_per_epoch: 2
    eval_table_size:
    eval_table_max_new_tokens: 128
    saves_per_epoch: 2
    debug:
    
    deepspeed: zero3_bf16_cpuoffload_params.json
    weight_decay: 0.0
    fsdp:
    fsdp_config:
    special_tokens:
      bos_token: "<s>"
      eos_token: "<|im_end|>"
      unk_token: "<unk>"
      pad_token: <|end_of_text|> # changed
    tokens:
      - "<|im_start|>"
  

[](#-prompt-template) Prompt Template
=========================================

You can use ChatML prompt template while using the model:

### [](#chatml)ChatML

    <|im_start|>system
    {system}<|im_end|>
    <|im_start|>user
    {user}<|im_end|>
    <|im_start|>assistant
    {asistant}<|im_end|>
    

This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the `tokenizer.apply_chat_template()` method:

    messages = [
        {"role": "system", "content": "You are helpful AI asistant."},
        {"role": "user", "content": "Hello!"}
    ]
    gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
    model.generate(**gen_input)
    

[](#-datasets-used-in-this-model) Datasets used in this model
=================================================================

The datasets used to train this model are listed in the metadata section of the model card.

Please note that certain datasets mentioned in the metadata may have undergone filtering based on various criteria.

The results of this filtering process and its outcomes are in the data folder of this repository:

[Weyaxi/Einstein-v6.1-Llama3-8B/data](https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B/tree/main/data)

[](#-quantizationed-versions) Quantizationed versions
=========================================================

[](#gguf-bartowski)GGUF [@bartowski](https://huggingface.co/bartowski)
----------------------------------------------------------------------

*   [https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF)

[](#exllamav2-bartowski)ExLlamaV2 [@bartowski](https://huggingface.co/bartowski)
--------------------------------------------------------------------------------

*   [https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-exl2](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-exl2)

[](#awq-solidrust)AWQ [@solidrust](https://huggingface.co/solidrust)
--------------------------------------------------------------------

*   [https://huggingface.co/solidrust/Einstein-v6.1-Llama3-8B-AWQ](https://huggingface.co/solidrust/Einstein-v6.1-Llama3-8B-AWQ)

[](#-open-llm-leaderboard-evaluation-results) [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
=============================================================================================================================================================

Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v6.1-Llama3-8B)

Metric

Value

Avg.

68.60

AI2 Reasoning Challenge (25-Shot)

62.46

HellaSwag (10-Shot)

82.41

MMLU (5-Shot)

66.19

TruthfulQA (0-shot)

55.10

Winogrande (5-shot)

79.32

GSM8k (5-shot)

66.11

[](#-additional-information-about-training) Additional information about training
=====================================================================================

This model is full fine-tuned for 2 epoch.

Total number of steps was 2026.

Loss graph

[![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/Ycs7ZpoqmxFt0u9rybCO1.png)](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/Ycs7ZpoqmxFt0u9rybCO1.png)

  

[](#-acknowledgments) Acknowledgments
=========================================

Thanks to [sablo.ai](https://sablo.ai) for sponsoring this model.

Thanks to all the dataset authors mentioned in the datasets section.

Thanks to [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for making the repository I used to make this model.

Thanks to all open source AI community.

[![Built with Axolotl](https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png)](https://github.com/OpenAccess-AI-Collective/axolotl)

If you would like to support me:

[ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)

## Model overview
The `Einstein-v6.1-Llama3-8B` is a fine-tuned version of the [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model, developed by [Weyaxi](https://aimodels.fyi/creators/huggingFace/Weyaxi). This model was trained on diverse datasets using 8xRTX3090 and 1xRTXA6000 GPUs with the [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) framework. The training was sponsored by [sablo.ai](https://sablo.ai).

## Model inputs and outputs

### Inputs
- Textual prompts

### Outputs
- Textual responses

## Capabilities
The `Einstein-v6.1-Llama3-8B` model is a powerful language model capable of generating human-like text across a variety of tasks. It can be used for text generation, question answering, summarization, and more.

## What can I use it for?
The `Einstein-v6.1-Llama3-8B` model can be used for a wide range of natural language processing tasks, such as chatbots, content generation, and language translation. It can be particularly useful for companies looking to automate customer service or create engaging content.

## Things to try
Experiment with the `Einstein-v6.1-Llama3-8B` model to see how it performs on your specific natural language processing tasks. Try fine-tuning the model on your own data to further improve its performance for your use case.