[](#weblab-10b-instruction-sft)weblab-10b-instruction-sft
=========================================================

[](#overview)Overview
=====================

This repository provides a Japanese-centric multilingual GPT-NeoX model of 10 billion parameters.

*   **Library**
    
    The model was trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
    
*   **Model architecture**
    
    A 36-layer, 4864-hidden-size transformer-based language model.
    
*   **Pre-training**
    
    The model was trained on around **600B** tokens from a mixture of the following corpora.
    
    *   [Japanese C4](https://huggingface.co/datasets/mc4)
    *   [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
*   **Instruction-supervised-finetuning**
    
    The model was finetuned on a subset records from a mixture of the following dataset. Training epoch: 1.
    
    *   [Alpaca (English)](https://github.com/gururise/AlpacaDataCleaned/blob/main/alpaca_data_cleaned.json)
    *   [Alpaca (Japanese translation)](https://github.com/shi3z/alpaca_ja/blob/main/alpaca_cleaned_ja.json)
    *   [Flan 2021 (English)](https://huggingface.co/datasets/conceptofmind/flan2021_submix_original)
    *   [Flan CoT (English)](https://huggingface.co/datasets/conceptofmind/cot_submix_original)
    *   [Flan Dialog (English)](https://huggingface.co/datasets/conceptofmind/dialog_submix_original)
*   **Model Series**
    
    Variant
    
    Link
    
    weblab-10b-instruction-sft
    
    [https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft)
    
    weblab-10b
    
    [https://huggingface.co/matsuo-lab/weblab-10b](https://huggingface.co/matsuo-lab/weblab-10b)
    
*   **Authors**
    
    Takeshi Kojima
    

* * *

[](#benchmarking)Benchmarking
=============================

*   **Japanese benchmark : JGLUE 8-task (2023-08-27)**
    
    *   _We used [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/2f1583c0735eacdfdfa5b7d656074b69577b6774) library for evaluation._
    *   _The 8-task average accuracy is based on results of JCommonsenseQA-1.1, JNLI-1.1, MARC-ja-1.1, JSQuAD-1.1, jaqket\_v2-0.2, xlsum\_ja-1.0, xwinograd\_ja, and mgsm-1.0._
    *   _model loading is performed with float16, and evaluation is performed with template version 0.3 using the few-shot in-context learning._
    *   _The number of few-shots is 3,3,3,2,1,1,0,5._
    *   _special\_tokens\_map.json is modified to avoid errors during the evaluation of the second half benchmarks. As a result, the results of the first half benchmarks became slightly different._
    
    model
    
    average
    
    jcommonsenseqa
    
    jnli
    
    marc\_ja
    
    jsquad
    
    jaqket\_v2
    
    xlsum\_ja
    
    xwinograd\_ja
    
    mgsm
    
    weblab-10b-instruction-sft
    
    59.11
    
    74.62
    
    66.56
    
    95.49
    
    78.34
    
    63.32
    
    20.57
    
    71.95
    
    2
    
    weblab-10b
    
    50.74
    
    66.58
    
    53.74
    
    82.07
    
    62.94
    
    56.19
    
    10.03
    
    71.95
    
    2.4
    
*   **Japanese benchmark : JGLUE 4-task (2023-08-18)**
    
    *   _We used [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/2f1583c0735eacdfdfa5b7d656074b69577b6774) library for evaluation._
    *   _The 4-task average accuracy is based on results of JCommonsenseQA-1.1, JNLI-1.1, MARC-ja-1.1, and JSQuAD-1.1._
    *   _model loading is performed with float16, and evaluation is performed with template version 0.3 using the few-shot in-context learning._
    *   _The number of few-shots is 3,3,3,2._
    
    Model
    
    Average
    
    JCommonsenseQA
    
    JNLI
    
    MARC-ja
    
    JSQuAD
    
    weblab-10b-instruction-sft
    
    78.78
    
    74.35
    
    65.65
    
    96.06
    
    79.04
    
    weblab-10b
    
    66.38
    
    65.86
    
    54.19
    
    84.49
    
    60.98
    

* * *

[](#how-to-use-the-model)How to use the model
=============================================

    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("matsuo-lab/weblab-10b-instruction-sft")
    model = AutoModelForCausalLM.from_pretrained("matsuo-lab/weblab-10b-instruction-sft", torch_dtype=torch.float16)
    
    if torch.cuda.is_available():
        model = model.to("cuda")
    
    text = ""
    text = f'\n\n### :\n{text}\n\n### :'
    token_ids = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt")
    
    with torch.no_grad():
        output_ids = model.generate(
            token_ids.to(model.device),
            max_new_tokens=100,
            do_sample=True,
            temperature=0.7,
            top_p=0.95
        )
    
    output = tokenizer.decode(output_ids.tolist()[0])
    print(output)
    

* * *

[](#licenese)Licenese
=====================

[cc-by-nc-4.0](https://creativecommons.org/licenses/by-nc/4.0/)

## Model overview

The `weblab-10b-instruction-sft` is a Japanese-centric multilingual GPT-NeoX model with 10 billion parameters. Trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox), it has a 36-layer, 4864-hidden-size transformer architecture. The model was pre-trained on around 600B tokens from a mixture of the [Japanese C4](https://huggingface.co/datasets/mc4) and [The Pile](https://huggingface.co/datasets/EleutherAI/pile) datasets. It was then finetuned on a subset of records from datasets like [Alpaca (English)](https://github.com/gururise/AlpacaDataCleaned/blob/main/alpaca_data_cleaned.json), [Alpaca (Japanese translation)](https://github.com/shi3z/alpaca_ja/blob/main/alpaca_cleaned_ja.json), and others to serve as an instruction-following conversational agent.

This model can be contrasted with the [japanese-gpt-neox-3.6b-instruction-sft](https://aimodels.fyi/models/huggingFace/japanese-gpt-neox-36b-instruction-sft-rinna) model, which is a 3.6 billion parameter Japanese GPT-NeoX model that has also been finetuned for instruction following. The key differences are the larger parameter size and broader pre-training dataset of the `weblab-10b-instruction-sft` model.

## Model inputs and outputs

### Inputs
- **Text prompts**: The model takes in text prompts, which can include multi-turn conversations or instructions for the model to follow.

### Outputs 
- **Generated text**: The model outputs generated text that continues or responds to the provided prompt. This can include generating coherent, contextual responses to instructions or conversational prompts.

## Capabilities

The `weblab-10b-instruction-sft` model can be used for a variety of language generation and understanding tasks, particularly ones involving Japanese. It demonstrates strong performance on the JGLUE 8-task evaluation, achieving high accuracy on tasks like JCommonsenseQA, JNLI, and MARC-ja. The model's large size and broad training data allow it to generate fluent, contextual responses to open-ended prompts, making it suitable for applications like chatbots and language assistants.

## What can I use it for?

The `weblab-10b-instruction-sft` model could be a good starting point for building Japanese-language chatbots, virtual assistants, or other applications that require fluent text generation and language understanding. Its multilingual capabilities also allow it to potentially be used for cross-lingual applications. However, as with any large language model, it's important to carefully curate and filter the model's outputs to ensure safety and mitigate potential biases or inaccuracies.

## Things to try

One interesting aspect of the `weblab-10b-instruction-sft` model is its ability to follow instructions and engage in open-ended dialogue. Prompts that involve multi-turn conversations or provide specific tasks or objectives for the model to complete could be a productive area to explore, leveraging the model's strong performance on the JGLUE benchmarks. Experimenting with different prompting techniques and finetuning approaches may also help unlock the model's full potential for downstream applications.

[](#weblab-10b)weblab-10b
=========================

[](#overview)Overview
=====================

This repository provides a Japanese-centric multilingual GPT-NeoX model of 10 billion parameters.

*   **Library**
    
    The model was trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
    
*   **Model architecture**
    
    A 36-layer, 4864-hidden-size transformer-based language model.
    
*   **Pre-training**
    
    The model was trained on around **600B** tokens from a mixture of the following corpora.
    
    *   [Japanese C4](https://huggingface.co/datasets/mc4)
    *   [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
*   **Model Series**
    
    Variant
    
    Link
    
    weblab-10b-instruction-sft
    
    [https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft)
    
    weblab-10b
    
    [https://huggingface.co/matsuo-lab/weblab-10b](https://huggingface.co/matsuo-lab/weblab-10b)
    
*   **Authors**
    
    Takeshi Kojima
    

* * *

[](#benchmarking)Benchmarking
=============================

*   **Japanese benchmark : JGLUE 8-task (2023-08-27)**
    
    *   _We used [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/2f1583c0735eacdfdfa5b7d656074b69577b6774) library for evaluation._
    *   _The 8-task average accuracy is based on results of JCommonsenseQA-1.1, JNLI-1.1, MARC-ja-1.1, JSQuAD-1.1, jaqket\_v2-0.2, xlsum\_ja-1.0, xwinograd\_ja, and mgsm-1.0._
    *   _model loading is performed with float16, and evaluation is performed with template version 0.3 using the few-shot in-context learning._
    *   _The number of few-shots is 3,3,3,2,1,1,0,5._
    *   _special\_tokens\_map.json is modified to avoid errors during the evaluation of the second half benchmarks. As a result, the results of the first half benchmarks became slightly different._
    
    model
    
    average
    
    jcommonsenseqa
    
    jnli
    
    marc\_ja
    
    jsquad
    
    jaqket\_v2
    
    xlsum\_ja
    
    xwinograd\_ja
    
    mgsm
    
    weblab-10b-instruction-sft
    
    59.11
    
    74.62
    
    66.56
    
    95.49
    
    78.34
    
    63.32
    
    20.57
    
    71.95
    
    2
    
    weblab-10b
    
    50.74
    
    66.58
    
    53.74
    
    82.07
    
    62.94
    
    56.19
    
    10.03
    
    71.95
    
    2.4
    
*   **Japanese benchmark : JGLUE 4-task (2023-08-18)**
    
    *   _We used [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/2f1583c0735eacdfdfa5b7d656074b69577b6774) library for evaluation._
    *   _The 4-task average accuracy is based on results of JCommonsenseQA-1.1, JNLI-1.1, MARC-ja-1.1, and JSQuAD-1.1._
    *   _model loading is performed with float16, and evaluation is performed with template version 0.3 using the few-shot in-context learning._
    *   _The number of few-shots is 3,3,3,2._
    
    Model
    
    Average
    
    JCommonsenseQA
    
    JNLI
    
    MARC-ja
    
    JSQuAD
    
    weblab-10b-instruction-sft
    
    78.78
    
    74.35
    
    65.65
    
    96.06
    
    79.04
    
    weblab-10b
    
    66.38
    
    65.86
    
    54.19
    
    84.49
    
    60.98
    

* * *

[](#how-to-use-the-model)How to use the model
=============================================

    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("matsuo-lab/weblab-10b")
    model = AutoModelForCausalLM.from_pretrained("matsuo-lab/weblab-10b", torch_dtype=torch.float16)
    
    if torch.cuda.is_available():
        model = model.to("cuda")
    
    text = ""
    token_ids = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt")
    
    with torch.no_grad():
        output_ids = model.generate(
            token_ids.to(model.device),
            max_new_tokens=100,
            do_sample=True,
            temperature=0.7,
            top_p=0.95
        )
    
    output = tokenizer.decode(output_ids.tolist()[0])
    print(output)
    

* * *

[](#licenese)Licenese
=====================

[cc-by-nc-4.0](https://creativecommons.org/licenses/by-nc/4.0/)

## Model overview

The `weblab-10b` is a Japanese-centric multilingual GPT-NeoX model with 10 billion parameters, developed by [matsuo-lab](https://aimodels.fyi/creators/huggingFace/matsuo-lab). It was trained on a mixture of the [Japanese C4](https://huggingface.co/datasets/mc4) and [The Pile](https://huggingface.co/datasets/EleutherAI/pile) datasets, totaling around 600 billion tokens. The model architecture consists of 36 layers and a 4864-hidden size, making it a large and powerful language model. Similar models in the series include the [`weblab-10b-instruction-sft`](https://aimodels.fyi/models/huggingFace/weblab-10b-instruction-sft-matsuo-lab) variant, which has been fine-tuned for instruction-following.

## Model inputs and outputs

The `weblab-10b` model takes in text as input and generates text as output, making it a versatile text-to-text language model. It can be used for a variety of natural language processing tasks, such as text generation, language understanding, and language translation.

### Inputs
- Text prompt: The model accepts arbitrary text as input, which it then uses to generate additional text.

### Outputs
- Generated text: The model outputs generated text that continues or responds to the input prompt. The length and content of the output can be controlled through various generation parameters.

## Capabilities

The `weblab-10b` model has demonstrated strong performance on a range of Japanese language tasks, including commonsense question answering, natural language inference, and summarization. Its large scale and multilingual nature make it a powerful tool for working with Japanese language data.

## What can I use it for?

The `weblab-10b` model can be used for a variety of applications, such as:

- **Text generation**: The model can be used to generate coherent and context-appropriate Japanese text, which can be useful for tasks like creative writing, dialogue generation, or report summarization.
- **Language understanding**: By fine-tuning the model on specific tasks, it can be used to improve performance on a range of Japanese NLP tasks, such as [question answering](https://aimodels.fyi/models/huggingFace/weblab-10b-instruction-sft-matsuo-lab) or [text classification](https://aimodels.fyi/models/huggingFace/japanese-gpt-neox-36b-rinna).
- **Multilingual applications**: The model's multilingual capabilities can be leveraged for applications that require translation or cross-lingual understanding.

## Things to try

One interesting aspect of the `weblab-10b` model is its strong performance on Japanese language tasks, which highlights its potential for working with Japanese data. Researchers and developers could explore fine-tuning the model on domain-specific Japanese datasets to tackle specialized problems, or investigating its ability to generate coherent and contextually appropriate Japanese text.

Another area to explore is the model's multilingual capabilities and how they can be leveraged for cross-lingual applications. Experiments could involve testing the model's ability to understand and generate text in multiple languages, or exploring zero-shot or few-shot learning approaches for tasks like machine translation.

Overall, the `weblab-10b` model represents a powerful and flexible language model that can be a valuable tool for a wide range of Japanese and multilingual NLP applications.