[](#japanese-gpt-neox-36b)japanese-gpt-neox-3.6b
================================================

[![rinna-icon](/rinna/japanese-gpt-neox-3.6b/resolve/main/rinna.png)](/rinna/japanese-gpt-neox-3.6b/blob/main/rinna.png)

[](#overview)Overview
=====================

This repository provides a Japanese GPT-NeoX model of 3.6 billion parameters.

*   **Library**
    
    The model was trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
    
*   **Model architecture**
    
    A 36-layer, 2816-hidden-size transformer-based language model.
    
*   **Pre-training**
    
    The model was trained on around **312.5B** tokens from [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz), [Japanese C4](https://huggingface.co/datasets/mc4), and [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) to optimize a traditional language modelling objective.
    
    A final validation perplexity of **8.68** has been reached.
    
*   **Model Series**
    
    Variant
    
    Link
    
    3.6B PPO
    
    [https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-ppo](https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-ppo)
    
    3.6B SFT-v2
    
    [https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft-v2](https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft-v2)
    
    3.6B SFT
    
    [https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft](https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft)
    
    3.6B pretrained
    
    [https://huggingface.co/rinna/japanese-gpt-neox-3.6b](https://huggingface.co/rinna/japanese-gpt-neox-3.6b)
    
*   **Contributors**
    
    [Tianyu Zhao](https://huggingface.co/tianyuz) and [Kei Sawada](https://huggingface.co/keisawada)
    

[](#how-to-use-the-model)How to use the model
=============================================

    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt-neox-3.6b", use_fast=False)
    model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt-neox-3.6b")
    
    if torch.cuda.is_available():
        model = model.to("cuda")
    
    text = ""
    token_ids = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt")
    
    with torch.no_grad():
        output_ids = model.generate(
            token_ids.to(model.device),
            max_new_tokens=100,
            min_new_tokens=100,
            do_sample=True,
            temperature=0.8,
            pad_token_id=tokenizer.pad_token_id,
            bos_token_id=tokenizer.bos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    output = tokenizer.decode(output_ids.tolist()[0])
    print(output)
    """"""
    

[](#tokenization)Tokenization
=============================

The model uses a [sentencepiece](https://github.com/google/sentencepiece)\-based tokenizer.

*   The tokenizer has a vocabulary size of 32,000.
*   It uses sentencepiece's byte fallback feature to decompose unknown text pieces into UTF-8 byte pieces and to avoid producing `<UNK>` tokens.
*   sentencepiece's `--add_dummy_prefix` option was turned off so that a leading whitespace will not be prepended automatically.
    
          print(tokenizer.tokenize(""))
          # ['', '', '', '', '']
          # instead of ['', '', '', '', '', ''] as in rinna/japanese-gpt-1b
        
    
*   sentencepiece's `--remove_extra_whitespaces` option was turned off so that leading, trailing, and duplicate whitespaces are reserved.
    
          print(tokenizer.tokenize("       "))
          # ['', '', '', '', '', '', '', '', '', '', '', '']
          # instead of ['', '', '', '', '', ''] as in rinna/japanese-gpt-1b
        
    
*   Don't forget to set `use_fast=False` to make the above features function correctly.
    
          good_tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt-neox-3.6b", use_fast=False)
          bad_tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt-neox-3.6b")
        
          print(good_tokenizer.decode(good_tokenizer.encode("       ")))
          # '       </s>'
          print(bad_tokenizer.decode(bad_tokenizer.encode("       ")))
          # '[UNK]   </s>'
        
    

[](#how-to-cite)How to cite
===========================

    @misc{rinna-japanese-gpt-neox-3.6b,
        title = {rinna/japanese-gpt-neox-3.6b},
        author = {Zhao, Tianyu and Sawada, Kei}
        url = {https://huggingface.co/rinna/japanese-gpt-neox-3.6b},
    }
    
    @inproceedings{sawada2024release,
        title = {Release of Pre-Trained Models for the {J}apanese Language},
        author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
        booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
        month = {5},
        year = {2024},
        url = {https://arxiv.org/abs/2404.01657},
    }
    

[](#licenese)Licenese
=====================

[The MIT license](https://opensource.org/licenses/MIT)

## Model Overview

The `japanese-gpt-neox-3.6b` is a 3.6 billion parameter Japanese language model developed by [rinna](https://aimodels.fyi/creators/huggingFace/rinna). The model was trained using the [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox) codebase on a dataset of over 312.5 billion Japanese tokens from sources like Japanese CC-100, Japanese C4, and Japanese Wikipedia. This results in a model with a validation perplexity of 8.68. 

The model comes in several variants, including an instruction-following fine-tuned version (`rinna/japanese-gpt-neox-3.6b-instruction-sft`) and a reinforcement learning version (`rinna/japanese-gpt-neox-3.6b-instruction-ppo`). These variants allow the model to better understand and follow human instructions.

In comparison, the [gpt-neox-20b](https://aimodels.fyi/models/huggingFace/gpt-neox-20b-eleutherai) model is a 20 billion parameter English language model trained by EleutherAI, while the [mGPT](https://aimodels.fyi/models/huggingFace/mgpt-ai-forever) model is a 1.3 billion parameter multilingual model developed by AI-Forever covering 61 languages. The [gpt-j-6b](https://aimodels.fyi/models/huggingFace/gpt-j-6b-eleutherai) model is a 6 billion parameter English language model developed by EleutherAI.

## Model Inputs and Outputs

### Inputs
- Text prompts in Japanese for the model to continue and generate additional text.

### Outputs
- Continued Japanese text generated by the model based on the input prompt.

## Capabilities

The `japanese-gpt-neox-3.6b` model can be used for a variety of Japanese language tasks, such as text generation, summarization, translation, and question answering. The model's strong performance on the Japanese language corpus allows it to generate coherent and contextually relevant Japanese text. 

The fine-tuned variants of the model, like `rinna/japanese-gpt-neox-3.6b-instruction-sft`, demonstrate an even stronger ability to understand and follow human instructions, making them useful for building interactive Japanese language assistants or chatbots.

## What Can I Use It For?

The `japanese-gpt-neox-3.6b` model can be a valuable tool for Japanese language researchers and developers. It can be used as a base model for fine-tuning on specific Japanese language tasks, or as a starting point for developing personalized Japanese language applications.

For example, a Japanese language tutoring app could use the model to generate natural Japanese responses to student prompts, providing an immersive language learning experience. Alternatively, a Japanese e-commerce platform could leverage the model's text generation capabilities to automatically produce product descriptions and summaries.

The instruction-following variants of the model, like `rinna/japanese-gpt-neox-3.6b-instruction-sft`, could be used to build sophisticated Japanese language assistants that can understand and execute complex user requests.

## Things to Try

One interesting aspect of the `japanese-gpt-neox-3.6b` model is its ability to generate coherent and contextually relevant Japanese text. Try providing the model with a Japanese sentence or paragraph as a prompt and see how it continues the text. Observe how the model maintains the style, tone, and overall coherence of the generated output.

You can also experiment with the different variants of the model, like `rinna/japanese-gpt-neox-3.6b-instruction-sft`, and compare their performance on tasks that require understanding and following human instructions. This can give you insights into the model's robustness and potential applications.