Nlp-waseda

Rank:

Average Model Cost: $0.0000

Number of Runs: 5,052

Models by this creator

roberta-base-japanese

roberta-base-japanese

nlp-waseda

Platform did not provide a description for this model.

Read more

$-/run

1.1K

Huggingface

roberta-large-japanese-seq512-with-auto-jumanpp

roberta-large-japanese-seq512-with-auto-jumanpp

nlp-waseda/roberta-large-japanese-seq512-with-auto-jumanpp Model description This is a Japanese RoBERTa large model pretrained on Japanese Wikipedia and the Japanese portion of CC-100 with the maximum sequence length of 512. How to use You can use this model for masked language modeling as follows: You can fine-tune this model on downstream tasks. Tokenization BertJapaneseTokenizer now supports automatic tokenization for Juman++. However, if your dataset is large, you may take a long time since BertJapaneseTokenizer still does not supoort fast tokenization. You can still do the Juman++ tokenization by your self and use the old model nlp-waseda/roberta-large-japanese-seq512. Juman++ 2.0.0-rc3 was used for pretraining. Each word is tokenized into tokens by sentencepiece. Vocabulary The vocabulary consists of 32000 tokens including words (JumanDIC) and subwords induced by the unigram language model of sentencepiece. Training procedure This model was trained on Japanese Wikipedia (as of 20210920) and the Japanese portion of CC-100 from the checkpoint of nlp-waseda/roberta-large-japanese. It took a week using eight NVIDIA A100 GPUs. The following hyperparameters were used during pretraining: learning_rate: 6e-5 distributed_type: multi-GPU num_devices: 8 total_train_batch_size: 4120 (max_seq_length=128), 4032 (max_seq_length=512) max_seq_length: 512 optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-6 lr_scheduler_type: linear training_steps: 670000 (max_seq_length=128) + 70000 (max_seq_length=512) warmup_steps: 10000 mixed_precision_training: Native AMP

Read more

$-/run

902

Huggingface

roberta-large-japanese-seq512

roberta-large-japanese-seq512

nlp-waseda/roberta-large-japanese-seq512 Model description This is a Japanese RoBERTa large model pretrained on Japanese Wikipedia and the Japanese portion of CC-100 with the maximum sequence length of 512. How to use You can use this model for masked language modeling as follows: You can fine-tune this model on downstream tasks. Tokenization The input text should be segmented into words by Juman++ in advance. Juman++ 2.0.0-rc3 was used for pretraining. Each word is tokenized into tokens by sentencepiece. BertJapaneseTokenizer now supports automatic JumanppTokenizer and SentencepieceTokenizer. You can use this model without any data preprocessing. Vocabulary The vocabulary consists of 32000 tokens including words (JumanDIC) and subwords induced by the unigram language model of sentencepiece. Training procedure This model was trained on Japanese Wikipedia (as of 20210920) and the Japanese portion of CC-100 from the checkpoint of nlp-waseda/roberta-large-japanese. It took a week using eight NVIDIA A100 GPUs. The following hyperparameters were used during pretraining: learning_rate: 6e-5 distributed_type: multi-GPU num_devices: 8 total_train_batch_size: 4120 (max_seq_length=128), 4032 (max_seq_length=512) max_seq_length: 512 optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-6 lr_scheduler_type: linear training_steps: 670000 (max_seq_length=128) + 70000 (max_seq_length=512) warmup_steps: 10000 mixed_precision_training: Native AMP

Read more

$-/run

741

Huggingface

gpt2-xl-japanese

gpt2-xl-japanese

nlp-waseda/gpt2-xl-japanese This is Japanese GPT2 with approximately 1.5B parameters pretrained on Japanese Wikipedia and CC-100 The model architecture of the model are based on Radford+ 2019. Intended uses & limitations You can use the raw model for text generation or fine-tune it to a downstream task. Note that the texts should be segmented into words using Juman++ in advance. How to use You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: Preprocessing The texts are normalized using neologdn, segmented into words using Juman++, and tokenized by BPE. Juman++ 2.0.0-rc3 was used for pretraining. The model was trained on 8 NVIDIA A100 GPUs. Acknowledgments This work was supported by Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN) through General Collaboration Project no. jh221004, "Developing a Platform for Constructing and Sharing of Large-Scale Japanese Language Models". For training models, we used the mdx: a platform for the data-driven future.

Read more

$-/run

499

Huggingface

roberta-base-japanese-with-auto-jumanpp

roberta-base-japanese-with-auto-jumanpp

nlp-waseda/roberta-base-japanese-with-auto-jumanpp Model description This is a Japanese RoBERTa base model pretrained on Japanese Wikipedia and the Japanese portion of CC-100. How to use You can use this model for masked language modeling as follows: You can fine-tune this model on downstream tasks. Tokenization BertJapaneseTokenizer now supports automatic tokenization for Juman++. However, if your dataset is large, you may take a long time since BertJapaneseTokenizer still does not supoort fast tokenization. You can still do the Juman++ tokenization by your self and use the old model nlp-waseda/roberta-base-japanese. Juman++ 2.0.0-rc3 was used for pretraining. Each word is tokenized into tokens by sentencepiece. Vocabulary The vocabulary consists of 32000 tokens including words (JumanDIC) and subwords induced by the unigram language model of sentencepiece. Training procedure This model was trained on Japanese Wikipedia (as of 20210920) and the Japanese portion of CC-100. It took a week using eight NVIDIA A100 GPUs. The following hyperparameters were used during pretraining: learning_rate: 1e-4 per_device_train_batch_size: 256 distributed_type: multi-GPU num_devices: 8 gradient_accumulation_steps: 2 total_train_batch_size: 4096 max_seq_length: 128 optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 lr_scheduler_type: linear training_steps: 700000 warmup_steps: 10000 mixed_precision_training: Native AMP Performance on JGLUE See the Baseline Scores of JGLUE.

Read more

$-/run

485

Huggingface

comet-gpt2-small-japanese

comet-gpt2-small-japanese

COMET-GPT2 ja Finetuned GPT-2 on ATOMIC ja using a causal language modeling (CLM) objective. It was introduced in this paper. How to use You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: Preprocessing The texts are segmented into words using Juman++ and tokenized using SentencePiece. Evaluation results The model achieves the following results: BibTeX entry and citation info

Read more

$-/run

431

Huggingface

roberta-large-japanese-with-auto-jumanpp

roberta-large-japanese-with-auto-jumanpp

nlp-waseda/roberta-large-japanese-with-auto-jumanpp Model description This is a Japanese RoBERTa large model pretrained on Japanese Wikipedia and the Japanese portion of CC-100. How to use You can use this model for masked language modeling as follows: You can fine-tune this model on downstream tasks. Tokenization BertJapaneseTokenizer now supports automatic tokenization for Juman++. However, if your dataset is large, you may take a long time since BertJapaneseTokenizer still does not supoort fast tokenization. You can still do the Juman++ tokenization by your self and use the old model nlp-waseda/roberta-large-japanese. Juman++ 2.0.0-rc3 was used for pretraining. Each word is tokenized into tokens by sentencepiece. Vocabulary The vocabulary consists of 32000 tokens including words (JumanDIC) and subwords induced by the unigram language model of sentencepiece. Training procedure This model was trained on Japanese Wikipedia (as of 20210920) and the Japanese portion of CC-100. It took two weeks using eight NVIDIA A100 GPUs. The following hyperparameters were used during pretraining: learning_rate: 6e-5 per_device_train_batch_size: 103 distributed_type: multi-GPU num_devices: 8 gradient_accumulation_steps: 5 total_train_batch_size: 4120 max_seq_length: 128 optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-6 lr_scheduler_type: linear training_steps: 670000 warmup_steps: 10000 mixed_precision_training: Native AMP Performance on JGLUE See the Baseline Scores of JGLUE.

Read more

$-/run

254

Huggingface

gpt2-small-japanese

gpt2-small-japanese

nlp-waseda/gpt2-small-japanese This model is Japanese GPT-2 pretrained on Japanese Wikipedia and CC-100. Intended uses & limitations You can use the raw model for text generation or fine-tune it to a downstream task. Note that the texts should be segmented into words using Juman++ in advance. How to use You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: Here is how to use this model to get the features of a given text in PyTorch: Training data The GPT-2 model was pretrained on Japanese Wikipedia, dumped on 2022-03-20, and the Japanese portion of CC-100. Training procedure Preprocessing The texts are normalized using zenhan, segmented into words using Juman++, and tokenized using SentencePiece. Juman++ 2.0.0-rc3 was used for pretraining. The model was trained on 8 NVIDIA A100 GPUs.

Read more

$-/run

239

Huggingface

Similar creators