Ku-nlp

Rank:

Average Model Cost: $0.0000

Number of Runs: 52,501

Models by this creator

deberta-v2-base-japanese-char-wwm

deberta-v2-base-japanese-char-wwm

ku-nlp

No description available.

Read more

$-/run

15.7K

Huggingface

deberta-v2-base-japanese

deberta-v2-base-japanese

The deberta-v2-base-japanese model is a Japanese language model based on the DeBERTa V2 architecture. It has been pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR. The model can be used for masked language modeling tasks and can also be fine-tuned on downstream tasks. The input text should be segmented into words using Juman++ and then tokenized into subwords using SentencePiece. The training data consists of a combination of the above-mentioned corpora, totaling 171GB in size. The model has been trained using hyperparameters such as a learning rate of 2e-4, batch size of 44 per device, and a total batch size of 2,112. The accuracy of the model on the masked language modeling task is 0.779. The model has also been fine-tuned and evaluated on the dev set of JGLUE for various NLU tasks. This work was supported by JHPCN and the mdx platform.

Read more

$-/run

13.1K

Huggingface

deberta-v2-tiny-japanese-char-wwm

deberta-v2-tiny-japanese-char-wwm

This model, deberta-v2-tiny-japanese-char-wwm, is a Japanese character-level language model. It is pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR. The model uses character-level tokenization and whole word masking. You can use this model for masked language modeling tasks or fine-tune it for downstream tasks. The training data consists of Japanese Wikipedia, CC-100, and OSCAR, with a total size of 171GB. The model was trained using Juman++ for whole word masking and the transformers library. The training took one day using 8 NVIDIA A100-SXM4-40GB GPUs. The training hyperparameters and the accuracy on the masked language modeling task are also provided. This model was developed with the support of Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN).

Read more

$-/run

7.8K

Huggingface

deberta-v2-large-japanese

deberta-v2-large-japanese

Model Card for Japanese DeBERTa V2 large Model description This is a Japanese DeBERTa V2 large model pre-trained on Japanese Wikipedia, the Japanese portion of CC-100, and the Japanese portion of OSCAR. How to use You can use this model for masked language modeling as follows: You can also fine-tune this model on downstream tasks. Tokenization The input text should be segmented into words by Juman++ in advance. Juman++ 2.0.0-rc3 was used for pre-training. Each word is tokenized into subwords by sentencepiece. Training data We used the following corpora for pre-training: Japanese Wikipedia (as of 20221020, 3.2GB, 27M sentences, 1.3M documents) Japanese portion of CC-100 (85GB, 619M sentences, 66M documents) Japanese portion of OSCAR (54GB, 326M sentences, 25M documents) Note that we filtered out documents annotated with "header", "footer", or "noisy" tags in OSCAR. Also note that Japanese Wikipedia was duplicated 10 times to make the total size of the corpus comparable to that of CC-100 and OSCAR. As a result, the total size of the training data is 171GB. Training procedure We first segmented texts in the corpora into words using Juman++. Then, we built a sentencepiece model with 32000 tokens including words (JumanDIC) and subwords induced by the unigram language model of sentencepiece. We tokenized the segmented corpora into subwords using the sentencepiece model and trained the Japanese DeBERTa model using transformers library. The training took 36 days using 8 NVIDIA A100-SXM4-40GB GPUs. The following hyperparameters were used during pre-training: learning_rate: 1e-4 per_device_train_batch_size: 18 distributed_type: multi-GPU num_devices: 8 gradient_accumulation_steps: 16 total_train_batch_size: 2,304 max_seq_length: 512 optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06 lr_scheduler_type: linear schedule with warmup training_steps: 300,000 warmup_steps: 10,000 The accuracy of the trained model on the masked language modeling task was 0.799. The evaluation set consists of 5,000 randomly sampled documents from each of the training corpora. Fine-tuning on NLU tasks We fine-tuned the following models and evaluated them on the dev set of JGLUE. We tuned learning rate and training epochs for each model and task following the JGLUE paper. *The scores of LUKE are from the official repository. Acknowledgments This work was supported by Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures ( JHPCN) through General Collaboration Project no. jh221004, "Developing a Platform for Constructing and Sharing of Large-Scale Japanese Language Models". For training models, we used the mdx: a platform for the data-driven future.

Read more

$-/run

3.7K

Huggingface

roberta-large-japanese-char-wwm

roberta-large-japanese-char-wwm

ku-nlp/roberta-large-japanese-char-wwm Model description This is a Japanese RoBERTa large model pre-trained on Japanese Wikipedia and the Japanese portion of CC-100. This model is trained with character-level tokenization and whole word masking. How to use You can use this model for masked language modeling as follows: You can fine-tune this model on downstream tasks. Tokenization There is no need to tokenize texts in advance, and you can give raw texts to the tokenizer. The texts are tokenized into character-level tokens by sentencepiece. Vocabulary The vocabulary consists of 18,377 tokens including all characters that appear in the training corpus. Training procedure This model was trained on Japanese Wikipedia (as of 20220220) and the Japanese portion of CC-100. It took a month using 8-16 NVIDIA A100 GPUs. The following hyperparameters were used during pre-training: learning_rate: 5e-5 per_device_train_batch_size: 38 distributed_type: multi-GPU num_devices: 16 gradient_accumulation_steps: 8 total_train_batch_size: 4864 max_seq_length: 512 optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-06 lr_scheduler_type: linear schedule with warmup training_steps: 500000 warmup_steps: 10000 Acknowledgments This work was supported by Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN) through General Collaboration Project no. jh221004, "Developing a Platform for Constructing and Sharing of Large-Scale Japanese Language Models". For training models, we used the mdx: a platform for the data-driven future.

Read more

$-/run

192

Huggingface

Similar creators