Izumi-lab
Rank:Average Model Cost: $0.0000
Number of Runs: 5,766
Models by this creator
bert-small-japanese
$-/run
3.4K
Huggingface
electra-base-japanese-discriminator
electra-base-japanese-discriminator
ELECTRA base Japanese discriminator This is a ELECTRA model pretrained on texts in the Japanese language. The codes for the pretraining are available at retarfi/language-pretraining. Model architecture The model architecture is the same as ELECTRA base in the original ELECTRA paper; 12 layers, 768 dimensions of hidden states, and 12 attention heads. Training Data The models are trained on the Japanese version of Wikipedia. The training corpus is generated from the Japanese version of Wikipedia, using Wikipedia dump file as of June 1, 2021. The corpus file is 2.9GB, consisting of approximately 20M sentences. Tokenization The texts are first tokenized by MeCab with IPA dictionary and then split into subwords by the WordPiece algorithm. The vocabulary size is 32768. Training The models are trained with the same configuration as ELECTRA base in the original ELECTRA paper; 512 tokens per instance, 256 instances per batch, and 766k training steps. The size of the generator is 1/3 of the size of the discriminator. Citation Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0. Acknowledgments This work was supported by JSPS KAKENHI Grant Number JP21K12010.
$-/run
1.4K
Huggingface
electra-small-japanese-fin-discriminator
electra-small-japanese-fin-discriminator
ELECTRA small Japanese finance discriminator This is a ELECTRA model pretrained on texts in the Japanese language. The codes for the pretraining are available at retarfi/language-pretraining. Model architecture The model architecture is the same as ELECTRA small in the original ELECTRA implementation; 12 layers, 256 dimensions of hidden states, and 4 attention heads. Training Data The models are trained on the Japanese version of Wikipedia. The training corpus is generated from the Japanese version of Wikipedia, using Wikipedia dump file as of June 1, 2021. The Wikipedia corpus file is 2.9GB, consisting of approximately 20M sentences. The financial corpus consists of 2 corpora: Summaries of financial results from October 9, 2012, to December 31, 2020 Securities reports from February 8, 2018, to December 31, 2020 The financial corpus file is 5.2GB, consisting of approximately 27M sentences. Tokenization The texts are first tokenized by MeCab with IPA dictionary and then split into subwords by the WordPiece algorithm. The vocabulary size is 32768. Training The models are trained with the same configuration as ELECTRA small in the original ELECTRA paper except size; 128 tokens per instance, 128 instances per batch, and 1M training steps. The size of the generator is the same of the discriminator. Citation Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0. Acknowledgments This work was supported by JSPS KAKENHI Grant Number JP21K12010.
$-/run
579
Huggingface
bert-base-japanese-fin-additional
bert-base-japanese-fin-additional
Additional pretrained BERT base Japanese finance This is a BERT model pretrained on texts in the Japanese language. The codes for the pretraining are available at retarfi/language-pretraining. Model architecture The model architecture is the same as BERT small in the original BERT paper; 12 layers, 768 dimensions of hidden states, and 12 attention heads. Training Data The models are additionally trained on financial corpus from Tohoku University's BERT base Japanese model (cl-tohoku/bert-base-japanese). The financial corpus consists of 2 corpora: Summaries of financial results from October 9, 2012, to December 31, 2020 Securities reports from February 8, 2018, to December 31, 2020 The financial corpus file consists of approximately 27M sentences. Tokenization You can use tokenizer Tohoku University's BERT base Japanese model (cl-tohoku/bert-base-japanese). You can use the tokenizer: Training The models are trained with the same configuration as BERT base in the original BERT paper; 512 tokens per instance, 256 instances per batch, and 1M training steps. Citation Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0. Acknowledgments This work was supported by JSPS KAKENHI Grant Number JP21K12010 and JST-Mirai Program Grant Number JPMJMI20B1.
$-/run
170
Huggingface
bert-small-japanese-fin
bert-small-japanese-fin
BERT small Japanese finance This is a BERT model pretrained on texts in the Japanese language. The codes for the pretraining are available at retarfi/language-pretraining. Model architecture The model architecture is the same as BERT small in the original ELECTRA paper; 12 layers, 256 dimensions of hidden states, and 4 attention heads. Training Data The models are trained on Wikipedia corpus and financial corpus. The Wikipedia corpus is generated from the Japanese Wikipedia dump file as of June 1, 2021. The corpus file is 2.9GB, consisting of approximately 20M sentences. The financial corpus consists of 2 corpora: Summaries of financial results from October 9, 2012, to December 31, 2020 Securities reports from February 8, 2018, to December 31, 2020 The financial corpus file is 5.2GB, consisting of approximately 27M sentences. Tokenization The texts are first tokenized by MeCab with IPA dictionary and then split into subwords by the WordPiece algorithm. The vocabulary size is 32768. Training The models are trained with the same configuration as BERT small in the original ELECTRA paper; 128 tokens per instance, 128 instances per batch, and 1.45M training steps. Citation Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0. Acknowledgments This work was supported by JSPS KAKENHI Grant Number JP21K12010.
$-/run
79
Huggingface
electra-base-japanese-generator
electra-base-japanese-generator
ELECTRA base Japanese generator This is a ELECTRA model pretrained on texts in the Japanese language. The codes for the pretraining are available at retarfi/language-pretraining. Model architecture The model architecture is the same as ELECTRA base in the original ELECTRA implementation; 12 layers, 256 dimensions of hidden states, and 4 attention heads. Training Data The models are trained on the Japanese version of Wikipedia. The training corpus is generated from the Japanese version of Wikipedia, using Wikipedia dump file as of June 1, 2021. The corpus file is 2.9GB, consisting of approximately 20M sentences. Tokenization The texts are first tokenized by MeCab with IPA dictionary and then split into subwords by the WordPiece algorithm. The vocabulary size is 32768. Training The models are trained with the same configuration as ELECTRA base in the original ELECTRA paper except size; 512 tokens per instance, 256 instances per batch, and 766k training steps. The size of the generator is 1/3 of the size of the discriminator. Citation Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0. Acknowledgments This work was supported by JSPS KAKENHI Grant Number JP21K12010.
$-/run
34
Huggingface
electra-small-japanese-discriminator
electra-small-japanese-discriminator
Platform did not provide a description for this model.
$-/run
28
Huggingface
electra-small-japanese-generator
electra-small-japanese-generator
Platform did not provide a description for this model.
$-/run
24
Huggingface
electra-small-paper-japanese-generator
electra-small-paper-japanese-generator
ELECTRA small Japanese generator This is a ELECTRA model pretrained on texts in the Japanese language. The codes for the pretraining are available at retarfi/language-pretraining. Model architecture The model architecture is the same as ELECTRA small in the original ELECTRA paper; 12 layers, 64 dimensions of hidden states, and 1 attention heads. Training Data The models are trained on the Japanese version of Wikipedia. The training corpus is generated from the Japanese version of Wikipedia, using Wikipedia dump file as of June 1, 2021. The corpus file is 2.9GB, consisting of approximately 20M sentences. Tokenization The texts are first tokenized by MeCab with IPA dictionary and then split into subwords by the WordPiece algorithm. The vocabulary size is 32768. Training The models are trained with the same configuration as ELECTRA small in the original ELECTRA paper; 128 tokens per instance, 128 instances per batch, and 1M training steps. The size of the generator is 1/4 of the size of the discriminator. Citation Licenses The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0. Acknowledgments This work was supported by JSPS KAKENHI Grant Number JP21K12010.
$-/run
22
Huggingface
electra-small-paper-japanese-fin-generator
electra-small-paper-japanese-fin-generator
Platform did not provide a description for this model.
$-/run
20
Huggingface