Indolem

Rank:

Average Model Cost: $0.0000

Number of Runs: 30,177

Models by this creator

indobert-base-uncased

indobert-base-uncased

indolem

IndoBERT is an Indonesian version of the BERT (Bidirectional Encoder Representations from Transformers) model. It has been trained on over 220 million words from various sources such as the Indonesian Wikipedia, news articles, and an Indonesian Web Corpus. The model has been trained for 2.4 million steps and has achieved a perplexity of 3.97 on the development set, which is similar to the English BERT-base. IndoBERT has been used to examine IndoLEM, an Indonesian benchmark consisting of seven tasks related to morpho-syntax, semantics, and discourse. The paper describing IndoBERT and IndoLEM has been published at COLING 2020. To use IndoBERT, you can load the model and tokenizer using the transformers library. If you use IndoBERT in your work, please cite the relevant paper.

Read more

$-/run

20.7K

Huggingface

indobertweet-base-uncased

indobertweet-base-uncased

IndoBERTweet is a large-scale pretrained language model specifically designed for Indonesian Twitter. It is trained by extending a monolingually trained Indonesian BERT model with domain-specific vocabulary. The model uses a dataset of Indonesian tweets collected over a one-year period, focusing on topics such as economy, health, education, and government. The model has been evaluated on seven Indonesian Twitter datasets and has shown promising results. The model requires preprocessing steps such as lower-casing words, converting user mentions and URLs, and translating emoticons.

Read more

$-/run

9.5K

Huggingface

Similar creators