Beomi

Rank:

Average Model Cost: $0.0000

Number of Runs: 80,716

Models by this creator

korean-hatespeech-multilabel

korean-hatespeech-multilabel

beomi

Platform did not provide a description for this model.

Read more

$-/run

22.5K

Huggingface

KcELECTRA-base

KcELECTRA-base

KcELECTRA-base is a Korean language model that has been pre-trained on user-generated, noisy text data from comments and replies on Naver News. It has been trained using the ELECTRA model architecture and tokenizer. The model has shown improved performance compared to previous models like KcBERT, with approximately 1% improvement on various downstream tasks. It can be easily used with the Transformers library from Huggingface. The pre-training data consists of over 180 million sentences collected from January 2019 to March 2021. The data has been pre-processed by including Korean, English, special characters, and emojis, reducing duplicates and slang words, and removing short texts and inappropriate content. The tokenizer used is the BertWordPieceTokenizer with a vocabulary size of 30,000. The model has been trained for approximately 10 days on a TPU v3-8 and has reached a loss convergence.

Read more

$-/run

10.4K

Huggingface

KoAlpaca-Polyglot-5.8B

KoAlpaca-Polyglot-5.8B

KoAlpaca-Polyglot-5.8B is a text generation model that is part of the KoAlpaca-Polyglot series. This specific model has been trained with 5.8 billion parameters. It is designed to generate coherent and contextually accurate text based on the given input. The model is capable of understanding and generating text in multiple languages. It can be used for a variety of natural language processing tasks such as text completion, summarization, translation, and conversation generation.

Read more

$-/run

7.4K

Huggingface

KcELECTRA-base-v2022

KcELECTRA-base-v2022

KcELECTRA-base-v2022 is a Korean language model that has been trained on a dataset of comments and replies from Naver News, using a tokenizer and ELECTRA model. It is designed to be applied to user-generated noisy text data, such as social media comments, which often contain slang, new words, and typos. The model shows improved performance compared to previous models, with a performance increase of around 1% in most downstream tasks. The model can be easily integrated into projects using the Transformers library. Pretraining and finetuning code, as well as data preprocessing details, are provided in the associated GitHub repositories. The model has been trained with a vocab size of 30,000 and achieves incremental performance improvements as the training steps increase. Potential use cases for the model include sentiment analysis, text classification, and other natural language processing tasks in the Korean language.

Read more

$-/run

5.4K

Huggingface

Similar creators