Monsoon-nlp

Rank:

Average Model Cost: $0.0000

Number of Runs: 3,342

Models by this creator

hindi-bert

hindi-bert

monsoon-nlp

Platform did not provide a description for this model.

Read more

$-/run

2.4K

Huggingface

es-seq2seq-gender-encoder

es-seq2seq-gender-encoder

This is a seq2seq model (encoder half) to "flip" gender in Spanish sentences. The model can augment your existing Spanish data, or generate counterfactuals to test a model's decisions (would changing the gender of the subject or speaker change output?). Intended Examples: People's names are unchanged in this version, but you can use packages such as https://pypi.org/project/gender-guesser/ https://colab.research.google.com/drive/1Ta_YkXx93FyxqEu_zJ-W23PjPumMNHe5 I originally developed a gender flip Python script with BETO, the Spanish-language BERT from Universidad de Chile, and spaCy to parse dependencies in sentences. More about this project: https://medium.com/ai-in-plain-english/gender-bias-in-spanish-bert-1f4d76780617 The seq2seq model is trained on gender-flipped text from that script run on the muchocine dataset, and the first 6,853 lines from the OSCAR corpus (Spanish ded-duped). The encoder and decoder started with weights and vocabulary from BETO (uncased). This model is useful to generate male and female text samples, but falls short of capturing gender diversity in the world and in the Spanish language. Some communities prefer the plural -@s to represent -os and -as, or -e and -es for gender-neutral or mixed-gender plural, or use fewer gendered professional nouns (la juez and not jueza). This is not yet embraced by the Royal Spanish Academy and is not represented in the corpora and tokenizers used to build this project. This seq2seq project and script could, in the future, help generate more text samples and prepare NLP models to understand us all better.

Read more

$-/run

28

Huggingface

tamillion

tamillion

This is the second version of a Tamil language model trained with Google Research's ELECTRA. Tokenization and pre-training CoLab: https://colab.research.google.com/drive/1Pwia5HJIb6Ad4Hvbx5f-IjND-vCaJzSE?usp=sharing V1: small model with GPU; 190,000 steps; V2 (current): base model with TPU and larger corpus; 224,000 steps Sudalai Rajkumar's Tamil-NLP page contains classification and regression tasks: https://www.kaggle.com/sudalairajkumar/tamil-nlp Notebook: https://colab.research.google.com/drive/1_rW9HZb6G87-5DraxHvhPOzGmSMUc67_?usp=sharin The model outperformed mBERT on news classification: (Random: 16.7%, mBERT: 53.0%, TaMillion: 75.1%) The model slightly outperformed mBERT on movie reviews: (RMSE - mBERT: 0.657, TaMillion: 0.626) Equivalent accuracy on the Tirukkural topic task. I didn't find a Tamil-language question answering dataset, but this model could be finetuned to train a QA model. See Hindi and Bengali examples here: https://colab.research.google.com/drive/1i6fidh2tItf_-IDkljMuaIGmEU6HT2Ar Trained on IndicCorp Tamil (11GB) https://indicnlp.ai4bharat.org/corpora/ and 1 October 2020 dump of https://ta.wikipedia.org (482MB) Included as vocab.txt in the upload

Read more

$-/run

28

Huggingface

Similar creators