Unitary

Rank:

Average Model Cost: $0.0000

Number of Runs: 28,902

Models by this creator

toxic-bert

toxic-bert

unitary

No description available.

Read more

$-/run

20.9K

Huggingface

multilingual-toxic-xlm-roberta

multilingual-toxic-xlm-roberta

Description Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. Built by Laura Hanu at Unitary, where we are working to stop harmful content online by interpreting visual content in context. Dependencies: For inference: 🤗 Transformers ⚡ Pytorch lightning For training will also need: Kaggle API (to download data) *Score not directly comparable since it is obtained on the validation set provided and not on the test set. To update when the test labels are made available. It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use. Limitations and ethical considerations If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups. The intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker. Some useful resources about the risk of different biases in toxicity or hate speech detection are: The Risk of Racial Bias in Hate Speech Detection Automated Hate Speech Detection and the Problem of Offensive Language Racial Bias in Hate Speech and Abusive Language Detection Datasets Quick prediction The multilingual model has been trained on 7 different languages so it should only be tested on: english, french, spanish, italian, portuguese, turkish or russian. For more details check the Prediction section. Labels All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema: Very Toxic (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective) Toxic (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective) Hard to Say Not Toxic More information about the labelling schema can be found here. Toxic Comment Classification Challenge This challenge includes the following labels: toxic severe_toxic obscene threat insult identity_hate Jigsaw Unintended Bias in Toxicity Classification This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments. Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation. toxicity severe_toxicity obscene threat insult identity_attack sexual_explicit Identity labels used: male female homosexual_gay_or_lesbian christian jewish muslim black white psychiatric_or_mental_illness A complete list of all the identity labels available can be found here. Jigsaw Multilingual Toxic Comment Classification Since this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on: toxicity How to run First, install dependencies Prediction Trained models summary: For a quick prediction can run the example script on a comment directly or from a txt containing a list of comments. Checkpoints can be downloaded from the latest release or via the Pytorch hub API with the following names: toxic_bert unbiased_toxic_roberta multilingual_toxic_xlm_r Importing detoxify in python: Training If you do not already have a Kaggle account: you need to create one to be able to download the data go to My Account and click on Create New API Token - this will download a kaggle.json file make sure this file is located in ~/.kaggle Start Training Toxic Comment Classification Challenge Unintended Bias in Toxicicity Challenge Multilingual Toxic Comment Classification This is trained in 2 stages. First, train on all available data, and second, train only on the translated versions of the first challenge. The translated data can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set). Monitor progress with tensorboard Model Evaluation Toxic Comment Classification Challenge This challenge is evaluated on the mean AUC score of all the labels. Unintended Bias in Toxicicity Challenge This challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric here. Multilingual Toxic Comment Classification This challenge is evaluated on the AUC score of the main toxic label. Citation

Read more

$-/run

381

Huggingface

Similar creators