Cambridgeltl
Rank:Average Model Cost: $0.0000
Number of Runs: 1,471,221
Models by this creator
SapBERT-from-PubMedBERT-fulltext
SapBERT-from-PubMedBERT-fulltext
SapBERT is a language model trained on biomedical texts using the UMLS dataset. It is based on the BiomedNLP-PubMedBERT model and is specifically designed for processing biomedical entity names. The model takes a string of biomedical entity names as input and generates embeddings, with the output being the representation of the [CLS] token from the last layer. The model can be used for various tasks, such as feature extraction for biomedical text analysis.
$-/run
1.5M
Huggingface
sst_mobilebert-uncased
sst_mobilebert-uncased
This model provides a MobileBERT (Sun et al., 2020) fine-tuned on the SST data with three sentiments (0 -- negative, 1 -- neutral, and 2 -- positive). Example Usage Below, we provide illustrations on how to use this model to make sentiment predictions. Citation: If you find this model useful, please kindly cite our model as
$-/run
2.5K
Huggingface
SapBERT-UMLS-2020AB-all-lang-from-XLMR
SapBERT-UMLS-2020AB-all-lang-from-XLMR
language: multilingual tags: datasets: [news] A cross-lingual extension of SapBERT will appear in the main onference of ACL 2021! [news] SapBERT will appear in the conference proceedings of NAACL 2021! SapBERT (Liu et al. 2020) trained with UMLS 2020AB, using xlm-roberta-base as the base model. Please use [CLS] as the representation of the input. The following script converts a list of strings (entity names) into embeddings. For more details about training and eval, see SapBERT github repo.
$-/run
1.3K
Huggingface
SapBERT-from-PubMedBERT-fulltext-mean-token
SapBERT-from-PubMedBERT-fulltext-mean-token
language: en tags: datasets: SapBERT by Liu et al. (2020). Trained with UMLS 2020AA (English only), using microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext as the base model. Please use the mean-pooling of the output as the representation. The following script converts a list of strings (entity names) into embeddings. For more details about training and eval, see SapBERT github repo.
$-/run
1.3K
Huggingface
magic_mscoco
$-/run
456
Huggingface
tweet-roberta-base-embeddings-v1
tweet-roberta-base-embeddings-v1
Platform did not provide a description for this model.
$-/run
377
Huggingface
SapBERT-UMLS-2020AB-all-lang-from-XLMR-large
SapBERT-UMLS-2020AB-all-lang-from-XLMR-large
language: multilingual tags: datasets: [news] A cross-lingual extension of SapBERT will appear in the main onference of ACL 2021! [news] SapBERT will appear in the conference proceedings of NAACL 2021! SapBERT (Liu et al. 2021) trained with UMLS 2020AB, using xlm-roberta-large as the base model. Please use [CLS] as the representation of the input. The following script converts a list of strings (entity names) into embeddings. For more details about training and eval, see SapBERT github repo.
$-/run
323
Huggingface
BioRedditBERT-uncased
BioRedditBERT-uncased
BioRedditBERT is a BERT model initialised from BioBERT (BioBERT-Base v1.0 + PubMed 200K + PMC 270K) and further pre-trained on health-related Reddit posts. Please view our paper COMETA: A Corpus for Medical Entity Linking in the Social Media (EMNLP 2020) for more details. We crawled all threads from 68 health themed subreddits such as r/AskDocs, r/health and etc. starting from the beginning of 2015 to the end of 2018, obtaining a collection of more than 800K discussions. This collection was then pruned by removing deleted posts, comments from bots or moderators, and so on. In the end, we obtained the training corpus with ca. 300 million tokens and a vocabulary size of ca. 780,000 words. We use the same pre-training script in the original google-research/bert repo. The model is initialised with BioBERT-Base v1.0 + PubMed 200K + PMC 270K. We train with a batch size of 64, a max sequence length of 64, a learning rate of 2e-5 for 100k steps on two GeForce GTX 1080Ti (11 GB) GPUs. Other hyper-parameters are the same as default. To show the benefit from further pre-training on the social media domain, we demonstrate results on a medical entity linking dataset also in the social media: AskAPatient (Limsopatham and Collier 2016). We follow the same 10-fold cross-validation procedure for all models and report the average result without fine-tuning. [CLS] is used as representations for entity mentions (we also tried average of all tokens but found [CLS] generally performs better).
$-/run
320
Huggingface
simctg_wikitext103
simctg_wikitext103
This model provides a GPT-2 language model trained with SimCTG on the Wikitext-103 benchmark (Merity et al., 2016) based on our paper A Contrastive Framework for Neural Text Generation. We provide a detailed tutorial on how to apply SimCTG and Contrastive Search in our project repo. In the following, we illustrate a brief tutorial on how to use our approach to perform text generation. For more details of our work, please refer to our main project repo. If you find our paper and resources useful, please kindly leave a star and cite our paper. Thanks!
$-/run
51
Huggingface
mirror-roberta-base-sentence-drophead
mirror-roberta-base-sentence-drophead
language: en tags: An unsupervised sentence encoder proposed by Liu et al. (2021), using drophead instead of dropout as feature space augmentation. The model is trained with unlabelled raw sentences, using roberta-base as the base model. Please use [CLS] (before pooler) as the representation of the input. Note the model does not replicate the exact numbers in the paper since the reported numbers in the paper are average of three runs.
$-/run
47
Huggingface