Models
Search through the list of amazing models below!
wav2vec2-large-xlsr-53-english
wav2vec2-large-xlsr-53-english
The model wav2vec2-large-xlsr-53-english is an automatic speech recognition (ASR) model designed to convert spoken language into written text. It is trained using the wav2vec2 architecture and the Cross-lingual Speaker Representations (XLSR) method. The model is specifically trained for the English language and is capable of accurately transcribing speech for various applications such as transcription services, voice assistants, and voice command recognition systems.
$-/run
68.5M
Huggingface
bert-base-uncased
bert-base-uncased
BERT-base-uncased is a pretrained language model that has been trained on a large corpus of English text. It uses a masked language modeling objective to learn an inner representation of the English language. This pretrained model can be used for tasks such as sequence classification, token classification, and question answering. It has been trained on the BookCorpus dataset and English Wikipedia using a vocabulary size of 30,000. The model has been trained on 4 cloud TPUs with a batch size of 256 and uses the Adam optimizer. When fine-tuned on downstream tasks, BERT-base-uncased achieves good performance on tasks like sentiment analysis and text classification. However, it is important to note that the model may have biased predictions, and this bias can also affect all fine-tuned versions of the model.
$-/run
49.6M
Huggingface

blip
The Bootstrapping Language-Image Pre-training (BLIP) model is a technique that integrates text and image data to improve language understanding and generation tasks. It uses a pre-training stage where training data is combined from both images and text to learn joint representations. These joint representations are then used to fine-tune the model on specific downstream tasks, such as image captioning or text generation. BLIP has been shown to surpass state-of-the-art results on various tasks and can be applied to a wide range of applications that involve both text and image data.
$0.001/run
44.9M
Replicate

gfpgan
GFPGAN is a practical face restoration algorithm designed for improving the quality of old photos or AI-generated faces. It uses a generator network to enhance facial details by upscaling low-resolution images and refining facial features. The model's objective is to generate high-quality, realistic face images that restore and improve the appearance of the input images.
$0.003/run
44.1M
Replicate

clip-features
The clip-features model is a model that utilizes the clip-vit-large-patch14 architecture to extract features from text and images. It takes in an image and text as input and returns the corresponding CLIP features. These features can then be used for various tasks such as image classification, object detection, and image generation. The model is designed to provide a compact representation of the input that captures both visual and textual information, allowing for cross-modal understanding and analysis.
$0.001/run
41.3M
Replicate

controlnet-scribble
The controlnet-scribble model is a text-to-image model that can generate detailed images from scribbled drawings. It uses a neural network architecture called ControlGAN, which incorporates a control module to guide the generation process based on textual descriptions. This model can be useful for tasks such as image synthesis, where detailed images need to be generated based on simple input sketches or descriptions.
$0.044/run
29.5M
Replicate
xlm-roberta-large
xlm-roberta-large
XLM-RoBERTa is a multilingual version of the RoBERTa model, which is pre-trained on a large corpus of text from 100 languages. It is trained using a masked language modeling (MLM) objective, where 15% of the words in a sentence are randomly masked and the model has to predict the masked words. This allows the model to learn a bidirectional representation of the sentence. The model can be used to extract features for downstream tasks such as classification or question answering. It can also be fine-tuned for specific tasks.
$-/run
23.7M
Huggingface
MedNER-CR-JA
MedNER-CR-JA
MedNER-CR-JA is a model for named entity recognition (NER) of Japanese medical documents. It is designed to identify and classify specific entities such as diseases, symptoms, treatments, and anatomical terms in the text. The model takes in a Japanese medical document as input and outputs the recognized entity mentions along with their corresponding entity labels. It can be used by running the provided predict.py script with the necessary files in the same folder. The model has been evaluated in the NTCIR-16 Real-MedNLP Task and achieved competitive results.
$-/run
20.1M
Huggingface

codeformer
Codeformer is a robust face restoration algorithm that can restore old photos and generate AI-generated faces. It uses advanced AI techniques to analyze and process images, making them look as good as new. The model is designed to be highly effective in restoring and enhancing the quality of old and degraded photographs, as well as generating highly realistic AI-generated faces. This algorithm could be incredibly useful for tasks like photo restoration, face synthesis, and other image-to-image applications.
$0.005/run
18.0M
Replicate
gpt2
gpt2
GPT-2 is a transformers model pretrained on a large corpus of English data using a self-supervised learning approach. It was trained to predict the next word in a sentence. The model uses a mask mechanism to ensure predictions only rely on past tokens. GPT-2 can be used for text generation or fine-tuned for downstream tasks. The training data consists of unfiltered internet content and may introduce bias in its predictions. The model was trained on a dataset called WebText, which includes web pages from Reddit links. The texts are tokenized using a version of Byte Pair Encoding (BPE) with a vocabulary size of 50,257. The model achieves impressive results without fine-tuning. However, the training duration and exact details were not disclosed.
$-/run
17.8M
Huggingface
xlm-roberta-base
xlm-roberta-base
XLM-RoBERTa is a multilingual version of the RoBERTa model that is pre-trained on a large amount of CommonCrawl data containing 100 languages. It uses the Masked language modeling (MLM) objective to randomly mask words in a sentence and predict the masked words. The model learns a bidirectional representation of the sentence, which can be used for downstream tasks such as classification and token labeling. It is primarily intended to be fine-tuned on specific tasks and can be used with a pipeline for masked language modeling.
$-/run
16.3M
Huggingface