Intfloat

Rank:

Average Model Cost: $0.0000

Number of Runs: 1,045,343

Models by this creator

e5-small-v2

e5-small-v2

intfloat

The e5-small-v2 model is a text embedding model that uses contrastive pre-training to generate text embeddings. It has 12 layers and 384-dimensional embeddings. It is trained on the MS-MARCO passage ranking dataset. The model is limited to working with English texts and can only handle texts with a maximum of 512 tokens. It can be used with the sentence_transformers library for various natural language processing tasks.

Read more

$-/run

815.4K

Huggingface

e5-large-v2

e5-large-v2

The e5-large-v2 model is a text embedding model that is trained through weakly-supervised contrastive pre-training. It has 24 layers and the embedding size is 1024. The model can be used to encode queries and passages from the MS-MARCO passage ranking dataset. It is designed for English texts and can handle texts with a maximum of 512 tokens. The model's training details and benchmark evaluation can be found in the associated research paper.

Read more

$-/run

64.2K

Huggingface

e5-base-v2

e5-base-v2

The e5-base-v2 model is a text embedding model that is trained using weakly-supervised contrastive pre-training. It consists of 12 layers and has an embedding size of 768. The model can be used to encode queries and passages from datasets like MS-MARCO for tasks such as passage ranking. It is designed for English texts and has a limitation of truncating long texts to a maximum of 512 tokens. The model's training details and benchmark evaluation results can be found in the associated research paper.

Read more

$-/run

45.5K

Huggingface

e5-base

e5-base

The e5-base model is a feature extraction model that is designed to extract useful features from text. It can be used for a variety of natural language processing tasks such as text classification, named entity recognition, and sentiment analysis. The model is trained on a large amount of text data and can capture semantic information from text to create meaningful representations of input text. It is a versatile tool that can be used as a starting point for building more complex NLP models.

Read more

$-/run

30.9K

Huggingface

multilingual-e5-base

multilingual-e5-base

The Multilingual-E5-base model is a text embedding model that is trained using weakly-supervised contrastive pre-training. It has 12 layers and an embedding size of 768. This model is initialized from xlm-roberta-base and trained on a mixture of multilingual datasets. It supports 100 languages from xlm-roberta, but performance may degrade for low-resource languages. The model is trained in two stages: contrastive pre-training with weak supervision and supervised fine-tuning. It can be used to encode queries and passages from the MS-MARCO passage ranking dataset. However, long texts will be truncated to at most 512 tokens. The model has been evaluated on the BEIR and MTEB benchmark.

Read more

$-/run

20.4K

Huggingface

e5-large

e5-large

The E5-large model is a text embedding model that is trained using weakly-supervised contrastive pre-training. It consists of 24 layers and has an embedding size of 1024. It can be used to encode queries and passages from the MS-MARCO passage ranking dataset. The model has been evaluated on the BEIR and MTEB benchmarks, and its evaluation results can be reproduced using the unilm/e5 model. However, it only works for English texts and long texts will be truncated to a maximum of 512 tokens.

Read more

$-/run

16.3K

Huggingface

e5-large-unsupervised

e5-large-unsupervised

This model is similar to e5-large but without supervised fine-tuning. Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 This model has 24 layers and the embedding size is 1024. Below is an example to encode queries and passages from the MS-MARCO passage ranking dataset. Please refer to our paper at https://arxiv.org/pdf/2212.03533.pdf. Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. If you find our paper or models helpful, please consider cite as follows: This model only works for English texts. Long texts will be truncated to at most 512 tokens.

Read more

$-/run

1.7K

Huggingface

e5-base-unsupervised

e5-base-unsupervised

This model is similar to e5-base but without supervised fine-tuning. Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 This model has 12 layers and the embedding size is 768. Below is an example to encode queries and passages from the MS-MARCO passage ranking dataset. Please refer to our paper at https://arxiv.org/pdf/2212.03533.pdf. Check out unilm/e5 to reproduce evaluation results on the BEIR and MTEB benchmark. If you find our paper or models helpful, please consider cite as follows: This model only works for English texts. Long texts will be truncated to at most 512 tokens.

Read more

$-/run

1.3K

Huggingface

Similar creators