![Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.](https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face)

**The text embedding set trained by [**Jina AI**](https://jina.ai/).**

[](#quick-start)Quick Start
---------------------------

The easiest way to starting using `jina-embeddings-v2-base-en` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).

[](#intended-usage--model-info)Intended Usage & Model Info
----------------------------------------------------------

`jina-embeddings-v2-base-en` is an English, monolingual **embedding model** supporting **8192 sequence length**. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length. The backbone `jina-bert-v2-base-en` is pretrained on the C4 dataset. The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.

The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi. This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc.

With a standard size of 137 million parameters, the model enables fast inference while delivering better performance than our small model. It is recommended to use a single GPU for inference. Additionally, we provide the following embedding models:

*   [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
*   [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters **(you are here)**.
*   [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): Chinese-English Bilingual embeddings.
*   [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): German-English Bilingual embeddings.
*   [`jina-embeddings-v2-base-es`](https://huggingface.co/jinaai/jina-embeddings-v2-base-es): Spanish-English Bilingual embeddings.

[](#data--parameters)Data & Parameters
--------------------------------------

Jina Embeddings V2 [technical report](https://arxiv.org/abs/2310.19923)

[](#usage)Usage
---------------

**Please apply mean pooling when integrating the model.**

### [](#why-mean-pooling)Why mean pooling?

`mean poooling` takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an `encode` function to deal with this.

However, if you would like to do it without using the default `encode` function:

    import torch
    import torch.nn.functional as F
    from transformers import AutoTokenizer, AutoModel
    
    def mean_pooling(model_output, attention_mask):
        token_embeddings = model_output[0]
        input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    
    sentences = ['How is the weather today?', 'What is the current weather like today?']
    
    tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v2-small-en')
    model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-small-en', trust_remote_code=True)
    
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
    
    with torch.no_grad():
        model_output = model(**encoded_input)
    
    embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
    embeddings = F.normalize(embeddings, p=2, dim=1)

You can use Jina Embedding models directly from transformers package.

    !pip install transformers
    from transformers import AutoModel
    from numpy.linalg import norm
    
    cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
    model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True) # trust_remote_code is needed to use the encode method
    embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
    print(cos_sim(embeddings[0], embeddings[1]))
    

If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:

    embeddings = model.encode(
        ['Very long ... document'],
        max_length=2048
    )
    

Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):

    !pip install -U sentence-transformers
    from sentence_transformers import SentenceTransformer
    from sentence_transformers.util import cos_sim
    
    model = SentenceTransformer(
        "jinaai/jina-embeddings-v2-base-en", # switch to en/zh for English or Chinese
        trust_remote_code=True
    )
    
    # control your input sequence length up to 8192
    model.max_seq_length = 1024
    
    embeddings = model.encode([
        'How is the weather today?',
        'What is the current weather like today?'
    ])
    print(cos_sim(embeddings[0], embeddings[1]))
    

[](#alternatives-to-using-transformers-or-sentenctransformers-package)Alternatives to Using Transformers (or SentencTransformers) Package
-----------------------------------------------------------------------------------------------------------------------------------------

1.  _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
2.  _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).

[](#use-jina-embeddings-for-rag)Use Jina Embeddings for RAG
-----------------------------------------------------------

According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83),

> In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out.

![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png)

[](#plans)Plans
---------------

1.  Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
2.  Multimodal embedding models enable Multimodal RAG applications.
3.  High-performt rerankers.

[](#trouble-shooting)Trouble Shooting
-------------------------------------

**Loading of Model Code failed**

If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized. This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:

    Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
    

**User is not logged into Huggingface**

The model is only availabe under [gated access](https://huggingface.co/docs/hub/models-gated). This means you need to be logged into huggingface load load it. If you receive the following error, you need to provide an access token, either by using the huggingface-cli or providing the token via an environment variable as described above:

    OSError: jinaai/jina-embeddings-v2-base-en is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
    If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
    

[](#contact)Contact
-------------------

Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.

[](#citation)Citation
---------------------

If you find Jina Embeddings useful in your research, please cite the following paper:

    @misc{gnther2023jina,
          title={Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents}, 
          author={Michael Gnther and Jackmin Ong and Isabelle Mohr and Alaeddine Abdessalem and Tanguy Abel and Mohammad Kalim Akram and Susana Guzman and Georgios Mastrapas and Saba Sturua and Bo Wang and Maximilian Werk and Nan Wang and Han Xiao},
          year={2023},
          eprint={2310.19923},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }

## Model overview

The `jina-embeddings-v2-base-en` model is a text embedding model created by [Jina AI](https://jina.ai/). It is based on a BERT architecture called JinaBERT that supports longer sequence length up to 8192 tokens using the symmetric bidirectional variant of ALiBi. The model was further trained on over 400 million sentence pairs and hard negatives from various domains. This makes it useful for a range of use cases like long document retrieval, semantic textual similarity, text reranking, and more. Compared to the smaller `jina-embeddings-v2-small-en` model, this base version has 137 million parameters, allowing for fast inference while delivering better performance.

## Model inputs and outputs

### Inputs
- Text sequences up to 8192 tokens long

### Outputs
- 4096-dimensional text embeddings

## Capabilities

The `jina-embeddings-v2-base-en` model can generate high-quality embeddings for long text sequences, enabling applications like semantic search, text similarity, and document understanding. Its ability to handle 8192 token sequences makes it particularly useful for working with long-form content like research papers, legal contracts, or product descriptions.

## What can I use it for?

The embeddings produced by this model can be used in a variety of downstream natural language processing tasks. Some potential use cases include:

- [Long document retrieval](https://jina.ai/embeddings/): Finding relevant documents from a large corpus based on semantic similarity to a query.
- [Semantic textual similarity](https://jina.ai/embeddings/): Measuring the semantic similarity between text pairs, which can be useful for applications like plagiarism detection or textual entailment.
- [Text reranking](https://jina.ai/embeddings/): Reordering a list of documents or passages based on their relevance to a given query.
- [Recommendation systems](https://jina.ai/embeddings/): Suggesting relevant content to users based on the semantic similarity of items.
- [RAG and LLM-based generative search](https://jina.ai/embeddings/): Enabling more powerful and flexible search experiences powered by large language models.

## Things to try

One interesting aspect of the `jina-embeddings-v2-base-en` model is its ability to handle very long text sequences, up to 8192 tokens. This makes it well-suited for working with long-form content like research papers, legal contracts, or product descriptions. You could try using the model to perform semantic search or text similarity analysis on a corpus of long-form documents, and see how the performance compares to models with shorter sequence lengths.

Another interesting area to explore would be the model's use in recommendation systems or generative search applications. The high-quality embeddings produced by the model could be leveraged to suggest relevant content to users or to enable more flexible and powerful search experiences powered by large language models.

  
  

![Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.](https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face)

**The embedding set trained by [**Jina AI**](https://jina.ai/).**

**Jina CLIP: your CLIP model is also your text retriever!**

[](#intended-usage--model-info)Intended Usage & Model Info
----------------------------------------------------------

`jina-clip-v1` is a state-of-the-art English **multimodal (text-image) embedding model**.

Traditional text embedding models, such as [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en), excel in text-to-text retrieval but incapable of cross-modal tasks. Models like [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) effectively align image and text embeddings but are not optimized for text-to-text retrieval due to their training methodologies and context limitations.

`jina-clip-v1` bridges this gap by offering robust performance in both domains. Its text component matches the retrieval efficiency of `jina-embeddings-v2-base-en`, while its overall architecture sets a new benchmark for cross-modal retrieval. This dual capability makes it an excellent tool for multimodal retrieval-augmented generation (MuRAG) applications, enabling seamless text-to-text and text-to-image searches within a single model.

[](#data--parameters)Data & Parameters
--------------------------------------

[Check out our paper](https://arxiv.org/abs/2405.20204)

[](#usage)Usage
---------------

1.  The easiest way to starting using jina-clip-v1-en is to use Jina AI's [Embeddings API](https://jina.ai/embeddings/).
2.  Alternatively, you can use Jina CLIP directly via transformers package.

    !pip install transformers einops timm pillow
    from transformers import AutoModel
    
    # Initialize the model
    model = AutoModel.from_pretrained('jinaai/jina-clip-v1', trust_remote_code=True)
    
    # New meaningful sentences
    sentences = ['A blue cat', 'A red cat']
    
    # Public image URLs
    image_urls = [
        'https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg',
        'https://i.pinimg.com/736x/c9/f2/3e/c9f23e212529f13f19bad5602d84b78b.jpg'
    ]
    
    # Encode text and images
    text_embeddings = model.encode_text(sentences)
    image_embeddings = model.encode_image(image_urls)  # also accepts PIL.image, local filenames, dataURI
    
    # Compute similarities
    print(text_embeddings[0] @ text_embeddings[1].T) # text embedding similarity
    print(text_embeddings[0] @ image_embeddings[0].T) # text-image cross-modal similarity
    print(text_embeddings[0] @ image_embeddings[1].T) # text-image cross-modal similarity
    print(text_embeddings[1] @ image_embeddings[0].T) # text-image cross-modal similarity
    print(text_embeddings[1] @ image_embeddings[1].T)# text-image cross-modal similarity
    

3.  JavaScript developers can use Jina CLIP via the [Transformers.js](https://huggingface.co/docs/transformers.js) library. Note that to use this model, you need to install Transformers.js [v3](https://github.com/xenova/transformers.js/tree/v3) from source using `npm install xenova/transformers.js#v3`.

    import { AutoTokenizer, CLIPTextModelWithProjection, AutoProcessor, CLIPVisionModelWithProjection, RawImage, cos_sim } from '@xenova/transformers';
    
    // Load tokenizer and text model
    const tokenizer = await AutoTokenizer.from_pretrained('jinaai/jina-clip-v1');
    const text_model = await CLIPTextModelWithProjection.from_pretrained('jinaai/jina-clip-v1');
    
    // Load processor and vision model
    const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch32');
    const vision_model = await CLIPVisionModelWithProjection.from_pretrained('jinaai/jina-clip-v1');
    
    // Run tokenization
    const texts = ['A blue cat', 'A red cat'];
    const text_inputs = tokenizer(texts, { padding: true, truncation: true });
    
    // Compute text embeddings
    const { text_embeds } = await text_model(text_inputs);
    
    // Read images and run processor
    const urls = [
        'https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg',
        'https://i.pinimg.com/736x/c9/f2/3e/c9f23e212529f13f19bad5602d84b78b.jpg'
    ];
    const image = await Promise.all(urls.map(url => RawImage.read(url)));
    const image_inputs = await processor(image);
    
    // Compute vision embeddings
    const { image_embeds } = await vision_model(image_inputs);
    
    //  Compute similarities
    console.log(cos_sim(text_embeds[0].data, text_embeds[1].data)) // text embedding similarity
    console.log(cos_sim(text_embeds[0].data, image_embeds[0].data)) // text-image cross-modal similarity
    console.log(cos_sim(text_embeds[0].data, image_embeds[1].data)) // text-image cross-modal similarity
    console.log(cos_sim(text_embeds[1].data, image_embeds[0].data)) // text-image cross-modal similarity
    console.log(cos_sim(text_embeds[1].data, image_embeds[1].data)) // text-image cross-modal similarity
    

[](#performance)Performance
---------------------------

### [](#text-image-retrieval)Text-Image Retrieval

Name

Flickr Image Retr. R@1

Flickr Image Retr. R@5

Flickr Text Retr. R@1

Flickr Text Retr. R@5

ViT-B-32

0.597

0.8398

0.781

0.938

ViT-B-16

0.6216

0.8572

0.822

0.966

jina-clip

0.6748

0.8902

0.811

0.965

Name

MSCOCO Image Retr. R@1

MSCOCO Image Retr. R@5

MSCOCO Text Retr. R@1

MSCOCO Text Retr. R@5

ViT-B-32

0.342

0.6001

0.5234

0.7634

ViT-B-16

0.3309

0.5842

0.5242

0.767

jina-clip

0.4111

0.6644

0.5544

0.7904

### [](#text-text-retrieval)Text-Text Retrieval

Name

STS12

STS15

STS17

STS13

STS14

STS16

STS22

STSBenchmark

SummEval

jina-embeddings-v2

0.7427

0.8755

0.8888

0.833

0.7917

0.836

0.6346

0.8404

0.3056

jina-clip

0.7352

0.8746

0.8976

0.8323

0.7868

0.8377

0.6583

0.8493

0.3048

Name

ArguAna

FiQA2018

NFCorpus

Quora

SCIDOCS

SciFact

TRECCOVID

jina-embeddings-v2

0.4418

0.4158

0.3245

0.882

0.1986

0.6668

0.6591

jina-clip

0.4933

0.3827

0.3352

0.8789

0.2024

0.6734

0.7161

[](#contact)Contact
-------------------

Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.

[](#citation)Citation
---------------------

If you find `jina-clip-v1` useful in your research, please cite the following paper:

    @misc{2405.20204,
        Author = {Andreas Koukounas and Georgios Mastrapas and Michael Gnther and Bo Wang and Scott Martens and Isabelle Mohr and Saba Sturua and Mohammad Kalim Akram and Joan Fontanals Martnez and Saahil Ognawala and Susana Guzman and Maximilian Werk and Nan Wang and Han Xiao},
        Title = {Jina CLIP: Your CLIP Model Is Also Your Text Retriever},
        Year = {2024},
        Eprint = {arXiv:2405.20204},
    }
    

[](#faq)FAQ
-----------

### [](#i-encounter-this-problem-what-should-i-do)I encounter this problem, what should I do?

    ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers_modules.jinaai.jina-clip-implementation.7f069e2d54d609ef1ad2eb578c7bf07b5a51de41.configuration_clip.JinaCLIPConfig'> and you passed <class 'transformers_modules.jinaai.jina-clip-implementation.7f069e2d54d609ef1ad2eb578c7bf07b5a51de41.configuration_cli.JinaCLIPConfig'>. Fix one of those so they match!
    

There was a bug in Transformers library between 4.40.x to 4.41.1. You can update transformers to >4.41.2 or <=4.40.0

### [](#given-one-query-how-can-i-merge-its-text-text-and-text-image-cosine-similarity)Given one query, how can I merge its text-text and text-image cosine similarity?

Our emperical study shows that text-text cosine similarity is normally larger than text-image cosine similarity! If you want to merge two scores, we recommended 2 ways:

1.  weighted average of text-text sim and text-image sim:

    combined_scores = sim(text, text) + lambda * sim(text, image)  # optimal lambda depends on your dataset, but in general lambda=2 can be a good choice.
    

2.  apply z-score normalization before merging scores:

    # pseudo code
    query_document_mean = np.mean(cos_sim_text_texts)
    query_document_std = np.std(cos_sim_text_texts)
    text_image_mean = np.mean(cos_sim_text_images)
    text_image_std = np.std(cos_sim_text_images)
    
    query_document_sim_normalized = (cos_sim_query_documents - query_document_mean) / query_document_std
    text_image_sim_normalized = (cos_sim_text_images - text_image_mean) / text_image_std

## Model overview

`jina-clip-v1` is a state-of-the-art English multimodal (text-image) embedding model trained by [Jina AI](https://aimodels.fyi/creators/huggingFace/jinaai). It bridges the gap between traditional text embedding models, such as [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en), which excel in text-to-text retrieval but are incapable of cross-modal tasks, and models like [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) that effectively align image and text embeddings but are not optimized for text-to-text retrieval. `jina-clip-v1` offers robust performance in both domains, matching the retrieval efficiency of `jina-embeddings-v2-base-en` for text-to-text tasks while setting a new benchmark for cross-modal retrieval.

## Model inputs and outputs

### Inputs

- **Sentences**: The model can encode meaningful sentences in English.
- **Images**: The model can also encode images, either by providing the public image URLs or directly passing in the PIL.Image objects.

### Outputs

- **Text embeddings**: The model outputs dense vector representations for the input sentences.
- **Image embeddings**: The model outputs dense vector representations for the input images.
- **Similarity scores**: The model can compute the cosine similarity between text and image embeddings, enabling cross-modal retrieval.

## Capabilities

`jina-clip-v1` excels at both text-to-text and text-to-image retrieval tasks. Its dual capability makes it an excellent tool for multimodal retrieval-augmented generation (MuRAG) applications, allowing seamless text-to-text and text-to-image searches within a single model.

## What can I use it for?

`jina-clip-v1` can be used for a variety of multimodal applications, such as:

- **Image search**: Users can search for images by describing them in text.
- **Cross-modal retrieval**: The model can retrieve relevant text or images based on a query in the opposite modality.
- **Multimodal question answering**: The model can be used to answer questions that require understanding both text and images.
- **Multimodal content generation**: The model can be used to generate relevant text or images based on a prompt in the opposite modality.

[Jina AI](https://aimodels.fyi/creators/huggingFace/jinaai) has also provided the [Embeddings API](https://jina.ai/embeddings/) as an easy-to-use interface for working with `jina-clip-v1` and their other embedding models.

## Things to try

One key advantage of `jina-clip-v1` is its ability to handle longer sequences of text, up to 8,192 tokens, thanks to its use of the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409). This makes the model well-suited for tasks involving long-form content, such as document retrieval, long-form question answering, and summarization. Researchers and developers can explore how the model's performance scales with longer input sequences compared to traditional text embedding models.

  
  

![Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.](https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face)

**Trained by [**Jina AI**](https://jina.ai/).**

[](#jina-reranker-v2-base-multilingual)jina-reranker-v2-base-multilingual
=========================================================================

[](#intended-usage--model-info)Intended Usage & Model Info
----------------------------------------------------------

The **Jina Reranker v2** (`jina-reranker-v2-base-multilingual`) is a transformer-based model that has been fine-tuned for text reranking task, which is a crucial component in many information retrieval systems. It is a cross-encoder model that takes a query and a document pair as input and outputs a score indicating the relevance of the document to the query. The model is trained on a large dataset of query-document pairs and is capable of reranking documents in multiple languages with high accuracy.

Compared with the state-of-the-art reranker models, including the previous released `jina-reranker-v1-base-en`, the **Jina Reranker v2** model has demonstrated competitiveness across a series of benchmarks targeting for text retrieval, multilingual capability, function-calling-aware and text-to-SQL-aware reranking, and code retrieval tasks.

The `jina-reranker-v2-base-multilingual` model is capable of handling long texts with a context length of up to `1024` tokens, enabling the processing of extensive inputs. To enable the model to handle long texts that exceed 1024 tokens, the model uses a sliding window approach to chunk the input text into smaller pieces and rerank each chunk separately.

The model is also equipped with a flash attention mechanism, which significantly improves the model's performance.

[](#usage)Usage
===============

_This model repository is licenced for research and evaluation purposes under CC-BY-NC-4.0. For commercial usage, please refer to Jina AI's APIs, AWS Sagemaker or Azure Marketplace offerings. Please [contact us](https://jina.ai/contact-sales) for any further clarifications._

1.  The easiest way to use `jina-reranker-v2-base-multilingual` is to call Jina AI's [Reranker API](https://jina.ai/reranker/).

    curl https://api.jina.ai/v1/rerank \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -d '{
      "model": "jina-reranker-v2-base-multilingual",
      "query": "Organic skincare products for sensitive skin",
      "documents": [
        "Organic skincare for sensitive skin with aloe vera and chamomile.",
        "New makeup trends focus on bold colors and innovative techniques",
        "Bio-Hautpflege fr empfindliche Haut mit Aloe Vera und Kamille",
        "Neue Make-up-Trends setzen auf krftige Farben und innovative Techniken",
        "Cuidado de la piel orgnico para piel sensible con aloe vera y manzanilla",
        "Las nuevas tendencias de maquillaje se centran en colores vivos y tcnicas innovadoras",
        "",
        "",
        "",
        ""
      ],
      "top_n": 3
    }'
    

2.  You can also use the `transformers` library to interact with the model programmatically.

Before you start, install the `transformers` and `einops` libraries:

    pip install transformers einops
    

And then:

    from transformers import AutoModelForSequenceClassification
    
    model = AutoModelForSequenceClassification.from_pretrained(
        'jinaai/jina-reranker-v2-base-multilingual',
        torch_dtype="auto",
        trust_remote_code=True,
    )
    
    model.to('cuda') # or 'cpu' if no GPU is available
    model.eval()
    
    # Example query and documents
    query = "Organic skincare products for sensitive skin"
    documents = [
        "Organic skincare for sensitive skin with aloe vera and chamomile.",
        "New makeup trends focus on bold colors and innovative techniques",
        "Bio-Hautpflege fr empfindliche Haut mit Aloe Vera und Kamille",
        "Neue Make-up-Trends setzen auf krftige Farben und innovative Techniken",
        "Cuidado de la piel orgnico para piel sensible con aloe vera y manzanilla",
        "Las nuevas tendencias de maquillaje se centran en colores vivos y tcnicas innovadoras",
        "",
        "",
        "",
        "",
    ]
    
    # construct sentence pairs
    sentence_pairs = [[query, doc] for doc in documents]
    
    scores = model.compute_score(sentence_pairs, max_length=1024)
    

The scores will be a list of floats, where each float represents the relevance score of the corresponding document to the query. Higher scores indicate higher relevance. For instance the returning scores in this case will be:

    [0.8311430811882019, 0.09401018172502518,
     0.6334102749824524, 0.08269733935594559,
     0.7620701193809509, 0.09947021305561066,
     0.9263036847114563, 0.05834583938121796,
     0.8418256044387817, 0.11124119907617569]
    

The model gives high relevance scores to the documents that are most relevant to the query regardless of the language of the document.

Note that by default, the `jina-reranker-v2-base-multilingual` model uses [flash attention](https://github.com/Dao-AILab/flash-attention), which requires certain types of GPU hardware to run. If you encounter any issues, you can try call `AutoModelForSequenceClassification.from_pretrained()` with `use_flash_attn=False`. This will use the standard attention mechanism instead of flash attention.

If you want to use flash attention for fast inference, you need to install the following packages:

    pip install ninja # required for flash attention
    pip install flash-attn --no-build-isolation
    

Enjoy the 3x-6x speedup with flash attention! 

3.  You can also use the `transformers.js` library to run the model directly in JavaScript (in-browser, Node.js, Deno, etc.)!

If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library (v3) using:

    npm i xenova/transformers.js#v3
    

Then, you can use the following code to interact with the model:

    import { AutoTokenizer, XLMRobertaModel } from '@xenova/transformers';
    
    const model_id = 'jinaai/jina-reranker-v2-base-multilingual';
    const model = await XLMRobertaModel.from_pretrained(model_id, { dtype: 'fp32' });
    const tokenizer = await AutoTokenizer.from_pretrained(model_id);
    
    /**
     * Performs ranking with the CrossEncoder on the given query and documents. Returns a sorted list with the document indices and scores.
     * @param {string} query A single query
     * @param {string[]} documents A list of documents
     * @param {Object} options Options for ranking
     * @param {number} [options.top_k=undefined] Return the top-k documents. If undefined, all documents are returned.
     * @param {number} [options.return_documents=false] If true, also returns the documents. If false, only returns the indices and scores.
     */
    async function rank(query, documents, {
        top_k = undefined,
        return_documents = false,
    } = {}) {
        const inputs = tokenizer(
            new Array(documents.length).fill(query),
            { text_pair: documents, padding: true, truncation: true }
        )
        const { logits } = await model(inputs);
        return logits.sigmoid().tolist()
            .map(([score], i) => ({
                corpus_id: i,
                score,
                ...(return_documents ? { text: documents[i] } : {})
            })).sort((a, b) => b.score - a.score).slice(0, top_k);
    }
    
    // Example usage:
    const query = "Organic skincare products for sensitive skin"
    const documents = [
        "Organic skincare for sensitive skin with aloe vera and chamomile.",
        "New makeup trends focus on bold colors and innovative techniques",
        "Bio-Hautpflege fr empfindliche Haut mit Aloe Vera und Kamille",
        "Neue Make-up-Trends setzen auf krftige Farben und innovative Techniken",
        "Cuidado de la piel orgnico para piel sensible con aloe vera y manzanilla",
        "Las nuevas tendencias de maquillaje se centran en colores vivos y tcnicas innovadoras",
        "",
        "",
        "",
        "",
    ]
    
    const results = await rank(query, documents, { return_documents: true, top_k: 3 });
    console.log(results);
    

That's it! You can now use the `jina-reranker-v2-base-multilingual` model in your projects.

In addition to the `compute_score()` function, the `jina-reranker-v2-base-multilingual` model also provides a `model.rerank()` function that can be used to rerank documents based on a query. You can use it as follows:

    result = model.rerank(
        query,
        documents,
        max_query_length=512,
        max_length=1024,
        top_n=3
    )
    

Inside the `result` object, you will find the reranked documents along with their scores. You can use this information to further process the documents as needed.

The `rerank()` function will automatically chunk the input documents into smaller pieces if they exceed the model's maximum input length. This allows you to rerank long documents without running into memory issues. Specifically, the `rerank()` function will split the documents into chunks of size `max_length` and rerank each chunk separately. The scores from all the chunks are then combined to produce the final reranking results. You can control the query length and document length in each chunk by setting the `max_query_length` and `max_length` parameters. The `rerank()` function also supports the `overlap` parameter (default is `80`) which determines how much overlap there is between adjacent chunks. This can be useful when reranking long documents to ensure that the model has enough context to make accurate predictions.

3.  Alternatively, `jina-reranker-v2-base-multilingual` has been integrated with `CrossEncoder` from the `sentence-transformers` library.

Before you start, install the `sentence-transformers` libraries:

    pip install sentence-transformers
    

The [`CrossEncoder`](https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html) class supports a [`predict`](https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html#sentence_transformers.cross_encoder.CrossEncoder.predict) method to get query-document relevance scores, and a [`rank`](https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html#sentence_transformers.cross_encoder.CrossEncoder.rank) method to rank all documents given your query.

    from sentence_transformers import CrossEncoder
    
    model = CrossEncoder(
        "jinaai/jina-reranker-v2-base-multilingual",
        automodel_args={"torch_dtype": "auto"},
        trust_remote_code=True,
    )
    
    # Example query and documents
    query = "Organic skincare products for sensitive skin"
    documents = [
        "Organic skincare for sensitive skin with aloe vera and chamomile.",
        "New makeup trends focus on bold colors and innovative techniques",
        "Bio-Hautpflege fr empfindliche Haut mit Aloe Vera und Kamille",
        "Neue Make-up-Trends setzen auf krftige Farben und innovative Techniken",
        "Cuidado de la piel orgnico para piel sensible con aloe vera y manzanilla",
        "Las nuevas tendencias de maquillaje se centran en colores vivos y tcnicas innovadoras",
        "",
        "",
        "",
        "",
    ]
    
    # construct sentence pairs
    sentence_pairs = [[query, doc] for doc in documents]
    
    scores = model.predict(sentence_pairs, convert_to_tensor=True).tolist()
    """
    [0.828125, 0.0927734375, 0.6328125, 0.08251953125, 0.76171875, 0.099609375, 0.92578125, 0.058349609375, 0.84375, 0.111328125]
    """
    
    rankings = model.rank(query, documents, return_documents=True, convert_to_tensor=True)
    print(f"Query: {query}")
    for ranking in rankings:
        print(f"ID: {ranking['corpus_id']}, Score: {ranking['score']:.4f}, Text: {ranking['text']}")
    """
    Query: Organic skincare products for sensitive skin
    ID: 6, Score: 0.9258, Text: 
    ID: 8, Score: 0.8438, Text: 
    ID: 0, Score: 0.8281, Text: Organic skincare for sensitive skin with aloe vera and chamomile.
    ID: 4, Score: 0.7617, Text: Cuidado de la piel orgnico para piel sensible con aloe vera y manzanilla
    ID: 2, Score: 0.6328, Text: Bio-Hautpflege fr empfindliche Haut mit Aloe Vera und Kamille
    ID: 9, Score: 0.1113, Text: 
    ID: 5, Score: 0.0996, Text: Las nuevas tendencias de maquillaje se centran en colores vivos y tcnicas innovadoras
    ID: 1, Score: 0.0928, Text: New makeup trends focus on bold colors and innovative techniques
    ID: 3, Score: 0.0825, Text: Neue Make-up-Trends setzen auf krftige Farben und innovative Techniken
    ID: 7, Score: 0.0583, Text: 
    """
    

[](#evaluation)Evaluation
=========================

We evaluated Jina Reranker v2 on multiple benchmarks to ensure top-tier performance and search relevance.

Model Name

Model Size

MKQA(nDCG@10, 26 langs)

BEIR(nDCG@10, 17 datasets)

MLDR(recall@10, 13 langs)

CodeSearchNet (MRR@10, 3 tasks)

AirBench (nDCG@10, zh/en)

ToolBench (recall@3, 3 tasks)

TableSearch (recall@3)

jina-reranker-v2-multilingual

278M

54.83

53.17

68.95

71.36

61.33

77.75

93.31

bge-reranker-v2-m3

568M

54.17

53.65

59.73

62.86

61.28

78.46

74.86

mmarco-mMiniLMv2-L12-H384-v1

118M

53.37

45.40

28.91

51.78

56.46

58.39

53.60

jina-reranker-v1-base-en

137M

\-

52.45

\-

\-

\-

74.13

72.89

Note:

*   NDCG@10 and MRR@10 measure ranking quality, with higher scores indicating better search results
*   recall@3 measures the proportion of relevant documents retrieved, with higher scores indicating better search results

## Model overview

The `jina-reranker-v2-base-multilingual` model is a transformer-based text reranking model trained by [Jina AI](https://jina.ai/). It is a cross-encoder model that takes a query and a document pair as input and outputs a score indicating the relevance of the document to the query. The model is trained on a large dataset of query-document pairs and is capable of reranking documents in multiple languages with high accuracy. Compared to the previous `jina-reranker-v1-base-en` model, the Jina Reranker v2 has demonstrated competitiveness across a series of benchmarks targeting text retrieval, multilingual capability, function-calling-aware and text-to-SQL-aware reranking, and code retrieval tasks.

## Model inputs and outputs

### Inputs
- **Query**: The input query for which relevant documents need to be ranked
- **Documents**: A list of documents to be ranked by relevance to the input query

### Outputs
- **Relevance scores**: A list of scores indicating the relevance of each document to the input query

## Capabilities

The `jina-reranker-v2-base-multilingual` model is capable of handling long texts with a context length of up to 1024 tokens, enabling the processing of extensive inputs. It also utilizes a flash attention mechanism to improve the model's performance.

## What can I use it for?

You can use the `jina-reranker-v2-base-multilingual` model for a variety of text retrieval and ranking tasks, such as improving the search experience in your applications, enhancing the performance of your information retrieval systems, or integrating it into your AI-powered decision support systems. The model's multilingual capability makes it a suitable choice for global or diverse user bases.

## Things to try

To get started with the `jina-reranker-v2-base-multilingual` model, you can try using the Jina AI [Reranker API](https://jina.ai/reranker/). This provides a convenient way to leverage the model's capabilities without having to worry about the underlying implementation details. You can also explore integrating the model into your own applications or experimenting with fine-tuning the model on your specific data and use case.

  
  

![Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.](https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face)

**The text embedding set trained by [**Jina AI**](https://jina.ai/).**

[](#quick-start)Quick Start
---------------------------

The easiest way to starting using `jina-embeddings-v2-base-zh` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).

[](#intended-usage--model-info)Intended Usage & Model Info
----------------------------------------------------------

`jina-embeddings-v2-base-zh` is a Chinese/English bilingual text **embedding model** supporting **8192 sequence length**. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length. We have designed it for high performance in mono-lingual & cross-lingual applications and trained it specifically to support mixed Chinese-English input without bias. Additionally, we provide the following embedding models:

`jina-embeddings-v2-base-zh` ******8192** BERT(JinaBERT)JinaBERTBERT[ALiBi](https://arxiv.org/abs/2108.12409) / :

*   [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
*   [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
*   [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English Bilingual embeddings **(you are here)**.
*   [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English Bilingual embeddings.
*   `jina-embeddings-v2-base-es`: Spanish-English Bilingual embeddings (soon).
*   [`jina-embeddings-v2-base-code`](https://huggingface.co/jinaai/jina-embeddings-v2-base-code): 161 million parameters code embeddings.

[](#data--parameters)Data & Parameters
--------------------------------------

The data and training details are described in this [technical report](https://arxiv.org/abs/2402.17016).

[](#usage)Usage
---------------

**Please apply mean pooling when integrating the model.**

### [](#why-mean-pooling)Why mean pooling?

`mean poooling` takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an `encode` function to deal with this.

However, if you would like to do it without using the default `encode` function:

    import torch
    import torch.nn.functional as F
    from transformers import AutoTokenizer, AutoModel
    
    def mean_pooling(model_output, attention_mask):
        token_embeddings = model_output[0]
        input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    
    sentences = ['How is the weather today?', '?']
    
    tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v2-base-zh')
    model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-zh', trust_remote_code=True)
    
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
    
    with torch.no_grad():
        model_output = model(**encoded_input)
    
    embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
    embeddings = F.normalize(embeddings, p=2, dim=1)

You can use Jina Embedding models directly from transformers package.

    !pip install transformers
    from transformers import AutoModel
    from numpy.linalg import norm
    
    cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
    model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-zh', trust_remote_code=True) # trust_remote_code is needed to use the encode method
    embeddings = model.encode(['How is the weather today?', '?'])
    print(cos_sim(embeddings[0], embeddings[1]))
    

If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:

    embeddings = model.encode(
        ['Very long ... document'],
        max_length=2048
    )
    

If you want to use the model together with the [sentence-transformers package](https://github.com/UKPLab/sentence-transformers/), make sure that you have installed the latest release and set `trust_remote_code=True` as well:

    !pip install -U sentence-transformers
    from sentence_transformers import SentenceTransformer
    from numpy.linalg import norm
    
    cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
    model = SentenceTransformer('jinaai/jina-embeddings-v2-base-zh', trust_remote_code=True)
    embeddings = model.encode(['How is the weather today?', '?'])
    print(cos_sim(embeddings[0], embeddings[1]))
    

Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):

    !pip install -U sentence-transformers
    from sentence_transformers import SentenceTransformer
    from sentence_transformers.util import cos_sim
    
    model = SentenceTransformer(
        "jinaai/jina-embeddings-v2-base-zh", # switch to en/zh for English or Chinese
        trust_remote_code=True
    )
    
    # control your input sequence length up to 8192
    model.max_seq_length = 1024
    
    embeddings = model.encode([
        'How is the weather today?',
        '?'
    ])
    print(cos_sim(embeddings[0], embeddings[1]))
    

[](#alternatives-to-using-transformers-package)Alternatives to Using Transformers Package
-----------------------------------------------------------------------------------------

1.  _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
2.  _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).

[](#use-jina-embeddings-for-rag)Use Jina Embeddings for RAG
-----------------------------------------------------------

According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83),

> In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out.

![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png)

[](#trouble-shooting)Trouble Shooting
-------------------------------------

**Loading of Model Code failed**

If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized. This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:

    Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-zh were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
    

**User is not logged into Huggingface**

The model is only availabe under [gated access](https://huggingface.co/docs/hub/models-gated). This means you need to be logged into huggingface load load it. If you receive the following error, you need to provide an access token, either by using the huggingface-cli or providing the token via an environment variable as described above:

    OSError: jinaai/jina-embeddings-v2-base-zh is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
    If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
    

[](#contact)Contact
-------------------

Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.

[](#citation)Citation
---------------------

If you find Jina Embeddings useful in your research, please cite the following paper:

    @article{mohr2024multi,
      title={Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings},
      author={Mohr, Isabelle and Krimmel, Markus and Sturua, Saba and Akram, Mohammad Kalim and Koukounas, Andreas and G{\"u}nther, Michael and Mastrapas, Georgios and Ravishankar, Vinit and Mart{\'\i}nez, Joan Fontanals and Wang, Feng and others},
      journal={arXiv preprint arXiv:2402.17016},
      year={2024}
    }

## Model Overview

The `jina-embeddings-v2-base-zh` model is a Chinese/English bilingual text embedding model developed by [Jina AI](https://aimodels.fyi/creators/huggingFace/jinaai). It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence lengths of up to 8192 tokens. Compared to other Jina embedding models, `jina-embeddings-v2-base-zh` is a 161 million parameter model trained specifically on mixed Chinese-English input to provide high performance in both mono-lingual and cross-lingual applications.

Similar Jina AI embedding models include [`jina-embeddings-v2-small-en`](https://aimodels.fyi/models/huggingFace/jina-embeddings-v2-small-en-jinaai), [`jina-embeddings-v2-base-en`](https://aimodels.fyi/models/huggingFace/jina-embeddings-v2-base-en-jinaai), [`jina-embeddings-v2-base-de`](https://aimodels.fyi/models/huggingFace/jina-embeddings-v2-base-de-jinaai), and an upcoming `jina-embeddings-v2-base-es` model for Spanish-English bilingual embeddings.

## Model Inputs and Outputs

### Inputs
- **Text sequence**: The model takes in text sequences of up to 8192 tokens, supporting both Chinese and English, as well as a mix of the two.

### Outputs
- **Text embeddings**: The model outputs 768-dimensional embedding vectors that capture the semantic meaning of the input text. These can be used for a variety of downstream tasks like information retrieval, text similarity, and multilingual applications.

## Capabilities

The `jina-embeddings-v2-base-zh` model has been designed to excel at both mono-lingual and cross-lingual tasks involving Chinese and English text. Its long sequence length support of up to 8192 tokens makes it useful for applications that need to process long-form content, such as document retrieval, semantic textual similarity, and text reranking.

## What Can I Use It For?

The `jina-embeddings-v2-base-zh` model can be used for a wide range of natural language processing tasks that require high-quality text embeddings, especially those involving a mix of Chinese and English text. Some potential use cases include:

- **Information Retrieval**: Use the embeddings for semantic search and retrieval of Chinese or English documents, or documents containing a mix of both languages.
- **Text Similarity**: Compute the similarity between Chinese, English, or bilingual text passages to detect paraphrases, identify related content, or perform clustering.
- **Multilingual Applications**: Leverage the model's cross-lingual capabilities to build applications that seamlessly handle Chinese and English input, such as chatbots or question-answering systems.

## Things to Try

An interesting aspect of the `jina-embeddings-v2-base-zh` model is its ability to handle long input sequences of up to 8192 tokens. This makes it well-suited for tasks involving lengthy documents or multi-paragraph inputs. You could experiment with using the model for tasks like:

- Long-form text summarization, where the model's ability to capture semantic meaning in long passages could improve the quality of generated summaries.
- Cross-lingual document retrieval, where the model's bilingual capabilities and long sequence support could help surface relevant content even when the query and target documents are in different languages.
- Multilingual dialog systems, where the model's embeddings could be used to maintain context and coherence across language switches within a conversation.

By exploring the model's unique features, you can uncover novel applications that leverage its strengths in handling long, multilingual text inputs.

  
  

![Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.](https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face)

**The text embedding set trained by [**Jina AI**](https://jina.ai/).**

[](#quick-start)Quick Start
---------------------------

The easiest way to starting using `jina-embeddings-v2-small-en` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).

[](#intended-usage--model-info)Intended Usage & Model Info
----------------------------------------------------------

`jina-embeddings-v2-small-en` is an English, monolingual **embedding model** supporting **8192 sequence length**. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length. The backbone `jina-bert-v2-small-en` is pretrained on the C4 dataset. The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.

The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi. This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc.

This model has 33 million parameters, which enables lightning-fast and memory efficient inference, while still delivering impressive performance. Additionally, we provide the following embedding models:

*   [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters **(you are here)**.
*   [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
*   [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English Bilingual embeddings.
*   [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English Bilingual embeddings.
*   `jina-embeddings-v2-base-es`: Spanish-English Bilingual embeddings (soon).

[](#data--parameters)Data & Parameters
--------------------------------------

Jina Embeddings V2 [technical report](https://arxiv.org/abs/2310.19923)

[](#usage)Usage
---------------

**Please apply mean pooling when integrating the model.**

### [](#why-mean-pooling)Why mean pooling?

`mean poooling` takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an `encode` function to deal with this.

However, if you would like to do it without using the default `encode` function:

    import torch
    import torch.nn.functional as F
    from transformers import AutoTokenizer, AutoModel
    
    def mean_pooling(model_output, attention_mask):
        token_embeddings = model_output[0]
        input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    
    sentences = ['How is the weather today?', 'What is the current weather like today?']
    
    tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v2-small-en')
    model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-small-en', trust_remote_code=True)
    
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
    
    with torch.no_grad():
        model_output = model(**encoded_input)
    
    embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
    embeddings = F.normalize(embeddings, p=2, dim=1)

You can use Jina Embedding models directly from transformers package.

    !pip install transformers
    from transformers import AutoModel
    from numpy.linalg import norm
    
    cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
    model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-small-en', trust_remote_code=True) # trust_remote_code is needed to use the encode method
    embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
    print(cos_sim(embeddings[0], embeddings[1]))
    

If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:

    embeddings = model.encode(
        ['Very long ... document'],
        max_length=2048
    )
    

Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):

    !pip install -U sentence-transformers
    from sentence_transformers import SentenceTransformer
    from sentence_transformers.util import cos_sim
    
    model = SentenceTransformer(
        "jinaai/jina-embeddings-v2-small-en", # switch to en/zh for English or Chinese
        trust_remote_code=True
    )
    
    # control your input sequence length up to 8192
    model.max_seq_length = 1024
    
    embeddings = model.encode([
        'How is the weather today?',
        'What is the current weather like today?'
    ])
    print(cos_sim(embeddings[0], embeddings[1]))
    

[](#alternatives-to-using-transformers-package)Alternatives to Using Transformers Package
-----------------------------------------------------------------------------------------

1.  _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
2.  _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).

[](#rag-performance)RAG Performance
-----------------------------------

According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83),

> In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out.

![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png)

[](#plans)Plans
---------------

1.  Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
2.  Multimodal embedding models enable Multimodal RAG applications.
3.  High-performt rerankers.

[](#trouble-shooting)Trouble Shooting
-------------------------------------

**Loading of Model Code failed**

If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized. This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:

    Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
    

**User is not logged into Huggingface**

The model is only availabe under [gated access](https://huggingface.co/docs/hub/models-gated). This means you need to be logged into huggingface load load it. If you receive the following error, you need to provide an access token, either by using the huggingface-cli or providing the token via an environment variable as described above:

    OSError: jinaai/jina-embeddings-v2-base-en is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
    If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
    

[](#contact)Contact
-------------------

Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.

[](#citation)Citation
---------------------

If you find Jina Embeddings useful in your research, please cite the following paper:

    @misc{gnther2023jina,
          title={Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents}, 
          author={Michael Gnther and Jackmin Ong and Isabelle Mohr and Alaeddine Abdessalem and Tanguy Abel and Mohammad Kalim Akram and Susana Guzman and Georgios Mastrapas and Saba Sturua and Bo Wang and Maximilian Werk and Nan Wang and Han Xiao},
          year={2023},
          eprint={2310.19923},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }

## Model overview

`jina-embeddings-v2-small-en` is an English text embedding model trained by [Jina AI](https://aimodels.fyi/creators/huggingFace/jinaai). It is based on a BERT architecture called JinaBERT that supports longer sequence lengths of up to 8192 tokens using the ALiBi technique. The model was further trained on over 400 million sentence pairs and hard negatives from various domains. Compared to the larger `jina-embeddings-v2-base-en` model, this smaller 33 million parameter version enables fast and efficient inference while still delivering impressive performance.

## Model inputs and outputs

### Inputs
- **Text sequences**: The model can handle text inputs up to 8192 tokens in length.

### Outputs
- **Sentence embeddings**: The model outputs 768-dimensional dense vector representations that capture the semantic meaning of the input text.

## Capabilities

`jina-embeddings-v2-small-en` is a highly capable text encoding model that can be used for a variety of natural language processing tasks. Its ability to handle long input sequences makes it particularly useful for applications like long document retrieval, semantic textual similarity, text reranking, recommendation, and generative search.

## What can I use it for?

The `jina-embeddings-v2-small-en` model can be used for a wide range of applications, including:

- **Information Retrieval**: Encoding long documents or queries into semantic vectors for efficient similarity-based search and ranking.
- **Recommendation Systems**: Generating embeddings of items (e.g. articles, products) or user queries to enable content-based recommendation.
- **Text Classification**: Using the sentence embeddings as input features for downstream classification tasks.
- **Semantic Similarity**: Computing the semantic similarity between text pairs, such as for paraphrase detection or question answering.
- **Natural Language Generation**: Incorporating the model into RAG (Retrieval-Augmented Generation) or other LLM-based systems to improve the coherence and relevance of generated text.

## Things to try

A key advantage of the `jina-embeddings-v2-small-en` model is its ability to handle long input sequences. This makes it well-suited for tasks involving lengthy documents, such as legal contracts, research papers, or product manuals. You could explore using this model to build intelligent search or recommendation systems that can effectively process and understand these types of complex, information-rich text inputs.

Additionally, the model's strong performance on semantic similarity tasks suggests it could be useful for building chatbots or dialogue systems that need to understand the meaning behind user queries and provide relevant, context-aware responses.

  
  

![Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.](https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face)

**Trained by [**Jina AI**](https://jina.ai/).**

[](#jina-colbert)Jina-ColBERT
=============================

**Jina-ColBERT is a ColBERT-style model but based on JinaBERT so it can support both _8k context length_, _fast and accurate retrieval_.**

[JinaBERT](https://arxiv.org/abs/2310.19923) is a BERT architecture that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length. The Jina-ColBERT model is trained on MSMARCO passage ranking dataset, following a very similar training procedure with ColBERTv2. The only difference is that we use `jina-bert-v2-base-en` as the backbone instead of `bert-base-uncased`.

For more information about ColBERT, please refer to the [ColBERTv1](https://arxiv.org/abs/2004.12832) and [ColBERTv2](https://arxiv.org/abs/2112.01488v3) paper, and [the original code](https://github.com/stanford-futuredata/ColBERT).

[](#usage)Usage
---------------

### [](#installation)Installation

To use this model, you will need to install the **latest version** of the ColBERT repository:

    pip install git+https://github.com/stanford-futuredata/ColBERT.git torch
    conda install -c conda-forge faiss-gpu  # use conda to install the latest version faiss
    

### [](#indexing)Indexing

    from colbert import Indexer
    from colbert.infra import Run, RunConfig, ColBERTConfig
    
    n_gpu: int = 1  # Set your number of available GPUs
    experiment: str = ""  # Name of the folder where the logs and created indices will be stored
    index_name: str = ""  # The name of your index, i.e. the name of your vector database
    
    if __name__ == "__main__":
        with Run().context(RunConfig(nranks=n_gpu, experiment=experiment)):
            config = ColBERTConfig(
              doc_maxlen=8192  # Our model supports 8k context length for indexing long documents
            )
            indexer = Indexer(
              checkpoint="jinaai/jina-colbert-v1-en",
              config=config,
            )
            documents = [
              "ColBERT is an efficient and effective passage retrieval model.",
              "Jina-ColBERT is a ColBERT-style model but based on JinaBERT so it can support both 8k context length.",
              "JinaBERT is a BERT architecture that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length.",
              "Jina-ColBERT model is trained on MSMARCO passage ranking dataset, following a very similar training procedure with ColBERTv2.",
              "Jina-ColBERT achieves the competitive retrieval performance with ColBERTv2.",
              "Jina is an easier way to build neural search systems.",
              "You can use Jina-ColBERT to build neural search systems with ease.",
              # Add more documents here to ensure the clustering work correctly
            ]
            indexer.index(name=index_name, collection=documents)
    

### [](#searching)Searching

    from colbert import Searcher
    from colbert.infra import Run, RunConfig, ColBERTConfig
    
    n_gpu: int = 0
    experiment: str = ""  # Name of the folder where the logs and created indices will be stored
    index_name: str = ""  # Name of your previously created index where the documents you want to search are stored.
    k: int = 10  # how many results you want to retrieve
    
    if __name__ == "__main__":
        with Run().context(RunConfig(nranks=n_gpu, experiment=experiment)):
            config = ColBERTConfig(
              query_maxlen=128  # Although the model supports 8k context length, we suggest not to use a very long query, as it may cause significant computational complexity and CUDA memory usage.
            )
            searcher = Searcher(
              index=index_name, 
              config=config
            )  # You don't need to specify the checkpoint again, the model name is stored in the index.
            query = "How to use ColBERT for indexing long documents?"
            results = searcher.search(query, k=k)
            # results: tuple of tuples of length k containing ((passage_id, passage_rank, passage_score), ...)
    

### [](#creating-vectors)Creating Vectors

    from colbert.modeling.checkpoint import Checkpoint
    
    ckpt = Checkpoint("jinaai/jina-colbert-v1-en", colbert_config=ColBERTConfig(root="experiments"))
    query_vectors = ckpt.queryFromText(["What does ColBERT do?", "This is a search query?"], bsize=16)
    print(query_vectors)
    

Complete working Colab Notebook is [here](https://colab.research.google.com/drive/1-5WGEYPSBNBg-Z0bGFysyvckFuM8imrg)

### [](#reranking-using-colbert)Reranking Using ColBERT

    from colbert.modeling.checkpoint import Checkpoint
    from colbert.infra import ColBERTConfig
    
    query = ["How to use ColBERT for indexing long documents?"]
    documents = [
        "ColBERT is an efficient and effective passage retrieval model.",
        "Jina-ColBERT is a ColBERT-style model but based on JinaBERT so it can support both 8k context length.",
        "JinaBERT is a BERT architecture that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length.",
        "Jina-ColBERT model is trained on MSMARCO passage ranking dataset, following a very similar training procedure with ColBERTv2.",
    ]
    
    config = ColBERTConfig(query_maxlen=32, doc_maxlen=512)
    ckpt = Checkpoint(args.reranker, colbert_config=colbert_config)
    Q = ckpt.queryFromText([all_queries[i]])
    D = ckpt.docFromText(all_passages, bsize=32)[0]
    D_mask = torch.ones(D.shape[:2], dtype=torch.long)
    scores = colbert_score(Q, D, D_mask).flatten().cpu().numpy().tolist()
    ranking = numpy.argsort(scores)[::-1]
    print(ranking)
    

[](#evaluation-results)Evaluation Results
-----------------------------------------

**TL;DR:** Our Jina-ColBERT achieves the competitive retrieval performance with [ColBERTv2](https://huggingface.co/colbert-ir/colbertv2.0) on all benchmarks, and outperforms ColBERTv2 on datasets in where documents have longer context length.

### [](#in-domain-benchmarks)In-domain benchmarks

We evaluate the in-domain performance on the dev subset of MSMARCO passage ranking dataset. We follow the same evaluation settings in the ColBERTv2 paper and rerun the results of ColBERTv2 using the released checkpoint.

Model

MRR@10

Recall@50

Recall@1k

ColBERTv2

39.7

86.8

97.6

Jina-ColBERT-v1

39.0

85.6

96.2

### [](#out-of-domain-benchmarks)Out-of-domain benchmarks

Following ColBERTv2, we evaluate the out-of-domain performance on 13 public BEIR datasets and use NDCG@10 as the main metric. We follow the same evaluation settings in the ColBERTv2 paper and rerun the results of ColBERTv2 using the released checkpoint.

Note that both ColBERTv2 and Jina-ColBERT-v1 only employ MSMARCO passage ranking dataset for training, so below results are the fully zero-shot performance.

dataset

ColBERTv2

Jina-ColBERT-v1

ArguAna

46.5

49.4

ClimateFEVER

18.1

19.6

DBPedia

45.2

41.3

FEVER

78.8

79.5

FiQA

35.4

36.8

HotPotQA

67.5

65.6

NFCorpus

33.7

33.8

NQ

56.1

54.9

Quora

85.5

82.3

SCIDOCS

15.4

16.9

SciFact

68.9

70.1

TREC-COVID

72.6

75.0

Webis-touch2020

26.0

27.0

Average

50.0

50.2

### [](#long-context-datasets)Long context datasets

We also evaluate the zero-shot performance on datasets where documents have longer context length and compare with some long-context embedding models. Here we use the [LoCo benchmark](https://www.together.ai/blog/long-context-retrieval-models-with-monarch-mixer), which contains 5 datasets with long context length.

Model

Used context length

Model max context length

Avg. NDCG@10

ColBERTv2

512

512

74.3

Jina-ColBERT-v1 (truncated)

512\*

8192

75.5

Jina-ColBERT-v1

8192

8192

83.7

Jina-embeddings-v2-base-en

8192

8192

**85.4**

\* denotes that we truncate the context length to 512 for documents. The context length of queries is all 512.

**To summarize, Jina-ColBERT achieves the comparable retrieval performance with ColBERTv2 on all benchmarks, and outperforms ColBERTv2 on datasets in where documents have longer context length.**

### [](#reranking-performance)Reranking Performance

We evaluate the reranking performance of ColBERTv2 and Jina-ColBERT on BEIR. We use BM25 as the first-stage retrieval model. The full evaluation code can be found in [this repo](https://github.com/liuqi6777/eval_reranker).

In summary, Jina-ColBERT outperforms ColBERTv2, even achieving comparable performance with some cross-encoder.

The best model, jina-reranker, will be open-sourced soon!

BM25

ColBERTv2

Jina-ColBERT

MiniLM-L-6-v2

BGE-reranker-base-v1

BGE-reranker-large-v1

Jina-reranker-base-v1

Arguana

29.99

33.42

33.95

30.67

23.26

25.42

Climate-Fever

16.51

20.66

21.87

24.70

31.60

31.98

DBPedia

31.80

42.16

41.43

43.90

41.56

43.79

FEVER

65.13

81.07

83.49

80.77

87.07

89.11

FiQA

23.61

35.60

36.68

34.87

33.17

37.70

HotpotQA

63.30

68.84

68.62

72.65

79.04

79.98

NFCorpus

33.75

36.69

36.38

36.48

32.71

36.57

NQ

30.55

51.27

51.01

52.01

53.55

56.81

Quora

78.86

85.18

82.75

82.45

78.44

81.06

SCIDOCS

14.90

15.39

16.67

16.28

15.06

16.84

SciFact

67.89

70.23

70.95

69.53

70.62

74.14

TREC-COVID

59.47

75.00

76.89

74.45

67.46

74.32

Webis-touche2020

44.22

32.12

32.56

28.40

34.37

35.66

Average

43.08

49.82

50.25

49.78

49.84

52.57

ColBERT

[](#plans)Plans
---------------

We are planning to improve the performance of Jina-ColBERT by fine-tuning on more datasets in the future.

[](#other-models)Other Models
-----------------------------

Additionally, we provide the following embedding models, you can also use them for retrieval.

*   [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
*   [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English bilingual model.
*   [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English bilingual model.
*   [`jina-embeddings-v2-base-es`](https://huggingface.co/jinaai/jina-embeddings-v2-base-es): 161 million parameters Spanish-English bilingual model.

[](#contact)Contact
-------------------

Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.

## Jina-ColBERT
Jina-ColBERT is a variant of the ColBERT retrieval model that is based on the [JinaBERT](https://arxiv.org/abs/2310.19923) architecture. Like the original ColBERT, Jina-ColBERT uses a late interaction approach to achieve fast and accurate retrieval. The key difference is that Jina-ColBERT supports a longer context length of up to 8,192 tokens, enabled by the JinaBERT backbone which incorporates the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409). 

## Model inputs and outputs

### Inputs
- Text passages to be indexed and searched

### Outputs
- Ranked lists of the most relevant passages for a given query

## Capabilities
Jina-ColBERT is designed for efficient and effective passage retrieval, outperforming standard BERT-based models. Its ability to handle long documents up to 8,192 tokens makes it well-suited for tasks involving large amounts of text, such as document search and question-answering over long-form content.

## What can I use it for?
Jina-ColBERT can be used to power a wide range of search and retrieval applications, including enterprise search, academic literature search, and question-answering systems. Its performance characteristics make it particularly useful in scenarios where users need to search large document collections quickly and accurately.

## Things to try
One interesting aspect of Jina-ColBERT is its ability to leverage the [JinaBERT](https://arxiv.org/abs/2310.19923) architecture to support longer input sequences. Practitioners could experiment with using Jina-ColBERT to search through long-form content like books, legal documents, or research papers, and compare its performance to other retrieval models.

  
  

![Jina AI logo: Jina AI is your Portal to Multimodal AI](https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face)

**The text embedding set trained by [**Jina AI**](https://jina.ai/).**

[](#quick-start)Quick Start
---------------------------

The easiest way to starting using `jina-embeddings-v2-base-de` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).

[](#intended-usage--model-info)Intended Usage & Model Info
----------------------------------------------------------

`jina-embeddings-v2-base-de` is a German/English bilingual text **embedding model** supporting **8192 sequence length**. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length. We have designed it for high performance in mono-lingual & cross-lingual applications and trained it specifically to support mixed German-English input without bias. Additionally, we provide the following embedding models:

`jina-embeddings-v2-base-de` ist ein zweisprachiges **Text Embedding Modell** fr Deutsch und Englisch, welches Texteingaben mit einer Lnge von bis zu **8192 Token untersttzt**. Es basiert auf der adaptierten Bert-Modell-Architektur JinaBERT, welche mithilfe einer symmetrische Variante von [ALiBi](https://arxiv.org/abs/2108.12409) lngere Eingabetexte erlaubt. Wir haben, das Model fr hohe Performance in einsprachigen und cross-lingual Anwendungen entwickelt und speziell darauf trainiert, gemischte deutsch-englische Eingaben ohne einen Bias zu kodieren. Des Weiteren stellen wir folgende Embedding-Modelle bereit:

*   [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
*   [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
*   [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English Bilingual embeddings.
*   [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English Bilingual embeddings **(you are here)**.
*   `jina-embeddings-v2-base-es`: Spanish-English Bilingual embeddings (soon).
*   [`jina-embeddings-v2-base-code`](https://huggingface.co/jinaai/jina-embeddings-v2-base-code): 161 million parameters code embeddings.

[](#data--parameters)Data & Parameters
--------------------------------------

The data and training details are described in this [technical report](https://arxiv.org/abs/2402.17016).

[](#usage)Usage
---------------

**Please apply mean pooling when integrating the model.**

### [](#why-mean-pooling)Why mean pooling?

`mean poooling` takes all token embeddings from model output and averaging them at sentence/paragraph level. It has been proved to be the most effective way to produce high-quality sentence embeddings. We offer an `encode` function to deal with this.

However, if you would like to do it without using the default `encode` function:

    import torch
    import torch.nn.functional as F
    from transformers import AutoTokenizer, AutoModel
    
    def mean_pooling(model_output, attention_mask):
        token_embeddings = model_output[0]
        input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    
    sentences = ['How is the weather today?', 'What is the current weather like today?']
    
    tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v2-base-de')
    model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-de', trust_remote_code=True)
    
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
    
    with torch.no_grad():
        model_output = model(**encoded_input)
    
    embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
    embeddings = F.normalize(embeddings, p=2, dim=1)

You can use Jina Embedding models directly from transformers package.

    !pip install transformers
    from transformers import AutoModel
    from numpy.linalg import norm
    
    cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
    model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-de', trust_remote_code=True) # trust_remote_code is needed to use the encode method
    embeddings = model.encode(['How is the weather today?', 'Wie ist das Wetter heute?'])
    print(cos_sim(embeddings[0], embeddings[1]))
    

If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:

    embeddings = model.encode(
        ['Very long ... document'],
        max_length=2048
    )
    

Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):

    !pip install -U sentence-transformers
    from sentence_transformers import SentenceTransformer
    from sentence_transformers.util import cos_sim
    
    model = SentenceTransformer(
        "jinaai/jina-embeddings-v2-base-de", # switch to en/zh for English or Chinese
        trust_remote_code=True
    )
    
    # control your input sequence length up to 8192
    model.max_seq_length = 1024
    
    embeddings = model.encode([
        'How is the weather today?',
        'Wie ist das Wetter heute?'
    ])
    print(cos_sim(embeddings[0], embeddings[1]))
    

[](#alternatives-to-using-transformers-package)Alternatives to Using Transformers Package
-----------------------------------------------------------------------------------------

1.  _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
2.  _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).

[](#benchmark-results)Benchmark Results
---------------------------------------

We evaluated our Bilingual model on all German and English evaluation tasks availble on the [MTEB benchmark](https://huggingface.co/blog/mteb). In addition, we evaluated the models agains a couple of other German, English, and multilingual models on additional German evaluation tasks:

![](/jinaai/jina-embeddings-v2-base-de/resolve/main/de_evaluation_results.png)

[](#use-jina-embeddings-for-rag)Use Jina Embeddings for RAG
-----------------------------------------------------------

According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83),

> In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out.

![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png)

[](#contact)Contact
-------------------

Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.

[](#citation)Citation
---------------------

If you find Jina Embeddings useful in your research, please cite the following paper:

    @article{mohr2024multi,
      title={Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings},
      author={Mohr, Isabelle and Krimmel, Markus and Sturua, Saba and Akram, Mohammad Kalim and Koukounas, Andreas and G{\"u}nther, Michael and Mastrapas, Georgios and Ravishankar, Vinit and Mart{\'\i}nez, Joan Fontanals and Wang, Feng and others},
      journal={arXiv preprint arXiv:2402.17016},
      year={2024}
    }

## Model overview

The `jina-embeddings-v2-base-de` is a German/English bilingual text embedding model developed by [Jina AI](https://aimodels.fyi/creators/huggingFace/jinaai). It supports input sequences up to 8192 tokens and is based on a BERT architecture (JinaBERT) that uses the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to handle longer sequences. Jina AI has also released several other embedding models, including [`jina-embeddings-v2-small-en`](https://aimodels.fyi/models/huggingFace/jina-embeddings-v2-small-en-jinaai), [`jina-embeddings-v2-base-en`](https://aimodels.fyi/models/huggingFace/jina-embeddings-v2-base-en-jinaai), [`jina-embeddings-v2-base-zh`](https://aimodels.fyi/models/huggingFace/jina-embeddings-v2-base-zh-jinaai), and [`jina-embeddings-v2-base-code`](https://huggingface.co/jinaai/jina-embeddings-v2-base-code).

## Model inputs and outputs

### Inputs
- Text sequences up to 8192 tokens in length, supporting mixed German-English input.

### Outputs
- A 768-dimensional embedding vector representing the semantic meaning of the input text.

## Capabilities

The `jina-embeddings-v2-base-de` model is designed for high performance in both monolingual and cross-lingual applications. It has been trained to handle mixed German-English input without bias, making it useful for applications involving multiple languages.

## What can I use it for?

The `jina-embeddings-v2-base-de` model can be used for a variety of NLP tasks, such as:

- Long document retrieval
- Semantic textual similarity
- Text re-ranking
- Recommendation systems
- RAG (Retrieval-Augmented Generation) and LLM-based generative search

According to a recent blog post from [LLamaIndex](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83), the combination of Jina AI's base embeddings with the CohereRerank/bge-reranker-large model stands out for achieving peak performance in both hit rate and MRR for RAG applications.

## Things to try

When using the `jina-embeddings-v2-base-de` model, it's important to apply mean pooling to the token embeddings to produce high-quality sentence-level embeddings. Jina AI provides an `encode` function to handle this automatically, but you can also implement mean pooling manually if needed.