For more details please refer to our github repo: [https://github.com/FlagOpen/FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)

[](#bge-m3-paper-code)BGE-M3 ([paper](https://arxiv.org/pdf/2402.03216.pdf), [code](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3))
================================================================================================================================================================

In this project, we introduce BGE-M3, which is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.

*   Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval.
*   Multi-Linguality: It can support more than 100 working languages.
*   Multi-Granularity: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens.

**Some suggestions for retrieval pipeline in RAG**

We recommend to use the following pipeline: hybrid retrieval + re-ranking.

*   Hybrid retrieval leverages the strengths of various methods, offering higher accuracy and stronger generalization capabilities. A classic example: using both embedding retrieval and the BM25 algorithm. Now, you can try to use BGE-M3, which supports both embedding and sparse retrieval. This allows you to obtain token weights (similar to the BM25) without any additional cost when generate dense embeddings. To use hybrid retrieval, you can refer to [Vespa](https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb) and [Milvus](https://github.com/milvus-io/pymilvus/blob/master/examples/hello_hybrid_sparse_dense.py).
    
*   As cross-encoder models, re-ranker demonstrates higher accuracy than bi-encoder embedding model. Utilizing the re-ranking model (e.g., [bge-reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker), [bge-reranker-v2](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_reranker)) after retrieval can further filter the selected text.
    

[](#news)News:
--------------

*   2024/3/20: **Thanks Milvus team!** Now you can use hybrid retrieval of bge-m3 in Milvus: [pymilvus/examples /hello\_hybrid\_sparse\_dense.py](https://github.com/milvus-io/pymilvus/blob/master/examples/hello_hybrid_sparse_dense.py).
*   2024/3/8: **Thanks for the [experimental results](https://towardsdatascience.com/openai-vs-open-source-multilingual-embedding-models-e5ccb7c90f05) from @[Yannael](https://huggingface.co/Yannael). In this benchmark, BGE-M3 achieves top performance in both English and other languages, surpassing models such as OpenAI.**
*   2024/3/2: Release unified fine-tuning [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/unified_finetune) and [data](https://huggingface.co/datasets/Shitao/bge-m3-data)
*   2024/2/6: We release the [MLDR](https://huggingface.co/datasets/Shitao/MLDR) (a long document retrieval dataset covering 13 languages) and [evaluation pipeline](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR).
*   2024/2/1: **Thanks for the excellent tool from Vespa.** You can easily use multiple modes of BGE-M3 following this [notebook](https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb)

[](#specs)Specs
---------------

*   Model

Model Name

Dimension

Sequence Length

Introduction

[BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)

1024

8192

multilingual; unified fine-tuning (dense, sparse, and colbert) from bge-m3-unsupervised

[BAAI/bge-m3-unsupervised](https://huggingface.co/BAAI/bge-m3-unsupervised)

1024

8192

multilingual; contrastive learning from bge-m3-retromae

[BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae)

\--

8192

multilingual; extend the max\_length of [xlm-roberta](https://huggingface.co/FacebookAI/xlm-roberta-large) to 8192 and further pretrained via [retromae](https://github.com/staoxiao/RetroMAE)

[BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)

1024

512

English model

[BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)

768

512

English model

[BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)

384

512

English model

*   Data

Dataset

Introduction

[MLDR](https://huggingface.co/datasets/Shitao/MLDR)

Docuemtn Retrieval Dataset, covering 13 languages

[bge-m3-data](https://huggingface.co/datasets/Shitao/bge-m3-data)

Fine-tuning data used by bge-m3

[](#faq)FAQ
-----------

**1\. Introduction for different retrieval methods**

*   Dense retrieval: map the text into a single embedding, e.g., [DPR](https://arxiv.org/abs/2004.04906), [BGE-v1.5](https://github.com/FlagOpen/FlagEmbedding)
*   Sparse retrieval (lexical matching): a vector of size equal to the vocabulary, with the majority of positions set to zero, calculating a weight only for tokens present in the text. e.g., BM25, [unicoil](https://arxiv.org/pdf/2106.14807.pdf), and [splade](https://arxiv.org/abs/2107.05720)
*   Multi-vector retrieval: use multiple vectors to represent a text, e.g., [ColBERT](https://arxiv.org/abs/2004.12832).

**2\. How to use BGE-M3 in other projects?**

For embedding retrieval, you can employ the BGE-M3 model using the same approach as BGE. The only difference is that the BGE-M3 model no longer requires adding instructions to the queries.

For hybrid retrieval, you can use [Vespa](https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb) and [Milvus](https://github.com/milvus-io/pymilvus/blob/master/examples/hello_hybrid_sparse_dense.py).

**3\. How to fine-tune bge-M3 model?**

You can follow the common in this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) to fine-tune the dense embedding.

If you want to fine-tune all embedding function of m3 (dense, sparse and colbert), you can refer to the [unified\_fine-tuning example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/unified_finetune)

[](#usage)Usage
---------------

Install:

    git clone https://github.com/FlagOpen/FlagEmbedding.git
    cd FlagEmbedding
    pip install -e .
    

or:

    pip install -U FlagEmbedding
    

### [](#generate-embedding-for-text)Generate Embedding for text

*   Dense Embedding

    from FlagEmbedding import BGEM3FlagModel
    
    model = BGEM3FlagModel('BAAI/bge-m3',  
                           use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
    
    sentences_1 = ["What is BGE M3?", "Defination of BM25"]
    sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", 
                   "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
    
    embeddings_1 = model.encode(sentences_1, 
                                batch_size=12, 
                                max_length=8192, # If you don't need such a long length, you can set a smaller value to speed up the encoding process.
                                )['dense_vecs']
    embeddings_2 = model.encode(sentences_2)['dense_vecs']
    similarity = embeddings_1 @ embeddings_2.T
    print(similarity)
    # [[0.6265, 0.3477], [0.3499, 0.678 ]]
    

You also can use sentence-transformers and huggingface transformers to generate dense embeddings. Refer to [baai\_general\_embedding](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding#usage) for details.

*   Sparse Embedding (Lexical Weight)

    from FlagEmbedding import BGEM3FlagModel
    
    model = BGEM3FlagModel('BAAI/bge-m3',  use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
    
    sentences_1 = ["What is BGE M3?", "Defination of BM25"]
    sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", 
                   "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
    
    output_1 = model.encode(sentences_1, return_dense=True, return_sparse=True, return_colbert_vecs=False)
    output_2 = model.encode(sentences_2, return_dense=True, return_sparse=True, return_colbert_vecs=False)
    
    # you can see the weight for each token:
    print(model.convert_id_to_token(output_1['lexical_weights']))
    # [{'What': 0.08356, 'is': 0.0814, 'B': 0.1296, 'GE': 0.252, 'M': 0.1702, '3': 0.2695, '?': 0.04092}, 
    #  {'De': 0.05005, 'fin': 0.1368, 'ation': 0.04498, 'of': 0.0633, 'BM': 0.2515, '25': 0.3335}]
    
    
    # compute the scores via lexical mathcing
    lexical_scores = model.compute_lexical_matching_score(output_1['lexical_weights'][0], output_2['lexical_weights'][0])
    print(lexical_scores)
    # 0.19554901123046875
    
    print(model.compute_lexical_matching_score(output_1['lexical_weights'][0], output_1['lexical_weights'][1]))
    # 0.0
    

*   Multi-Vector (ColBERT)

    from FlagEmbedding import BGEM3FlagModel
    
    model = BGEM3FlagModel('BAAI/bge-m3',  use_fp16=True) 
    
    sentences_1 = ["What is BGE M3?", "Defination of BM25"]
    sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", 
                   "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
    
    output_1 = model.encode(sentences_1, return_dense=True, return_sparse=True, return_colbert_vecs=True)
    output_2 = model.encode(sentences_2, return_dense=True, return_sparse=True, return_colbert_vecs=True)
    
    print(model.colbert_score(output_1['colbert_vecs'][0], output_2['colbert_vecs'][0]))
    print(model.colbert_score(output_1['colbert_vecs'][0], output_2['colbert_vecs'][1]))
    # 0.7797
    # 0.4620
    

### [](#compute-score-for-text-pairs)Compute score for text pairs

Input a list of text pairs, you can get the scores computed by different methods.

    from FlagEmbedding import BGEM3FlagModel
    
    model = BGEM3FlagModel('BAAI/bge-m3',  use_fp16=True) 
    
    sentences_1 = ["What is BGE M3?", "Defination of BM25"]
    sentences_2 = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", 
                   "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
    
    sentence_pairs = [[i,j] for i in sentences_1 for j in sentences_2]
    
    print(model.compute_score(sentence_pairs, 
                              max_passage_length=128, # a smaller max length leads to a lower latency
                              weights_for_different_modes=[0.4, 0.2, 0.4])) # weights_for_different_modes(w) is used to do weighted sum: w[0]*dense_score + w[1]*sparse_score + w[2]*colbert_score
    
    # {
    #   'colbert': [0.7796499729156494, 0.4621465802192688, 0.4523794651031494, 0.7898575067520142], 
    #   'sparse': [0.195556640625, 0.00879669189453125, 0.0, 0.1802978515625], 
    #   'dense': [0.6259765625, 0.347412109375, 0.349853515625, 0.67822265625], 
    #   'sparse+dense': [0.482503205537796, 0.23454029858112335, 0.2332356721162796, 0.5122477412223816], 
    #   'colbert+sparse+dense': [0.6013619303703308, 0.3255828022956848, 0.32089319825172424, 0.6232916116714478]
    # }
    

[](#evaluation)Evaluation
-------------------------

We provide the evaluation script for [MKQA](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MKQA) and [MLDR](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR)

### [](#benchmarks-from-the-open-source-community)Benchmarks from the open-source community

[![avatar](/BAAI/bge-m3/resolve/main/imgs/others.webp)](/BAAI/bge-m3/blob/main/imgs/others.webp) The BGE-M3 model emerged as the top performer on this benchmark (OAI is short for OpenAI). For more details, please refer to the [article](https://towardsdatascience.com/openai-vs-open-source-multilingual-embedding-models-e5ccb7c90f05) and [Github Repo](https://github.com/Yannael/multilingual-embeddings)

### [](#our-results)Our results

*   Multilingual (Miracl dataset)

[![avatar](/BAAI/bge-m3/resolve/main/imgs/miracl.jpg)](/BAAI/bge-m3/blob/main/imgs/miracl.jpg)

*   Cross-lingual (MKQA dataset)

[![avatar](/BAAI/bge-m3/resolve/main/imgs/mkqa.jpg)](/BAAI/bge-m3/blob/main/imgs/mkqa.jpg)

*   Long Document Retrieval
    
    *   MLDR:  
        [![avatar](/BAAI/bge-m3/resolve/main/imgs/long.jpg)](/BAAI/bge-m3/blob/main/imgs/long.jpg) Please note that [MLDR](https://huggingface.co/datasets/Shitao/MLDR) is a document retrieval dataset we constructed via LLM, covering 13 languages, including test set, validation set, and training set. We utilized the training set from MLDR to enhance the model's long document retrieval capabilities. Therefore, comparing baselines with `Dense w.o.long`(fine-tuning without long document dataset) is more equitable. Additionally, this long document retrieval dataset will be open-sourced to address the current lack of open-source multilingual long text retrieval datasets. We believe that this data will be helpful for the open-source community in training document retrieval models.
        
    *   NarritiveQA:  
        [![avatar](/BAAI/bge-m3/resolve/main/imgs/nqa.jpg)](/BAAI/bge-m3/blob/main/imgs/nqa.jpg)
        
*   Comparison with BM25
    

We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline). We tested BM25 using two different tokenizers: one using Lucene Analyzer and the other using the same tokenizer as M3 (i.e., the tokenizer of xlm-roberta). The results indicate that BM25 remains a competitive baseline, especially in long document retrieval.

[![avatar](/BAAI/bge-m3/resolve/main/imgs/bm25.jpg)](/BAAI/bge-m3/blob/main/imgs/bm25.jpg)

[](#training)Training
---------------------

*   Self-knowledge Distillation: combining multiple outputs from different retrieval modes as reward signal to enhance the performance of single mode(especially for sparse retrieval and multi-vec(colbert) retrival)
*   Efficient Batching: Improve the efficiency when fine-tuning on long text. The small-batch strategy is simple but effective, which also can used to fine-tune large embedding model.
*   MCLS: A simple method to improve the performance on long text without fine-tuning. If you have no enough resource to fine-tuning model with long text, the method is useful.

Refer to our [report](https://arxiv.org/pdf/2402.03216.pdf) for more details.

[](#acknowledgement)Acknowledgement
-----------------------------------

Thanks to the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc. Thanks to the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [Pyserini](https://github.com/castorini/pyserini).

[](#citation)Citation
---------------------

If you find this repository useful, please consider giving a star :star: and citation

    @misc{bge-m3,
          title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation}, 
          author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
          year={2024},
          eprint={2402.03216},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }

## Model overview

`bge-m3` is a versatile AI model developed by BAAI (Beijing Academy of Artificial Intelligence) that is distinguished by its multi-functionality, multi-linguality, and multi-granularity capabilities. It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval. The model supports more than 100 working languages and can process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens.

Compared to similar models like [m3e-large](https://aimodels.fyi/models/huggingFace/m3e-large-moka-ai), `bge-m3` offers a unique combination of retrieval functionalities in a single model. Other related models like [bge_1-5_query_embeddings](https://aimodels.fyi/models/huggingFace/bge1-5queryembeddings-center-for-curriculum-redesign), [bge-large-en-v1.5](https://aimodels.fyi/models/huggingFace/bge-large-en-v15-nateraw), [bge-reranker-base](https://aimodels.fyi/models/huggingFace/bge-reranker-base-ninehills), and [bge-reranker-v2-m3](https://aimodels.fyi/models/huggingFace/bge-reranker-v2-m3-yxzwayne) provide specific functionalities like query embedding generation, text embedding, and re-ranking.

## Model inputs and outputs

### Inputs
- Text sequences of varying length, up to 8192 tokens

### Outputs
- Dense embeddings for retrieval
- Sparse token-level representations for retrieval
- Multi-vector representations for retrieval

## Capabilities

`bge-m3` can effectively handle a wide range of text-related tasks, such as dense retrieval, multi-vector retrieval, and sparse retrieval. The model's multi-functionality allows it to leverage the strengths of different retrieval methods, resulting in higher accuracy and stronger generalization capabilities. For example, the model can be used in a hybrid retrieval pipeline that combines embedding-based retrieval and the BM25 algorithm, without incurring additional cost.

## What can I use it for?

`bge-m3` can be leveraged in various applications that require effective text retrieval, such as chatbots, search engines, question-answering systems, and content recommendation engines. By taking advantage of the model's multi-functionality, users can build robust and versatile retrieval pipelines that cater to their specific needs.

## Things to try

One interesting aspect of `bge-m3` is its ability to process inputs of different granularities, from short sentences to long documents. This feature can be particularly useful in applications that involve working with a diverse range of text sources, such as social media posts, news articles, or research papers. Experiment with inputting text of varying lengths and observe how the model performs across these different scenarios.

Additionally, the model's support for over 100 languages makes it a valuable tool for building multilingual systems. Consider exploring the model's performance on non-English text and how it compares to language-specific models or other multilingual alternatives.

## Model overview

The `bge-large-zh` model is a state-of-the-art text embedding model developed by the Beijing Academy of Artificial Intelligence (BAAI). It is part of the BAAI General Embedding (BGE) family of models, which have achieved top performance on both the MTEB and C-MTEB benchmarks. The `bge-large-zh` model is specifically designed for Chinese text processing, and it can map any Chinese text into a low-dimensional dense vector that can be used for tasks like retrieval, classification, clustering, or semantic search.

Compared to similar models like [BAAI/bge-large-en](https://aimodels.fyi/models/huggingFace/bge-large-en-baai) and [BAAI/bge-small-en](https://aimodels.fyi/models/huggingFace/bge-small-en-baai), the `bge-large-zh` model has been optimized for Chinese text and has demonstrated state-of-the-art performance on Chinese benchmarks. The [BAAI/llm-embedder](https://aimodels.fyi/models/huggingFace/llm-embedder-baai) model is a more recent addition to the BAAI family, serving as a unified embedding model to support diverse retrieval augmentation needs for large language models (LLMs).

## Model inputs and outputs

### Inputs
- **Text**: The `bge-large-zh` model can take any Chinese text as input, ranging from short queries to long passages.
- **Instruction (optional)**: For retrieval tasks that use short queries to find long related documents, it is recommended to add an instruction to the query to help the model better understand the intent. The instruction should be placed at the beginning of the query text. No instruction is needed for the passage/document text.

### Outputs
- **Embeddings**: The primary output of the `bge-large-zh` model is a dense vector embedding of the input text. These embeddings can be used for a variety of downstream tasks, such as:
  - Retrieval: The embeddings can be used to find related passages or documents by computing the similarity between the query embedding and the passage/document embeddings.
  - Classification: The embeddings can be used as features for training classification models.
  - Clustering: The embeddings can be used to group similar text together.
  - Semantic search: The embeddings can be used to find semantically related text.

## Capabilities

The `bge-large-zh` model demonstrates state-of-the-art performance on a range of Chinese text processing tasks. On the Chinese Massive Text Embedding Benchmark (C-MTEB), the `bge-large-zh-v1.5` model ranked first overall, showing strong results across tasks like retrieval, semantic similarity, and classification.

Additionally, the `bge-large-zh` model has been designed to handle long input text, with a maximum sequence length of 512 tokens. This makes it well-suited for tasks that involve processing lengthy passages or documents, such as research paper retrieval or legal document search.

## What can I use it for?

The `bge-large-zh` model can be used for a variety of Chinese text processing tasks, including:

- **Retrieval**: Use the model to find relevant passages or documents given a query. This can be helpful for building search engines, Q&A systems, or knowledge management tools.
- **Classification**: Use the model's embeddings as features to train classification models for tasks like sentiment analysis, topic classification, or intent detection.
- **Clustering**: Group similar Chinese text together using the model's embeddings, which can be useful for organizing large collections of documents or categorizing user-generated content.
- **Semantic search**: Find semantically related text by computing the similarity between the model's embeddings, enabling more advanced search experiences.

## Things to try

One interesting aspect of the `bge-large-zh` model is its ability to handle queries with or without instruction. While adding an instruction to the query can improve retrieval performance, the model's v1.5 version has been enhanced to perform well even without the instruction. This makes it more convenient to use in certain applications, as you don't need to worry about crafting the perfect query instruction.

Another thing to try is fine-tuning the `bge-large-zh` model on your own data. The provided [examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) show how you can prepare data and fine-tune the model to improve its performance on your specific use case. This can be particularly helpful if you have domain-specific text that the pre-trained model doesn't handle as well.

## Model overview

The `bge-large-en` model is a text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence). It is part of the BAAI General Embedding (BGE) family of models, which can map text to low-dimensional dense vectors for tasks like retrieval, classification, and semantic search. The maintainers recommend using the newer [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) model, which has a more reasonable similarity distribution and the same usage method.

## Model inputs and outputs

### Inputs
- Text sequences of up to 512 tokens

### Outputs
- 1024-dimensional dense vector embeddings

## Capabilities

The `bge-large-en` model can generate high-quality text embeddings that capture semantic meaning. These embeddings can be used for a variety of downstream tasks, such as:

- **Retrieval**: Finding relevant documents or passages given a query
- **Classification**: Classifying text into predefined categories
- **Clustering**: Grouping similar text documents together
- **Semantic search**: Searching for relevant content based on meaning, not just keywords

## What can I use it for?

The `bge-large-en` embeddings can be leveraged in various applications that require understanding the semantic meaning of text. For example, you could use them to build a powerful search engine that returns relevant results based on the query's intent, rather than just matching keywords.

Another potential use case is intelligent document retrieval and recommendation, where the model can surface the most relevant information to users based on their needs. This could be especially useful in enterprise settings or academic research, where users need to quickly find relevant information among large document collections.

## Things to try

One interesting experiment would be to fine-tune the `bge-large-en` model on a specific domain or task, such as legal document retrieval or scientific paper recommendation. This could help the model better capture the nuances and specialized vocabulary of your particular use case.

You could also explore using the `bge-large-en` embeddings in combination with other techniques, such as sparse lexical matching or multi-vector retrieval, to create a hybrid search system that leverages the strengths of different approaches.

[](#reranker)Reranker
=====================

**More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/tree/master).**

*   [Model List](#model-list)
*   [Usage](#usage)
*   [Fine-tuning](#fine-tune)
*   [Evaluation](#evaluation)
*   [Citation](#citation)

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in \[0,1\] by sigmoid function.

[](#model-list)Model List
-------------------------

Model

Base model

Language

layerwise

feature

[BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base)

[xlm-roberta-base](https://huggingface.co/xlm-roberta-base)

Chinese and English

\-

Lightweight reranker model, easy to deploy, with fast inference.

[BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large)

[xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large)

Chinese and English

\-

Lightweight reranker model, easy to deploy, with fast inference.

[BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3)

[bge-m3](https://huggingface.co/BAAI/bge-m3)

Multilingual

\-

Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference.

[BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma)

[gemma-2b](https://huggingface.co/google/gemma-2b)

Multilingual

\-

Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities.

[BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise)

[MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)

Multilingual

8-40

Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference.

You can select the model according your senario and resource.

*   For **multilingual**, utilize [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma)
    
*   For **Chinese or English**, utilize [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise).
    
*   For **efficiency**, utilize [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and the low layer of [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise).
    
*   For better performance, recommand [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise) and [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma)
    

[](#usage)Usage
---------------

### [](#using-flagembedding)Using FlagEmbedding

    pip install -U FlagEmbedding
    

#### [](#for-normal-reranker-bge-reranker-base--bge-reranker-large--bge-reranker-v2-m3-)For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 )

Get relevance scores (higher scores indicate more relevance):

    from FlagEmbedding import FlagReranker
    reranker = FlagReranker('BAAI/bge-reranker-v2-m3', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
    
    score = reranker.compute_score(['query', 'passage'])
    print(score) # -5.65234375
    
    # You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
    score = reranker.compute_score(['query', 'passage'], normalize=True)
    print(score) # 0.003497010252573502
    
    scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
    print(scores) # [-8.1875, 5.26171875]
    
    # You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
    scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], normalize=True)
    print(scores) # [0.00027803096387751553, 0.9948403768236574]
    

#### [](#for-llm-based-reranker)For LLM-based reranker

    from FlagEmbedding import FlagLLMReranker
    reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
    # reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', use_bf16=True) # You can also set use_bf16=True to speed up computation with a slight performance degradation
    
    score = reranker.compute_score(['query', 'passage'])
    print(score)
    
    scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
    print(scores)
    

#### [](#for-llm-based-layerwise-reranker)For LLM-based layerwise reranker

    from FlagEmbedding import LayerWiseFlagLLMReranker
    reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
    # reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', use_bf16=True) # You can also set use_bf16=True to speed up computation with a slight performance degradation
    
    score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28]) # Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
    print(score)
    
    scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutoff_layers=[28])
    print(scores)
    

### [](#using-huggingface-transformers)Using Huggingface transformers

#### [](#for-normal-reranker-bge-reranker-base--bge-reranker-large--bge-reranker-v2-m3--1)For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 )

Get relevance scores (higher scores indicate more relevance):

    import torch
    from transformers import AutoModelForSequenceClassification, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3')
    model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-v2-m3')
    model.eval()
    
    pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
    with torch.no_grad():
        inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
        scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
        print(scores)
    

#### [](#for-llm-based-reranker-1)For LLM-based reranker

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    def get_inputs(pairs, tokenizer, prompt=None, max_length=1024):
        if prompt is None:
            prompt = "Given a query A and a passage B, determine whether the passage contains an answer to the query by providing a prediction of either 'Yes' or 'No'."
        sep = "\n"
        prompt_inputs = tokenizer(prompt,
                                  return_tensors=None,
                                  add_special_tokens=False)['input_ids']
        sep_inputs = tokenizer(sep,
                               return_tensors=None,
                               add_special_tokens=False)['input_ids']
        inputs = []
        for query, passage in pairs:
            query_inputs = tokenizer(f'A: {query}',
                                     return_tensors=None,
                                     add_special_tokens=False,
                                     max_length=max_length * 3 // 4,
                                     truncation=True)
            passage_inputs = tokenizer(f'B: {passage}',
                                       return_tensors=None,
                                       add_special_tokens=False,
                                       max_length=max_length,
                                       truncation=True)
            item = tokenizer.prepare_for_model(
                [tokenizer.bos_token_id] + query_inputs['input_ids'],
                sep_inputs + passage_inputs['input_ids'],
                truncation='only_second',
                max_length=max_length,
                padding=False,
                return_attention_mask=False,
                return_token_type_ids=False,
                add_special_tokens=False
            )
            item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs
            item['attention_mask'] = [1] * len(item['input_ids'])
            inputs.append(item)
        return tokenizer.pad(
                inputs,
                padding=True,
                max_length=max_length + len(sep_inputs) + len(prompt_inputs),
                pad_to_multiple_of=8,
                return_tensors='pt',
        )
    
    tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-gemma')
    model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2-gemma')
    yes_loc = tokenizer('Yes', add_special_tokens=False)['input_ids'][0]
    model.eval()
    
    pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
    with torch.no_grad():
        inputs = get_inputs(pairs, tokenizer)
        scores = model(**inputs, return_dict=True).logits[:, -1, yes_loc].view(-1, ).float()
        print(scores)
    

#### [](#for-llm-based-layerwise-reranker-1)For LLM-based layerwise reranker

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    def get_inputs(pairs, tokenizer, prompt=None, max_length=1024):
        if prompt is None:
            prompt = "Given a query A and a passage B, determine whether the passage contains an answer to the query by providing a prediction of either 'Yes' or 'No'."
        sep = "\n"
        prompt_inputs = tokenizer(prompt,
                                  return_tensors=None,
                                  add_special_tokens=False)['input_ids']
        sep_inputs = tokenizer(sep,
                               return_tensors=None,
                               add_special_tokens=False)['input_ids']
        inputs = []
        for query, passage in pairs:
            query_inputs = tokenizer(f'A: {query}',
                                     return_tensors=None,
                                     add_special_tokens=False,
                                     max_length=max_length * 3 // 4,
                                     truncation=True)
            passage_inputs = tokenizer(f'B: {passage}',
                                       return_tensors=None,
                                       add_special_tokens=False,
                                       max_length=max_length,
                                       truncation=True)
            item = tokenizer.prepare_for_model(
                [tokenizer.bos_token_id] + query_inputs['input_ids'],
                sep_inputs + passage_inputs['input_ids'],
                truncation='only_second',
                max_length=max_length,
                padding=False,
                return_attention_mask=False,
                return_token_type_ids=False,
                add_special_tokens=False
            )
            item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs
            item['attention_mask'] = [1] * len(item['input_ids'])
            inputs.append(item)
        return tokenizer.pad(
                inputs,
                padding=True,
                max_length=max_length + len(sep_inputs) + len(prompt_inputs),
                pad_to_multiple_of=8,
                return_tensors='pt',
        )
    
    tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-minicpm-layerwise', trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2-minicpm-layerwise', trust_remote_code=True, torch_dtype=torch.bfloat16)
    model = model.to('cuda')
    model.eval()
    
    pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
    with torch.no_grad():
        inputs = get_inputs(pairs, tokenizer).to(model.device)
        all_scores = model(**inputs, return_dict=True, cutoff_layers=[28])
        all_scores = [scores[:, -1].view(-1, ).float() for scores in all_scores[0]]
        print(all_scores)
    

[](#fine-tune)Fine-tune
-----------------------

### [](#data-format)Data Format

Train data should be a json file, where each line is a dict like this:

    {"query": str, "pos": List[str], "neg":List[str], "prompt": str}
    

`query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts, `prompt` indicates the relationship between query and texts. If you have no negative texts for a query, you can random sample some from the entire corpus as the negatives.

See [toy\_finetune\_data.jsonl](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_reranker/toy_finetune_data.jsonl) for a toy data file.

### [](#train)Train

You can fine-tune the reranker with the following code:

**For llm-based reranker**

    torchrun --nproc_per_node {number of gpus} \
    -m FlagEmbedding.llm_reranker.finetune_for_instruction.run \
    --output_dir {path to save model} \
    --model_name_or_path google/gemma-2b \
    --train_data ./toy_finetune_data.jsonl \
    --learning_rate 2e-4 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --dataloader_drop_last True \
    --query_max_len 512 \
    --passage_max_len 512 \
    --train_group_size 16 \
    --logging_steps 1 \
    --save_steps 2000 \
    --save_total_limit 50 \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing \
    --deepspeed stage1.json \
    --warmup_ratio 0.1 \
    --bf16 \
    --use_lora True \
    --lora_rank 32 \
    --lora_alpha 64 \
    --use_flash_attn True \
    --target_modules q_proj k_proj v_proj o_proj
    

**For llm-based layerwise reranker**

    torchrun --nproc_per_node {number of gpus} \
    -m FlagEmbedding.llm_reranker.finetune_for_layerwise.run \
    --output_dir {path to save model} \
    --model_name_or_path openbmb/MiniCPM-2B-dpo-bf16 \
    --train_data ./toy_finetune_data.jsonl \
    --learning_rate 2e-4 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --dataloader_drop_last True \
    --query_max_len 512 \
    --passage_max_len 512 \
    --train_group_size 16 \
    --logging_steps 1 \
    --save_steps 2000 \
    --save_total_limit 50 \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing \
    --deepspeed stage1.json \
    --warmup_ratio 0.1 \
    --bf16 \
    --use_lora True \
    --lora_rank 32 \
    --lora_alpha 64 \
    --use_flash_attn True \
    --target_modules q_proj k_proj v_proj o_proj \
    --start_layer 8 \
    --head_multi True \
    --head_type simple \
    --lora_extra_parameters linear_head
    

Our rerankers are initialized from [google/gemma-2b](https://huggingface.co/google/gemma-2b) (for llm-based reranker) and [openbmb/MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16) (for llm-based layerwise reranker), and we train it on a mixture of multilingual datasets:

*   [bge-m3-data](https://huggingface.co/datasets/Shitao/bge-m3-data)
*   [quora train data](https://huggingface.co/datasets/quora)
*   [fever train data](https://fever.ai/dataset/fever.html)

[](#evaluation)Evaluation
-------------------------

*   llama-index.

[![image-20240317193909373](/BAAI/bge-reranker-v2-m3/resolve/main/assets/llama-index.png)](/BAAI/bge-reranker-v2-m3/blob/main/assets/llama-index.png)

*   BEIR.

rereank the top 100 results from bge-en-v1.5 large.

[![image-20240317174633333](/BAAI/bge-reranker-v2-m3/resolve/main/assets/BEIR-bge-en-v1.5.png)](/BAAI/bge-reranker-v2-m3/blob/main/assets/BEIR-bge-en-v1.5.png)

rereank the top 100 results from e5 mistral 7b instruct.

[![image-20240317172949713](/BAAI/bge-reranker-v2-m3/resolve/main/assets/BEIR-e5-mistral.png)](/BAAI/bge-reranker-v2-m3/blob/main/assets/BEIR-e5-mistral.png)

*   CMTEB-retrieval.  
    It rereank the top 100 results from bge-zh-v1.5 large.

[![image-20240317173026235](/BAAI/bge-reranker-v2-m3/resolve/main/assets/CMTEB-retrieval-bge-zh-v1.5.png)](/BAAI/bge-reranker-v2-m3/blob/main/assets/CMTEB-retrieval-bge-zh-v1.5.png)

*   miracl (multi-language).  
    It rereank the top 100 results from bge-m3.

[![image-20240317173117639](/BAAI/bge-reranker-v2-m3/resolve/main/assets/miracl-bge-m3.png)](/BAAI/bge-reranker-v2-m3/blob/main/assets/miracl-bge-m3.png)

[](#citation)Citation
---------------------

If you find this repository useful, please consider giving a star and citation

    @misc{li2023making,
          title={Making Large Language Models A Better Foundation For Dense Retrieval}, 
          author={Chaofan Li and Zheng Liu and Shitao Xiao and Yingxia Shao},
          year={2023},
          eprint={2312.15503},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }
    @misc{chen2024bge,
          title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation}, 
          author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
          year={2024},
          eprint={2402.03216},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }

## Model overview

The `bge-reranker-v2-m3` model is a lightweight reranker model from BAAI that possesses strong multilingual capabilities. It is built on top of the `bge-m3` base model, which is a versatile AI model that can simultaneously perform dense retrieval, multi-vector retrieval, and sparse retrieval. The `bge-reranker-v2-m3` model is easy to deploy and provides fast inference, making it suitable for a variety of multilingual contexts.

## Model inputs and outputs

The `bge-reranker-v2-m3` model takes as input a query and a passage, and outputs a relevance score that indicates how relevant the passage is to the query. The relevance score is not bounded to a specific range, as the model is optimized based on cross-entropy loss. This allows for more fine-grained ranking of passages compared to models that output similarity scores bounded between 0 and 1.

### Inputs
- **Query**: The text of the query to be evaluated
- **Passage**: The text of the passage to be evaluated for relevance to the query

### Outputs
- **Relevance score**: A float value representing the relevance of the passage to the query, with higher scores indicating more relevance.

## Capabilities

The `bge-reranker-v2-m3` model is designed to be a powerful and efficient reranker for multilingual contexts. It can be used to rerank the top-k documents retrieved by an embedding model, such as the `bge-m3` model, to further improve the relevance of the final results.

## What can I use it for?

The `bge-reranker-v2-m3` model is well-suited for a variety of multilingual information retrieval and question-answering tasks. It can be used to rerank results from a search engine, to filter and sort documents for research or analysis, or to improve the relevance of responses in a multilingual chatbot or virtual assistant. Its fast inference and strong multilingual capabilities make it a versatile tool for building language-agnostic applications.

## Things to try

One interesting aspect of the `bge-reranker-v2-m3` model is its ability to output relevance scores that are not bounded between 0 and 1. This allows for more nuanced ranking of passages, which could be particularly useful in applications where small differences in relevance are important. Developers could experiment with using these unbounded scores to improve the precision of their retrieval systems, or to surface more contextually relevant information to users.

Another interesting thing to try would be to combine the `bge-reranker-v2-m3` model with the `bge-m3` model in a hybrid retrieval pipeline. By using the `bge-m3` model for initial dense retrieval and the `bge-reranker-v2-m3` model for reranking, you could potentially achieve higher accuracy and better performance across a range of multilingual use cases.