Ai4bharat

Rank:

Average Model Cost: $0.0000

Number of Runs: 29,668

Models by this creator

IndicNER

IndicNER

ai4bharat

IndicNER is a model that performs named entity recognition (NER) on texts written in Indic languages. NER is the task of identifying and classifying named entities such as names, locations, organizations, etc. This model is trained to recognize named entities in Indic text and classify them into predefined categories. It can be used to extract useful information from unstructured text data in Indic languages.

Read more

$-/run

15.8K

Huggingface

indic-bert

indic-bert

IndicBERT is a multilingual ALBERT model pretrained exclusively on 12 major Indian languages. It is pre-trained on our novel monolingual corpus of around 9 billion tokens and subsequently evaluated on a set of diverse tasks. IndicBERT has much fewer parameters than other multilingual models (mBERT, XLM-R etc.) while it also achieves a performance on-par or better than these models. The 12 languages covered by IndicBERT are: Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu. The code can be found here. For more information, checkout our project page or our paper. We pre-trained indic-bert on AI4Bharat's monolingual corpus. The corpus has the following distribution of languages: IndicBERT is evaluated on IndicGLUE and some additional tasks. The results are summarized below. For more details about the tasks, refer our official repo * Note: all models have been restricted to a max_seq_length of 128. The model can be downloaded here. Both tf checkpoints and pytorch binaries are included in the archive. Alternatively, you can also download it from Huggingface. If you are using any of the resources, please cite the following article: We would like to hear from you if: The IndicBERT code (and models) are released under the MIT License. This work is the outcome of a volunteer effort as part of AI4Bharat initiative.

Read more

$-/run

4.9K

Huggingface

IndicBART

IndicBART

IndicBART is a multilingual, sequence-to-sequence pre-trained model focusing on Indic languages and English. It currently supports 11 Indian languages and is based on the mBART architecture. You can use IndicBART model to build natural language generation applications for Indian languages by finetuning the model with supervised training data for tasks like machine translation, summarization, question generation, etc. Some salient features of the IndicBART are: You can read more about IndicBART in this paper. For detailed documentation, look here: https://github.com/AI4Bharat/indic-bart/ and https://indicnlp.ai4bharat.org/indic-bart/ We used the IndicCorp data spanning 12 languages with 452 million sentences (9 billion tokens). The model was trained using the text-infilling objective used in mBART. If you use IndicBART, please cite the following paper: The model is available under the MIT License.

Read more

$-/run

684

Huggingface

indicwav2vec-hindi

indicwav2vec-hindi

IndicWav2Vec-Hindi This is a Wav2Vec2 style ASR model trained in fairseq and ported to Hugging Face. More details on datasets, training-setup and conversion to HuggingFace format can be found in the IndicWav2Vec repo.Note: This model doesn't support inference with Language Model. Script to Run Inference About AI4Bharat Website: https://ai4bharat.org/ Code: https://github.com/AI4Bharat HuggingFace: https://huggingface.co/ai4bharat

Read more

$-/run

415

Huggingface

Similar creators