Nvidia

Models by this creator

↗️

Nemotron-4-340B-Instruct

nvidia

Total Score

476

The Nemotron-4-340B-Instruct is a large language model (LLM) developed by NVIDIA. It is a fine-tuned version of the Nemotron-4-340B-Base model, optimized for English-based single and multi-turn chat use-cases. The model has 340 billion parameters and supports a context length of 4,096 tokens. The Nemotron-4-340B-Instruct model was trained on a diverse corpus of 9 trillion tokens, including English-based texts, 50+ natural languages, and 40+ coding languages. It then went through additional alignment steps, including supervised fine-tuning (SFT), direct preference optimization (DPO), and reward-aware preference optimization (RPO), using approximately 20K human-annotated data. This results in a model that is aligned for human chat preferences, improvements in mathematical reasoning, coding, and instruction-following, and is capable of generating high quality synthetic data for a variety of use cases. Model Inputs and Outputs Inputs Text**: The Nemotron-4-340B-Instruct model takes natural language text as input, typically in the form of prompts or conversational exchanges. Outputs Text**: The model generates natural language text as output, which can include responses to prompts, continuations of conversations, or synthetic data. Capabilities The Nemotron-4-340B-Instruct model can be used for a variety of natural language processing tasks, including: Chat and Conversation**: The model is optimized for English-based single and multi-turn chat use-cases, and can engage in coherent and helpful conversations. Instruction-Following**: The model can understand and follow instructions, making it useful for task-oriented applications. Mathematical Reasoning**: The model has improved capabilities in mathematical reasoning, which can be useful for educational or analytical applications. Code Generation**: The model's training on coding languages allows it to generate high-quality code, making it suitable for developer assistance or programming-related tasks. Synthetic Data Generation**: The model's alignment and optimization process makes it well-suited for generating high-quality synthetic data, which can be used to train other language models. What Can I Use It For? The Nemotron-4-340B-Instruct model can be used for a wide range of applications, particularly those that require natural language understanding, generation, and task-oriented capabilities. Some potential use cases include: Chatbots and Virtual Assistants**: The model can be used to build conversational AI agents that can engage in helpful and coherent dialogues. Educational and Tutoring Applications**: The model's capabilities in mathematical reasoning and instruction-following can be leveraged to create educational tools and virtual tutors. Developer Assistance**: The model's ability to generate high-quality code can be used to build tools that assist software developers with programming-related tasks. Synthetic Data Generation**: Companies and researchers can use the model to generate high-quality synthetic data for training their own language models, as described in the technical report. Things to Try One interesting aspect of the Nemotron-4-340B-Instruct model is its ability to follow instructions and engage in task-oriented dialogue. You could try prompting the model with open-ended questions or requests, and observe how it responds and adapts to the task at hand. For example, you could ask the model to write a short story, solve a math problem, or provide step-by-step instructions for a particular task, and see how it performs. Another interesting area to explore would be the model's capabilities in generating synthetic data. You could experiment with different prompts or techniques to guide the model's data generation, and then assess the quality and usefulness of the generated samples for training your own language models.

Read more

Updated 6/20/2024

🧠

Llama3-ChatQA-1.5-8B

nvidia

Total Score

475

The Llama3-ChatQA-1.5-8B model is a large language model developed by NVIDIA that excels at conversational question answering (QA) and retrieval-augmented generation (RAG). It was built on top of the Llama-3 base model and incorporates more conversational QA data to enhance its tabular and arithmetic calculation capabilities. There is also a larger 70B parameter version available. Model inputs and outputs Inputs Text**: The model accepts text input to engage in conversational question answering and generation tasks. Outputs Text**: The model outputs generated text responses, providing answers to questions and generating relevant information. Capabilities The Llama3-ChatQA-1.5-8B model demonstrates strong performance on a variety of conversational QA and RAG benchmarks, outperforming models like ChatQA-1.0-7B, Llama-3-instruct-70b, and GPT-4-0613. It excels at tasks like document-grounded dialogue, multi-turn question answering, and open-ended conversational QA. What can I use it for? The Llama3-ChatQA-1.5-8B model is well-suited for building conversational AI assistants, chatbots, and other applications that require natural language understanding and generation capabilities. It could be used to power customer service chatbots, virtual assistants, educational tools, and more. The model's strong performance on QA and RAG tasks make it a valuable resource for researchers and developers working on conversational AI systems. Things to try One interesting aspect of the Llama3-ChatQA-1.5-8B model is its ability to handle tabular and arithmetic calculation tasks, which can be useful for applications that require quantitative reasoning. Developers could explore using the model to power conversational interfaces for data analysis, financial planning, or other domains that involve numerical information. Another interesting area to explore would be the model's performance on multi-turn dialogues and its ability to maintain context and coherence over the course of a conversation. Developers could experiment with using the model for open-ended chatting, task-oriented dialogues, or other interactive scenarios to further understand its conversational capabilities.

Read more

Updated 6/1/2024

👀

Llama3-ChatQA-1.5-70B

nvidia

Total Score

274

The Llama3-ChatQA-1.5-70B model is a large language model developed by NVIDIA that excels at conversational question answering (QA) and retrieval-augmented generation (RAG). It is built on top of the Llama-3 base model and incorporates more conversational QA data to enhance its tabular and arithmetic calculation capability. The model comes in two variants: Llama3-ChatQA-1.5-8B and Llama3-ChatQA-1.5-70B. Both models were originally trained using Megatron-LM and then converted to the Hugging Face format. Model Inputs and Outputs Inputs Text**: The model takes text as input, which can be in the form of a conversation or a question. Outputs Text**: The model generates text as output, providing answers to questions or continuing a conversation. Capabilities The Llama3-ChatQA-1.5-70B model excels at conversational question answering and retrieval-augmented generation tasks. It has demonstrated strong performance on benchmarks such as ConvRAG, QuAC, QReCC, and ConvFinQA, outperforming other models like ChatQA-1.0-7B, Command-R-Plus, and Llama-3-instruct-70b. What can I use it for? The Llama3-ChatQA-1.5-70B model can be used for a variety of applications that involve question answering and conversational abilities, such as: Building intelligent chatbots or virtual assistants Enhancing search engines with more advanced query understanding and response generation Developing educational tools and tutoring systems Automating customer service and support interactions Assisting in research and analysis tasks by providing relevant information and insights Things to try One interesting aspect of the Llama3-ChatQA-1.5-70B model is its ability to handle tabular and arithmetic calculations as part of its conversational QA capabilities. You could try prompting the model with questions that involve numerical data or complex reasoning, and observe how it responds. Additionally, the model's retrieval-augmented generation capabilities allow it to provide responses that are grounded in relevant information, which can be useful for tasks that require fact-based answers.

Read more

Updated 6/1/2024

🎲

GPT-2B-001

nvidia

Total Score

191

GPT-2B-001 is a transformer-based language model developed by NVIDIA. It is part of the GPT family of models, similar to GPT-2 and GPT-3, with a total of 2 billion trainable parameters. The model was trained on 1.1 trillion tokens using NVIDIA's NeMo toolkit. Compared to similar models like gemma-2b-it, prometheus-13b-v1.0, and bge-reranker-base, GPT-2B-001 features several architectural improvements, including the use of the SwiGLU activation function, rotary positional embeddings, and a longer maximum sequence length of 4,096. Model inputs and outputs Inputs Text prompts of variable length, up to a maximum of 4,096 tokens. Outputs Continuation of the input text, generated in an autoregressive manner. The model can be used for a variety of text-to-text tasks, such as language modeling, text generation, and question answering. Capabilities GPT-2B-001 is a powerful language model capable of generating human-like text on a wide range of topics. It can be used for tasks such as creative writing, summarization, and even code generation. The model's large size and robust training process allow it to capture complex linguistic patterns and produce coherent, contextually relevant output. What can I use it for? GPT-2B-001 can be used for a variety of natural language processing tasks, including: Content generation**: The model can be used to generate articles, stories, dialogue, and other forms of text. This can be useful for writers, content creators, and marketers. Question answering**: The model can be fine-tuned to answer questions on a wide range of topics, making it useful for building conversational agents and knowledge-based applications. Summarization**: The model can be used to generate concise summaries of longer text, which can be helpful for researchers, students, and business professionals. Code generation**: The model can be used to generate code snippets and even complete programs, which can assist developers in their work. Things to try One interesting aspect of GPT-2B-001 is its ability to generate text that is both coherent and creative. Try prompting the model with a simple sentence or phrase and see how it expands upon the idea, generating new and unexpected content. You can also experiment with fine-tuning the model on specific datasets to see how it performs on more specialized tasks. Another fascinating area to explore is the model's capability for reasoning and logical inference. Try presenting the model with prompts that require deductive or inductive reasoning, and observe how it approaches the problem and formulates its responses.

Read more

Updated 5/28/2024

👁️

canary-1b

nvidia

Total Score

191

The canary-1b model is a part of the NVIDIA NeMo Canary family of multi-lingual, multi-tasking models. With 1 billion parameters, the Canary-1B model supports automatic speech-to-text recognition (ASR) in 4 languages (English, German, French, Spanish) and translation from English to German/French/Spanish and from German/French/Spanish to English with or without punctuation and capitalization (PnC). The model uses a FastConformer-Transformer encoder-decoder architecture. Model inputs and outputs Inputs Audio files or a jsonl manifest file containing audio data Outputs Transcribed text in the specified language (English, German, French, Spanish) Translated text to/from the specified language pair Capabilities The Canary-1B model demonstrates state-of-the-art performance on multiple benchmarks for ASR and translation tasks in the supported languages. It can handle various accents, background noise, and technical language well. What can I use it for? The canary-1b model is well-suited for research on robust, multi-lingual speech recognition and translation. It can also be fine-tuned on specific datasets to improve performance for particular domains or applications. Developers may find it useful as a pre-trained model for building ASR or translation tools, especially for the supported languages. Things to try You can experiment with the canary-1b model by loading it using the NVIDIA NeMo toolkit. Try transcribing or translating audio samples in different languages, and compare the results to your expectations or other models. You can also fine-tune the model on your own data to see how it performs on specific tasks or domains.

Read more

Updated 5/28/2024

🖼️

segformer-b0-finetuned-ade-512-512

nvidia

Total Score

119

The segformer-b0-finetuned-ade-512-512 model is a version of the SegFormer model fine-tuned on the ADE20k dataset for semantic segmentation. SegFormer is a convolutional neural network architecture that uses a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve strong results on semantic segmentation benchmarks. This particular model was pre-trained on ImageNet-1k and then fine-tuned on the ADE20k dataset at a resolution of 512x512. The SegFormer architecture is similar to the Vision Transformer (ViT) in that it treats an image as a sequence of patches and uses a Transformer encoder to process them. However, SegFormer uses a more efficient hierarchical design and a lightweight decode head, making it simpler and faster than traditional semantic segmentation models. The segformer-b2-clothes model is another example of a SegFormer variant fine-tuned for a specific task, in this case clothes segmentation. Model inputs and outputs Inputs Images**: The model takes in images as its input, which are split into a sequence of fixed-size patches that are then linearly embedded and processed by the Transformer encoder. Outputs Segmentation maps**: The model outputs a segmentation map, where each pixel is assigned a class label corresponding to the semantic category it belongs to (e.g., person, car, building, etc.). The resolution of the output segmentation map is lower than the input image resolution, typically by a factor of 4. Capabilities The segformer-b0-finetuned-ade-512-512 model is capable of performing semantic segmentation, which is the task of assigning a semantic label to each pixel in an image. It can accurately identify and delineate the various objects, scenes, and regions present in an image. This makes it useful for applications like autonomous driving, scene understanding, and image editing. What can I use it for? This SegFormer model can be used for a variety of semantic segmentation tasks, such as: Autonomous Driving**: Identify and segment different objects on the road (cars, pedestrians, traffic signs, etc.) to enable self-driving capabilities. Scene Understanding**: Understand the composition of a scene by segmenting it into different semantic regions (sky, buildings, vegetation, etc.), which can be useful for applications like robotics and augmented reality. Image Editing**: Perform precise segmentation of objects in an image, allowing for selective editing, masking, and manipulation of specific elements. The model hub provides access to a range of SegFormer models fine-tuned on different datasets, so you can explore options that best suit your specific use case. Things to try One interesting aspect of the SegFormer architecture is its hierarchical Transformer encoder, which allows it to capture features at multiple scales. This enables the model to understand the context and relationships between different semantic elements in an image, leading to more accurate and detailed segmentation. To see this in action, you could try using the segformer-b0-finetuned-ade-512-512 model on a diverse set of images, ranging from indoor scenes to outdoor landscapes. Observe how the model is able to segment the various objects, textures, and regions in the images, and how the segmentation maps evolve as you move up the hierarchy of the Transformer encoder.

Read more

Updated 5/28/2024

📶

Nemotron-4-340B-Base

nvidia

Total Score

111

Nemotron-4-340B-Base is a large language model (LLM) developed by NVIDIA that can be used as part of a synthetic data generation pipeline. With 340 billion parameters and support for a context length of 4,096 tokens, this multilingual model was pre-trained on a diverse dataset of over 50 natural languages and 40 coding languages. After an initial pre-training phase of 8 trillion tokens, the model underwent continuous pre-training on an additional 1 trillion tokens to improve quality. Similar models include the Nemotron-3-8B-Base-4k, a smaller enterprise-ready 8 billion parameter model, and the GPT-2B-001, a 2 billion parameter multilingual model with architectural improvements. Model Inputs and Outputs Nemotron-4-340B-Base is a powerful text generation model that can be used for a variety of natural language tasks. The model accepts textual inputs and generates corresponding text outputs. Inputs Textual prompts in over 50 natural languages and 40 coding languages Outputs Coherent, contextually relevant text continuations based on the input prompts Capabilities Nemotron-4-340B-Base excels at a range of natural language tasks, including text generation, translation, code generation, and more. The model's large scale and broad multilingual capabilities make it a versatile tool for researchers and developers looking to build advanced language AI applications. What Can I Use It For? Nemotron-4-340B-Base is well-suited for use cases that require high-quality, diverse language generation, such as: Synthetic data generation for training custom language models Multilingual chatbots and virtual assistants Automated content creation for websites, blogs, and social media Code generation and programming assistants By leveraging the NVIDIA NeMo Framework and tools like Parameter-Efficient Fine-Tuning and Model Alignment, users can further customize Nemotron-4-340B-Base to their specific needs. Things to Try One interesting aspect of Nemotron-4-340B-Base is its ability to generate text in a wide range of languages. Try prompting the model with inputs in different languages and observe the quality and coherence of the generated outputs. You can also experiment with combining the model's multilingual capabilities with tasks like translation or cross-lingual information retrieval. Another area worth exploring is the model's potential for synthetic data generation. By fine-tuning Nemotron-4-340B-Base on specific datasets or domains, you can create custom language models tailored to your needs, while leveraging the broad knowledge and capabilities of the base model.

Read more

Updated 6/20/2024

🛸

parakeet-rnnt-1.1b

nvidia

Total Score

98

The parakeet-rnnt-1.1b is an ASR (Automatic Speech Recognition) model developed jointly by the NVIDIA NeMo and Suno.ai teams. It uses the FastConformer Transducer architecture, which is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. This XXL model has around 1.1 billion parameters and can transcribe speech in lower case English alphabet with high accuracy. The model is similar to other high-performing ASR models like Canary-1B, which also uses the FastConformer architecture but supports multiple languages. In contrast, the parakeet-rnnt-1.1b is focused solely on English speech transcription. Model Inputs and Outputs Inputs 16000 Hz mono-channel audio (WAV files) Outputs Transcribed speech as a string for a given audio sample Capabilities The parakeet-rnnt-1.1b model demonstrates state-of-the-art performance on English speech recognition tasks. It was trained on a large, diverse dataset of 85,000 hours of speech data from various public and private sources, including LibriSpeech, Fisher Corpus, Switchboard, and more. What Can I Use It For? The parakeet-rnnt-1.1b model is well-suited for a variety of speech-to-text applications, such as voice transcription, dictation, and audio captioning. It could be particularly useful in scenarios where high-accuracy English speech recognition is required, such as in media production, customer service, or educational applications. Things to Try One interesting aspect of the parakeet-rnnt-1.1b model is its ability to handle a wide range of audio inputs, from clear studio recordings to noisier real-world audio. You could experiment with feeding it different types of audio samples and observe how it performs in terms of transcription accuracy and robustness. Additionally, since the model was trained on a large and diverse dataset, you could try fine-tuning it on a more specialized domain or genre of audio to see if you can further improve its performance for your specific use case.

Read more

Updated 5/28/2024

⛏️

Nemotron-4-340B-Reward

nvidia

Total Score

76

The Nemotron-4-340B-Reward is a multi-dimensional reward model developed by NVIDIA. It is based on the larger Nemotron-4-340B-Base model, which is a 340 billion parameter language model trained on a diverse corpus of English and multilingual text, as well as code. The Nemotron-4-340B-Reward model takes a conversation between a user and an assistant, and rates the assistant's responses across five attributes: helpfulness, correctness, coherence, complexity, and verbosity. It outputs a scalar value for each of these attributes, providing a nuanced evaluation of the response quality. This model can be used as part of a synthetic data generation pipeline to create training data for other language models, or as a standalone reward model for reinforcement learning from AI feedback. The model is compatible with the NVIDIA NeMo Framework, which provides tools for customizing and deploying large language models. Similar models in the Nemotron family include the Nemotron-4-340B-Base and Nemotron-3-8B-Base-4k, which are large language models that can be used as foundations for building custom AI applications. Model Inputs and Outputs Inputs A conversation with multiple turns between a user and an assistant Outputs A scalar value (typically between 0 and 4) for each of the following attributes: Helpfulness: Overall helpfulness of the assistant's response to the prompt Correctness: Inclusion of all pertinent facts without errors Coherence: Consistency and clarity of expression Complexity: Intellectual depth required to write the response Verbosity: Amount of detail included in the response, relative to what is asked for in the prompt Capabilities The Nemotron-4-340B-Reward model can be used to evaluate the quality of assistant responses in a nuanced way, providing insights into different aspects of the response. This can be useful for building AI systems that provide helpful and coherent responses, as well as for generating high-quality synthetic training data for other language models. What Can I Use It For? The Nemotron-4-340B-Reward model can be used in a variety of applications that require evaluating the quality of language model outputs. Some potential use cases include: Synthetic Data Generation**: The model can be used as part of a pipeline to generate high-quality training data for other language models, by providing a reward signal to guide the generation process. Reinforcement Learning from AI Feedback (RLAIF)**: The model can be used as a reward model in RLAIF, where a language model is fine-tuned to optimize for the target attributes (helpfulness, correctness, etc.) as defined by the reward model. Reward-Model-as-a-Judge**: The model can be used to evaluate the outputs of other language models, providing a more nuanced assessment than a simple binary pass/fail. Things to Try One interesting aspect of the Nemotron-4-340B-Reward model is its ability to provide a multi-dimensional evaluation of language model outputs. This can be useful for understanding the strengths and weaknesses of different models, and for identifying areas for improvement. For example, you could use the model to evaluate the responses of different language models on a set of prompts, and compare the scores across the different attributes. This could reveal that a model is good at producing coherent and helpful responses, but struggles with providing factually correct information. Armed with this insight, you could then focus on improving the model's knowledge base or fact-checking capabilities. Additionally, you could experiment with using the Nemotron-4-340B-Reward model as part of a reinforcement learning pipeline, where the model's output is used as a reward signal to fine-tune a language model. This could potentially lead to models that are better aligned with human preferences and priorities, as defined by the reward model's attributes.

Read more

Updated 6/20/2024

🤯

parakeet-tdt-1.1b

nvidia

Total Score

61

The parakeet-tdt-1.1b is an ASR (Automatic Speech Recognition) model that transcribes speech in lower case English alphabet. This model is jointly developed by the NVIDIA NeMo and Suno.ai teams. It uses a FastConformer-TDT architecture, which is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. The model has around 1.1 billion parameters. Similar models include the parakeet-rnnt-1.1b, which is also a large ASR model developed by NVIDIA and Suno.ai. It uses a FastConformer Transducer architecture and has similar performance characteristics. Model inputs and outputs Inputs 16000 Hz mono-channel audio (wav files) as input Outputs Transcribed speech as a string for a given audio sample Capabilities The parakeet-tdt-1.1b model is capable of transcribing English speech with high accuracy. It was trained on a large corpus of speech data, including 64K hours of English speech from various public and private datasets. What can I use it for? You can use the parakeet-tdt-1.1b model for a variety of speech-to-text applications, such as transcribing audio recordings, live speech recognition, or integrating it into your own voice-enabled products and services. The model can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset using the NVIDIA NeMo toolkit. Things to try One interesting thing to try with the parakeet-tdt-1.1b model is to experiment with fine-tuning it on a specific domain or dataset. This could help improve the model's performance on your particular use case. You could also try combining the model with other components, such as language models or audio preprocessing modules, to further enhance its capabilities.

Read more

Updated 5/28/2024