Maritaca-ai

Models by this creator

🌀

sabia-7b

maritaca-ai

Total Score

79

sabia-7b is a Portuguese language model developed by Maritaca AI. It is an auto-regressive language model that uses the same architecture as LLaMA-1-7B and the same tokenizer. The model was pretrained on 7 billion tokens from the Portuguese subset of ClueWeb22, starting with the weights of LLaMA-1-7B and further trained for an additional 10 billion tokens. Compared to similar models like Sensei-7B-V1, sabia-7b is tailored specifically for the Portuguese language. Model inputs and outputs sabia-7b is a text-to-text model, accepting only text input and generating text output. The model has a maximum sequence length of 2048 tokens. Inputs Text**: The model accepts natural language text as input. Outputs Text**: The model generates natural language text as output. Capabilities sabia-7b is capable of performing a variety of natural language processing tasks in Portuguese, such as text generation, translation, and language understanding. Due to its large training dataset and robust architecture, the model can generate high-quality, coherent Portuguese text across a range of topics and styles. What can I use it for? sabia-7b can be a valuable tool for developers and researchers working on Portuguese language applications, such as chatbots, content generation, and language understanding. The model can be fine-tuned or used in a few-shot manner for specific tasks, like the example provided in the model description. Things to try One interesting aspect of sabia-7b is its ability to effectively utilize the LLaMA-1-7B architecture and tokenizer, which were originally designed for English, and adapt them to the Portuguese language. This suggests the model may have strong cross-lingual transfer capabilities, potentially allowing it to be fine-tuned or used in a few-shot manner for tasks involving multiple languages.

Read more

Updated 5/21/2024