Maintainer: fnlp

Total Score


Last updated 5/19/2024


Model LinkView on HuggingFace
API SpecView on HuggingFace
Github LinkNo Github link provided
Paper LinkNo paper link provided

Get summaries of the top AI models delivered straight to your inbox:

Model overview

The moss-moon-003-sft model is a conversational language model developed by fnlp. It was initialized with the CodeGen model and further pre-trained on 100B Chinese tokens and 20B English tokens, seeing a total of 700B tokens during pre-training. The model was then fine-tuned on ~1.1M multi-turn conversational data, allowing it to follow instructions in multi-turn dialogues and refuse inappropriate requests.

The moss-moon-003-sft model is part of a family of MOSS models, which also includes the moss-moon-003-base base model, the moss-moon-003-sft-plugin model fine-tuned on plugin-augmented data, and various quantized versions of these models (e.g. moss-moon-003-sft-int4, moss-moon-003-sft-int8). The final MOSS-003 model, which demonstrated better factuality, safety, and more stable response quality, will be open-sourced in the near future.

Model inputs and outputs


  • Text: The moss-moon-003-sft model takes text input in the form of prompts or dialogue history. It can handle both English and Chinese text.


  • Text: The model generates text responses in the language specified by the user, which can be either English or Chinese.


The moss-moon-003-sft model has been trained to be helpful, honest, and harmless. It can understand and communicate fluently in both English and Chinese, and perform a wide range of language-based tasks. The model can follow instructions in multi-turn dialogues, refuse inappropriate requests, and provide additional relevant details to answer questions in-depth and comprehensively.

Some example use cases for the moss-moon-003-sft model include:

  • Web search: The model can be used to search the web and provide summaries of the results, as shown in the example.
  • Math and coding: The model can solve simple math problems and write basic code, as demonstrated in the examples and examples.
  • Text-to-image generation: The model can use text-to-image plugins to generate images based on user descriptions, as shown in the example.
  • Chinese language tasks: The model has strong Chinese language capabilities, as evidenced by the examples, examples, and examples.
  • Harmlessness: The model has been trained to refuse requests for harmful or unethical actions, as shown in the example.

What can I use it for?

The moss-moon-003-sft model can be used in a variety of applications that require natural language processing and generation, such as:

  • Chatbots and virtual assistants: The model's ability to engage in multi-turn dialogues and understand both English and Chinese makes it a suitable choice for building chatbots and virtual assistants.
  • Content generation: The model can be used to generate text content, such as articles, stories, or product descriptions, in both English and Chinese.
  • Code generation: The model's capability to write basic code can be leveraged for tasks like automated programming, code completion, or code generation.
  • Multilingual translation: While the model is not specifically designed for translation, its understanding of both English and Chinese can be used for rudimentary translation between the two languages.

Things to try

One interesting aspect of the moss-moon-003-sft model is its ability to refuse inappropriate requests. This feature can be useful in building safe and ethical AI systems that prioritize user wellbeing and avoid causing harm. Developers can experiment with the model's response to different types of prompts, both benign and potentially harmful, to better understand its safety and alignment capabilities.

Another interesting aspect is the model's strong performance on Chinese language tasks. Developers working on applications targeting Chinese-speaking users can explore the model's capabilities in areas like content generation, question answering, and language understanding for the Chinese language.

Finally, the availability of quantized versions of the moss-moon-003-sft model (e.g. moss-moon-003-sft-int4, moss-moon-003-sft-int8) presents an opportunity to experiment with deploying the model on hardware with limited memory resources, such as edge devices or mobile phones. Developers can test the performance and quality trade-offs of these quantized models to find the best fit for their specific use cases.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models




Total Score


moss-moon-003-sft-plugin is a 16B parameter conversational language model developed by the FNLP team. It was initialized with CodeGen and further pre-trained on 100B Chinese tokens and 20B English tokens. This model was then fine-tuned on 1.1M multi-turn conversational data and an additional 300K plugin-augmented data, enabling it to use several external tools including a search engine, text-to-image generation, calculator, and equation solver. The moss-moon-003-sft model was fine-tuned on the same ~1.1M conversational data, but without the plugin-augmented data. It can follow instructions in multi-turn dialogues and refuse inappropriate requests. Model Inputs and Outputs Inputs Text prompts in either English or Chinese Outputs Responses generated in text form, which can include the results of using the integrated plugins Capabilities The moss-moon-003-sft-plugin model demonstrates strong conversational and task-completion abilities. It can engage in multi-turn dialogues, answer questions, provide explanations, and even generate code snippets. Importantly, the model has also been trained to be helpful, honest, and harmless, refusing to engage in unsafe or unethical requests. The plugin integration allows the model to leverage external tools to enhance its capabilities. For example, it can use the search engine plugin to find relevant information, the text-to-image plugin to generate visual outputs, and the calculator and equation solver plugins to perform mathematical computations. What can I use it for? The moss-moon-003-sft-plugin model can be used in a variety of applications that require a capable and trustworthy conversational AI assistant. Some potential use cases include: Building interactive chatbots and virtual assistants for customer service, education, or entertainment Developing AI-powered productivity tools that can help users with tasks like research, analysis, and problem-solving Integrating the model into creative applications, allowing users to generate text, images, and other media Deploying the model in language learning applications, where it can engage students in conversations and provide feedback Things to try One interesting aspect of the moss-moon-003-sft-plugin model is its ability to use external plugins to enhance its capabilities. Try prompting the model to perform tasks that require the use of these plugins, such as: "Use the search engine to find information about the history of the Forbidden City in China, and then summarize the key facts." "Calculate the area of a circle with a radius of 5 meters." "Generate an image of a futuristic city skyline." Observe how the model seamlessly integrates the plugin results into its responses, demonstrating its versatility and problem-solving skills.

Read more

Updated Invalid Date




Total Score


moss-moon-003-base is the base language model of the MOSS-003 series, which was initialized with CodeGen and further pre-trained on 100B Chinese tokens and 20B English tokens. The model has seen 700B tokens during pre-training and consumed ~6.67x1022 FLOPs in total. MOSS-003 models are designed to be helpful, honest, and harmless conversational AI assistants that can communicate fluently in multiple languages and perform a variety of language-based tasks. Model inputs and outputs MOSS-003 models take in text inputs and generate text outputs. The input format expects a conversation-style structure with markers for the human user, the model's internal thoughts, commands to external tools, and the model's final response. The output is the model's generated response. Inputs Conversational prompts**: Multi-turn dialogue prompts with markers for the human, the model's internal thoughts, commands to external tools, and the model's previous response. Outputs Text response**: The model's generated response to the input prompt, which can include the use of external tools like search engines, calculators, and text-to-image generators. Capabilities The moss-moon-003-base model and its variants demonstrate strong capabilities in language understanding and generation across a wide range of tasks, including open-ended conversation, question answering, coding, and using external tools. The model can communicate fluently in both English and Chinese, and has been trained to be helpful, honest, and harmless in its responses. What can I use it for? You can use the MOSS-003 models for a variety of language-based applications, such as building chatbots, virtual assistants, and language learning tools. The models' ability to use external tools makes them particularly useful for applications that require accessing information on the web, performing calculations, or generating images. However, it's important to note that the model's outputs may not always be fully accurate or reliable, so you should carefully review and fact-check any information generated by the model before relying on it. Things to try One interesting aspect of the MOSS-003 models is their ability to use external tools and plugins to enhance their capabilities. You could try prompting the model to use the search engine, calculator, or text-to-image features and see how it handles more complex, multi-step tasks that require accessing external information or generating new content. Additionally, you could explore the model's ability to handle open-ended conversation, follow instructions, and refuse inappropriate requests - all key features for a helpful and trustworthy AI assistant.

Read more

Updated Invalid Date




Total Score


The weblab-10b-instruction-sft is a Japanese-centric multilingual GPT-NeoX model with 10 billion parameters. Trained using code based on EleutherAI/gpt-neox, it has a 36-layer, 4864-hidden-size transformer architecture. The model was pre-trained on around 600B tokens from a mixture of the Japanese C4 and The Pile datasets. It was then finetuned on a subset of records from datasets like Alpaca (English), Alpaca (Japanese translation), and others to serve as an instruction-following conversational agent. This model can be contrasted with the japanese-gpt-neox-3.6b-instruction-sft model, which is a 3.6 billion parameter Japanese GPT-NeoX model that has also been finetuned for instruction following. The key differences are the larger parameter size and broader pre-training dataset of the weblab-10b-instruction-sft model. Model inputs and outputs Inputs Text prompts**: The model takes in text prompts, which can include multi-turn conversations or instructions for the model to follow. Outputs Generated text**: The model outputs generated text that continues or responds to the provided prompt. This can include generating coherent, contextual responses to instructions or conversational prompts. Capabilities The weblab-10b-instruction-sft model can be used for a variety of language generation and understanding tasks, particularly ones involving Japanese. It demonstrates strong performance on the JGLUE 8-task evaluation, achieving high accuracy on tasks like JCommonsenseQA, JNLI, and MARC-ja. The model's large size and broad training data allow it to generate fluent, contextual responses to open-ended prompts, making it suitable for applications like chatbots and language assistants. What can I use it for? The weblab-10b-instruction-sft model could be a good starting point for building Japanese-language chatbots, virtual assistants, or other applications that require fluent text generation and language understanding. Its multilingual capabilities also allow it to potentially be used for cross-lingual applications. However, as with any large language model, it's important to carefully curate and filter the model's outputs to ensure safety and mitigate potential biases or inaccuracies. Things to try One interesting aspect of the weblab-10b-instruction-sft model is its ability to follow instructions and engage in open-ended dialogue. Prompts that involve multi-turn conversations or provide specific tasks or objectives for the model to complete could be a productive area to explore, leveraging the model's strong performance on the JGLUE benchmarks. Experimenting with different prompting techniques and finetuning approaches may also help unlock the model's full potential for downstream applications.

Read more

Updated Invalid Date




Total Score


Nous-Hermes-2-Mixtral-8x7B-SFT is a state-of-the-art language model fine-tuned by NousResearch on over 1 million entries of high-quality data, primarily from GPT-4 generated content. This model was trained on top of the Mixtral 8x7B MoE LLM, achieving state-of-the-art performance on a variety of tasks. The model is available in both an SFT-only version (Nous-Hermes-2-Mixtral-8x7B-SFT) as well as an SFT+DPO version (Nous-Hermes-2-Mixtral-8x7B-DPO), allowing users to experiment and find the best fit for their needs. The SFT+DPO model further improves performance through the use of Diffusion Prompt Optimization. Model Inputs and Outputs Inputs Text prompt**: The model accepts text prompts as input and generates relevant, coherent responses. Outputs Textual output**: The model generates human-like text outputs, ranging from creative writing to task-oriented responses. Capabilities The Nous-Hermes-2-Mixtral-8x7B-SFT model has demonstrated strong performance across a variety of benchmarks, including GPT4All, AGIEval, and BigBench. It outperforms the base Mixtral model as well as the Mixtral Finetune by MistralAI in many areas. For example, the model achieves state-of-the-art results on tasks like ARC-challenge, ARC-easy, Hellaswag, and OpenBookQA. The model's capabilities span a wide range of applications, from writing code for data visualization to generating cyberpunk psychedelic poems. It can also perform useful tasks like backtranslation to create prompts from input text. What Can I Use It For? The Nous-Hermes-2-Mixtral-8x7B-SFT model is suitable for a variety of language-related tasks, including: Content Generation**: Create engaging and coherent text for creative writing, storytelling, and content creation. Task Completion**: Provide step-by-step instructions and solutions for complex tasks, such as software development, data analysis, and more. Question Answering**: Answer a wide range of questions by drawing upon the model's broad knowledge base. Summarization**: Condense lengthy text into concise, informative summaries. Translation**: Perform high-quality translation between languages. Things to Try One interesting aspect of the Nous-Hermes-2-Mixtral-8x7B-SFT model is its use of the ChatML prompt format, which enables more structured and interactive multi-turn dialogues with the model. By utilizing system prompts, users can steer the model's behavior and guide it to adopt specific roles, rules, and stylistic choices. Another fascinating capability of the model is its ability to generate long-form, coherent responses. This can be useful for tasks that require in-depth explanation, analysis, or storytelling. Additionally, the availability of quantized versions of the model, such as the GGUF and GPTQ variants, makes the Nous-Hermes-2-Mixtral-8x7B-SFT more accessible and deployable on a wider range of hardware configurations.

Read more

Updated Invalid Date