Maintainer: LLM360

Total Score


Last updated 5/17/2024


Model LinkView on HuggingFace
API SpecView on HuggingFace
Github LinkNo Github link provided
Paper LinkNo paper link provided

Get summaries of the top AI models delivered straight to your inbox:

Model overview

CrystalCoder is a 7B parameter language model developed by LLM360, a project focused on comprehensive and open-sourced large language models. CrystalCoder is distinctively trained on the SlimPajama and StarCoder datasets, allowing it to excel at balancing natural language processing and coding capabilities. Despite being trained on a smaller dataset of 1.4 trillion tokens compared to LLaMA 2's 2 trillion, CrystalCoder surpasses LLaMA 2 in some challenging English and coding tasks, demonstrating superior performance in benchmarks like MMLU, HumanEval, and MBPP.

Model inputs and outputs

CrystalCoder is a text-to-text model, capable of generating natural language and code based on the provided input prompt. The model supports a wide range of inputs, including natural language instructions, code snippets, and prompts that combine both.


  • Natural language: Prompts in English or other languages for tasks like code generation, summarization, or translation.
  • Code snippets: Partial code that the model can complete or expand upon.
  • Mixed prompts: Prompts that combine natural language and code, such as "Write a function in Python that calculates the factorial of a number."


  • Generated text: The model can produce natural language, such as explanations, summaries, or responses to queries.
  • Generated code: The model can generate code in a variety of programming languages, including but not limited to Python, C++, Java, and JavaScript.


CrystalCoder's key strength is its ability to excel at both natural language processing and coding tasks. For example, the model can generate high-quality code snippets in response to natural language prompts, or provide clear and concise explanations of code. This balance is reflected in its strong performance on benchmarks like MMLU, HumanEval, and MBPP, where it outperforms larger models like LLaMA 2.

What can I use it for?

CrystalCoder can be a powerful tool for a variety of applications, such as:

  • Code generation: Automatically generate code in multiple programming languages based on natural language descriptions or partial code.
  • Code explanation and documentation: Summarize and explain code snippets in natural language.
  • Programming assistance: Assist developers by providing code completion, translation, and debugging suggestions.
  • Educational applications: Help students learn programming concepts by providing explanations and example code.

The model's open-source nature and comprehensive documentation make it accessible for researchers and developers to explore and integrate into their own projects.

Things to try

One interesting aspect of CrystalCoder is its ability to balance natural language and coding tasks. Try using the model to generate code in response to a natural language prompt, and then ask it to explain the generated code. This can help you understand how the model is reasoning about the relationship between language and programming.

You can also experiment with providing the model with partial code snippets and asking it to complete or expand upon them. This can showcase the model's capabilities in terms of code understanding and generation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models




Total Score


Amber is a 7B parameter English language model with the LLaMA architecture, developed by the team at LLM360. While not a state-of-the-art model, Amber is released to make large language model training knowledge accessible to all. It is trained on a diverse dataset including sources like Arxiv, Books, C4, Refined-Web, StarCoder, StackExchange, and Wikipedia, totaling over 1 trillion tokens. Similar models include CrystalCoder, a 7B parameter model distinctively trained on the SlimPajama and StarCoder datasets that excels in balancing natural language processing and coding capabilities, and the OpenLLaMA models, which are open-source reproductions of Meta's LLaMA ranging from 3B to 13B parameters. Model inputs and outputs Inputs Raw text prompts in English Outputs Continuation of the input text, generating coherent and relevant text in English Capabilities Amber demonstrates solid performance on a variety of benchmark tasks, including the AI2 Reasoning Challenge (ARC-C), HellaSwag, MMLU, TruthfulQA, and WinoGrande. While not a state-of-the-art model, Amber's capabilities make it a useful tool for tasks like open-ended text generation, question answering, and language understanding. What can I use it for? The Amber model can be applied to a range of natural language processing tasks, such as: Content generation**: Produce coherent and relevant text continuations for creative writing, blog posts, product descriptions, and more. Question answering**: Answer questions based on provided context, useful for building chatbots or virtual assistants. Language understanding**: Leverage Amber's solid performance on benchmarks like MMLU to build applications that require robust natural language understanding. Things to try One interesting aspect of Amber is the availability of 360 different model checkpoints, allowing users to experiment with different stages of the training process. By loading different checkpoint versions, you can explore how the model's capabilities evolve over the course of training. This provides a valuable opportunity to better understand the inner workings of large language models and how they learn.

Read more

Updated Invalid Date




Total Score


The WizardCoder-15B-V1.0 model is a large language model (LLM) developed by the WizardLM Team that has been fine-tuned specifically for coding tasks using their Evol-Instruct method. This method involves automatically generating a diverse set of code-related instructions to further train the model on instruction-following capabilities. Compared to similar open-source models like CodeGen-16B-Multi, LLaMA-33B, and StarCoder-15B, the WizardCoder-15B-V1.0 model exhibits significantly higher performance on the HumanEval benchmark, achieving a pass@1 score of 57.3 compared to the 18.3-37.8 range of the other models. Model inputs and outputs Inputs Natural language instructions**: The model takes in natural language prompts that describe coding tasks or problems to be solved. Outputs Generated code**: The model outputs code in a variety of programming languages (e.g. Python, Java, etc.) that attempts to solve the given problem or complete the requested task. Capabilities The WizardCoder-15B-V1.0 model has been specifically trained to excel at following code-related instructions and generating functional code to solve a wide range of programming problems. It is capable of tasks such as writing simple algorithms, fixing bugs in existing code, and even generating complex programs from high-level descriptions. What can I use it for? The WizardCoder-15B-V1.0 model could be a valuable tool for developers, students, and anyone working on code-related projects. Some potential use cases include: Prototyping and rapid development of new software features Automating repetitive coding tasks Helping to explain programming concepts by generating sample code Tutoring and teaching programming by providing step-by-step solutions Things to try One interesting thing to try with the WizardCoder-15B-V1.0 model is to provide it with vague or open-ended prompts and see how it interprets and responds to them. For example, you could ask it to "Write a Python program that analyzes stock market data" and see the creative and functional solutions it comes up with. Another idea is to give the model increasingly complex or challenging coding problems, like those found on programming challenge websites, and test its ability to solve them. This can help uncover the model's strengths and limitations when it comes to more advanced programming tasks.

Read more

Updated Invalid Date



Total Score


The CodeFuse-CodeLlama-34B is a 34 billion parameter code-focused large language model (LLM) developed by codefuse-ai. This model is a fine-tuned version of the CodeLlama-34b-Python model, trained on 600k instructions and answers across various programming tasks. It achieves state-of-the-art performance of 74.4% pass@1 on the HumanEval benchmark, outperforming other open-source models like WizardCoder-Python-34B-V1.0 and GPT-4 on this metric. Model inputs and outputs Inputs The model accepts a concatenated string of conversation data in a specific format, including system instructions, human messages, and bot responses. Outputs The model generates text continuations in response to the input prompt. Capabilities The CodeFuse-CodeLlama-34B model is highly capable at a variety of code-related tasks, including code completion, infilling, and following programming instructions. It demonstrates strong performance on benchmarks like HumanEval, indicating its ability to synthesize and understand code. The model is also a Python specialist, making it well-suited for tasks involving the Python programming language. What can I use it for? The CodeFuse-CodeLlama-34B model can be used for a wide range of applications that involve code generation, understanding, and assistance. Some potential use cases include: Building intelligent code editors or IDEs that can provide advanced code completion and suggestion capabilities. Developing chatbots or virtual assistants that can help programmers with coding tasks, answer questions, and provide code examples. Automating the generation of boilerplate code or repetitive programming tasks. Enhancing existing ML/AI systems with code-generation capabilities, such as automated machine learning pipelines or data processing workflows. Things to try One interesting thing to try with the CodeFuse-CodeLlama-34B model is to provide it with open-ended programming challenges or tasks, and observe how it approaches and solves them. The model's strong performance on benchmarks like HumanEval suggests it may be able to tackle a variety of programming problems in creative and novel ways. Developers could also experiment with fine-tuning or adapting the model for their specific use cases, leveraging the provided tools and resources from the codefuse-ai team.

Read more

Updated Invalid Date




Total Score


The starcoder2-15b model is a 15B parameter model trained on 600+ programming languages from the The Stack v2 dataset, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 4+ trillion tokens. The model was trained using the NVIDIA NeMo Framework on the NVIDIA Eos Supercomputer built with NVIDIA DGX H100 systems. The starcoder2-15b model is an evolution of the earlier StarCoder model, which was a 15.5B parameter model trained on 80+ programming languages. Both models were developed by the BigCode team. Model inputs and outputs Inputs Text prompts in any of the 600+ programming languages the model was trained on Outputs Generated code in response to the input prompt Capabilities The starcoder2-15b model is capable of generating code in a wide variety of programming languages. It can be used for tasks like code completion, code generation, and even open-ended programming challenges. The model's large size and extensive training data allow it to handle complex programming concepts and idioms across many languages. What can I use it for? The starcoder2-15b model could be useful for a variety of applications, such as: Building programming assistants to help developers write code more efficiently Generating example code snippets for educational or documentation purposes Prototyping new ideas and quickly iterating on code-based projects Integrating code generation capabilities into no-code or low-code platforms Things to try One interesting aspect of the starcoder2-15b model is its ability to handle long-form context. By training on a 16,384 token context window, the model can generate code that is coherent and consistent over a large number of lines. You could try providing the model with a partially completed function or class definition and see if it can generate the remaining implementation. Another interesting experiment would be to fine-tune the starcoder2-15b model on a specific programming language or domain-specific dataset. This could allow the model to develop specialized knowledge and skills tailored to your particular use case.

Read more

Updated Invalid Date