[](#tinytimemixer-ttm-model-card)TinyTimeMixer (TTM) Model Card
===============================================================

![](/ibm-granite/granite-timeseries-ttm-v1/resolve/main/ttm_image.webp)

TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research. **With less than 1 Million parameters, TTM introduces the notion of the first-ever tiny pre-trained models for Time-Series Forecasting.**

TTM outperforms several popular benchmarks demanding billions of parameters in zero-shot and few-shot forecasting. TTMs are lightweight forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive. Refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf) for more details.

**The current open-source version supports point forecasting use-cases ranging from minutely to hourly resolutions (Ex. 10 min, 15 min, 1 hour, etc.)**

**Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed in 1 GPU machine or in laptops too!!**

[](#how-to-get-started-with-the-model)How to Get Started with the Model
-----------------------------------------------------------------------

*   [colab](https://colab.research.google.com/github/IBM/tsfm/blob/tutorial/notebooks/tutorial/ttm_tutorial.ipynb)
*   [Getting Started Notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb)
*   [512-96 Benchmarks](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_benchmarking_512_96.ipynb)
*   [1024-96 Benchmarks](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_benchmarking_1024_96.ipynb)
*   Script for Finetuning with cross-channel correlation support - to be added soon

[](#benchmark-highlights)Benchmark Highlights:
----------------------------------------------

*   TTM (with less than 1 Million parameters) outperforms the following popular Pre-trained SOTAs demanding several hundred Million to Billions of parameters [paper](https://arxiv.org/pdf/2401.03955.pdf):
    *   _GPT4TS (NeurIPS 23) by 7-12% in few-shot forecasting_
    *   _LLMTime (NeurIPS 23) by 24% in zero-shot forecasting_.
    *   _SimMTM (NeurIPS 23) by 17% in few-shot forecasting_.
    *   _Time-LLM (ICLR 24) by 2-8% in few-shot forecasting_
    *   _UniTime (WWW 24) by 27% in zero-shot forecasting._
*   Zero-shot results of TTM surpass the _few-shot results of many popular SOTA approaches_ including PatchTST (ICLR 23), PatchTSMixer (KDD 23), TimesNet (ICLR 23), DLinear (AAAI 23) and FEDFormer (ICML 22).
*   TTM (1024-96, released in this model card with 1M parameters) outperforms pre-trained MOIRAI-Small (14M parameters) by 10%, MOIRAI-Base (91M parameters) by 2% and MOIRAI-Large (311M parameters) by 3% on zero-shot forecasting (horizon = 96). [\[notebook\]](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_benchmarking_1024_96.ipynb)
*   TTM quick fine-tuning also outperforms the competitive statistical baselines (Statistical ensemble and S-Naive) in M4-hourly dataset which existing pretrained TS models are finding difficult to outperform. [\[notebook\]](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_m4_hourly.ipynb)
*   TTM takes only a _few seconds for zeroshot/inference_ and a _few minutes for finetuning_ in 1 GPU machine, as opposed to long timing-requirements and heavy computing infra needs of other existing pre-trained models.

[](#model-description)Model Description
---------------------------------------

TTM falls under the category of focused pre-trained models, wherein each pre-trained TTM is tailored for a particular forecasting setting (governed by the context length and forecast length). Instead of building one massive model supporting all forecasting settings, we opt for the approach of constructing smaller pre-trained models, each focusing on a specific forecasting setting, thereby yielding more accurate results. Furthermore, this approach ensures that our models remain extremely small and exceptionally fast, facilitating easy deployment without demanding a ton of resources.

Hence, in this model card, we plan to release several pre-trained TTMs that can cater to many common forecasting settings in practice. Additionally, we have released our source code along with our pretraining scripts that users can utilize to pretrain models on their own. Pretraining TTMs is very easy and fast, taking only 3-6 hours using 6 A100 GPUs, as opposed to several days or weeks in traditional approaches.

Each pre-trained model will be released in a different branch name in this model card. Kindly access the required model using our getting started [notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) mentioning the branch name.

[](#model-releases-along-with-the-branch-name-where-the-models-are-stored)Model Releases (along with the branch name where the models are stored):
--------------------------------------------------------------------------------------------------------------------------------------------------

*   **512-96:** Given the last 512 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length) in future. This model is targeted towards a forecasting setting of context length 512 and forecast length 96 and recommended for hourly and minutely resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: main)
    
*   **1024-96:** Given the last 1024 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length) in future. This model is targeted towards a long forecasting setting of context length 1024 and forecast length 96 and recommended for hourly and minutely resolutions (Ex. 10 min, 15 min, 1 hour, etc). (branch name: 1024-96-v1)
    
*   Stay tuned for more models !
    

[](#model-details)Model Details
-------------------------------

For more details on TTM architecture and benchmarks, refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf).

TTM-1 currently supports 2 modes:

*   **Zeroshot forecasting**: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).
    
*   **Finetuned forecasting**: Finetune the pre-trained model with a subset of your target data to further improve the forecast.
    

**Since, TTM models are extremely small and fast, it is practically very easy to finetune the model with your available target data in few minutes to get more accurate forecasts.**

The current release supports multivariate forecasting via both channel independence and channel-mixing approaches. Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across time-series variates, a critical capability lacking in existing counterparts.

In addition, TTM also supports exogenous infusion and categorical data which is not released as part of this version. Stay tuned for these extended features.

[](#recommended-use)Recommended Use
-----------------------------------

1.  Users have to externally standard scale their data indepedently for every channel before feeding it to the model (Refer to [TSP](https://github.com/IBM/tsfm/blob/main/tsfm_public/toolkit/time_series_preprocessor.py), our data processing utility for data scaling.)
2.  Enabling any upsampling or prepending zeros to virtually increase the context length for shorter length datasets is not recommended and will impact the model performance.

### [](#model-sources)Model Sources

*   **Repository:** [https://github.com/IBM/tsfm/tree/main/tsfm\_public/models/tinytimemixer](https://github.com/IBM/tsfm/tree/main/tsfm_public/models/tinytimemixer)
*   **Paper:** [https://arxiv.org/pdf/2401.03955.pdf](https://arxiv.org/pdf/2401.03955.pdf)

[](#uses)Uses
-------------

    # Load Model from HF Model Hub mentioning the branch name in revision field
    
    model = TinyTimeMixerForPrediction.from_pretrained(
                    "https://huggingface.co/ibm/TTM", revision="main"
                ) 
    
    # Do zeroshot
    zeroshot_trainer = Trainer(
            model=model,
            args=zeroshot_forecast_args,
            )
        )
    
    zeroshot_output = zeroshot_trainer.evaluate(dset_test)
    
    
    # Freeze backbone and enable few-shot or finetuning:
    
    # freeze backbone
    for param in model.backbone.parameters():
      param.requires_grad = False
    
    finetune_forecast_trainer = Trainer(
            model=model,
            args=finetune_forecast_args,
            train_dataset=dset_train,
            eval_dataset=dset_val,
            callbacks=[early_stopping_callback, tracking_callback],
            optimizers=(optimizer, scheduler),
        )
    finetune_forecast_trainer.train()
    fewshot_output = finetune_forecast_trainer.evaluate(dset_test)
    

[](#training-data)Training Data
-------------------------------

The TTM models were trained on a collection of datasets from the Monash Time Series Forecasting repository. The datasets used include:

*   Australian Electricity Demand: [https://zenodo.org/records/4659727](https://zenodo.org/records/4659727)
*   Australian Weather: [https://zenodo.org/records/4654822](https://zenodo.org/records/4654822)
*   Bitcoin dataset: [https://zenodo.org/records/5122101](https://zenodo.org/records/5122101)
*   KDD Cup 2018 dataset: [https://zenodo.org/records/4656756](https://zenodo.org/records/4656756)
*   London Smart Meters: [https://zenodo.org/records/4656091](https://zenodo.org/records/4656091)
*   Saugeen River Flow: [https://zenodo.org/records/4656058](https://zenodo.org/records/4656058)
*   Solar Power: [https://zenodo.org/records/4656027](https://zenodo.org/records/4656027)
*   Sunspots: [https://zenodo.org/records/4654722](https://zenodo.org/records/4654722)
*   Solar: [https://zenodo.org/records/4656144](https://zenodo.org/records/4656144)
*   US Births: [https://zenodo.org/records/4656049](https://zenodo.org/records/4656049)
*   Wind Farms Production data: [https://zenodo.org/records/4654858](https://zenodo.org/records/4654858)
*   Wind Power: [https://zenodo.org/records/4656032](https://zenodo.org/records/4656032)

[](#citation-optional)Citation \[optional\]
-------------------------------------------

Kindly cite the following paper, if you intend to use our model or its associated architectures/approaches in your work

**BibTeX:**

    @article{ekambaram2024ttms,
      title={TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series},
      author={Ekambaram, Vijay and Jati, Arindam and Nguyen, Nam H and Dayama, Pankaj and Reddy, Chandra and Gifford, Wesley M and Kalagnanam, Jayant},
      journal={arXiv preprint arXiv:2401.03955},
      year={2024}
    }
    

**APA:**

Ekambaram, V., Jati, A., Nguyen, N. H., Dayama, P., Reddy, C., Gifford, W. M., & Kalagnanam, J. (2024). TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series. arXiv preprint arXiv:2401.03955.

[](#model-card-authors)Model Card Authors
-----------------------------------------

Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Nam H. Nguyen, Wesley Gifford and Jayant Kalagnanam

[](#ibm-public-repository-disclosure)IBM Public Repository Disclosure:
----------------------------------------------------------------------

All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.

## Model overview

The `granite-timeseries-ttm-v1` model is a compact pre-trained model for Multivariate Time-Series Forecasting, open-sourced by IBM Research. With less than 1 Million parameters, it introduces the notion of the first-ever tiny pre-trained models for Time-Series Forecasting. The [TinyTimeMixer (TTM)](https://aimodels.fyi/creators/huggingFace/ibm-granite) model outperforms several popular benchmarks demanding billions of parameters in zero-shot and few-shot forecasting. TTMs are lightweight forecasters, pre-trained on publicly available time series data with various augmentations. The current open-source version supports point forecasting use-cases ranging from minutely to hourly resolutions.

## Model inputs and outputs

### Inputs
- **Multivariate time-series data**: The model takes in multivariate time-series data as input, where the number of time-points (context length) can range from 512 to 1024.

### Outputs
- **Future time-series forecasts**: Given the input time-series data, the model generates forecasts for the next 96 time-points (forecast length) in the future.

## Capabilities

The `granite-timeseries-ttm-v1` model outperforms several popular pre-trained SOTA approaches in both zero-shot and few-shot forecasting. For example, it surpasses the few-shot results of models like [PatchTST](https://aimodels.fyi/models/huggingFace/ttm-ibm), [PatchTSMixer](https://aimodels.fyi/models/huggingFace/ttm-ibm), and [TimesNet](https://aimodels.fyi/models/huggingFace/ttm-ibm) in its zero-shot forecasts. The model also demonstrates the ability to provide state-of-the-art zero-shot forecasts and can be quickly fine-tuned with just 5% of the target data to achieve competitive results.

## What can I use it for?

You can use the `granite-timeseries-ttm-v1` model for a variety of time-series forecasting applications, such as electricity demand forecasting, stock price prediction, weather forecasting, and more. The model's compact size and fast inference makes it suitable for deployment on resource-constrained environments, like edge devices or laptops. Additionally, the provided notebooks and scripts can help you get started with using the model for your own time-series forecasting tasks.

## Things to try

One interesting aspect of the `granite-timeseries-ttm-v1` model is its ability to provide state-of-the-art zero-shot forecasts. This means you can apply the pre-trained model directly to your target data without any fine-tuning and still get accurate predictions. You can also try fine-tuning the model with just a small portion of your target data (e.g., 5%) to further improve the forecasting accuracy. The provided notebooks showcase these capabilities and can serve as a starting point for your experiments.

[![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png)](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png)

[](#granite-8b-code-instruct)Granite-8B-Code-Instruct
=====================================================

[](#model-summary)Model Summary
-------------------------------

**Granite-8B-Code-Instruct** is a 8B parameter model fine tuned from _Granite-8B-Code-Base_ on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

*   **Developers:** IBM Research
*   **GitHub Repository:** [ibm-granite/granite-code-models](https://github.com/ibm-granite/granite-code-models)
*   **Paper:** [Granite Code Models: A Family of Open Foundation Models for Code Intelligence](https://arxiv.org/abs/2405.04324)
*   **Release Date**: May 6th, 2024
*   **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).

[](#usage)Usage
---------------

> **You need to build transformers from source to use this model correctly.** Relevant PR: [https://github.com/huggingface/transformers/pull/30031](https://github.com/huggingface/transformers/pull/30031)
> 
>     git clone https://github.com/huggingface/transformers
>     cd transformers/
>     pip install ./
>     cd ..
>     

### [](#intended-use)Intended use

The model is designed to respond to coding related instructions and can be used to build coding assitants.

### [](#generation)Generation

This is a simple example of how to use **Granite-8B-Code-Instruct** model.

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    device = "cuda" # or "cpu"
    model_path = "ibm-granite/granite-8b-code-instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    # drop device_map if running on CPU
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    # change input text as desired
    chat = [
        { "role": "user", "content": "Write a code to find the maximum value in a list of numbers." },
    ]
    chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    # tokenize the text
    input_tokens = tokenizer(chat, return_tensors="pt")
    # transfer tokenized inputs to the device
    for i in input_tokens:
        input_tokens[i] = input_tokens[i].to(device)
    # generate output tokens
    output = model.generate(**input_tokens, max_new_tokens=100)
    # decode output tokens into text
    output = tokenizer.batch_decode(output)
    # loop over the batch to print, in this example the batch size is 1
    for i in output:
        print(i)
    

[](#training-data)Training Data
-------------------------------

Granite Code Instruct models are trained on the following types of data.

*   Code Commits Datasets: we sourced code commits data from the [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) dataset, a filtered version of the full CommitPack dataset. From CommitPackFT dataset, we only consider data for 92 programming languages. Our inclusion criteria boils down to selecting programming languages common across CommitPackFT and the 116 languages that we considered to pretrain the code-base model (_Granite-8B-Code-Base_).
*   Math Datasets: We consider two high-quality math datasets, [MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) and [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA). Due to license issues, we filtered out GSM8K-RFT and Camel-Math from MathInstruct dataset.
*   Code Instruction Datasets: We use [Glaive-Code-Assistant-v3](https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3), [Glaive-Function-Calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2), [NL2SQL11](https://huggingface.co/datasets/bugdaryan/sql-create-context-instruction) and a small collection of synthetic API calling datasets.
*   Language Instruction Datasets: We include high-quality datasets such as [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) and an open license-filtered version of [Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus). We also include a collection of hardcoded prompts to ensure our model generates correct outputs given inquiries about its name or developers.

[](#infrastructure)Infrastructure
---------------------------------

We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.

[](#ethical-considerations-and-limitations)Ethical Considerations and Limitations
---------------------------------------------------------------------------------

Granite code instruct models are primarily finetuned using instruction-response pairs across a specific set of programming languages. Thus, their performance may be limited with out-of-domain programming languages. In this situation, it is beneficial providing few-shot examples to steer the model's output. Moreover, developers should perform safety testing and target-specific tuning before deploying these models on critical applications. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to _[Granite-8B-Code-Base](https://huggingface.co/ibm-granite/granite-8b-code-base)_ model card.

## Model Overview

The `granite-8b-code-instruct` model is an 8 billion parameter language model fine-tuned by IBM Research to enhance instruction following capabilities, including logical reasoning and problem-solving skills. The model is built on the [Granite-8B-Code-Base](https://github.com/ibm-granite/granite-code-models) foundation model, which was pre-trained on a large corpus of permissively licensed code data. This fine-tuning process aimed to imbue the model with strong abilities to understand and execute coding-related instructions.

## Model Inputs and Outputs

The `granite-8b-code-instruct` model is designed to accept natural language instructions and generate relevant code or text responses. Its inputs can include a wide range of coding-related prompts, such as requests to write functions, debug code, or explain programming concepts. The model's outputs are similarly broad, spanning generated code snippets, explanations, and other text-based responses.

### Inputs
- Natural language instructions or prompts related to coding and software development

### Outputs
- Generated code snippets
- Text-based responses explaining programming concepts
- Debugging suggestions or fixes for code issues

## Capabilities

The `granite-8b-code-instruct` model excels at understanding and executing coding-related instructions. It can be used to build intelligent coding assistants that can help with tasks like generating boilerplate code, explaining programming concepts, and debugging issues. The model's strong logical reasoning and problem-solving skills make it well-suited for a variety of software development and engineering use cases.

## What Can I Use It For?

The `granite-8b-code-instruct` model can be used to build a wide range of applications, from intelligent coding assistants to automated code generation tools. Developers could leverage the model to create conversational interfaces that help users write, understand, and troubleshoot code. Researchers could explore the model's capabilities in areas like program synthesis, code summarization, and language-guided software engineering.

## Things to Try

One interesting application of the `granite-8b-code-instruct` model could be to use it as a foundation for building a collaborative, AI-powered coding environment. By integrating the model's instruction following and code generation abilities, developers could create a tool that assists with tasks like pair programming, code review, and knowledge sharing. Another potential use case could be to fine-tune the model further on domain-specific datasets to create specialized code intelligence models for industries like finance, healthcare, or manufacturing.

[![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png)](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png)

[](#granite-34b-code-instruct)Granite-34B-Code-Instruct
=======================================================

[](#model-summary)Model Summary
-------------------------------

**Granite-34B-Code-Instruct** is a 34B parameter model fine tuned from _Granite-34B-Code-Base_ on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

*   **Developers:** IBM Research
*   **GitHub Repository:** [ibm-granite/granite-code-models](https://github.com/ibm-granite/granite-code-models)
*   **Paper:** [Granite Code Models: A Family of Open Foundation Models for Code Intelligence](https://arxiv.org/abs/2405.04324)
*   **Release Date**: May 6th, 2024
*   **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).

[](#usage)Usage
---------------

### [](#intended-use)Intended use

The model is designed to respond to coding related instructions and can be used to build coding assitants.

### [](#generation)Generation

This is a simple example of how to use **Granite-34B-Code-Instruct** model.

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    device = "cuda" # or "cpu"
    model_path = "ibm-granite/granite-34b-code-instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    # drop device_map if running on CPU
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    # change input text as desired
    chat = [
        { "role": "user", "content": "Write a code to find the maximum value in a list of numbers." },
    ]
    chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    # tokenize the text
    input_tokens = tokenizer(chat, return_tensors="pt")
    # transfer tokenized inputs to the device
    for i in input_tokens:
        input_tokens[i] = input_tokens[i].to(device)
    # generate output tokens
    output = model.generate(**input_tokens, max_new_tokens=100)
    # decode output tokens into text
    output = tokenizer.batch_decode(output)
    # loop over the batch to print, in this example the batch size is 1
    for i in output:
        print(i)
    

[](#training-data)Training Data
-------------------------------

Granite Code Instruct models are trained on the following types of data.

*   Code Commits Datasets: we sourced code commits data from the [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) dataset, a filtered version of the full CommitPack dataset. From CommitPackFT dataset, we only consider data for 92 programming languages. Our inclusion criteria boils down to selecting programming languages common across CommitPackFT and the 116 languages that we considered to pretrain the code-base model (_Granite-34B-Code-Base_).
*   Math Datasets: We consider two high-quality math datasets, [MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) and [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA). Due to license issues, we filtered out GSM8K-RFT and Camel-Math from MathInstruct dataset.
*   Code Instruction Datasets: We use [Glaive-Code-Assistant-v3](https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3), [Glaive-Function-Calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2), [NL2SQL11](https://huggingface.co/datasets/bugdaryan/sql-create-context-instruction) and a small collection of synthetic API calling datasets.
*   Language Instruction Datasets: We include high-quality datasets such as [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) and an open license-filtered version of [Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus). We also include a collection of hardcoded prompts to ensure our model generates correct outputs given inquiries about its name or developers.

[](#infrastructure)Infrastructure
---------------------------------

We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.

[](#ethical-considerations-and-limitations)Ethical Considerations and Limitations
---------------------------------------------------------------------------------

Granite code instruct models are primarily finetuned using instruction-response pairs across a specific set of programming languages. Thus, their performance may be limited with out-of-domain programming languages. In this situation, it is beneficial providing few-shot examples to steer the model's output. Moreover, developers should perform safety testing and target-specific tuning before deploying these models on critical applications. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to _[Granite-34B-Code-Base](https://huggingface.co/ibm-granite/granite-34b-code-base)_ model card.

## Model Overview

`granite-34b-code-instruct` is a 34B parameter model fine-tuned from the `granite-34b-code-base` model on a combination of permissively licensed instruction data to enhance its instruction following capabilities, including logical reasoning and problem-solving skills. It was developed by [IBM Research](https://aimodels.fyi/creators/huggingFace/ibm-granite).

Similar models include the [granite-8b-code-instruct](https://aimodels.fyi/models/huggingFace/granite-8b-code-instruct-ibm-granite) and [CodeLlama-34B-Instruct-GPTQ](https://aimodels.fyi/models/huggingFace/codellama-34b-instruct-gptq-thebloke) models. The granite-8b-code-instruct model is an 8B parameter version of the code instruction model, while the CodeLlama-34B-Instruct-GPTQ model is a 34B parameter model developed by the community and quantized for faster inference.

## Model Inputs and Outputs

### Inputs
- The model takes in text prompts, which can include instructions or coding tasks.

### Outputs
- The model generates text responses, which can include code snippets, explanations, or solutions to the given prompts.

## Capabilities

The `granite-34b-code-instruct` model is designed to excel at responding to coding-related instructions and can be used to build coding assistants. It has strong logical reasoning and problem-solving skills, allowing it to generate relevant and helpful code in response to prompts.

## What can I use it for?

The `granite-34b-code-instruct` model could be used to develop a variety of coding assistant applications, such as:

- Code generation and completion tools
- Automated programming helpers
- Natural language-to-code translation interfaces
- Educational coding tutors

By leveraging the model's instruction following and problem-solving capabilities, developers can create tools that make it easier for users to write and understand code.

## Things to Try

One interesting thing to try with the `granite-34b-code-instruct` model is to provide it with open-ended prompts about coding problems or tasks, and see how it responds. The model's ability to understand and reason about code-related instructions could lead to creative and unexpected solutions.

Another idea is to fine-tune the model further on domain-specific data or tasks, such as a particular programming language or software framework, to see if it can develop even more specialized capabilities.