### [](#model-and-inputs)Model and Inputs

Prithvi is a first-of-its-kind temporal Vision transformer pre-trained by the IBM and NASA team on contiguous US Harmonised Landsat Sentinel 2 (HLS) data. The model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder (MAE) learning strategy, with an MSE loss function. The model includes spatial attention across multiple patches and also temporal attention for each patch.

[![](/ibm-nasa-geospatial/Prithvi-100M/resolve/main/GFM.png)](/ibm-nasa-geospatial/Prithvi-100M/blob/main/GFM.png)

The model accepts remote sensing data in a video format (B, C, T, H, W). Note that the temporal dimension (T) is very important in this application and not present in most other works around remote sensing modeling. The ability to handle a time series of remote sensing images can benefit a variety of downstream tasks (e.g. Burn Scars segmentation, Flood Segmentation, Land Cover Classification). The model can also handle static imagery which can be fed into the model with T=1.

### [](#pre-training)Pre-training

The model was pre-trained with NASA's HLS V2 L30 product (30m granularity) from the contiguous United States. The bands that were used are the following:

1.  Blue
2.  Green
3.  Red
4.  Narrow NIR
5.  SWIR 1
6.  SWIR 2

### [](#code)Code

The model follows the [original MAE repo](https://github.com/facebookresearch/mae) with some modifications including:

1.  replace 2D patch embed with 3D patch embed;
2.  replace 2D positional embed with 3D positional embed;
3.  replace 2D patchify and unpatchify with 3D.
4.  adding infrared bands besides RGB

### [](#inference-and-demo)Inference and demo

There is an inference script (`Prithvi_run_inference.py`) that allows to run the image reconstruction on a set of HLS images assumed to be from the same location at different time steps(see example below). These should be provided in chronological order in geotiff format, including the channels described above (Blue, Green, Red, Narrow NIR, SWIR 1, SWIR 2) in reflectance units. There is also a **demo** that leverages the same code [here](https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-100M-demo).

    python Prithvi_run_inference.py --data_files t1.tif t2.tif t3.tif --yaml_file_path /path/to/yaml/Prithvi_100.yaml --checkpoint /path/to/checkpoint/Prithvi_100.pth --output_dir /path/to/out/dir/ --input_indices <space separated 0-based indices of channels to select from input> --mask_ratio 0.5 --img_size <length of one side of square input shape>
    

This demo is a starting point that can be used as a starting point to generalize to different input shapes / types.

### [](#finetuning-examples)Finetuning examples

Examples of finetuning the model for image segmentation using the mmsegmentation library are available through Hugging Face (e.g. [burn scars segmentation](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-burn-scar), [flood mapping](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-sen1floods11), and [multi temporal crop classification](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-multi-temporal-crop-classification)), with the code used for the experiments available on [github](https://github.com/NASA-IMPACT/hls-foundation-os/tree/main/fine-tuning-examples). This also contains instructions to finetune the model for flood detection on the popular open access [sen1floods11 dataset](https://github.com/cloudtostreet/Sen1Floods11).

### [](#feedback)Feedback

Your feedback is invaluable to us. If you have any feedback about the model, please feel free to share it with us. You can do this by submitting issues on our open-source repository, [hls-foundation-os](https://github.com/NASA-IMPACT/hls-foundation-os/issues), on GitHub.

### [](#citation)Citation

If this model helped your research, please cite `Prithvi-100M` in your publications. Here are two BibTeX entries as examples:

    @article{Prithvi-100M-preprint,
        author          = {Jakubik, Johannes and Roy, Sujit and Phillips, C. E. and Fraccaro, Paolo and Godwin, Denys and Zadrozny, Bianca and Szwarcman, Daniela and Gomes, Carlos and Nyirjesy, Gabby and Edwards, Blair and Kimura, Daiki and Simumba, Naomi and Chu, Linsong and Mukkavilli, S. Karthik and Lambhate, Devyani and Das, Kamal and Bangalore, Ranjini and Oliveira, Dario and Muszynski, Michal and Ankur, Kumar and Ramasubramanian, Muthukumaran and Gurung, Iksha and Khallaghi, Sam and Li, Hanxi (Steve) and Cecil, Michael and Ahmadi, Maryam and Kordi, Fatemeh and Alemohammad, Hamed and Maskey, Manil and Ganti, Raghu and Weldemariam, Kommy and Ramachandran, Rahul},
        month           = oct,
        title           = {{Foundation Models for Generalist Geospatial Artificial Intelligence}},
        journal         = {Preprint Available on arxiv:2310.18660},
        year            = {2023}
    }
    
    @misc{Prithvi-100M,
        author          = {Jakubik, Johannes and Chu, Linsong and Fraccaro, Paolo and Gomes, Carlos and Nyirjesy, Gabby and Bangalore, Ranjini and Lambhate, Devyani and Das, Kamal and Oliveira Borges, Dario and Kimura, Daiki and Simumba, Naomi and Szwarcman, Daniela and Muszynski, Michal and Weldemariam, Kommy and Zadrozny, Bianca and Ganti, Raghu and Costa, Carlos and Edwards, Blair & Watson, Campbell and Mukkavilli, Karthik and Schmude, Johannes & Hamann, Hendrik and Robert, Parkin and Roy, Sujit and Phillips, Christopher and Ankur, Kumar and Ramasubramanian, Muthukumaran and Gurung, Iksha and Leong, Wei Ji and Avery, Ryan and Ramachandran, Rahul and Maskey, Manil and Olofossen, Pontus and Fancher, Elizabeth and Lee, Tsengdar and Murphy, Kevin and Duffy, Dan and Little, Mike and Alemohammad, Hamed and Cecil, Michael and Li, Steve and Khallaghi, Sam and Godwin, Denys and Ahmadi, Maryam and Kordi, Fatemeh and Saux, Bertrand and Pastick, Neal and Doucette, Peter and Fleckenstein, Rylie and Luanga, Dalton and Corvin, Alex and Granger, Erwan},
        doi             = {10.57967/hf/0952},
        month           = aug,
        title           = {{Prithvi-100M}},
        repository-code = {https://github.com/NASA-IMPACT/hls-foundation-os},
        year            = {2023}
    }

## Model overview

The `Prithvi-100M` model is a first-of-its-kind temporal Vision Transformer pre-trained by the IBM and NASA team on contiguous US Harmonised Landsat Sentinel 2 (HLS) data. The model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder (MAE) learning strategy, with an MSE loss function. The model includes spatial attention across multiple patches and also temporal attention for each patch.

This model can be compared to other similar models like [moondream1](https://aimodels.fyi/models/huggingFace/moondream1-vikhyatk), which is a 1.6B parameter model built using SigLIP, Phi-1.5 and the LLaVa training dataset, as well as [neural-chat-7b-v3-1](https://aimodels.fyi/models/huggingFace/neural-chat-7b-v3-1-intel), a 7B parameter LLM finetuned on the Intel Gaudi 2 processor.

## Model inputs and outputs

### Inputs
- The `Prithvi-100M` model accepts remote sensing data in a video format (B, C, T, H, W), where the temporal dimension (T) is crucial for this application and not present in most other remote sensing models.
- The model can handle both time series of remote sensing images as well as static imagery with T=1.
- The input data includes the following bands from the NASA HLS V2 L30 product: Blue, Green, Red, Narrow NIR, SWIR 1, and SWIR 2.

### Outputs
- The model can perform image reconstruction on a set of HLS images from the same location at different time steps.
- The output can be used for a variety of downstream tasks such as Burn Scars segmentation, Flood Segmentation, and Land Cover Classification.

## Capabilities

The `Prithvi-100M` model's unique capability is its ability to handle temporal remote sensing data, which can benefit a variety of applications in the geospatial domain. By incorporating spatial and temporal attention, the model can learn meaningful representations from time-series imagery, enabling more accurate and robust analysis of land cover changes, disaster events, and other environmental phenomena.

## What can I use it for?

The `Prithvi-100M` model can be used for a range of applications in the remote sensing and geospatial fields. Some potential use cases include:

- **Land Cover Classification**: The model can be finetuned on labeled land cover data to perform accurate and efficient classification of different land cover types over time.
- **Burn Scar Mapping**: The temporal capabilities of the model can be leveraged to detect and map the extent of burn scars after wildfires, which is crucial for disaster response and mitigation efforts.
- **Flood Monitoring**: By analyzing time-series remote sensing data, the model can be used to identify and track the progression of flood events, supporting flood risk assessment and emergency planning.

## Things to try

One interesting aspect of the `Prithvi-100M` model is its ability to handle both static and time-series remote sensing imagery. Researchers and developers could explore how the model's performance varies when applying it to different types of input data, such as comparing its accuracy on single-date versus multi-date land cover classification tasks.

Additionally, the model's finetuning capabilities, as demonstrated by the provided examples for burn scar segmentation, present an opportunity to investigate how the pre-trained model can be further optimized for specific downstream applications. Experimenting with different finetuning strategies and dataset compositions could yield insights into the model's adaptability and versatility.