cloudy_banner

SEN12MS-CR DATASET

A Dataset and Remote Sensing Benchmark
for Multimodal Cloud Removal





Introduction

On average, over half of all optical observations acquired via spaceborne Earth imagery are affected by clouds. As cloud coverage severely impedes the ongoing observation of Earth, the automated reconstruction of noisy or cloud covered information is a persistent problem in signal processing and remote sensing. While classical remote sensing applications oftentimes focused on narrowly-defined areals and case studies, the increasing availability of public-access, daily and large-scale satellite monitoring shifted the community's interest towards globally applicable methodology. To support the development of modern machine learning techniques for the purpose of cloud removal on whole-planet satellite data, we curated a large data set for training and evaluating new approaches.


This data set, SEN12MS-CR, is a multi-modal and mono-temporal data set for cloud removal. It contains paired and co-registered space-borne radar measurements practically unaffected by clouds, as well as cloud-covered and cloud-free multi-spectral optical satellite observations. The radar and optical data are collected via Sentinel-1 and Sentinel-2 satellites from European Space Agency's Copernicus mission, respectively. The Sentinel satellites provide public access data and are among the most prominent satellites in Earth observation.


SEN12MS-CR is the first public data set for cloud removal in Earth observation to provide a large-scale global and all-season coverage. Based on the observation that cloud coverage varies widely in practice, all scenarios ranging from clear skies to absolute coverage are contained in the train and test data. By making this curated and readily pre-processed data set available to the research community, we hope to hope to help advance automated cloud removal in optical satellite data.


Exemplary triplets of data. Every sample in SEN12MS-CR is a tuple consisting of 2-bands Sentinel-1 radar measurements, as well as cloudy and cloud-free 13-bands Sentinel-2 optical observations.




Statistics

SEN12MS-CR is a global data set for multi-modal cloud removal. It contains observations covering 175 globally distributed Regions of Interest recorded in one of four seasons throughout the year of 2018. For each region, synthetic aperture radar (SAR) Sentinel-1 as well as cloudy and cloud-free optical multi-spectral Sentinel-2 are provided. The full-scene images are sliced up into a total of 122,218 patch triplets, each patch of size \(256 \times 256 \: px^2\). The samples are patch-wise co-registered and fully compatible with the SEN12MS data set, such that semantic segmentation and scene classification can be performed based on the available semantic land cover annotations. The approximate cloud coverage of all data is about at circa 48%---reaching from clear-view images (e.g. for validation purposes), over semi-transparent haze or small clouds to dense and wide cloud coverage.

Geospatial distribution of ROI in SEN12MS-CR. Different colors indicate differences in seasons collected.

Exemplary cloud removal results. Left to right: Radar data, loud-covered optical data, cloud-removed optical predictions and cloud-free reference optical data.



Benchmarking

Method MAE ↓ SAM ↓ PSNR ↑ SSIM ↑
McGAN (Enomoto et al., 2017) 0.048 15.676 25.14 0.744
SAR-Opt-cGAN (Grohnfeldt et al., 2018) 0.043 15.494 25.59 0.764
SAR2OPT (Bermudez et al., 2018) 0.042 14.788 25.87 0.793
SpA GAN (Pan, 2020) 0.045 18.085 24.78 0.754
Simulation-Fusion GAN (Gao et al., 2020) 0.045 16.633 24.73 0.701
DSen2-CR (Meraner et al., 2020) 0.031 9.472 27.76 0.874
GLF-CR (Xu et al., 2022) 0.028 8.981 28.64 0.885
UnCRtainTS L2 (Ebel et al., 2023) 0.027 8.320 28.90 0.880

Feel free to report this benchmarking of prior methods in your work utilizing SEN12MS-CR.
Please reach out to us if you would like to have your work referenced and your model included in the benchmark.





Download

1. Dataset

Download Link here and here (supplementary, e.g. splits)

Note: You can also download (parts of) the data in the terminal (passwd: m1554803) using wget or rsync, for instance via
wget "ftp://m1554803:m1554803@dataserv.ub.tum.de/ROIs1158_spring_s1.tar.gz"
rsync -chavzP --stats rsync://m1554803@dataserv.ub.tum.de/m1554803/ .

Update: You can now easily get the dataset via this automated downloading script here.

2. Code

Download Links here and here

3. Trained Models

Cloud Removal Models

Model Weights Download Description
DSen2-CR GoogleDrive
pCloud Share
[Tensorflow] The network trained on SEN12MS-CR in Meraner et al 2020 on paired radar and cloudy optical satellite observations.
ResNet pCloud Share
[PyTorch] A ResNet16 pre-trained for mono-temporal cloud removal on SEN12MS-CR. Takes radar and multi-spectral satellite data as inputs to make cloud-removed multispectral optical predictions.
GLF-CR Google Drive [PyTorch] The vision transformer cloud removal network trained on SEN12MS-CR in Xu et al 2022 on paired radar and cloudy optical satellite observations. Code here.
UnCRtainTS (t=1) pCloud Share [PyTorch] The monotemporal version of the uncertainty prediction network trained on SEN12MS-CR in Ebel et al 2023 on paired radar and cloudy optical satellite observations. Code here.





References

1. Citation

    If you utilize this dataset in your work, please use the following citation:

@article{sen12mscr,
        title = {{Multisensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 Imagery}},
        author = {Ebel, Patrick and Meraner, Andrea and Schmitt, Michael and Zhu, Xiao Xiang},
        journal = {IEEE Transactions on Geoscience and Remote Sensing},
        year = {2020}
        publisher = {IEEE}
}






© Patrick Ebel www.pwjebel.com.