On average, over half of all optical observations acquired via spaceborne Earth imagery are affected by clouds. As cloud coverage severely impedes the ongoing observation of Earth, the automated reconstruction of noisy or cloud covered information is a persistent problem in signal processing and remote sensing. While classical remote sensing applications oftentimes focused on narrowly-defined areals and case studies, the increasing availability of public-access, daily and large-scale satellite monitoring shifted the community's interest towards globally applicable methodology. To support the development of modern machine learning techniques for the purpose of cloud removal on whole-planet satellite data, we curated a large data set for training and evaluating new approaches.
This data set, SEN12MS-CR, is a multi-modal and mono-temporal data set for cloud removal. It contains paired and co-registered space-borne radar measurements practically unaffected by clouds, as well as cloud-covered and cloud-free multi-spectral optical satellite observations. The radar and optical data are collected via Sentinel-1 and Sentinel-2 satellites from European Space Agency's Copernicus mission, respectively. The Sentinel satellites provide public access data and are among the most prominent satellites in Earth observation.
SEN12MS-CR is the first public data set for cloud removal in Earth observation to provide a large-scale global and all-season coverage. Based on the observation that cloud coverage varies widely in practice, all scenarios ranging from clear skies to absolute coverage are contained in the train and test data. By making this curated and readily pre-processed data set available to the research community, we hope to hope to help advance automated cloud removal in optical satellite data.
SEN12MS-CR is a global data set for multi-modal cloud removal. It contains observations covering 175 globally distributed Regions of Interest recorded in one of four seasons throughout the year of 2018. For each region, synthetic aperture radar (SAR) Sentinel-1 as well as cloudy and cloud-free optical multi-spectral Sentinel-2 are provided. The full-scene images are sliced up into a total of 122,218 patch triplets, each patch of size \(256 \times 256 \: px^2\). The samples are patch-wise co-registered and fully compatible with the SEN12MS data set, such that semantic segmentation and scene classification can be performed based on the available semantic land cover annotations. The approximate cloud coverage of all data is about at circa 48%---reaching from clear-view images (e.g. for validation purposes), over semi-transparent haze or small clouds to dense and wide cloud coverage.
Benchmarking
Method | MAE ↓ | SAM ↓ | PSNR ↑ | SSIM ↑ |
---|---|---|---|---|
McGAN (Enomoto et al., 2017) | 0.048 | 15.676 | 25.14 | 0.744 |
SAR-Opt-cGAN (Grohnfeldt et al., 2018) | 0.043 | 15.494 | 25.59 | 0.764 |
SAR2OPT (Bermudez et al., 2018) | 0.042 | 14.788 | 25.87 | 0.793 |
SpA GAN (Pan, 2020) | 0.045 | 18.085 | 24.78 | 0.754 |
Simulation-Fusion GAN (Gao et al., 2020) | 0.045 | 16.633 | 24.73 | 0.701 |
DSen2-CR (Meraner et al., 2020) | 0.031 | 9.472 | 27.76 | 0.874 |
GLF-CR (Xu et al., 2022) | 0.028 | 8.981 | 28.64 | 0.885 |
UnCRtainTS L2 (Ebel et al., 2023) | 0.027 | 8.320 | 28.90 | 0.880 |
DiffCR (Zou et al., 2024) | 0.019 | 5.821 | 31.77 | 0.902 |
Feel free to report this benchmarking of prior methods in your work utilizing SEN12MS-CR.
Please reach out to us if you would like to have your work referenced and your model included in the benchmark.
1. Dataset
Download Link here and here (supplementary, e.g. splits)
Note: You can also download (parts of) the data in the terminal (passwd: m1554803) using wget or rsync, for instance via
wget "ftp://m1554803:m1554803@dataserv.ub.tum.de/ROIs1158_spring_s1.tar.gz"
rsync -chavzP --stats rsync://m1554803@dataserv.ub.tum.de/m1554803/ .
Update: You can now easily get the dataset via this automated downloading script here.
3. Trained Models
Cloud Removal Models
Model | Weights Download | Description |
---|---|---|
DSen2-CR | GoogleDrive pCloud Share | [Tensorflow] The network trained on SEN12MS-CR in Meraner et al 2020 on paired radar and cloudy optical satellite observations. |
ResNet | pCloud Share |
[PyTorch] A ResNet16 pre-trained for mono-temporal cloud removal on SEN12MS-CR. Takes radar and multi-spectral satellite data as inputs to make cloud-removed multispectral optical predictions. |
GLF-CR | Google Drive | [PyTorch] The vision transformer cloud removal network trained on SEN12MS-CR in Xu et al 2022 on paired radar and cloudy optical satellite observations. Code here. |
UnCRtainTS (t=1) | pCloud Share | [PyTorch] The monotemporal version of the uncertainty prediction network trained on SEN12MS-CR in Ebel et al 2023 on paired radar and cloudy optical satellite observations. Code here. |
1. Citation
    If you utilize this dataset in your work, please use the following citation:
@article{sen12mscr,
        title = {{Multisensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 Imagery}},
        author = {Ebel, Patrick and Meraner, Andrea and Schmitt, Michael and Zhu, Xiao Xiang},
        journal = {IEEE Transactions on Geoscience and Remote Sensing},
        year = {2020}
        publisher = {IEEE}
}