Skip to Main content Skip to Navigation
Conference papers

U-Net Transformer: Self and Cross Attention for Medical Image Segmentation

Abstract : Medical image segmentation remains particularly challenging for complex and low-contrast anatomical structures. In this paper, we introduce the U-Transformer network, which combines a U-shaped architecture for image segmentation with self-and cross-attention from Transformers. U-Transformer overcomes the inability of U-Nets to model long-range contextual interactions and spatial dependencies, which are arguably crucial for accurate segmentation in challenging contexts. To this end, attention mechanisms are incorporated at two main levels: a self-attention module leverages global interactions between encoder features, while cross-attention in the skip connections allows a fine spatial recovery in the U-Net decoder by filtering out non-semantic features. Experiments on two abdominal CT-image datasets show the large performance gain brought out by U-Transformer compared to U-Net and local Attention U-Nets. We also highlight the importance of using both self-and cross-attention, and the nice interpretability features brought out by U-Transformer.
Document type :
Conference papers
Complete list of metadata
Contributor : Olivier Petit Connect in order to contact the contributor
Submitted on : Wednesday, September 8, 2021 - 4:46:22 PM
Last modification on : Friday, August 5, 2022 - 2:54:00 PM
Long-term archiving on: : Friday, December 10, 2021 - 9:52:49 AM


Files produced by the author(s)


  • HAL Id : hal-03337089, version 1



Olivier Petit, Nicolas Thome, Clement Rambour, Loic Themyr, Toby Collins, et al.. U-Net Transformer: Self and Cross Attention for Medical Image Segmentation. MICCAI workshop MLMI, Sep 2021, Strasbourg (virtuel), France. ⟨hal-03337089⟩



Record views


Files downloads