Medical Imaging Segmentation & Infrastructure
Details
Agenda:
•18:00 – Gathering
•18:30 – Nvidia medical imaging related products overview
•18:45 – GPU optimization during inference | Assaf Nahum, Nvidia
•19:00 – Swin UnetR | | Assaf Nahum, Nvidia
•19:30 – Segmentation of Medical Images via Patch-Wise Polygons & Prediction and Explainability Guided COVID-19 Detection in CT Scans | Tal Shaharabany, TAU
Swin UnetR by Assaf Nahum (Nvidia)
The popular "U-shaped" network architecture has achieved state-of-the-art performance benchmarks on different 2D and 3D semantic segmentation tasks and across various imaging modalities. However, due to the limited kernel size of convolution layers in FCNNs, their performance of modeling long-range information is sub-optimal, and this can lead to deficiencies in the segmentation of tumors with variable sizes. On the other hand, transformer models have demonstrated excellent capabilities in capturing such long-range information in multiple domains, including natural language processing and computer vision. Inspired by the success of vision transformers and their variants, we propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR). Specifically, the task of 3D brain tumor semantic segmentation is reformulated as a sequence to sequence prediction problem wherein multi-modal input data is projected into a 1D sequence of embedding and used as an input to a hierarchical Swin transformer as the encoder. The swin transformer encoder extracts features at five different resolutions by utilizing shifted windows for computing self-attention and is connected to an FCNN-based decoder at each resolution via skip connections. We have participated in BraTS 2021 segmentation challenge, and our proposed model ranks among the top-performing approaches in the validation phase.
arXiv Paper Link:
https://arxiv.org/abs/2201.01266
Segmentation of Medical Images via Patch-Wise Polygons by Tal Shaharabany (TAU)
The leading medical image segmentation methods represent the output map as a pixel grid. We present an alternative in which the object edges are modeled, per image patch, as a polygon with k vertices that is coupled with per-patch label probabilities. The vertices are optimized by employing a differentiable neural renderer to create a raster image. The delineated region is then compared with the ground truth segmentation. Our method obtains multiple state-of-the-art results for the Gland segmentation dataset (Glas), the Nucleus challenges (MoNuSeg), and multiple polyp segmentation datasets, as well as for non-medical benchmarks, including Cityscapes, CUB, and Vaihingen.
arXiv paper link:
https://arxiv.org/abs/2112.02535
Prediction and Explainability Guided COVID-19 Detection in CT Scans By Tal Shaharabany (TAU)
Radiological examination of chest CT is an effective method for screening COVID-19 cases. In this work, we overcome three challenges in the automation of this process: (i) the limited number of supervised positive cases, (ii) the lack of region-based supervision, and (iii) variability across acquisition sites. These challenges are met by incorporating a recent augmentation solution called SnapMix, a novel explainability-driven contrastive loss for patch embedding, and by performing test-time augmentation that masks out the most relevant patches in order to analyze the prediction stability. The three techniques are complementary and are all based on utilizing the heatmaps produced by the Class Activation Mapping (CAM) explainability method. State-of-the-art performance is obtained on three different datasets for COVID detection in CT scans.
arXiv paper link:
https://arxiv.org/pdf/2104.14506.pdf
