MedCondDiff: Lightweight, Robust, Semantically Guided Diffusion for Medical Image Segmentation

Published in arxiv preprint, 2025

Abstract: We introduce MedCondDiff, a diffusion-based framework for multi-organ medical image segmentation that is efficient and anatomically grounded. The model conditions the denoising process on semantic priors extracted by a Pyramid Vision Transformer (PVT) backbone, yielding a semantically guided and lightweight diffusion architecture. This design improves robustness while reducing both inference time and VRAM usage compared to conventional diffusion models. Experiments on multi-organ, multi-modality datasets demonstrate that MedCondDiff delivers competitive performance across anatomical regions and imaging modalities, underscoring the potential of semantically guided diffusion models as an effective class of architectures for medical imaging tasks.

Recommended citation: R. Huang and J. Li, “MedCondDiff: Lightweight, Robust, Semantically Guided Diffusion for Medical Image Segmentation,” arXiv preprint arXiv:2512.00350, 2025.
Download Paper

Ruirui (Annie) Huang