Can AI learn to read a slide like a pathologist, not just spot the obvious hotspots?

PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification

Published 3/21/2025 by Sharon Peled, Yosef E. Maruvka, Moti Freiman

Get notified when new papers like this one come out!

Overview

PSA-MIL introduces a probabilistic approach to analyzing medical whole slide images (WSIs)
The method models spatial dependencies between image regions using probabilistic attention
Outperforms previous methods on multiple cancer datasets (TCGA-NSCLC, CAMELYON16)
Achieves state-of-the-art accuracy while maintaining interpretability
Computationally efficient compared to existing attention-based methods

Plain English Explanation

When doctors analyze tissue samples under a microscope, they're looking for subtle patterns that might indicate cancer. These digital microscope slides (called whole slide images) are enormous - like trying to find a specific house by examining a detailed map of an entire city.

PSA-MIL helps solve this problem. Think of it as an intelligent assistant that first breaks the massive image into smaller patches, then learns which patches deserve more attention based on their relationships with neighboring regions.

The key innovation in PSA-MIL is how it handles uncertainty. Traditional methods might say "this patch is 100% important" or "this patch doesn't matter at all." But PSA-MIL recognizes that medical diagnosis isn't so black and white - it assigns probability scores to each region and considers how neighboring regions might influence each other.

This approach mirrors how actual pathologists work. Doctors don't just look at isolated cells; they examine patterns across the tissue. Similarly, PSA-MIL creates a probabilistic spatial attention mechanism that weighs the importance of different areas while accounting for their spatial relationships.

The result is more accurate cancer detection that can highlight suspicious regions for human experts to verify. It's like having a digital assistant pre-screen thousands of slides, allowing pathologists to focus their expertise where it matters most.

Key Findings

PSA-MIL achieved 95.1% accuracy on the TCGA-NSCLC lung cancer dataset, outperforming previous state-of-the-art methods
On CAMELYON16, the model reached an AUC of 0.971, demonstrating strong generalization capabilities
The probabilistic attention mechanism effectively captured significant tumor regions with greater precision than deterministic approaches
Visual analysis confirmed the model focused on clinically relevant tissue areas according to pathology expertise
Ablation studies showed both the probabilistic nature and spatial context components contributed significantly to performance gains
The method maintained computational efficiency with minimal additional parameters compared to baseline MIL approaches

The results validate that incorporating uncertainty through probabilistic attention helps the model make more reliable predictions when analyzing complex histopathology images. The spatial component proved particularly valuable for identifying subtle patterns spread across different tissue regions.

Technical Explanation

PSA-MIL builds upon the multiple instance learning (MIL) framework, which treats WSIs as bags of instances (patches). The architecture consists of three main components: a feature extractor, the probabilistic spatial attention module, and a classifier.

For feature extraction, the authors used a ResNet-50 pretrained on ImageNet. Each WSI was divided into patches of 256×256 pixels at 20× magnification, creating thousands of patches per slide. These patches were embedded into a 512-dimensional feature space.

The core innovation lies in the probabilistic spatial attention module, which differs from standard attention mechanisms in two key ways:

It models attention weights as probability distributions rather than deterministic values
It incorporates spatial context by considering the relationships between neighboring patches

Mathematically, the attention mechanism uses a Gaussian parameterization to produce a mean and variance for each attention score, rather than a fixed value. This allows the model to express uncertainty about which regions are important. The spatial component uses a graph neural network structure to aggregate information from surrounding patches, enhancing the context awareness.

For training, the authors employed a combination of cross-entropy loss and a KL divergence regularization term to prevent the probability distributions from collapsing to deterministic values. They used the Adam optimizer with a learning rate of 1e-4 and batch size of 1 due to memory constraints.

The implementation was developed in PyTorch and experiments were conducted on NVIDIA A100 GPUs. Compared to other attention-based MIL approaches, PSA-MIL required only marginally more computational resources while providing significantly better performance.

Critical Analysis

Despite its strong performance, PSA-MIL has several limitations worth considering. First, the model still requires slide-level labels for training, whereas obtaining precise annotations for gigapixel images remains expensive and time-consuming. A semi-supervised approach could potentially address this constraint.

The paper primarily evaluated PSA-MIL on binary classification tasks (cancer vs. non-cancer), but many clinical applications require multi-class classification or regression. The authors acknowledged this limitation but didn't thoroughly explore how the probabilistic approach would scale to more complex prediction tasks.

Interpretability, while improved through the attention mechanism, remains imperfect. The model highlights regions of interest, but doesn't explicitly explain the reasoning behind these selections. For clinical deployment, more robust explanations would be beneficial for pathologist collaboration.

The spatial contextual awareness is limited to local neighborhoods of patches. However, some pathological patterns span much larger regions, which might not be fully captured by the current approach. Future work could explore multi-scale spatial relationships to address this limitation.

Computational efficiency, while better than some competitors, still presents challenges for real-time analysis. The need to process thousands of patches sequentially creates bottlenecks that could limit clinical adoption where rapid diagnosis is essential.

Finally, the datasets used, while standard benchmarks, may not fully represent the diversity of real-world histopathology samples. More extensive validation across diverse patient populations and scanning equipment would strengthen confidence in the method's generalization capabilities.

Conclusion

PSA-MIL represents a significant advancement in computational pathology by addressing fundamental limitations in how AI systems analyze whole slide images. By modeling uncertainty through probabilistic attention and incorporating spatial context, the approach aligns more closely with how human pathologists examine tissue samples.

The improved performance on cancer detection tasks suggests that probabilistic methods may be essential for handling the inherent ambiguity in medical imaging. Not every region in a pathology slide carries the same diagnostic certainty, and PSA-MIL's ability to express confidence levels in its attention mechanism acknowledges this reality.

For the broader field of computational pathology, this research highlights the importance of moving beyond treating images as collections of independent patches. Tissue architecture and spatial relationships contain crucial diagnostic information that simpler models miss.

As digital pathology adoption grows worldwide, techniques like PSA-MIL could help address the increasing workload facing pathologists while improving diagnostic accuracy. The ability to quickly identify regions of interest in massive whole slide images could streamline workflows and potentially enable earlier cancer detection.

Future developments might combine this approach with multimodal data integration, incorporating genetic information and patient history to create more comprehensive diagnostic tools. As research continues to refine these methods, we can expect increasingly sophisticated AI assistants that augment rather than replace human expertise in pathology.

Original Paper

View on arxiv(opens in a new tab)

Highlights

No highlights yet