title: LiverProfile AI
sdk: docker
app_file: app.py
LiverProfile AI
Advanced AI-Powered Liver Segmentation and Analysis for Medical Imaging
LiverProfile AI is a state-of-the-art deep learning system designed for automatic liver segmentation and morphological analysis from 3D MRI volumes. Built on the SRMA-Mamba architecture, it provides accurate, real-time liver segmentation with comprehensive medical reporting capabilities.
Overview
LiverProfile AI leverages cutting-edge Mamba-based neural networks to automatically identify and segment liver tissue in MRI scans. The system supports both T1-weighted and T2-weighted MRI sequences, making it versatile for various clinical imaging protocols. Beyond segmentation, LiverProfile AI provides detailed morphological analysis including volume calculations, shape metrics, and automated medical reports.
What It Does
- Automatic Liver Segmentation: Accurately identifies and segments liver tissue in 3D MRI volumes
- Multi-Modality Support: Optimized for T1-weighted MRI sequences; T2 support is experimental/beta
- Morphological Analysis: Calculates liver volume, surface area, and shape characteristics
- Medical Reporting: Generates comprehensive reports with clinical insights
- Interactive Visualization: Slice-by-slice viewing with segmentation overlays
- Export Capabilities: Download segmentation masks in standard NIfTI format
- Segmentation Refinement: Automatic post-processing to remove fragmentation and smooth boundaries
- Quality Guardrails: Volume sanity checks and connected component validation
Key Features
Core Capabilities
- High Accuracy: Dice 0.94 Β± 0.02, IoU 0.89 Β± 0.03 on T1 sequences
- GPU-Accelerated Processing: Near-interactive inference with optimized memory management (typically 10-30s per volume on L40S)
- 3D Volume Support: Handles full 3D MRI volumes using sliding window inference
- Interactive UI: User-friendly Gradio interface with real-time visualization
- REST API: Programmatic access via FastAPI for integration into clinical workflows
- Medical Reports: Automated generation of clinical analysis reports
- Performance Monitoring: Real-time GPU utilization tracking and diagnostics
Technical Highlights
- Architecture: Spatial Reverse Mamba Attention (SRMA-Mamba) Network
- Optimization: Dynamic memory management for various GPU configurations
- Performance: Optimized for L40S (48GB), A100, and other high-VRAM GPUs
- Format Support: Standard NIfTI (.nii.gz) input/output
- CUDA Extensions: Optional mamba_ssm and selective_scan_cuda_oflex for maximum speed
- Compilation: torch.compile with reduce-overhead mode for faster inference
- Memory Layout: Channels-last 3D format for optimal GPU memory throughput
Model Performance
All results are mean Β± SD on held-out test sets; threshold = 0.5, no test-time augmentation.
| Metric | T1 (n=test set) | T2 (n=test set, experimental) |
|---|---|---|
| Dice (DSC) | 0.94 Β± 0.02 | 0.71 Β± 0.09 |
| IoU | 0.89 Β± 0.03 | 0.56 Β± 0.08 |
| HD95 (mm) | 6.2 Β± 2.1 | 18.4 Β± 7.0 |
| ASSD (mm) | 1.9 Β± 0.6 | 5.7 Β± 2.3 |
| Volume Error (%) | +3.1 Β± 6.5 | β14.8 Β± 12.2 |
Note: T2 performance is experimental and depends on scanner/protocol. Results vary significantly under domain shift. T2 support should be considered beta-quality and may require manual review.
Architecture
SRMA-Mamba Network Architecture
The system is built on the SRMA-Mamba (Spatial Reverse Mamba Attention) architecture, which combines:
- Mamba-based Encoder: Efficient state-space models for long-range dependencies in 3D medical volumes
- Spatial Reverse Attention: Captures multi-scale spatial features through reverse attention mechanisms
- Multi-Resolution Processing: Handles various volume sizes through sliding window inference
- Attention Mechanisms: Multi-head attention for feature refinement and spatial context
Model Components
- SRMA-Mamba Network: Main segmentation network with spatial reverse attention
- Sliding Window Inferer: Processes large volumes in overlapping windows to manage GPU memory
- Multi-scale Feature Extraction: Captures features at different resolutions for robust segmentation
- Attention Mechanisms: Spatial reverse attention for feature refinement
Model Architecture Details
The SRMA-Mamba architecture consists of:
- Input Processing: 3D volume input with channel-first format
- Encoder: Mamba-based encoder with spatial reverse attention blocks
- Decoder: Multi-scale decoder with skip connections
- Output Head: Segmentation head producing binary liver masks
The model processes 3D volumes using a sliding window approach:
- ROI Size: Typically 256x256x64 or 256x256x80 voxels per window
- Overlap: 0.10 (10% overlap between windows) for optimal balance
- Batch Processing: 1-2 windows processed concurrently based on GPU memory
- Aggregation: GPU-based aggregation by default for faster stitching
Complete Function Documentation
processing.py Functions
validate_nifti(nifti_img)
Validates NIfTI file structure and metadata.
Parameters:
nifti_img: nibabel NIfTI image object
Validations:
- Checks shape has at least 3 dimensions
- Ensures all dimensions are positive and <= 2000
- Validates voxel spacing is positive
- Checks for NaN or Inf values in data
Returns: True if valid, raises ValueError otherwise
preprocess_nifti(file_path, device=None)
Preprocesses NIfTI file for model input.
Parameters:
file_path: Path to NIfTI filedevice: PyTorch device (cuda or cpu)
Process:
- Loads NIfTI file with nibabel (memory-mapped for large files >100MB)
- Validates file structure and metadata
- Checks file size and dimensions
- Detects pre-normalized data (range [0,1])
- Applies MONAI transforms:
- LoadImaged: Load image data
- EnsureChannelFirstD: Add channel dimension if missing
- NormalizeIntensityd: Normalize intensity values (nonzero, channel-wise)
- ToTensord: Convert to PyTorch tensor
- Converts to float32
- Moves to GPU with non-blocking transfer
- Applies channels-last 3D memory layout for optimal GPU performance
- Pins memory for faster CPU-to-GPU transfers
Returns: Preprocessed PyTorch tensor on specified device
Diagnostics:
- Warns if file size < 100 KB (compression/low resolution)
- Warns if < 20 slices (incomplete volume)
- Warns if voxel spacing is default (1.0, 1.0, 1.0) - missing metadata
- Warns if integer data type (uint8/uint16) - compression artifacts
- Warns if extreme intensity values or low variance
refine_liver_mask_enhanced(mask, voxel_spacing, pred_probabilities, threshold, modality)
Enhanced liver mask refinement with spatial priors and quality checks.
Parameters:
mask: Binary segmentation mask (3D, 4D, or 5D numpy array)voxel_spacing: Tuple of (z, y, x) voxel spacing in mmpred_probabilities: Raw prediction probabilities from modelthreshold: Threshold used for binarizationmodality: MRI modality ('T1' or 'T2')
Process:
- Preserves original shape (handles 3D, 4D, 5D inputs)
- Applies spatial priors:
- Removes top 15% slices (diaphragm protection)
- Removes right 30% pixels (stomach protection)
- Removes left 15% pixels (spleen protection)
- Removes bottom 10% slices (lower abdomen protection)
- Connected component filtering: Keeps only largest component
- Morphological cleanup:
- Binary closing (ball radius=2) to fill gaps
- Hole filling to remove internal holes
- Binary opening (ball radius=2) to remove small spurious regions
- Optional 3D median filter smoothing (size=3)
- Re-keeps largest component after morphology
- Auto-rethresholding if no components found after spatial priors
Returns:
refined_mask: Refined binary mask (same shape as input)metrics: Dictionary with refinement statistics (voxels, components, volume change)confidence_score: Confidence score (0-100)
refine_liver_mask(mask, voxel_spacing=(1.0, 1.0, 1.0), enable_smoothing=True, min_component_size=None)
Basic liver mask refinement without spatial priors.
Parameters:
mask: Binary segmentation mask (3D, 4D, or 5D numpy array)voxel_spacing: Tuple of (z, y, x) voxel spacing in mmenable_smoothing: Whether to apply median filter smoothingmin_component_size: Minimum size for connected components to keep (None = keep only largest)
Process:
- Preserves original shape
- Connected component filtering: Keeps only largest component
- Morphological cleanup (closing, hole filling, opening)
- Optional 3D median filter smoothing
Returns:
refined_mask: Refined binary maskmetrics: Dictionary with refinement statistics
calculate_confidence_score(mask, pred_probabilities, threshold, num_components, volume_change_percent, guards_ok=True, voxel_spacing=(1.0, 1.0, 1.0))
Calculates confidence score for segmentation quality.
Parameters:
mask: Binary segmentation maskpred_probabilities: Raw prediction probabilitiesthreshold: Threshold used for binarizationnum_components: Number of connected componentsvolume_change_percent: Percentage change in volume after refinementguards_ok: Whether quality guardrails passedvoxel_spacing: Voxel spacing for volume calculation
Calculation:
- Base score: Average prediction probability in mask region
- Component penalty: Reduces score if multiple components
- Volume change penalty: Reduces score if large volume changes
- Guard penalty: Reduces score if quality guardrails failed
- Volume penalty: Reduces score if volume outside normal range
Returns: Confidence score (0-100)
calculate_liver_volume(pred_binary, voxel_spacing=(1.0, 1.0, 1.0))
Calculates liver volume in milliliters.
Parameters:
pred_binary: Binary segmentation maskvoxel_spacing: Tuple of (z, y, x) voxel spacing in mm
Calculation:
- Voxel volume = spacing[0] * spacing[1] * spacing[2] (mm^3)
- Liver voxels = sum of all positive voxels
- Volume (ml) = (liver_voxels * voxel_volume) / 1000.0
Returns: Liver volume in milliliters (float)
analyze_liver_morphology(pred_binary)
Analyzes morphological characteristics of segmentation.
Parameters:
pred_binary: Binary segmentation mask
Analysis:
- Connected component labeling
- Component size calculation
- Largest component ratio
- Fragmentation level classification:
- Low: largest_ratio > 0.95
- Moderate: largest_ratio > 0.80
- High: largest_ratio <= 0.80
Returns: Dictionary with:
connected_components: Number of connected componentslargest_component_ratio: Ratio of largest component to totalfragmentation: Fragmentation level (low/moderate/high)
check_volume_sanity(volume_ml)
Checks if liver volume is within normal physiological range.
Parameters:
volume_ml: Liver volume in milliliters
Normal Range: 1200-1800 ml (configurable via LIVER_VOL_LOW and LIVER_VOL_HIGH env vars)
Checks:
- CRITICAL: Volume < 50% of normal (< 600 ml) or > 150% of normal (> 2700 ml)
- WARNING: Volume < normal (< 1200 ml) or > normal (> 1800 ml)
- OK: Volume within normal range
Returns: Tuple of (status, message) where status is "OK", "WARNING", or "CRITICAL"
generate_medical_report(statistics, volume_ml, morphology, modality, confidence_score=0.0)
Generates comprehensive medical report.
Parameters:
statistics: Dictionary with segmentation statistics (voxels, percentage, shape)volume_ml: Liver volume in millilitersmorphology: Morphology analysis dictionarymodality: MRI modality ('T1' or 'T2')confidence_score: Confidence score (0-100)
Report Sections:
- Study Information: Date, time, modality, status, confidence
- Key Findings: Volume assessment, spatial distribution, quality issues
- Quantitative Measurements: Volume, percentage, voxels, morphology
- Quality Assessment: Segmentation quality, fragmentation, coverage
- Clinical Context: Clinical interpretation and recommendations
Returns: Formatted medical report string (Markdown)
inference.py Functions
adjust_roi_for_volume(volume_shape)
Adjusts sliding window ROI size based on input volume dimensions.
Parameters:
volume_shape: Shape of input volume tensor (4D or 5D)
Adjustments:
- Reduces ROI depth if > volume depth
- Reduces ROI height if > volume height
- Reduces ROI width if > volume width
- Reduces overlap for very large volumes (>20M voxels)
- Optimizes ROI depth for small volumes (<64 slices)
Returns: None (modifies WINDOW_INFER.roi_size in place)
predict_volume(nifti_file, modality, slice_idx=None)
Main prediction function for Gradio interface.
Parameters:
nifti_file: Uploaded NIfTI file (Gradio file object)modality: MRI modality ('T1' or 'T2')slice_idx: Optional slice index for visualization (default: middle slice)
Process:
- Acquires processing lock (prevents concurrent requests)
- Loads appropriate model (T1 or T2)
- Loads and validates NIfTI file
- Preprocesses volume
- Adjusts ROI size for volume dimensions
- Runs sliding window inference with AMP
- Applies sigmoid activation
- Threshold selection (grid search or default with fallback)
- Intensity gating (T1 only, if shapes match)
- Refines segmentation mask
- Calculates volume and morphology
- Generates medical report
- Creates visualization overlay
- Saves segmentation mask
- Releases processing lock
Returns: Tuple of (overlay_image, info_text, report_text, output_path)
Error Handling:
- Progressive OOM fallback: reduces batch size, ROI depth, switches to CPU aggregation
- Shape mismatch handling: resizes slices for overlay creation
- Threshold fallback: tries lower thresholds (0.35, 0.3, percentile) if default fails
predict_volume_api(file_path, modality='T1', slice_idx=None)
API version of prediction function.
Parameters:
file_path: Path to NIfTI file (string)modality: MRI modality ('T1' or 'T2')slice_idx: Optional slice index for visualization
Process: Same as predict_volume but returns JSON response
Returns: Dictionary with:
success: Booleanvolume_ml: Liver volumeliver_percentage: Percentage of scan volumesegmentation_path: Path to saved maskreport: Medical report textsegmentation_file: Base64-encoded mask fileoverlay_image: Base64-encoded overlay PNGmorphology: Morphology analysis dictionaryerror: Error message if failed
safe_predict_volume(nifti_file, modality, slice_idx=None)
Safe wrapper for predict_volume with error handling.
Parameters:
nifti_file: Uploaded NIfTI filemodality: MRI modalityslice_idx: Optional slice index
Returns: Same as predict_volume, but catches all exceptions and returns error message
model_loader.py Functions
clear_gpu_memory()
Clears GPU memory by unloading models.
Process:
- Deletes MODEL_T1 and MODEL_T2
- Deletes WINDOW_INFER
- Clears CUDA cache
- Synchronizes CUDA operations
Returns: None
load_model(modality='T1')
Loads and configures SRMA-Mamba model for inference.
Parameters:
modality: Model modality ('T1' or 'T2')
Process:
- Initializes CUDA device with retry logic
- Builds SRMA-Mamba architecture from config
- Loads pre-trained checkpoint weights
- Moves model to GPU
- Sets model to evaluation mode
- Configures TF32 for faster matmul operations
- Enables cuDNN benchmarking
- Applies torch.compile if enabled
- Configures sliding window inferer based on available VRAM:
- Very High VRAM (>40GB): ROI [256, 256, 80], batch_size=2
- High VRAM (>30GB): ROI [256, 256, 64], batch_size=2
- Medium VRAM (20-30GB): ROI [256, 256, 64], batch_size=1
- Low VRAM (10-20GB): ROI [224, 224, 64], batch_size=1
- Very Low VRAM (<10GB): Progressively smaller ROI, batch_size=1
- Sets aggregation device (GPU by default, CPU if VRAM < 2GB)
- Runs warm-up inference to trigger compilation and kernel autotuning
- Stores model in global variable (MODEL_T1 or MODEL_T2)
Returns: Loaded model instance
Checkpoint Loading:
- Searches for checkpoint_T1.pth or checkpoint_T2.pth in multiple locations
- Falls back to Hugging Face Hub download if local file not found
- Handles both 'state_dict' and direct state dict formats
app.py Functions
fix_gradio_schema_bug()
Monkeypatch to fix Gradio 4.44.x schema bug.
Issue: Gradio crashes when additionalProperties is boolean instead of dict in JSON schema.
Fix:
- Patches gradio_client.utils.get_type to handle boolean schemas
- Patches Blocks._get_api_info to normalize schemas before API generation
- Converts boolean additionalProperties to empty dict
Returns: None
log_startup_health()
Logs comprehensive startup health information.
Information Logged:
- PyTorch version and CUDA availability
- GPU name and memory status
- TF32 settings (matmul and conv)
- cuDNN benchmark status
- torch.compile status
- Library versions (MONAI, Gradio, NiBabel)
- CUDA extensions status (mamba_ssm, selective_scan_cuda_oflex)
- Environment variables (PYTORCH_ALLOC_CONF, ENABLE_CUDNN_BENCHMARK, etc.)
Returns: None
create_interface()
Creates Gradio interface for web UI.
Components:
- File upload input for NIfTI files
- Modality selector (T1/T2)
- Slice index slider for visualization
- Predict button
- Output image display (segmentation overlay)
- Output info text (volume, statistics)
- Output report text (medical report)
- Output file download (segmentation mask)
Returns: Gradio Blocks object
Processing Pipeline
Complete Workflow
Input Validation
- File format verification (NIfTI)
- Shape validation (minimum 3D, maximum 2000 per dimension)
- Voxel spacing validation
- NaN/Inf value detection
- File size limits (upload: max 2 GB, processing: max 2 GB)
- File size warnings (< 100 KB may indicate compression)
- Dimension warnings (< 20 slices may indicate incomplete volume)
- Metadata validation (voxel spacing, affine matrix)
Preprocessing (
processing.py::preprocess_nifti)- Load NIfTI file with nibabel (memory-mapped for large files >100MB)
- Apply MONAI transforms:
- LoadImaged: Load image data
- EnsureChannelFirstD: Add channel dimension if missing
- NormalizeIntensityd: Normalize intensity values (nonzero, channel-wise)
- ToTensord: Convert to PyTorch tensor
- Convert to float32
- Move to GPU with non-blocking transfer
- Apply channels-last 3D memory layout for optimal GPU performance
- Pin memory for faster CPU-to-GPU transfers
Model Loading (
model_loader.py::load_model)- Build SRMA-Mamba architecture
- Load pre-trained checkpoint weights (T1 or T2 modality)
- Move model to GPU
- Enable TF32 for faster matmul operations
- Enable cuDNN benchmarking
- Apply torch.compile (reduce-overhead mode by default)
- Configure sliding window inferer based on available VRAM:
- Very High VRAM (>40GB): ROI [256, 256, 80], batch_size=2
- High VRAM (>30GB): ROI [256, 256, 64], batch_size=2
- Medium VRAM (20-30GB): ROI [256, 256, 64], batch_size=1
- Low VRAM (<20GB): Progressively smaller ROI and batch_size=1
- Set aggregation device (GPU by default, CPU only if VRAM < 2GB)
- Run warm-up inference to trigger compilation and kernel autotuning
Inference (
inference.py::predict_volume)- Adjust ROI size based on input volume dimensions
- Monitor GPU utilization in real-time (background thread)
- Run sliding window inference with:
- Automatic Mixed Precision (AMP) enabled
- GPU compute, GPU aggregation (default)
- Channels-last 3D memory layout
- Apply sigmoid activation to convert logits to probabilities
- Threshold selection:
- Grid search: T1 uses [0.60-0.80], T2 uses [0.30-0.70]
- Default: T1=0.65, T2=0.5
- Fallback: Tries 0.35, 0.3, percentile-based if default gives 0 voxels
Post-Processing (
inference.py)- Intensity gating (T1 only):
- Calculates liver-like intensity range from right upper quadrant
- Clamps predictions outside intensity range
- Skips if shape mismatch between prediction and original data
- Size-aware auto-tune:
- Increases threshold if mask fraction > 4% or volume > 2200 ml
- Progressive OOM fallback:
- Stage 1: Reduce sw_batch_size to 1
- Stage 2: Reduce ROI depth to 48
- Stage 3: Reduce ROI depth to 32
- Stage 4: Switch to CPU aggregation
- Intensity gating (T1 only):
Segmentation Refinement (
processing.py::refine_liver_mask_enhanced)- Binarize mask (threshold > 0.5)
- Apply spatial priors:
- Remove top 15% slices (diaphragm protection)
- Remove right 30% pixels (stomach protection)
- Remove left 15% pixels (spleen protection)
- Remove bottom 10% slices (lower abdomen protection)
- Connected component filtering: Keep only largest component
- Morphological cleanup:
- Binary closing (ball radius=2) to fill gaps
- Hole filling to remove internal holes
- Binary opening (ball radius=2) to remove small spurious regions
- Optional 3D median filter smoothing (size=3)
- Re-keep largest component after morphology
- Preserve original shape (3D, 4D, or 5D)
Analysis and Reporting
- Calculate liver volume (ml) from voxel count and spacing
- Analyze morphology (connected components, fragmentation)
- Calculate confidence score
- Quality guardrails:
- Volume sanity check (normal range: 1200-1800 ml)
- Connected component validation (expect 1 component)
- Warnings for extreme values or fragmentation
- Generate medical report
Visualization
- Create overlay image (green mask on grayscale MRI)
- Extract middle slice or specified slice index
- Handle shape mismatches by resizing prediction slice to match original
- Convert to PIL Image for display
Output
- Save refined segmentation mask as NIfTI file
- Return volume statistics, report, and visualization
Dependencies and Libraries
Core Deep Learning Frameworks
- torch>=2.0.0: PyTorch deep learning framework with CUDA support
- torchvision>=0.15.0: Computer vision utilities and models
- monai>=1.4.0: Medical Open Network for AI - medical image processing, sliding window inference, transforms
Medical Imaging
- nibabel>=5.3.0: Neuroimaging Informatics Technology Initiative format support for reading/writing NIfTI files
- scipy>=1.10.0: Scientific computing library for morphological operations, connected components, filtering
- scikit-image>=0.20.0: Image processing library for binary morphological operations (closing, opening, hole filling)
Web Framework and API
- gradio==4.44.1: Interactive web interface for machine learning models
- fastapi>=0.115: Modern, fast web framework for building REST APIs
- uvicorn>=0.30: ASGI server for running FastAPI applications
- python-multipart>=0.0.6: Multipart form data parsing for file uploads
Data Processing and Utilities
- numpy>=1.24.0: Numerical computing library for array operations
- pandas>=2.0.0: Data manipulation and analysis
- Pillow>=9.5.0: Python Imaging Library for image processing and visualization
- opencv-python>=4.8.0: Computer vision library for image operations
Model Architecture and Training
- timm>=0.6.12: PyTorch Image Models - provides DropPath and other layer utilities
- fvcore>=0.1.5: Facebook Vision core utilities for model analysis
- einops: Tensor operations with readable syntax
- ninja: Build system for compiling CUDA extensions
- packaging: Version and dependency management utilities
- setuptools: Python packaging and distribution utilities
- wheel: Built-package format for Python
Hugging Face Integration
- huggingface-hub>=0.20.0: Client library for interacting with Hugging Face Hub
- transformers>=4.30.0: State-of-the-art natural language processing models
Configuration and Utilities
- pyyaml>=6.0: YAML parser for configuration files
- yacs>=0.1.8: Yet Another Configuration System for managing configs
- tqdm>=4.65.0: Progress bars for long-running operations
- scikit-learn>=1.3.0: Machine learning utilities
Performance Monitoring
- pynvml>=11.0.0: Python bindings for NVIDIA Management Library - GPU utilization monitoring
Optional CUDA Extensions (for maximum speed)
- mamba-ssm>=2.2.2: CUDA-accelerated Mamba state-space model operations
- selective_scan_cuda_oflex: Custom CUDA extension for selective scan operations (built from source)
Hugging Face Spaces
- spaces>=0.26.0: Hugging Face Spaces SDK for GPU resource management
Performance Optimizations
Memory Management
- Dynamic ROI size adjustment based on available VRAM
- Automatic batch size reduction on OOM
- CPU aggregation fallback for very low VRAM (<2GB)
- Pinned memory for faster transfers
- Memory-mapped NIfTI loading for large files
- Models stay loaded between requests (no reload overhead)
GPU Acceleration
- Channels-last 3D memory layout for better cache utilization
- TF32 enabled for faster matmul operations
- cuDNN benchmarking enabled
- GPU aggregation by default (faster stitching)
- Non-blocking transfers with pinned memory
Compilation and Caching
- torch.compile with reduce-overhead mode (faster first run)
- Optional max-autotune mode for maximum speed
- Warm-up inference to trigger kernel autotuning
- cuDNN autotune cache preservation between requests
- Models stay loaded between requests (no reload overhead)
CUDA Extensions
- Optional mamba_ssm for faster Mamba operations
- Optional selective_scan_cuda_oflex for faster selective scan
- Automatic fallback to PyTorch implementations if extensions unavailable
- Setup script (setup.sh) for building extensions
Configuration
Environment Variables
- ENABLE_TORCH_COMPILE: Enable/disable torch.compile (default: false)
- TORCH_COMPILE_MODE: Compile mode - "reduce-overhead" (default), "max-autotune", or "default"
- ENABLE_CUDNN_BENCHMARK: Enable cuDNN benchmarking (default: true)
- INFERENCE_TIMEOUT: Maximum inference time in seconds (default: 1800)
- MAX_GRADIO_CONCURRENCY: Maximum concurrent Gradio requests (default: 1)
- PYTORCH_ALLOC_CONF: PyTorch memory allocator config (default: expandable_segments:True,max_split_size_mb=128). Note: PyTorch uses PYTORCH_ALLOC_CONF for CUDA allocator configuration.
- T1_THRESHOLD: Default threshold for T1 modality (default: 0.65)
- SEGMENTATION_THRESHOLD: Default threshold for T2 modality (default: 0.5)
- LIVER_VOL_LOW: Lower bound of normal liver volume range in ml (default: 1200)
- LIVER_VOL_HIGH: Upper bound of normal liver volume range in ml (default: 1800)
- REQUIRE_CUDA_EXTENSIONS: If true, raises ImportError if CUDA extensions not installed (default: false)
Default Settings (Fast + Accurate Preset)
- AMP: Enabled (Automatic Mixed Precision)
- TF32: Enabled for faster matmul
- ROI Size: 256 x 256 x 64 (or 80 for >40GB VRAM)
- Overlap: 0.10
- Sliding Window Batch: 1 (or 2 for >30GB VRAM)
- Compute Device: GPU
- Aggregation Device: GPU (CPU only if VRAM < 2GB)
- Memory Layout: Channels-last 3D
- torch.compile: Disabled by default (enable with ENABLE_TORCH_COMPILE=true for benchmarking only)
- CUDA Extensions: Optional but recommended
API Documentation
Endpoints
POST /api/segment
Upload a NIfTI file for liver segmentation.
Parameters:
file: NIfTI file (multipart/form-data, required)modality: "T1" or "T2" (default: "T1")slice_idx: Optional slice index for visualization (default: middle slice)
Response:
{
"success": true,
"volume_ml": 1234.56,
"liver_percentage": 2.5,
"status": "NORMAL",
"mask_path_token": "secure-token-123",
"mask_download_url": "/api/download/secure-token-123",
"segmentation_file": "data:application/octet-stream;base64,...",
"overlay_image": "data:image/png;base64,...",
"report": "Medical report text...",
"morphology": {
"connected_components": 1,
"largest_component_ratio": 1.0,
"fragmentation": "low"
}
}
Note: For mask files > 2 GB, segmentation_file will be null and mask_path_token will be provided. Use mask_download_url to download the file. Tokens expire after 24 hours.
GET /api/health
Check API health and model status.
Response:
{
"status": "healthy",
"device": "cuda",
"model_t1_loaded": true,
"model_t2_loaded": true,
"gpu_name": "NVIDIA L40S",
"gpu_memory_gb": 48.0
}
Interactive API Docs
Visit /docs for Swagger UI documentation with interactive testing.
System Requirements
Recommended Hardware
| GPU | VRAM | Status | Performance | Settings |
|---|---|---|---|---|
| Nvidia L40S | 48 GB | Optimal | Best performance | ROI [256,256,80], batch=2 |
| Nvidia A100 | 40-80 GB | Excellent | Production-ready | ROI [256,256,64-80], batch=2 |
| Nvidia L4 | 24 GB | Good | Works well | ROI [256,256,64], batch=1 |
| Nvidia T4 | 16 GB | Limited | May require minimal settings | ROI [224,224,48], batch=1 |
Software Requirements
- Python 3.10+
- CUDA 11.8+ or 12.8+ (for GPU acceleration)
- PyTorch 2.0+ (tested with 2.9)
- 8GB+ RAM
- 10GB+ disk space for models and dependencies
Performance Optimization
Automatic Optimization
The system automatically optimizes based on available GPU memory:
- Very High VRAM (>40GB): ROI [256, 256, 80], batch_size=2, GPU aggregation
- High VRAM (>30GB): ROI [256, 256, 64], batch_size=2, GPU aggregation
- Medium VRAM (20-30GB): ROI [256, 256, 64], batch_size=1, GPU aggregation
- Low VRAM (10-20GB): ROI [224, 224, 64], batch_size=1, GPU aggregation
- Very Low VRAM (<10GB): Progressively smaller ROI, batch_size=1, CPU aggregation if <2GB
Manual Optimization Tips
- Install CUDA Extensions: Run
bash setup.shto build mamba_ssm and selective_scan_cuda_oflex - Monitor GPU Utilization: Check logs for GPU utilization warnings
- Adjust Compile Mode: Set
TORCH_COMPILE_MODE=max-autotunefor maximum speed (after extensions installed) - Disable Compile for Testing: Set
ENABLE_TORCH_COMPILE=falsefor faster first run
Performance Metrics
- First Inference: 30-60s (with reduce-overhead compile) or 2-5min (with max-autotune)
- Subsequent Inferences: 10-30s depending on volume size
- GPU Utilization: Target 70-90%+ (monitored automatically)
- Memory Usage: 15-25GB typical on L40S with optimal settings
Quality Assurance
Segmentation Refinement Pipeline
The system automatically refines raw model outputs:
- Connected Component Filtering: Keeps only the largest component (removes false positives)
- Morphological Cleanup:
- Binary closing (fills gaps)
- Hole filling (removes internal holes)
- Binary opening (removes small spurious regions)
- Smoothing: Optional 3D median filter for jagged surfaces
Quality Guardrails
- Volume Sanity Check: Warns if volume outside normal range (1200-1800 ml)
- Connected Components: Validates single dominant component
- Fragmentation Analysis: Detects and reports high fragmentation
- Visual Inspection Recommendations: Suggests manual review for extreme cases
Troubleshooting
Common Segmentation Failures
The model automatically detects and warns about common input quality issues:
1. Low Resolution / Compressed Files
Symptoms:
- File size < 100 KB
- Very small dimensions (< 10 voxels in any axis)
- Low prediction confidence (max < 0.3)
Causes:
- Downsampled or compressed input loses texture and boundary cues
- MRI slices depend on voxel intensity gradients - compression distorts them
- Model loses spatial context with reduced resolution
Solutions:
- Use original, uncompressed NIfTI files
- Avoid downsampling before upload
- Ensure minimum resolution: at least 100x100x20 voxels
2. Missing Metadata
Symptoms:
- Voxel spacing = (1.0, 1.0, 1.0) (default values)
- Unusual affine determinant
- Incorrect volume calculations
Causes:
- Metadata lost during .nii/.png conversions
- File compression removes header information
- Manual conversion tools may not preserve affine/spacing
Solutions:
- Use original DICOM or NIfTI files with intact headers
- Verify voxel spacing matches scanner parameters
- Check affine matrix is preserved during conversion
3. Single Slice or Incomplete Volumes
Symptoms:
- Very few slices (< 20)
- Small dimension in one axis
- Model sees incomplete anatomy
Causes:
- Only one mid-slice uploaded instead of full volume
- Cropped or partial volumes
- Model expects full 3D context
Solutions:
- Upload complete 3D volumes (typically 50-200 slices)
- Ensure all anatomical regions are included
- Model performs best with full volume context
4. Normalization Mismatch
Symptoms:
- Integer data type (uint8/uint16) instead of float32
- Extreme intensity values (> 10000 or < -1000)
- Very low data variance
- Low prediction confidence
Causes:
- Input not properly normalized to model's expected range
- Integer compression artifacts
- Data type conversion issues
Solutions:
- Model expects normalized float32 tensors
- Use original DICOM or properly converted NIfTI
- Avoid manual intensity scaling or type conversion
5. Threshold Issues
Symptoms:
- Zero voxels segmented
- Grid search fails
- Very low prediction values
Causes:
- Strict threshold (e.g., 0.5) filters out valid low-confidence voxels
- Model predictions are low due to input quality issues
- Threshold too high for the data distribution
Solutions:
- System automatically tries lower thresholds (0.35, 0.3, percentile-based)
- Check input quality warnings in logs
- Verify preprocessing is working correctly
Automatic Diagnostics
The system automatically checks and warns about:
- File size: Warns if < 100 KB (may indicate compression)
- Dimensions: Warns if < 20 slices or very small dimensions
- Voxel spacing: Warns if default (1.0, 1.0, 1.0) values detected
- Data type: Warns if integer types (uint8/uint16) detected
- Intensity range: Warns if extreme values or low variance
- Prediction confidence: Warns if max prediction < 0.3 or mean < 0.1
- Affine matrix: Warns if unusual determinant values
All warnings are printed in the logs to help diagnose issues before they cause segmentation failures.
Common Issues
OOM (Out of Memory) Errors
- System automatically reduces ROI size and batch size
- Check GPU memory with nvidia-smi
- Restart Space to clear GPU memory if needed
Slow First Inference
- Normal: torch.compile takes 30-60s on first run
- Set ENABLE_TORCH_COMPILE=false to disable compilation
- Install CUDA extensions for faster compilation
Low GPU Utilization
- Install CUDA extensions (mamba_ssm, selective_scan_cuda_oflex)
- Verify GPU aggregation is enabled (check logs)
- Check channels-last layout is active
CUDA Extension Build Failures
- Ensure CUDA toolkit is installed
- Check PyTorch and CUDA versions match
- System will fall back to PyTorch implementations
Limitations
- Domain shift: Performance may degrade on unseen scanners/protocols, especially T2 sequences. T2 support is experimental and results may vary significantly.
- Header dependence: Requires valid NIfTI affine/zooms; lossy conversions or missing metadata may cause failures or incorrect volume calculations.
- Partial FOV: Small field-of-view or partial liver volumes can cause under-segmentation; flagged by quality guardrails.
- Orientation dependence: Spatial priors assume RAS (Right-Anterior-Superior) orientation. Inputs are automatically reoriented, but unusual orientations may affect spatial prior effectiveness.
- Body size variance: Normal liver volume range (1200-1800 ml) is for average adult body size. Pediatric patients or extreme body sizes may have different normal ranges and should not trigger false CRITICAL warnings.
- Not for clinical use: Research only; manual review recommended for all outputs, especially for T2 sequences or when status is WARNING/CRITICAL/FAILURE.
File Structure
srmamamba-liver-segmentation/
βββ app.py # Main application entry point (Gradio + FastAPI)
β # - Sets up environment variables (PYTORCH_ALLOC_CONF, TRITON_CACHE_DIR)
β # - Fixes Gradio schema bug (fix_gradio_schema_bug)
β # - Logs startup health (log_startup_health)
β # - Creates FastAPI app with CORS middleware
β # - Creates Gradio interface (create_interface)
β # - Defines API endpoints (/segment, /health)
β # - Launches Gradio app
β
βββ config.py # Configuration and environment setup
β # - Sets OMP_NUM_THREADS
β # - Sets PYTORCH_ALLOC_CONF
β # - Imports and checks CUDA extensions (mamba_ssm, selective_scan_cuda_oflex)
β # - Imports build_SRMAMamba from model configs
β # - Defines BUILD_SRMAMAMBA_AVAILABLE flag
β # - Defines SRMA_MAMBA_DIR path
β
βββ model_loader.py # Model loading and sliding window configuration
β # - clear_gpu_memory(): Unloads models and clears GPU cache
β # - load_model(modality): Loads SRMA-Mamba model, configures sliding window
β
βββ processing.py # Preprocessing, refinement, and report generation
β # - validate_nifti(): Validates NIfTI file structure
β # - preprocess_nifti(): Preprocesses NIfTI for model input
β # - refine_liver_mask_enhanced(): Enhanced refinement with spatial priors
β # - refine_liver_mask(): Basic refinement without spatial priors
β # - calculate_confidence_score(): Calculates segmentation confidence
β # - calculate_liver_volume(): Calculates volume in ml
β # - analyze_liver_morphology(): Analyzes connected components and fragmentation
β # - check_volume_sanity(): Checks if volume is within normal range
β # - generate_medical_report(): Generates comprehensive medical report
β
βββ inference.py # Core inference logic and API endpoints
β # - adjust_roi_for_volume(): Adjusts ROI size based on volume dimensions
β # - predict_volume(): Main prediction function for Gradio
β # - predict_volume_api(): API version of prediction function
β # - safe_predict_volume(): Safe wrapper with error handling
β
βββ requirements.txt # Python dependencies with pinned versions
βββ setup.sh # CUDA extension build script
βββ post_build.sh # Post-build script for Python Spaces (fallback)
βββ postBuild # Hugging Face Spaces post-build script
βββ app.yaml # Hugging Face Spaces configuration (sdk: docker)
βββ Dockerfile # Docker image definition for deployment
βββ checkpoint_T1.pth # Pre-trained T1 model weights
βββ checkpoint_T2.pth # Pre-trained T2 model weights
β
βββ SRMA-Mamba/ # Model architecture code
βββ model/
β βββ SRMAMamba.py # Main model architecture
β βββ vmamba2.py # Mamba backbone
β βββ csm_triton.py # Triton kernels (optional)
β βββ csms6s.py # Selective scan operations
β βββ mamba2/ # Mamba2 implementation
β βββ selective_state_update.py
β βββ ssd_combined.py
β βββ ...
βββ configs/
β βββ config.py # General configuration
β βββ model_configs.py # Model configuration and build function
β βββ vssm1/
β βββ vmambav2_tiny_224.yaml # Model architecture YAML
βββ selective_scan/ # Selective scan CUDA extension source
βββ setup.py # Extension build script
βββ csrc/ # CUDA source code
Quick Start
Using the Web Interface
- Upload a 3D NIfTI MRI volume (
.nii.gzformat) - Select the MRI modality (T1 or T2)
- Click "Segment Liver" to run inference
- View the segmentation overlay and medical report
- Download the 3D segmentation mask
Using the API
import requests
# Upload and segment
with open('liver_scan.nii.gz', 'rb') as f:
response = requests.post(
'https://your-api-url/api/segment',
files={'file': f},
data={'modality': 'T1'}
)
result = response.json()
# Access segmentation file, volume, and report
Installation
Requirements
- Python 3.10+
- CUDA-capable GPU (recommended: 24GB+ VRAM for optimal performance)
- CUDA 11.8+ or 12.8+ (for GPU acceleration)
- PyTorch 2.0+ (tested with PyTorch 2.9)
- 8GB+ RAM
- 10GB+ disk space for models and dependencies
Setup
# Clone the repository
git clone https://huggingface.co/spaces/HarshithReddy01/srmamamba-liver-segmentation
cd srmamamba-liver-segmentation
# Install dependencies
pip install -r requirements.txt
# Optional: Build CUDA extensions for maximum speed
bash setup.sh
# Run the application
python app.py
Building CUDA Extensions (Optional, for Maximum Speed)
The setup.sh script automatically builds CUDA extensions:
bash setup.sh
This will:
- Install mamba-ssm (CUDA extension for Mamba operations)
- Build selective_scan_cuda_oflex (custom CUDA extension)
- Verify installation
If extensions are not available, the system automatically falls back to PyTorch implementations (slower but still functional).
Citation
If you use LiverProfile AI in your research, please cite:
Repository:
@software{liverprofile_ai_2025,
title={LiverProfile AI: SRMA-Mamba Liver Segmentation},
author={Harshith Reddy},
year={2025},
url={https://huggingface.co/spaces/HarshithReddy01/srmamamba-liver-segmentation},
note={Preprint/In preparation}
}
Related Work (if available):
@article{zeng2025srma,
title={SRMA-Mamba: Spatial Reverse Mamba Attention Network for Pathological Liver Segmentation in MRI Volumes},
author={Zeng, Jun and Huang, Yannan and Keles, Elif and Aktas, Halil Ertugrul and Durak, Gorkem and Tomar, Nikhil Kumar and Trinh, Quoc-Huy and Nayak, Deepak Ranjan and Bagci, Ulas and Jha, Debesh},
journal={arXiv preprint arXiv:2508.12410},
year={2025},
note={If published, please use the published citation}
}
Disclaimer
Important: This software is intended for research purposes only. It is not approved for clinical use or diagnostic purposes without proper validation and regulatory approval. Always consult with qualified medical professionals for clinical decision-making.
Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
Contact
For questions, support, or collaboration inquiries:
- Email: harshithreddy0117@gmail.com
- Hugging Face Space: srmamamba-liver-segmentation
License
This project is provided for research and educational purposes. Please refer to the original SRMA-Mamba paper for licensing details.
LiverProfile AI - Empowering Medical Imaging with AI
Built for the medical imaging community