Supervisor:
Maxim Sharaev, Head of BIMAI-Lab
Description:
To diagnose cancer, a pathologist needs to look through a microscope, BUT first, a complex chemical experiment must be performed — one that is expensive, takes days, and irreversibly consumes a piece of the patient's tissue.
We are teaching neural networks to convert a standard, low-cost tissue image (H&E) into a complex molecular assay (IHC) in seconds.
The project addresses a fundamental problem in histopathology — replacing costly and labor-intensive chemical staining (IHC) with digital prediction from H&E images. We aim to overcome the limitations of existing methods and offer two research tracks that can be combined:
1. Generative Design: Research and adaptation of modern architectures — Diffusion Bridges, Flow-based models, and Rectified Flows. We work with the public IHC4BC dataset, which surpasses alternatives (BCI, MIST) but remains largely unexplored in the literature. The goal is to experiment with architectural choices, loss functions, and feature extraction to achieve biological accuracy at the cellular level.
2. Data Engineering: Creating a unique, "clean" dataset for training reliable staining models. Unlike existing works, where data is full of artifacts or manually curated, we are developing an automated preprocessing pipeline — from non-rigid registration (WSI registration) to intelligent filtering algorithms. There are no established standards in this area yet — this work could become a first in the field.
What experience will the intern gain?
- Hands-on experience with SOTA approaches (Diffusion, Flow-based) beyond traditional GANs
- The final project could form the basis of systems that help doctors save lives
- Understanding of the specific nature of medical data
- Skills in conducting experiments, identifying weaknesses in published papers, and developing original architectural solutions
- Experience working with ultra-high-resolution images (WSI — Whole Slide Images) and preprocessing pipelines
- Opportunity to contribute to a publication in high-impact journals and conferences such as MICCAI/CVPR
Internship duration: 2 months
Internship start date: flexible
Requirements for the candidate:
Required:
Python — confident level, experience writing custom modules (e.g., DataLoaders)
Basic understanding of Generative AI (how GANs or basic diffusion approaches work)
Knowledge of digital image processing fundamentals (OpenCV, filters, transformations)
Preferred:
Familiarity with modern libraries (diffusers, timm, albumentations)
Plus:
Knowledge of image registration methods or experience with libraries such as SimpleITK / Kornia
Experience participating in competitions (Kaggle)
Monthly compensation: not available
Contact person: Maxim Sharaev, m.sharaev@skoltech.ru