
OUTCOME SPOTLIGHT
When Complexity Fails: Why Simple Augmentation Outperforms CycleGAN for Pneumonia Detection
A rigorous failure analysis demonstrating that simpler data augmentation methods can outperform complex generative models in clinically sensitive medical imaging tasks.
DEMONSTRATED CAPABILITY
Evidence-Based Model Evaluation
System-Level Insight
Designed and evaluated a medical imaging AI pipeline by stress-testing complex methods against strong baselines and diagnosing failure mechanisms that impact diagnostic reliability under clinical constraints.
Analyzed alternative modeling approaches through controlled comparisons, selecting methods based on empirical evidence rather than assumed algorithmic superiority.
Diagnosed clinically relevant failure mechanisms by combining quantitative performance analysis with visual model-behavior inspection.
Interpreted experimental evidence to make defensible trade-offs between reliability, complexity, and resource cost in high-stakes decision settings.
What This Project Achieves
This project investigates whether complex generative models meaningfully improve medical image classification compared to traditional data augmentation. Using pneumonia detection from chest X-rays as a case study, the work systematically compares CycleGAN-based augmentation against simpler augmentation techniques across multiple metrics. The results reveal that traditional methods significantly outperform CycleGAN, achieving substantially higher diagnostic reliability. By documenting why and how generative models fail in this context, the project provides practical guidance for deploying clinically responsible and resource-efficient medical AI systems.
How This Was Built — Key Highlights
This project implemented a controlled experimental framework to evaluate data augmentation strategies for medical image classification.
Constructed a pneumonia detection pipeline using chest X-ray datasets totaling 5,840 images.
Implemented five augmentation strategies, including traditional geometric augmentation and CycleGAN-based synthetic generation.Trained and evaluated classification models under each augmentation regime using consistent architectures and metrics.
Applied GradCAM visualization, intensity distribution analysis, and failure mode diagnostics to examine model behavior beyond accuracy.
Documented failure mechanisms such as mode collapse, contrast distortion, and attention misalignment.
Challenges
Evaluating generative models in a clinical imaging context introduced several challenges.
CycleGAN training was unstable and computationally expensive relative to dataset size.
Synthetic images exhibited subtle intensity shifts that corrupted diagnostically relevant features.
Model performance metrics alone were insufficient to explain observed failures without deeper interpretability analysis.
Balancing experimental rigor with clinical relevance required careful metric selection and validation.
Insights
The study yielded several important insights relevant to medical AI deployment.
Increased algorithmic complexity does not guarantee improved clinical performance.
Generative models can introduce harmful artifacts that are invisible to standard performance metrics.
Failure analysis and interpretability tools are essential for validating diagnostic integrity.
Simple, well-understood methods can outperform complex models under real-world data constraints.
Project Gallery
Academic Team Feedback
Feedback from the Project Lead—a Senior Imaging Scientist at Novartis and researcher at MIT specializing in AI for clinical trials—highlighted this work as an unusually rigorous and clinically grounded analysis of model failure. Drawing on his experience translating medical imaging research into production systems, he emphasized the importance of the student’s decision to challenge prevailing assumptions about generative model superiority. The project was commended for its systematic experimentation, clear identification of failure mechanisms such as mode collapse and diagnostic contrast corruption, and strong use of interpretability tools to support conclusions. The Academic Coordinator additionally noted Sean’s ownership, analytical depth, and ability to communicate complex findings clearly while supporting peers throughout the project.
Project Reflection
This project reinforced the importance of understanding why models fail, not just whether they perform well. By dissecting failure mechanisms in complex generative models, I gained perspective on how evidence-based evaluation can guide safer and more effective deployment of AI in medical settings.






