
BEST OUTCOME
Enhancing Biomedical Literature Retrieval with Semantic Textual Similarity
A domain-aware teacher–student model improving biomedical semantic similarity and clinical text retrieval.
What This Project Achieves
This project develops a biomedical semantic textual similarity (STS) system to address the limitations of keyword-based literature search. Using a teacher–student architecture with PubMedBERT as the domain expert and BioLinkBERT as the student, the team built a model trained on BioASQ snippet–answer pairs. Their adjusted dual-loss strategy improved stability, reduced overfitting, and achieved optimal performance around epochs four to five, enabling more reliable biomedical question–answer retrieval.
How This Was Built — Key Highlights
This project uses a teacher–student modeling setup to improve biomedical semantic similarity learning. The process combines dataset construction, model training, and targeted adjustments to stabilize performance and reduce noise.
Constructed a dataset from BioASQ Task 13b questions, ideal answers, and supporting snippets, ensuring clear separation between positive and negative examples.
Applied PubMedBERT as the teacher model to generate similarity labels and fine-tuned BioLinkBERT as the student model.
Introduced an adjusted dual-loss method (SmoothL1 + alignment loss + adaptive weighting) to address unstable gradients and improve convergence.
Identified that training around epochs 4–5 produced the most reliable results, with later epochs showing signs of overfitting.
Challenges
This project encountered several issues related to model stability and data quality, which influenced training behavior and required systematic troubleshooting. These challenges highlight where semantic similarity modeling becomes sensitive to noise, inconsistency, and parameter choices.
Training instability occurred when using cosine-only or MSE-based loss functions, resulting in oscillating and unpredictable gradients.
Variations in snippet quality and structure produced inconsistent feature matches during early training iterations.
Dataset leakage across train, validation, and test splits indicated the need for stricter snippet-level deduplication.
Insights
The findings from model behavior and experimentation provided practical lessons about effective training strategies and the importance of domain-specific signals. These insights guided improvements to stability, generalization, and semantic understanding.
Adaptive dual-loss training (SmoothL1 + alignment loss + adaptive weighting) produced more stable gradients and delayed overfitting.
Domain-specific supervision from PubMedBERT improved semantic understanding beyond simple keyword similarity.
Careful control of training duration, especially around epochs 4–5, maximized performance before overfitting began.
Project Gallery
Academic Team Feedback
The academic team noted that this project demonstrated strong methodological rigor, with each architectural and training decision supported by systematic experimentation and clear analysis. The contributor showed a solid understanding of model behavior, addressed limitations thoughtfully, and articulated realistic next steps for improvement. The Project Lead, a researcher in machine learning and NLP, highlighted the work as a well-executed example of applied semantic modeling in a biomedical context.
Project Reflection
This PBL helped us understand how semantic similarity models can be built and evaluated in a real biomedical setting, moving beyond theory into practical reasoning about model behavior. Working within a teacher–student framework strengthened our ability to interpret instability, refine training strategies, and apply domain knowledge effectively.








