Enhancing Biomedical Literature Retrieval with Semantic Textual Similarity

OUTCOME SPOTLIGHT

Enhancing Biomedical Literature Retrieval with Semantic Textual Similarity

A domain-aware teacher–student model improving biomedical semantic similarity and clinical text retrieval.

Project Outcome

Attended PBL

Hugging Face Project

AI in Natural Language Processing

Build and apply AI models to analyze, classify, and understand natural language, gaining practical skills in modern NLP techniques and tools.

View this Project

DEMONSTRATED CAPABILITY

What This Project Achieves

This project develops a biomedical semantic textual similarity (STS) system to address the limitations of keyword-based literature search. Using a teacher–student architecture with PubMedBERT as the domain expert and BioLinkBERT as the student, the team built a model trained on BioASQ snippet–answer pairs. Their adjusted dual-loss strategy improved stability, reduced overfitting, and achieved optimal performance around epochs four to five, enabling more reliable biomedical question–answer retrieval.

How This Was Built — Key Highlights

This project uses a teacher–student modeling setup to improve biomedical semantic similarity learning. The process combines dataset construction, model training, and targeted adjustments to stabilize performance and reduce noise.

Constructed a dataset from BioASQ Task 13b questions, ideal answers, and supporting snippets, ensuring clear separation between positive and negative examples.
Applied PubMedBERT as the teacher model to generate similarity labels and fine-tuned BioLinkBERT as the student model.
Introduced an adjusted dual-loss method (SmoothL1 + alignment loss + adaptive weighting) to address unstable gradients and improve convergence.
Identified that training around epochs 4–5 produced the most reliable results, with later epochs showing signs of overfitting.

Challenges

This project encountered several issues related to model stability and data quality, which influenced training behavior and required systematic troubleshooting. These challenges highlight where semantic similarity modeling becomes sensitive to noise, inconsistency, and parameter choices.

Training instability occurred when using cosine-only or MSE-based loss functions, resulting in oscillating and unpredictable gradients.
Variations in snippet quality and structure produced inconsistent feature matches during early training iterations.
Dataset leakage across train, validation, and test splits indicated the need for stricter snippet-level deduplication.

Insights

The findings from model behavior and experimentation provided practical lessons about effective training strategies and the importance of domain-specific signals. These insights guided improvements to stability, generalization, and semantic understanding.

Adaptive dual-loss training (SmoothL1 + alignment loss + adaptive weighting) produced more stable gradients and delayed overfitting.
Domain-specific supervision from PubMedBERT improved semantic understanding beyond simple keyword similarity.
Careful control of training duration, especially around epochs 4–5, maximized performance before overfitting began.

Project Gallery

Academic Team Feedback

The academic team noted that this project demonstrated strong methodological rigor, with each architectural and training decision supported by systematic experimentation and clear analysis. The contributor showed a solid understanding of model behavior, addressed limitations thoughtfully, and articulated realistic next steps for improvement. The Project Lead, a researcher in machine learning and NLP, highlighted the work as a well-executed example of applied semantic modeling in a biomedical context.

Project Contributor(s)

Yuhang Zhou

Shibaura Institute of Technology • Japan

Swati Rajesh

National University of Singapore • Singapore

Chat

Jiwoo Hong

Korea University • Korea

Chat

Mariam Alamoodi

United Arab Emirates University • UAE

Chat

Shuyi Li

University of Tokyo • Japan

Chat

Project Reflection

This PBL helped us understand how semantic similarity models can be built and evaluated in a real biomedical setting, moving beyond theory into practical reasoning about model behavior. Working within a teacher–student framework strengthened our ability to interpret instability, refine training strategies, and apply domain knowledge effectively.