Earthquake Rate and Average Magnitude Prediction Using XGBoost

OUTCOME SPOTLIGHT

Earthquake Rate and Average Magnitude Prediction Using XGBoost

Using machine learning and geospatial analysis to support earthquake preparedness and environmental risk mitigation.

Project Outcome

Github

Attended PBL

Shell Project

AI for Energy and Sustainability

Leverage machine learning to predict and mitigate seismic activity, enhancing safety and sustainability in subsurface energy operations.

View this Project

DEMONSTRATED CAPABILITY

What This Project Achieves

This project applies machine learning and geospatial analysis to understand and forecast earthquake risk associated with human energy activities. Working independently, the contributor developed XGBoost models to predict earthquake rates and average magnitudes across short- and long-term horizons. By combining predictive modeling with spatial visualization over the Los Angeles Basin, the project demonstrates how data-driven insights can help government agencies and planners identify high-risk zones and improve earthquake preparedness and environmental decision-making.

How This Was Built — Key Highlights

This project followed a machine learning and geospatial analysis workflow to model and communicate earthquake risk linked to energy production activities. The approach integrated data processing, feature engineering, predictive modeling, and spatial visualization to translate complex seismic patterns into interpretable insights.

Cleaned, merged, and filtered well injection, production, and earthquake datasets, converting geospatial coordinates into grid-based polygon geometries for spatial analysis.
Engineered temporal and operational features, including rolling 3-month, 1-year, and 5-year averages, lagged variables, and injection–production ratios.
Trained XGBoost regression models separately for each forecasting horizon to predict earthquake rates and average magnitudes.
Generated distribution and prediction maps across the Los Angeles Basin to visualize spatial risk patterns.
Evaluated model performance using RMSE and R² metrics, observing stronger stability and accuracy over longer forecasting windows.

Challenges

This project involved several technical and analytical challenges due to the complexity of seismic data and the independent execution of the full modeling pipeline.

Integrating heterogeneous datasets required careful cleaning and alignment to ensure meaningful spatial and temporal relationships.
Designing geospatial grids and polygon-based representations introduced a steep learning curve in spatial data handling.
Balancing model complexity with interpretability was necessary to ensure results could be communicated to non-technical audiences.

Insights

Analysis of model behavior and experimental results revealed important insights into both seismic forecasting and applied machine learning.
Longer forecasting horizons produced more stable and reliable predictions, with significant improvements in RMSE and R² values over time.
Operational factors such as cumulative fluid injection and production played a major role in influencing earthquake rates.
Combining predictive modeling with spatial visualization greatly enhanced the interpretability and practical usefulness of the results.

Project Gallery

Academic Team Feedback

Feedback from the Project Lead—a researcher in earthquake hazards at MIT’s Civil and Environmental Engineering Department and Harvard’s Earth and Planetary Sciences Department—highlighted the exceptional initiative and independence demonstrated in this project. Drawing on his background in geophysics and industry experience in energy-related subsurface activities, he noted that Qi Hui’s ability to scope, execute, and advance a complex modeling project entirely on her own reflects strong technical maturity and research readiness. Despite working independently while most peers collaborated in teams, she made substantial progress and delivered high-quality results, ranking her in the top 10% of the PBL cohort. The Academic Coordinator similarly emphasized her intellectual rigor, organization, and clear sense of ownership, recognizing the project as a standout example of disciplined, self-directed applied research.

Project Contributor(s)

Qi Hui Choy

National University of Singapore • Singapore

Chat

Project Reflection

This project allowed me to apply machine learning and geospatial analysis to a real-world environmental challenge while independently managing the full project lifecycle. It strengthened my confidence in using predictive modeling and visualization together to communicate complex insights that support practical decision-making.