
BEST OUTCOME
Earthquake Rate and Average Magnitude Prediction Using XGBoost
Using machine learning and geospatial analysis to support earthquake preparedness and environmental risk mitigation.
DEMONSTRATED CAPABILITY
System-Level Perception Design
🎓 Graduate-level Systems Thinking
Designed and evaluated an interactive 3D reconstruction system by balancing neural rendering fidelity, geometric consistency, and hardware constraints to support scalable AR/VR experiences.
Analyzed and compared alternative reconstruction approaches to select methods that balance visual quality, computational efficiency, and interactivity under constrained resources.
Compared alternative technical approaches by analyzing trade-offs in accuracy, robustness, and resource constraints to inform system-level design decisions.
Compared alternative technical approaches by analyzing trade-offs in accuracy, robustness, and resource constraints to inform system-level design decisions.
What This Project Achieves
This project applies machine learning and geospatial analysis to understand and forecast earthquake risk associated with human energy activities. Working independently, the contributor developed XGBoost models to predict earthquake rates and average magnitudes across short- and long-term horizons. By combining predictive modeling with spatial visualization over the Los Angeles Basin, the project demonstrates how data-driven insights can help government agencies and planners identify high-risk zones and improve earthquake preparedness and environmental decision-making.
How This Was Built — Key Highlights
This project followed a machine learning and geospatial analysis workflow to model and communicate earthquake risk linked to energy production activities. The approach integrated data processing, feature engineering, predictive modeling, and spatial visualization to translate complex seismic patterns into interpretable insights.
Cleaned, merged, and filtered well injection, production, and earthquake datasets, converting geospatial coordinates into grid-based polygon geometries for spatial analysis.
Engineered temporal and operational features, including rolling 3-month, 1-year, and 5-year averages, lagged variables, and injection–production ratios.
Trained XGBoost regression models separately for each forecasting horizon to predict earthquake rates and average magnitudes.
Generated distribution and prediction maps across the Los Angeles Basin to visualize spatial risk patterns.
Evaluated model performance using RMSE and R² metrics, observing stronger stability and accuracy over longer forecasting windows.
Challenges
This project involved several technical and analytical challenges due to the complexity of seismic data and the independent execution of the full modeling pipeline.
Integrating heterogeneous datasets required careful cleaning and alignment to ensure meaningful spatial and temporal relationships.
Designing geospatial grids and polygon-based representations introduced a steep learning curve in spatial data handling.
Balancing model complexity with interpretability was necessary to ensure results could be communicated to non-technical audiences.
Insights
Analysis of model behavior and experimental results revealed important insights into both seismic forecasting and applied machine learning.
Longer forecasting horizons produced more stable and reliable predictions, with significant improvements in RMSE and R² values over time.
Operational factors such as cumulative fluid injection and production played a major role in influencing earthquake rates.
Combining predictive modeling with spatial visualization greatly enhanced the interpretability and practical usefulness of the results.
Project Gallery
Academic Team Feedback
Feedback from the Project Lead—a researcher in earthquake hazards at MIT’s Civil and Environmental Engineering Department and Harvard’s Earth and Planetary Sciences Department—highlighted the exceptional initiative and independence demonstrated in this project. Drawing on his background in geophysics and industry experience in energy-related subsurface activities, he noted that Qi Hui’s ability to scope, execute, and advance a complex modeling project entirely on her own reflects strong technical maturity and research readiness. Despite working independently while most peers collaborated in teams, she made substantial progress and delivered high-quality results, ranking her in the top 10% of the PBL cohort. The Academic Coordinator similarly emphasized her intellectual rigor, organization, and clear sense of ownership, recognizing the project as a standout example of disciplined, self-directed applied research.
Project Reflection
This project allowed me to apply machine learning and geospatial analysis to a real-world environmental challenge while independently managing the full project lifecycle. It strengthened my confidence in using predictive modeling and visualization together to communicate complex insights that support practical decision-making.





