Dissertations, Master's Theses and Master's Reports

A DOMAIN-ADAPTED NATURAL LANGUAGE PROCESSING FRAMEWORK FOR MINING SAFETY ANALYTICS: FROM LATENT PATTERN DISCOVERY TO AUTOMATED INFORMATION EXTRACTION

Abid Ali Khan Danish, Michigan Technological UniversityFollow

Date of Award

2025

Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy in Mining Engineering (PhD)

Administrative Home Department

Department of Geological and Mining Engineering and Sciences

Advisor 1

Snehamoy Chatterjee

Committee Member 1

Luke Bowman

Committee Member 2

Mohammadhossein Sadeghiamirshahidi

Committee Member 3

Sidike Paheding

Abstract

Occupational safety remains a critical global concern, particularly in high-risk sectors such as mining, where analysis of historical accident data is essential for identifying hazards and guiding preventive measures. While structured data has traditionally supported retrospective safety analytics, the rich contextual information embedded in unstructured accident narratives remains largely underutilized in current safety management practice for proactive risk assessment and targeted safety intervention. This dissertation addresses this gap by presenting a comprehensive natural language processing (NLP) framework that addresses fundamental challenges through an integrated approach combining semantic text analysis, domain-adaptive language modeling, and automated safety information extraction for mining safety analytics. Traditional NLP techniques face significant limitations in capturing nuanced semantic relationships within unstructured text and in effectively integrating structured metadata. To address these, the framework introduces a clustering-based semantic analysis framework that combines transformer-based sentence embeddings, nonlinear dimensionality reduction, and k-means clustering. By incorporating structured metadata into the embedding process, this unsupervised approach reveals latent accident patterns not captured by conventional techniques, while highlighting the contextual limitations of general-purpose language models in understanding domain-specific safety language. To address these limitations, the study employs Domain-Adaptive Pretraining (DAPT) of Bidirectional Encoder Representations from Transformers (BERT) and parameter-efficient architecture, A Lite BERT (ALBERT), on a multi-source safety corpus spanning mining, construction, transportation, and chemical processing sectors. The resulting models, SafetyBERT and SafetyALBERT, demonstrate substantial improvements in both intrinsic and extrinsic evaluations, outperforming general-domain and larger models, including Llama 3.1-8B, across multiple safety-specific single-task and multi-task classification tasks. Leveraging these models, the framework is extended to an extractive question answering (QA) system that uses SafetyBERT to automatically extract critical safety information, such as incident causes, work activities, and injury types, from unstructured narratives. To reduce annotation costs, an integrated hybrid active learning (AL) strategy is proposed. The AL cold-start problem is mitigated through a strategic seed selection process using unsupervised, embedding-based clustering, followed by a hybrid querying mechanism that combines uncertainty-based and confidence-based sampling for iterative model refinement, achieving robust performance on the extractive QA task. Altogether, this dissertation contributes a robust NLP framework that advances occupational safety analytics in the mining domain. The integrated approach combining unsupervised pattern discovery, domain-adaptive language modeling, and automated safety-critical information extraction enables proactive hazard management and targeted safety interventions.

Recommended Citation

Danish, Abid Ali Khan, "A DOMAIN-ADAPTED NATURAL LANGUAGE PROCESSING FRAMEWORK FOR MINING SAFETY ANALYTICS: FROM LATENT PATTERN DISCOVERY TO AUTOMATED INFORMATION EXTRACTION", Campus Access Dissertation, Michigan Technological University, 2025.

https://doi.org/10.37099/mtu.dc.etdr/1965

Download

Available for download on Saturday, August 01, 2026

COinS

Dissertations, Master's Theses and Master's Reports

A DOMAIN-ADAPTED NATURAL LANGUAGE PROCESSING FRAMEWORK FOR MINING SAFETY ANALYTICS: FROM LATENT PATTERN DISCOVERY TO AUTOMATED INFORMATION EXTRACTION

Date of Award

Document Type

Degree Name

Administrative Home Department

Advisor 1

Committee Member 1

Committee Member 2

Committee Member 3

Abstract

Recommended Citation

LINKS

Browse

Search

Author Corner

Dissertations, Master's Theses and Master's Reports

A DOMAIN-ADAPTED NATURAL LANGUAGE PROCESSING FRAMEWORK FOR MINING SAFETY ANALYTICS: FROM LATENT PATTERN DISCOVERY TO AUTOMATED INFORMATION EXTRACTION

Author

Date of Award

Document Type

Degree Name

Administrative Home Department

Advisor 1

Committee Member 1

Committee Member 2

Committee Member 3

Abstract

Recommended Citation

Share

LINKS

Browse

Search

Author Corner