Date of Award

2024

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Statistics (PhD)

Administrative Home Department

Department of Mathematical Sciences

Advisor 1

Benjamin Ong

Committee Member 1

Allan Struthers

Committee Member 2

Byung-Jun Kim

Committee Member 3

Laura Brown

Abstract

Analyzing high-dimensional data exposes the challenge associated with the "curse of dimensionality", making data analysis computationally intensive. To tackle this, dimension reduction techniques play a pivotal role in simplifying high-dimensional data. These methods can be categorized into two groups: linear and non-linear dimension reduction, with the latter accommodating complex data structures.

Our focus is on the development of an incremental non-linear dimension reduction method for streaming data based on the Geometric Multi-Resolution Analysis framework. The primary goal is to assess the incremental GMRA's effectiveness compared to the batch GMRA approach and overcome key challenges specific to streaming data scenarios.

Key challenges include the incremental update of the existing cluster map to align it with a cluster map generated from a bulk dataset, the incremental updating of PCA basis vectors instead of recomputing them entirely, determining whether continuous updating of PCA basis vectors is necessary or if bulk updating suffices, and exploring the necessity of updating and computing wavelet coefficients every time the GMRA structure undergoes incremental updates.

Numerical experiments conducted to assess the proposed Incremental GMRA method's performance showed that the algorithm demonstrates adaptability as it accurately represents nonlinear manifolds even with small initial sample sizes, and the final approximation closely aligns with the batch GMRA results. A unique advantage of the incremental approach is the ability to maintain the multiscale structure with updated basis vectors, resulting in efficient updates. Additionally, we observe a decay pattern in wavelet coefficients, aiding in the determination of the required depth of approximation.

This research emphasizes the potential of the incremental GMRA approach for efficient dimension reduction in streaming data scenarios by handling evolving and complex manifold structures. It also addresses and overcomes key challenges encountered in this context.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Available for download on Saturday, April 12, 2025

Share

COinS