Automatic cluster selection using gap statistics for pattern-based multi-point geostatistical simulation

Document Type


Publication Date



Department of Geological and Mining Engineering and Sciences


An automatic cluster number selection algorithm is proposed for multi-point geostatistical simulation. The multi-point simulation is performed by extracting patterns from training image. The computational time of the pattern-based simulation is significantly reduced by dimension reduction of patterns by principal component analysis (PCA). The traditional PCA is used for its simplicity and computational ease. The patterns are classified using their principal components (PCs) by the k-means clustering algorithm. The number of clusters is selected automatically by calculating the gap statistics. The conditional cumulative density function (ccdf) for each class was generated based on the frequency of the central node value of the template. For sequential simulation, the similarity of the conditioning data with the class prototypes is measured using the L2-norm. The ccdf of best-matched class is used to draw a pattern from a class. The algorithm is validated with examples of conditional and unconditional simulation. The results show that the spatial continuity in terms of reproduction of curvilinear structure is well reproduced in all examples. The reproductions of first- and second-order statistics are also very good for all examples. A comparative study with the wavesim and filtersim techniques show that the proposed algorithm performed better than the filtersim and performed more or less very similar to the wavesim algorithm; however, the computational time of the proposed method is similar to filtersim and relatively less than that of the wavesim algorithm. The sensitivity of the algorithm on a number of PCs and the number of clusters have also been tested. Results revealed that automatic cluster selection helps to improve the performance of the proposed method.

Publication Title

Arabian Journal of Geosciences