Evaluation of unsupervised Machine Learning algorithms with climate data

Authors

DOI:

https://doi.org/10.14482/inde.40.02.622.553

Keywords:

Climate, Clustering, Machine Learning, K-means, K-medoids

Abstract

By using climate data, researchers have difficulty determining the clustering algorithm and the best performing parameters for processing a specific dataset.

We perform the evaluation of the following unsupervised machine learning algorithms: K-means, K-medoids and Linkage-complete, which are applied to three datasets with climatological variables (temperature, rainfall, relative humidity and solar radiation) for three meteorological stations located in the department of Caldas, Colombia, at different heights above sea level. Five scenarios are defined for 2, 3 and 5 clusters for each of the two partitioned algorithms, and five scenarios for the hierarchical algorithm, in each one of the meteorological stations. Different quantities and groupings of variables are applied for the different scenarios by using Euclidean distance. Davis-Bouldin is the applied method of quality evaluation of clusters. Normalization with techniques such as range-transformation and Z-transformation, as well as some iterations of the algorithm and reduction of dimensionality with PCA. In addition, the computational cost is evaluated. This research can guide the researcher about certain decisions in cluster analysis used in meteorological data, as well as identify the most important algorithm and parameters to take into consideration by the best performance, according to the particular conditions and requirements.

References

Á. Arroyo, Á. Herrero, V. Tricio, and E. Corchado, “Analysis of meteorological conditions in Spain by means of clustering techniques,” J. Appl. Log., vol. 24, pp. 76–89, 2017. Available: https://doi.org/10.1016/j.jal.2016.11.026

M. A. Asadi Zarch, B. Sivakumar, and A. Sharma, “Assessment of global aridity change,” J. Hydrol., vol. 520, pp. 300–313, 2015. Available: https://doi.org/10.1016/j.jhydrol.2014.11.033

L. Carro-Calvo, C. Ordóñez, R. García-Herrera, and J. L. Schnell, “Spatial clustering and meteorological drivers of summer ozone in Europe,” Atmos. Environ., vol. 167, pp. 496–510, 2017. Available: https://doi.org/10.1016/j.atmosenv.2017.08.050

M. J. Carvalho, P. Melo-Gonçalves, J. C. Teixeira, and A. Rocha, “Regionalization of Europe based on a K-Means Cluster Analysis of the climate change of temperatures and precipitation,” Phys. Chem. Earth, vol. 94, pp. 22–28, 2016. Available: https://doi.org/10.1016/j.pce.2016.05.001

J. Chen, M. Song, and L. Xu, “Evaluation of environmental efficiency in China using data envelopment analysis,” Ecol. Indic., vol. 52, pp. 577–583, 2015. Available: https://doi.org/10.1016/j.ecolind.2014.05.008

L. Chen and G. Jia, “Environmental efficiency analysis of China’s regional industry?: a data envelopment analysis (DEA) based approach,” J. Clean. Prod., vol. 142, pp. 846–853, 2017. Available: https://doi.org/10.1016/j.jclepro.2016.01.045

R. Falquina and C. Gallardo, “Development and application of a technique for projecting novel and disappearing climates using cluster analysis,” Atmos. Res., vol. 197, no. July, pp. 224–231, 2017. Available: https://doi.org/10.1016/j.atmosres.2017.06.031

A. M. Kalteh, P. Hjorth, and R. Berndtsson, “Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application,” Environ. Model. Softw., vol. 23, no. 7, pp. 835–845, 2008. Available: https://doi.org/http://dx.doi.org/10.1016/j.envsoft.2007.10.001

S. C. Sheridan and C. C. Lee, “The self-organizing map in synoptic climatological research,” Prog. Phys. Geogr., vol. 35, no. 1, pp. 109–119, 2011. Available: https://doi.org/10.1177/0309133310397582

X. Wang et al., “A stepwise cluster analysis approach for downscaled climate projection - A Canadian case study,” Environ. Model. Softw., vol. 49, pp. 141–151, 2013. Available: https://doi.org/10.1016/j.envsoft.2013.08.006

Y. Zheng et al., “Vegetation response to climate conditions based on NDVI simulations using stepwise cluster analysis for the Three-River Headwaters region of China,” Ecol. Indic., no. September 2016, pp. 0–1, 2017. Available: https://doi.org/10.1016/j.ecolind.2017.06.040

X. Zuo, H. Hua, Z. Dong, and C. Hao, “Environmental Performance Index at the Provincial Level for China 2006–2011,” Ecol. Indic., vol. 75, pp. 48–56, 2017. Available: https://doi.org/10.1016/j.ecolind.2016.12.016

S. A. Cashman et al., “Mining Available Data from the United States Environmental Protection Agency to Support Rapid Life Cycle Inventory Modeling of Chemical Manufacturing,” Environ. Sci. Technol., vol. 50, no. 17, pp. 9013–9025, 2016. Available: https://doi.org/10.1021/acs.est.6b02160

C. Gallo, N. Faccilongo, and P. La Sala, “Clustering analysis of environmental emissions: A study on Kyoto Protocol’s impact on member countries,” J. Clean. Prod., 2017. Available: https://doi.org/10.1016/j.jclepro.2017.07.194

J. Jiang, B. Ye, D. Xie, and J. Tang, “Provincial-level carbon emission drivers and emission reduction strategies in China: Combining multi-layer LMDI decomposition with hierarchical clustering,” J. Clean. Prod., vol. 169, pp. 178–190, 2017. Available: https://doi.org/10.1016/j.jclepro.2017.03.189

I. Meghea, M. Mihai, I. Lacatusu, and I. Iosub, “Evaluation of Monitoring of Lead Emissions in Bucharest by Statistical Processing,” J. Environ. Prot. Ecol., vol. 13, no. 2, pp. 746–755, 2012. Available: http://www.scopus.com/inward/record.url?eid=2-s2.0-84864251930&partnerID=MN8TOARS

N. Clay and B. King, “Smallholders uneven capacities to adapt to climate change amid Africa’s green revolution: Case study of Rwanda’s crop intensification program,” World Dev., vol. 116, pp. 1–14, 2019. Available: https://doi.org/S0305750X18304285

N. D. Abdul Halim et al., “The long-term assessment of air quality on an island in Malaysia,” Heliyon, vol. 4, no. 12, 2018. Available: https://doi.org/10.1016/j.heliyon.2018.e01054

T. Conradt, C. Gornott, and F. Wechsung, “Extending and improving regionalized winter wheat and silage maize yield regression models for Germany: Enhancing the predictive skill by panel definition through cluster analysis,” Agric. For. Meteorol., vol. 216, pp. 68–81, 2016. Available: https://doi.org/10.1016/j.agrformet.2015.10.003

S. Farah, D. Whaley, W. Saman, and J. Boland, “Integrating Climate Change into Meteorological Weather Data for Building Energy Simulation,” Energy Build., vol. 183, pp. 749–760, 2019. Available: https://doi.org/S0378778818323296

T. Soubdhan, M. Abadi, and R. Emilion, “Time dependent classification of solar radiation sequences using best information criterion,” Energy Procedia, vol. 57, pp. 1309–1316, 2014. Available: https://doi.org/10.1016/j.egypro.2014.10.121

S. Khedairia and M. T. Khadir, “Impact of clustered meteorological parameters on air pollutants concentrations in the region of Annaba, Algeria,” Atmos. Res., vol. 113, pp. 89–101, 2012. Available: https://doi.org/10.1016/j.atmosres.2012.05.002

T. Schneider, H. Hampel, P. V. Mosquera, W. Tylmann, and M. Grosjean, “Paleo-ENSO revisited: Ecuadorian Lake Pallcacocha does not reveal a conclusive El Niño signal,” Glob. Planet. Change, vol. 168, no. February, pp. 54–66, 2018. Available: https://doi.org/10.1016/j.gloplacha.2018.06.004

F. Franceschi, M. Cobo, and M. Figueredo, “Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogotá Colombia, using Artificial Neural Networks, Principal Component Analysis, and k-means clustering,” Atmos. Pollut. Res., vol. 9, no. 5, pp. 912–922, 2018. Available: https://doi.org/10.1016/j.apr.2018.02.006

A. K. Yadav, H. Malik, and S. S. Chandel, “Application of rapid miner in ANN based prediction of solar radiation for assessment of solar energy resource potential of 76 sites in Northwestern India,” Renew. Sustain. Energy Rev., vol. 52, pp. 1093–1106, 2015. Available: https://doi.org/10.1016/j.rser.2015.07.156

Y. Hao, L. Dong, X. Liao, J. Liang, L. Wang, and B. Wang, “A novel clustering algorithm based on mathematical morphology for wind power generation prediction,” Renew. Energy, vol. 136, pp. 572–585, 2019. Available: https://doi.org/10.1016/j.renene.2019.01.018

S. Han et al., “Quantitative evaluation method for the complementarity of wind–solar–hydro power and optimization of wind–solar ratio,” Appl. Energy, vol. 236, no. December 2018, pp. 973–984, 2019. Available: https://doi.org/10.1016/j.apenergy.2018.12.059

M. André, R. Perez, T. Soubdhan, J. Schlemmer, R. Calif, and S. Monjoly, “Preliminary assessment of two spatio-temporal forecasting technics for hourly satellite-derived irradiance in a complex meteorological context,” Sol. Energy, vol. 177, no. December 2018, pp. 703–712, 2019. Available: https://doi.org/10.1016/j.solener.2018.11.010

P. Lin, Z. Peng, Y. Lai, S. Cheng, Z. Chen, and L. Wu, “Short-term power prediction for photovoltaic power plants using a hybrid improved Kmeans-GRA-Elman model based on multivariate meteorological factors and historical power datasets,” Energy Convers. Manag., vol. 177, no. July, pp. 704–717, 2018. Available: https://doi.org/10.1016/j.enconman.2018.10.015

F. Mokdad and B. Haddad, “Improved infrared precipitation estimation approaches based on k-means clustering: Application to north Algeria using MSG-SEVIRI satellite data,” Adv. Sp. Res., vol. 59, no. 12, pp. 2880–2900, 2017. Available: https://doi.org/10.1016/j.asr.2017.03.027

S. Li, H. Ma, and W. Li, “Typical solar radiation year construction using k-means clustering and discrete-time Markov chain,” Appl. Energy, vol. 205, no. May, pp. 720–731, 2017. Available: https://doi.org/10.1016/j.apenergy.2017.08.067

M. Ghayekhloo, M. Ghofrani, M. B. Menhaj, and R. Azimi, “A novel clustering approach for short-term solar radiation forecasting,” Sol. Energy, vol. 122, pp. 1371–1383, 2015. Available: https://doi.org/10.1016/j.solener.2015.10.053

M. Bador, P. Naveau, E. Gilleland, M. Castellà, and T. Arivelo, “Spatial clustering of summer temperature maxima from the CNRM-CM5 climate model ensembles & E-OBS over Europe,” Weather Clim. Extrem., vol. 9, pp. 17–24, 2015. Available: https://doi.org/10.1016/j.wace.2015.05.003

L. Pokorná, M. Ku?erová, and R. Huth, “Annual cycle of temperature trends in Europe, 1961–2000,” Glob. Planet. Change, vol. 170, no. August, pp. 146–162, 2018. Available: https://doi.org/10.1016/j.gloplacha.2018.08.015

J. Parente, M. G. Pereira, and M. Tonini, “Space-time clustering analysis of wildfires: The influence of dataset characteristics, fire prevention policy decisions, weather and climate,” Sci. Total Environ., vol. 559, pp. 151–165, 2016. Available: https://doi.org/10.1016/j.scitotenv.2016.03.129

M. I. Chidean, J. Muñoz-Bulnes, J. Ramiro-Bargueño, A. J. Caamaño, and S. Salcedo-Sanz, “Spatio-temporal trend analysis of air temperature in Europe and Western Asia using data-coupled clustering,” Glob. Planet. Change, vol. 129, pp. 45–55, 2015. Available: https://doi.org/10.1016/j.gloplacha.2015.03.006

M. I. Chidean, A. J. Caamaño, J. Ramiro-Bargueño, C. Casanova-Mateo, and S. Salcedo-Sanz, “Spatio-temporal analysis of wind resource in the Iberian Peninsula with data-coupled clustering,” Renew. Sustain. Energy Rev., vol. 81, no. June, pp. 2684–2694, 2018. Available: https://doi.org/10.1016/j.rser.2017.06.075

Y. Zheng et al., “Assessment of global aridity change,” Ecol. Indic., vol. 75, no. September 2016, pp. 151–165, 2016. Available: https://doi.org/10.1016/j.scitotenv.2015.11.063

J. Ramirez, Juan; Duque, Nestor; Velez, “Normalización en desempeño de k-means sobre datos climáticos,” Vínculos, vol. 16, pp. 57–72, 2019. Available: https://doi.org/10.14483/2322939X.15550

D. G. de B. Franco and M. T. A. Steiner, “Clustering of solar energy facilities using a hybrid fuzzy c-means algorithm initialized by metaheuristics,” J. Clean. Prod., vol. 191, pp. 445–457, 2018. Available: https://doi.org/10.1016/j.jclepro.2018.04.207

J. Hidalgo et al., “Comparison between local climate zones maps derived from administrative datasets and satellite observations,” Urban Clim., vol. 27, no. November 2017, pp. 64–89, 2019. Available: https://doi.org/10.1016/j.uclim.2018.10.004

C. C. Aggarwal and C. K. Reddy, DATA Custering Algorithms and Applications. CRC Press, 2013. Available: https://doi.org/10.1201/9781315373515

G. Gan, C. Ma, and J. Wu, Data Clustering: Theory, Algorithms, and Applications. Philadelphia, Pennsylvania: SIAM - Society for Industrial and Applied Mathematics, 2007. Available: https://doi.org/10.1137/1.9780898718348

T. T. Nguyen, A. Kawamura, T. N. Tong, N. Nakagawa, H. Amaguchi, and R. Gilbuena, “Clustering spatio-seasonal hydrogeochemical data using self-organizing maps for groundwater quality assessment in the Red River Delta, Vietnam,” J. Hydrol., vol. 522, pp. 661–673, 2015. Available: https://doi.org/10.1016/j.jhydrol.2015.01.023

H. Yahyaoui and H. S. Own, “Unsupervised clustering of service performance behaviors,” Inf. Sci. (Ny)., vol. 422, pp. 558–571, 2018. Available: https://doi.org/10.1016/j.ins.2017.08.065

A. Lausch, A. Schmidt, and L. Tischendorf, “Data mining and linked open data – New perspectives for data analysis in environmental research,” Ecol. Modell., vol. 295, pp. 5–17, 2015. Available: https://doi.org/10.1016/j.ecolmodel.2014.09.018

A. Naik and L. Samant, “Correlation Review of Classification Algorithm Using Data Mining Tool: WEKA, Rapidminer, Tanagra, Orange and Knime,” Procedia Comput. Sci., vol. 85, no. Cms, pp. 662–668, 2016. Available: https://doi.org/10.1016/j.procs.2016.05.251

V. Obradovic, D. Bjelica, D. Petrovic, M. Mihic, and M. Todorovic, “Whether We are Still Immature to Assess the Environmental KPIs!,” Procedia - Soc. Behav. Sci., vol. 226, no. October 2015, pp. 132–139, 2016. Available: https://doi.org/10.1016/j.sbspro.2016.06.171

K. Pitchayadejanant and P. Nakpathom, “Data mining approach for arranging and clustering the agro-tourism activities in orchard,” Kasetsart J. Soc. Sci., 2017. Available: https://doi.org/10.1016/j.kjss.2017.07.004

S. S. Shaukat, T. A. Rao, and M. A. Khan, “Impact of sample size on principal component analysis ordination of an environmental data set: Effects on Eigenstructure,” Ekol. Bratislava, vol. 35, no. 2, pp. 173–190, 2016. Available: https://doi.org/10.1515/eko-2016-0014

N. Erman and J. Suklan, “Performance of selected agglomerative clustering methods,” Innov. Issues Approaches Soc. Sci., vol. 8, no. January, 2015. Available: https://doi.org/10.12959/issn.1855-0541.IIASS-2015-no1-art11

J. Ramírez, “Evaluación de algoritmos de aprendizaje de máquina no supervisados sobre datos climáticos”. Universidad Nacional de Colombia repository, 2019. Available: https://repositorio.unal.edu.co/bitstream/handle/unal/75848/1053773873.2019.pdf?isAllowed=y&sequence=3

Published

2022-07-04

How to Cite

[1]
J. S. Ramírez and N. D. Duque, “Evaluation of unsupervised Machine Learning algorithms with climate data”, Ing. y Des., vol. 40, no. 2, pp. 131–165, Jul. 2022.