Evaluación de algoritmos de Aprendizaje de Máquina no supervisados con datos climáticos

Autores/as

DOI:

https://doi.org/10.14482/inde.40.02.622.553

Palabras clave:

Agrupamiento, Aprendizaje de máquina, Clima, K-means, K-medoids

Resumen

Al usar datos climáticos, los investigadores tienen dificultades para determinar el algoritmo de agrupamiento y los parámetros de mejor rendimiento para procesar un conjunto de datos específico.

Se realiza la evaluación de algoritmos de aprendizaje automático no supervisados ??K-means, K-medoids y Linkage-complete, aplicados a tres conjuntos de datos con variables climatológicas (temperatura, lluvia, humedad relativa y radiación solar), para tres estaciones meteorológicas ubicadas en el departamento de Caldas, Colombia, a diferentes alturas sobre el nivel del mar. Se definen 5 escenarios para 2, 3 y 5 clústeres para cada uno de los dos algoritmos particionados y 5 escenarios para el algoritmo jerárquico, para cada una de las estaciones meteorológicas, y aplicando una cantidad y agrupación diferente de variables para los diferentes escenarios y utilizando la distancia euclidiana, Davis-Bouldin como método de evaluación de calidad de clústeres, normalización con técnicas como transformación de rango y transformación Z, varias iteraciones del algoritmo y reducción de dimensionalidad con PCA. Además, se evalúa el costo computacional. Esta investigación puede guiar al investigador sobre ciertas decisiones en el análisis de conglomerados utilizados en datos meteorológicos, así como identificar el algoritmo y los parámetros más importantes a considerar para el mejor desempeño, de acuerdo con las condiciones y requisitos particulares.

Citas

Á. Arroyo, Á. Herrero, V. Tricio, and E. Corchado, “Analysis of meteorological conditions in Spain by means of clustering techniques,” J. Appl. Log., vol. 24, pp. 76–89, 2017. Available: https://doi.org/10.1016/j.jal.2016.11.026

M. A. Asadi Zarch, B. Sivakumar, and A. Sharma, “Assessment of global aridity change,” J. Hydrol., vol. 520, pp. 300–313, 2015. Available: https://doi.org/10.1016/j.jhydrol.2014.11.033

L. Carro-Calvo, C. Ordóñez, R. García-Herrera, and J. L. Schnell, “Spatial clustering and meteorological drivers of summer ozone in Europe,” Atmos. Environ., vol. 167, pp. 496–510, 2017. Available: https://doi.org/10.1016/j.atmosenv.2017.08.050

M. J. Carvalho, P. Melo-Gonçalves, J. C. Teixeira, and A. Rocha, “Regionalization of Europe based on a K-Means Cluster Analysis of the climate change of temperatures and precipitation,” Phys. Chem. Earth, vol. 94, pp. 22–28, 2016. Available: https://doi.org/10.1016/j.pce.2016.05.001

J. Chen, M. Song, and L. Xu, “Evaluation of environmental efficiency in China using data envelopment analysis,” Ecol. Indic., vol. 52, pp. 577–583, 2015. Available: https://doi.org/10.1016/j.ecolind.2014.05.008

L. Chen and G. Jia, “Environmental efficiency analysis of China’s regional industry?: a data envelopment analysis (DEA) based approach,” J. Clean. Prod., vol. 142, pp. 846–853, 2017. Available: https://doi.org/10.1016/j.jclepro.2016.01.045

R. Falquina and C. Gallardo, “Development and application of a technique for projecting novel and disappearing climates using cluster analysis,” Atmos. Res., vol. 197, no. July, pp. 224–231, 2017. Available: https://doi.org/10.1016/j.atmosres.2017.06.031

A. M. Kalteh, P. Hjorth, and R. Berndtsson, “Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application,” Environ. Model. Softw., vol. 23, no. 7, pp. 835–845, 2008. Available: https://doi.org/http://dx.doi.org/10.1016/j.envsoft.2007.10.001

S. C. Sheridan and C. C. Lee, “The self-organizing map in synoptic climatological research,” Prog. Phys. Geogr., vol. 35, no. 1, pp. 109–119, 2011. Available: https://doi.org/10.1177/0309133310397582

X. Wang et al., “A stepwise cluster analysis approach for downscaled climate projection - A Canadian case study,” Environ. Model. Softw., vol. 49, pp. 141–151, 2013. Available: https://doi.org/10.1016/j.envsoft.2013.08.006

Y. Zheng et al., “Vegetation response to climate conditions based on NDVI simulations using stepwise cluster analysis for the Three-River Headwaters region of China,” Ecol. Indic., no. September 2016, pp. 0–1, 2017. Available: https://doi.org/10.1016/j.ecolind.2017.06.040

X. Zuo, H. Hua, Z. Dong, and C. Hao, “Environmental Performance Index at the Provincial Level for China 2006–2011,” Ecol. Indic., vol. 75, pp. 48–56, 2017. Available: https://doi.org/10.1016/j.ecolind.2016.12.016

S. A. Cashman et al., “Mining Available Data from the United States Environmental Protection Agency to Support Rapid Life Cycle Inventory Modeling of Chemical Manufacturing,” Environ. Sci. Technol., vol. 50, no. 17, pp. 9013–9025, 2016. Available: https://doi.org/10.1021/acs.est.6b02160

C. Gallo, N. Faccilongo, and P. La Sala, “Clustering analysis of environmental emissions: A study on Kyoto Protocol’s impact on member countries,” J. Clean. Prod., 2017. Available: https://doi.org/10.1016/j.jclepro.2017.07.194

J. Jiang, B. Ye, D. Xie, and J. Tang, “Provincial-level carbon emission drivers and emission reduction strategies in China: Combining multi-layer LMDI decomposition with hierarchical clustering,” J. Clean. Prod., vol. 169, pp. 178–190, 2017. Available: https://doi.org/10.1016/j.jclepro.2017.03.189

I. Meghea, M. Mihai, I. Lacatusu, and I. Iosub, “Evaluation of Monitoring of Lead Emissions in Bucharest by Statistical Processing,” J. Environ. Prot. Ecol., vol. 13, no. 2, pp. 746–755, 2012. Available: http://www.scopus.com/inward/record.url?eid=2-s2.0-84864251930&partnerID=MN8TOARS

N. Clay and B. King, “Smallholders uneven capacities to adapt to climate change amid Africa’s green revolution: Case study of Rwanda’s crop intensification program,” World Dev., vol. 116, pp. 1–14, 2019. Available: https://doi.org/S0305750X18304285

N. D. Abdul Halim et al., “The long-term assessment of air quality on an island in Malaysia,” Heliyon, vol. 4, no. 12, 2018. Available: https://doi.org/10.1016/j.heliyon.2018.e01054

T. Conradt, C. Gornott, and F. Wechsung, “Extending and improving regionalized winter wheat and silage maize yield regression models for Germany: Enhancing the predictive skill by panel definition through cluster analysis,” Agric. For. Meteorol., vol. 216, pp. 68–81, 2016. Available: https://doi.org/10.1016/j.agrformet.2015.10.003

S. Farah, D. Whaley, W. Saman, and J. Boland, “Integrating Climate Change into Meteorological Weather Data for Building Energy Simulation,” Energy Build., vol. 183, pp. 749–760, 2019. Available: https://doi.org/S0378778818323296

T. Soubdhan, M. Abadi, and R. Emilion, “Time dependent classification of solar radiation sequences using best information criterion,” Energy Procedia, vol. 57, pp. 1309–1316, 2014. Available: https://doi.org/10.1016/j.egypro.2014.10.121

S. Khedairia and M. T. Khadir, “Impact of clustered meteorological parameters on air pollutants concentrations in the region of Annaba, Algeria,” Atmos. Res., vol. 113, pp. 89–101, 2012. Available: https://doi.org/10.1016/j.atmosres.2012.05.002

T. Schneider, H. Hampel, P. V. Mosquera, W. Tylmann, and M. Grosjean, “Paleo-ENSO revisited: Ecuadorian Lake Pallcacocha does not reveal a conclusive El Niño signal,” Glob. Planet. Change, vol. 168, no. February, pp. 54–66, 2018. Available: https://doi.org/10.1016/j.gloplacha.2018.06.004

F. Franceschi, M. Cobo, and M. Figueredo, “Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogotá Colombia, using Artificial Neural Networks, Principal Component Analysis, and k-means clustering,” Atmos. Pollut. Res., vol. 9, no. 5, pp. 912–922, 2018. Available: https://doi.org/10.1016/j.apr.2018.02.006

A. K. Yadav, H. Malik, and S. S. Chandel, “Application of rapid miner in ANN based prediction of solar radiation for assessment of solar energy resource potential of 76 sites in Northwestern India,” Renew. Sustain. Energy Rev., vol. 52, pp. 1093–1106, 2015. Available: https://doi.org/10.1016/j.rser.2015.07.156

Y. Hao, L. Dong, X. Liao, J. Liang, L. Wang, and B. Wang, “A novel clustering algorithm based on mathematical morphology for wind power generation prediction,” Renew. Energy, vol. 136, pp. 572–585, 2019. Available: https://doi.org/10.1016/j.renene.2019.01.018

S. Han et al., “Quantitative evaluation method for the complementarity of wind–solar–hydro power and optimization of wind–solar ratio,” Appl. Energy, vol. 236, no. December 2018, pp. 973–984, 2019. Available: https://doi.org/10.1016/j.apenergy.2018.12.059

M. André, R. Perez, T. Soubdhan, J. Schlemmer, R. Calif, and S. Monjoly, “Preliminary assessment of two spatio-temporal forecasting technics for hourly satellite-derived irradiance in a complex meteorological context,” Sol. Energy, vol. 177, no. December 2018, pp. 703–712, 2019. Available: https://doi.org/10.1016/j.solener.2018.11.010

P. Lin, Z. Peng, Y. Lai, S. Cheng, Z. Chen, and L. Wu, “Short-term power prediction for photovoltaic power plants using a hybrid improved Kmeans-GRA-Elman model based on multivariate meteorological factors and historical power datasets,” Energy Convers. Manag., vol. 177, no. July, pp. 704–717, 2018. Available: https://doi.org/10.1016/j.enconman.2018.10.015

F. Mokdad and B. Haddad, “Improved infrared precipitation estimation approaches based on k-means clustering: Application to north Algeria using MSG-SEVIRI satellite data,” Adv. Sp. Res., vol. 59, no. 12, pp. 2880–2900, 2017. Available: https://doi.org/10.1016/j.asr.2017.03.027

S. Li, H. Ma, and W. Li, “Typical solar radiation year construction using k-means clustering and discrete-time Markov chain,” Appl. Energy, vol. 205, no. May, pp. 720–731, 2017. Available: https://doi.org/10.1016/j.apenergy.2017.08.067

M. Ghayekhloo, M. Ghofrani, M. B. Menhaj, and R. Azimi, “A novel clustering approach for short-term solar radiation forecasting,” Sol. Energy, vol. 122, pp. 1371–1383, 2015. Available: https://doi.org/10.1016/j.solener.2015.10.053

M. Bador, P. Naveau, E. Gilleland, M. Castellà, and T. Arivelo, “Spatial clustering of summer temperature maxima from the CNRM-CM5 climate model ensembles & E-OBS over Europe,” Weather Clim. Extrem., vol. 9, pp. 17–24, 2015. Available: https://doi.org/10.1016/j.wace.2015.05.003

L. Pokorná, M. Ku?erová, and R. Huth, “Annual cycle of temperature trends in Europe, 1961–2000,” Glob. Planet. Change, vol. 170, no. August, pp. 146–162, 2018. Available: https://doi.org/10.1016/j.gloplacha.2018.08.015

J. Parente, M. G. Pereira, and M. Tonini, “Space-time clustering analysis of wildfires: The influence of dataset characteristics, fire prevention policy decisions, weather and climate,” Sci. Total Environ., vol. 559, pp. 151–165, 2016. Available: https://doi.org/10.1016/j.scitotenv.2016.03.129

M. I. Chidean, J. Muñoz-Bulnes, J. Ramiro-Bargueño, A. J. Caamaño, and S. Salcedo-Sanz, “Spatio-temporal trend analysis of air temperature in Europe and Western Asia using data-coupled clustering,” Glob. Planet. Change, vol. 129, pp. 45–55, 2015. Available: https://doi.org/10.1016/j.gloplacha.2015.03.006

M. I. Chidean, A. J. Caamaño, J. Ramiro-Bargueño, C. Casanova-Mateo, and S. Salcedo-Sanz, “Spatio-temporal analysis of wind resource in the Iberian Peninsula with data-coupled clustering,” Renew. Sustain. Energy Rev., vol. 81, no. June, pp. 2684–2694, 2018. Available: https://doi.org/10.1016/j.rser.2017.06.075

Y. Zheng et al., “Assessment of global aridity change,” Ecol. Indic., vol. 75, no. September 2016, pp. 151–165, 2016. Available: https://doi.org/10.1016/j.scitotenv.2015.11.063

J. Ramirez, Juan; Duque, Nestor; Velez, “Normalización en desempeño de k-means sobre datos climáticos,” Vínculos, vol. 16, pp. 57–72, 2019. Available: https://doi.org/10.14483/2322939X.15550

D. G. de B. Franco and M. T. A. Steiner, “Clustering of solar energy facilities using a hybrid fuzzy c-means algorithm initialized by metaheuristics,” J. Clean. Prod., vol. 191, pp. 445–457, 2018. Available: https://doi.org/10.1016/j.jclepro.2018.04.207

J. Hidalgo et al., “Comparison between local climate zones maps derived from administrative datasets and satellite observations,” Urban Clim., vol. 27, no. November 2017, pp. 64–89, 2019. Available: https://doi.org/10.1016/j.uclim.2018.10.004

C. C. Aggarwal and C. K. Reddy, DATA Custering Algorithms and Applications. CRC Press, 2013. Available: https://doi.org/10.1201/9781315373515

G. Gan, C. Ma, and J. Wu, Data Clustering: Theory, Algorithms, and Applications. Philadelphia, Pennsylvania: SIAM - Society for Industrial and Applied Mathematics, 2007. Available: https://doi.org/10.1137/1.9780898718348

T. T. Nguyen, A. Kawamura, T. N. Tong, N. Nakagawa, H. Amaguchi, and R. Gilbuena, “Clustering spatio-seasonal hydrogeochemical data using self-organizing maps for groundwater quality assessment in the Red River Delta, Vietnam,” J. Hydrol., vol. 522, pp. 661–673, 2015. Available: https://doi.org/10.1016/j.jhydrol.2015.01.023

H. Yahyaoui and H. S. Own, “Unsupervised clustering of service performance behaviors,” Inf. Sci. (Ny)., vol. 422, pp. 558–571, 2018. Available: https://doi.org/10.1016/j.ins.2017.08.065

A. Lausch, A. Schmidt, and L. Tischendorf, “Data mining and linked open data – New perspectives for data analysis in environmental research,” Ecol. Modell., vol. 295, pp. 5–17, 2015. Available: https://doi.org/10.1016/j.ecolmodel.2014.09.018

A. Naik and L. Samant, “Correlation Review of Classification Algorithm Using Data Mining Tool: WEKA, Rapidminer, Tanagra, Orange and Knime,” Procedia Comput. Sci., vol. 85, no. Cms, pp. 662–668, 2016. Available: https://doi.org/10.1016/j.procs.2016.05.251

V. Obradovic, D. Bjelica, D. Petrovic, M. Mihic, and M. Todorovic, “Whether We are Still Immature to Assess the Environmental KPIs!,” Procedia - Soc. Behav. Sci., vol. 226, no. October 2015, pp. 132–139, 2016. Available: https://doi.org/10.1016/j.sbspro.2016.06.171

K. Pitchayadejanant and P. Nakpathom, “Data mining approach for arranging and clustering the agro-tourism activities in orchard,” Kasetsart J. Soc. Sci., 2017. Available: https://doi.org/10.1016/j.kjss.2017.07.004

S. S. Shaukat, T. A. Rao, and M. A. Khan, “Impact of sample size on principal component analysis ordination of an environmental data set: Effects on Eigenstructure,” Ekol. Bratislava, vol. 35, no. 2, pp. 173–190, 2016. Available: https://doi.org/10.1515/eko-2016-0014

N. Erman and J. Suklan, “Performance of selected agglomerative clustering methods,” Innov. Issues Approaches Soc. Sci., vol. 8, no. January, 2015. Available: https://doi.org/10.12959/issn.1855-0541.IIASS-2015-no1-art11

J. Ramírez, “Evaluación de algoritmos de aprendizaje de máquina no supervisados sobre datos climáticos”. Universidad Nacional de Colombia repository, 2019. Available: https://repositorio.unal.edu.co/bitstream/handle/unal/75848/1053773873.2019.pdf?isAllowed=y&sequence=3

Descargas

Publicado

2022-07-04

Cómo citar

[1]
J. S. Ramírez y N. D. Duque, «Evaluación de algoritmos de Aprendizaje de Máquina no supervisados con datos climáticos», Ing. y Des., vol. 40, n.º 2, pp. 131–165, jul. 2022.

Número

Sección

Artículos