These techniques are also known under the name “machine learning”, and are applicable for various data mining, pattern recognition, regression and classification problems. Since the start of the 21st century, however, there has been an increasing interest in using more computationally intensive and primarily data-driven algorithms. One of the reasons why kriging has been used so widely is its accessibility to researchers, especially thanks to the makers of gslib ( Deutsch & Journel, 1998), ESRI’s Geostatistical Analyst ( ), ISATIS ( ) and developers of the gstat ( Pebesma, 2004 Bivand et al., 2008), geoR ( Diggle & Ribeiro Jr, 2007) and geostatsp ( Brown, 2015) packages for R. The number of published applications on kriging has steadily increased since 1980 and the technique is now used in a variety of fields, ranging from physical geography ( Oliver & Webster, 1990), geology and soil science ( Goovaerts, 1999 Minasny & McBratney, 2007), hydrology ( Skøien, Merz & Blöschl, 2005), epidemiology ( Moore & Carpenter, 1999 Graham, Atkinson & Danson, 2004), natural hazard monitoring ( Dubois, 2005) and climatology ( Hudson & Wackernagel, 1994 Hartkamp et al., 1999 Bárdossy & Pegram, 2013). Kriging and its many variants have been used as the Best Unbiased Linear Prediction technique for spatial points since the 1960s ( Isaaks & Srivastava, 1989 Cressie, 1990 Goovaerts, 1997). For many data sets, especially those with lower number of points and covariates and close-to-linear relationships, model-based geostatistics can still lead to more accurate predictions than RFsp. The key to the success of the RFsp framework might be the training data quality-especially quality of spatial sampling (to minimize extrapolation problems and any type of bias in data), and quality of model validation (to ensure that accuracy is not effected by overfitting). Some disadvantages of RFsp are the exponentially growing computational intensity with increase of calibration data and covariates and the high sensitivity of predictions to input data quality. RFsp appears to be especially attractive for building multivariate spatial prediction models that can be used as “knowledge engines” in various geoscience fields. Advantages of using RFsp over kriging are that it needs no rigid statistical assumptions about the distribution and stationarity of the target variable, it is more flexible towards incorporating, combining and extending covariates of different types, and it possibly yields more informative maps characterizing the prediction error. The results show that RFsp can obtain equally accurate and unbiased predictions as different versions of kriging. Performance of the RFsp framework is compared with the state-of-the-art kriging techniques using fivefold cross-validation with refitting. The RFsp framework is illustrated with examples that use textbook datasets and apply spatial and spatio-temporal prediction to numeric, binary, categorical, multivariate and spatiotemporal variables. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. Random forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. Cite this article Hengl T, Nussbaum M, Wright MN, Heuvelink GBM, Gräler B. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. Licence This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. 5 52° North Initiative for Geospatial Open Source Software GmbH, Muenster, Germany DOI 10.7717/peerj.5518 Published Accepted Received Academic Editor Tal Svoray Subject Areas Biogeography, Soil Science, Computational Science, Data Mining and Machine Learning, Spatial and Geographic Information Science Keywords Random forest, Kriging, Predictive modeling, R statistical computing, Sampling, Spatiotemporal data, Spatial data, Geostatistics, Pedometrics Copyright © 2018 Hengl et al.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |