Tuesday, August 29, 2006

Predicting habitat suitability with machine learning models

An article was recently published in Ecological Modelling describing procedures used to model Pine forest distribution in Spain. The authors used Grass and R to carry out the modelling process. openModeller was not used but the article is still interesting for those involved in ecological niche modelling. The complete article is available for download as a pdf document .

Abstract:
"We present a modelling framework for predicting forest areas. The framework is obtained by integrating a machine learning software suite within the GRASS Geographical Information System (GIS) and by providing additional methods for predictive habitat modelling. Three machine learning techniques (Tree-Based Classification, Neural Networks and Random Forest) are available in parallel for modelling from climatic and topographic variables. Model evaluation and parameter selection are measured by sensitivity-specificity ROC analysis, while the final presence and absence maps are obtained through maximisation of the kappa statistic. The modelling framework is applied at a resolution of 1 km with Iberian subpopulations of Pinus sylvestris L. forests. For this data set, the most accurate algorithm is Breiman's random forest, an ensemble method which provides automatic combination of tree-classifiers trained on bootstrapped subsamples and randomised variable sets. All models show a potential area of P. sylvestris for the Iberian Peninsula which is larger than the present one, a result corroborated by regional pollen analyses."

Bibtex Citation:
@article{Benito2006_pred_habitat_pinus,
abstract = {We present a modelling framework for predicting forest areas. The framework is obtained by integrating a machine learning software suite within the GRASS Geographical Information System (GIS) and by providing additional methods for predictive habitat modelling. Three machine learning techniques (Tree-Based Classification, Neural Networks and Random Forest) are available in parallel for modelling from climatic and topographic variables. Model evaluation and parameter selection are measured by sensitivity-specificity ROC analysis, while the final presence and absence maps are obtained through maximisation of the kappa statistic. The modelling framework is applied at a resolution of 1 km with Iberian subpopulations of Pinus sylvestris L. forests. For this data set, the most accurate algorithm is Breiman's random forest, an ensemble method which provides automatic combination of tree-classifiers trained on bootstrapped subsamples and randomised variable sets. All models show a potential area of P. sylvestris for the Iberian Peninsula which is larger than the present one, a result corroborated by regional pollen analyses.},
author = { and Blazek, Radim and Neteler, Markus and Dios, Rut S. and Ollero, Helios S. and Furlanello, Cesare },
citeulike-article-id = {608546},
doi = {10.1016/j.ecolmodel.2006.03.015},
journal = {Ecological Modelling},
keywords = {ecology gis machine-learning presence-absence-models roc},
month = {August},
number = {3-4},
pages = {383--393},
priority = {2},
title = {Predicting habitat suitability with machine learning models: The potential area of Pinus sylvestris L. in the Iberian Peninsula},
url = {http://www.sciencedirect.com/science/article/B6VBS-4JRVBDK-5/2/6b75f12e4a096f17439ecf5c766c94c1},
volume = {197},
year = {2006}
}