Identifying the origin of groundwater samples in a Multi-Layer Aquifer System with random Forest Classification

P. Baudron; F. Alonso-Sarria; J. Garcia-Arostegui; F. Canovas-Garcia; D. Martinez-Vicente; J. Moreno-Brotons

Article Dans Une Revue Journal of Hydrology Année : 2013

Identifying the origin of groundwater samples in a Multi-Layer Aquifer System with random Forest Classification

(1, 2, 3, 4) , (2) , (3) , (2) , (1, 2) , (2)

1
2
3
4

P. Baudron

Fonction : Auteur
PersonId : 931743

Fundacion Instituto Euromediterraneo del Agua

Institute for Water & Environment

Geological Survey of Spain

Géosciences Paris Sud

F. Alonso-Sarria

Fonction : Auteur

Institute for Water & Environment

J. Garcia-Arostegui

Fonction : Auteur

Geological Survey of Spain

F. Canovas-Garcia

Fonction : Auteur

Institute for Water & Environment

D. Martinez-Vicente

Fonction : Auteur

Fundacion Instituto Euromediterraneo del Agua

Institute for Water & Environment

J. Moreno-Brotons

Fonction : Auteur

Institute for Water & Environment

Résumé

Accurate identification of the origin of groundwater samples is not always possible in complex multilayered aquifers. This poses a major difficulty for a reliable interpretation of geochemical results. The problem is especially severe when the information on the tubewells design is hard to obtain. This paper shows a supervised classification method based on the Random Forest (RF) machine learning technique to identify the layer from where groundwater samples were extracted. The classification rules were based on the major ion composition of the samples. We applied this method to the Campo de Cartagena multi-layer aquifer system, in southeastern Spain. A large amount of hydrogeochemical data was available, but only a limited fraction of the sampled tubewells included a reliable determination of the borehole design and, consequently, of the aquifer layer being exploited. Added difficulty was the very similar compositions of water samples extracted from different aquifer layers. Moreover, not all groundwater samples included the same geochemical variables. Despite of the difficulty of such a background, the Random Forest classification reached accuracies over 90%. These results were much better than the Linear Discriminant Analysis (LDA) and Decision Trees (CART) supervised classification methods. From a total of 1,549 samples, 805 proceeded from one unique identified aquifer, 409 proceeded from a possible blend of waters from several aquifers and 335 were of unknown origin. Only 468 of the 805 unique-aquifer samples included all the chemical variables needed to calibrate and validate the models. Finally, 107 of the groundwater samples of unknown origin could be classified. Most unclassified samples did not feature a complete dataset. The uncertainty on the identification of training samples was taken in account to enhance the model. Most of the samples that could not be identified had an incomplete dataset.

Domaines

Sciences de la Terre

Deleted User CCSD : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00848590

Soumis le : vendredi 26 juillet 2013-15:00:06

Dernière modification le : mardi 2 avril 2024-15:42:11

Dates et versions

hal-00848590 , version 1 (26-07-2013)

Identifiants

HAL Id : hal-00848590 , version 1

Citer

P. Baudron, F. Alonso-Sarria, J. Garcia-Arostegui, F. Canovas-Garcia, D. Martinez-Vicente, et al.. Identifying the origin of groundwater samples in a Multi-Layer Aquifer System with random Forest Classification. Journal of Hydrology, 2013, 499:, pp.303-3015 (IF 2,964). ⟨hal-00848590⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-PARIS-SACLAY GEOPS

37 Consultations

0 Téléchargements

Identifying the origin of groundwater samples in a Multi-Layer Aquifer System with random Forest Classification

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager