This research developed an accurate and interpretable groundwater potential map for Iran’s Fars province using optimized machine learning models (CatBoost and Random Forest) combined with SHAP explainability and feature selection via Boruta-XGBoost.
Groundwater is like a secret underground treasure 💧—hidden beneath our feet, crucial for life, and often overlooked. In Iran's Fars province, where dry spells and over-extraction are threatening water security, scientists have turned to artificial intelligence to map where groundwater is hiding.
Let’s break down how a group of researchers used advanced machine learning (ML) models to create a smart, highly accurate map of groundwater potential—and what this means for the future of water sustainability. 🌱🤖
Groundwater makes up 99% of Earth's liquid freshwater and supports agriculture, industry, and drinking supplies. In countries like Iran, groundwater is especially vital—supplying 60% of water needs, with agriculture gulping down 90% of that! 🧑🌾
But here’s the catch:
This is leading to aquifers drying up, land sinking, and water shortages. So, we need a way to know where water is and manage it better.
The researchers created a Groundwater Potential Map (GWPM) using:
📊 Machine Learning (CatBoost & Random Forest models)
🧠 Feature Selection with Boruta-XGBoost
🔬 Interpretability using SHAP values (more on this soon!)
The target? Accurately predict where water is likely to be underground and make sense of why it’s there.
Located in southern Iran, Fars Province is a mix of mountains and plains, with agriculture as its lifeline. But it’s a tough environment:
That makes it a perfect testbed for AI-powered groundwater mapping.
Let’s simplify the tech process:
From global and national databases, the researchers gathered features like:
They used the Boruta-XGBoost algorithm to cut out noise and focus on what matters. Out went features like:
Top features that stayed:
These turned out to be the most water-relevant!
They trained two powerful decision tree models:
🌳 Random Forest (RF)
🐱 CatBoost (short for "Categorical Boosting")
These models were optimized using Bayesian optimization to tune hyperparameters for best performance.
Metric | Random Forest | CatBoost |
AUC (Accuracy) | 0.8396 | 0.8778 🔥 |
RMSE (Error) | 0.4072 | 0.3779 ✅ |
Accuracy | 77.0% | 80.7% 📈 |
🎯 CatBoost nailed it, especially in correctly identifying:
Using the models, researchers divided the region into 5 zones:
According to CatBoost:
🟢 25% of the area = very high potential
🔴 15% = low potential
⚫ 9% = very low potential
One of the coolest parts? They didn't just get a prediction—they explained it too, using SHAP (SHapley Additive exPlanations). 🎯
SHAP values break down how much each feature influenced the final prediction. This is huge for building trust in ML models and making better decisions.
Most Influential Features | Least Influential Features |
---|---|
Land Use / Land Cover 🌾 | Permeability ❌ |
Terrain Roughness 🏔️ | Porosity ❌ |
Sand Content 🏜️ | Slope ❌ |
Elevation ⛰️ | Silt ❌ |
👉 SHAP maps also showed where in the region each feature had the most impact, allowing local-level decision-making.
With these smart maps and insights, here’s how Iran and other regions can benefit:
✅ Avoid digging wells in dry zones
✅ Plan crops suited for high-potential areas
✅ Prioritize groundwater recharge projects
✅ Adapt policies based on local water potential
This is a huge leap toward sustainable water use in semi-arid regions. 🛑💦
The study opens doors to even smarter, more adaptable models by:
🌱 Integrating satellite data on land changes
☀️ Considering climate change effects
🧠 Combining AI models (ensemble learning)
📏 Using higher resolution mapping (10m or 30m pixels)
The team also encourages exploring more explainable AI tools to bring transparency to environmental modeling. 🌐
This research shows the power of blending environmental science with AI to solve one of humanity’s oldest challenges: finding water. By mapping the invisible underground flows using tools like CatBoost and SHAP, scientists are helping communities make smarter, more sustainable choices. 🌎💧
💧 Groundwater - Water stored underground in soil and rock layers—like a hidden reservoir beneath our feet! - More about this concept in the article "Safeguarding Groundwater from Coal Mines: How Science Battles Pollution Risks 🌊🛡️".
🗺️ Groundwater Potential Map (GWPM) - A visual map showing where groundwater is most likely to be found—high potential = more water, low potential = less water.
🤖 Machine Learning (ML) - A type of AI where computers learn from data to make predictions—like telling where groundwater might be without being told exactly how. - More about this concept in the article "How Machine Learning is Safeguarding Honey Bees from Toxic Pesticides 🐝 🍯".
🌲 Random Forest (RF) - An ML model made of lots of decision trees working together, like a forest of mini-experts voting on the best answer. - More about this concept in the article "Predicting Tomorrow Through Sentiment Analysis: How AI is Changing Stock Market Forecasting 📈🤖".
🐱 CatBoost - A smart and fast ML algorithm that handles messy or categorical data (like land use types) really well. - More about this concept in the article "Smart Energy Insights: How Machine Learning is Transforming Neighborhood Design 🏙️💡".
🎯 Feature Selection - Picking the most useful pieces of information (features) from all available data to make the model smarter and faster. - More about this concept in the article "Unlocking the Future of Gesture Control: AI-Powered Hand Recognition ✋🤖".
🌟 Boruta-XGBoost - A combo of algorithms used to find the most important features by comparing real data with fake “shadow” data to filter out the noise.
🧠 SHAP (SHapley Additive exPlanations) - A technique that shows how much each feature influenced the prediction—like asking your model “Why did you say that?” - More about this concept in the article "Unlocking the Black Box: How Explainable AI (XAI) is Transforming Malware Detection 🦠 🤖".
📈 AUC (Area Under the Curve) - A number between 0.5 and 1.0 showing how accurate a model is—the closer to 1, the better!
📉 RMSE (Root Mean Square Error) - A score that shows how far off the model’s predictions are—lower = more accurate. - More about this concept in the article "Revolutionizing Diabetes Care: AI Meets Continuous Glucose Monitoring (CGM) 🩸 📈".
🌍 Fars Province - A dry, agricultural region in southern Iran where groundwater is essential due to limited rainfall and high water use.
🌾 Land Use / Land Cover (LULC) - Describes how the land is being used—farms, forests, cities, etc.—which affects how much water can soak into the ground. - More about this concept in the article "🌊 Mapping the Future: How Geospatial Tech is Saving Bangladesh's Groundwater 🗺️".
🏔️ Terrain Roughness Index (TRI) - A measure of how bumpy or rugged the land is, which affects water flow and accumulation.
Source: Hosseini, F.S.; Jafari, A.; Zandi, I.; Alesheikh, A.A.; Rezaie, F. Groundwater Potential Mapping Using Optimized Decision Tree-Based Ensemble Learning Model with Local and Global Explainability. Water 2025, 17, 1520. https://doi.org/10.3390/w17101520
From: K. N. Toosi University of Technology; University of Tehran; Korea University of Science and Technology.