EngiSphere icone
EngiSphere

Cracking the Code of Hidden Water 💧 How AI Is Mapping Groundwater

: ; ; ; ; ; ; ; ;

Discover how machine learning is revolutionizing water resource management in dry regions 🌍

Published May 19, 2025 By EngiSphere Research Editors
Layered Underground Terrain © AI Illustration
Layered Underground Terrain © AI Illustration

The Main Idea

This research developed an accurate and interpretable groundwater potential map for Iran’s Fars province using optimized machine learning models (CatBoost and Random Forest) combined with SHAP explainability and feature selection via Boruta-XGBoost.


The R&D

Groundwater is like a secret underground treasure 💧—hidden beneath our feet, crucial for life, and often overlooked. In Iran's Fars province, where dry spells and over-extraction are threatening water security, scientists have turned to artificial intelligence to map where groundwater is hiding.

Let’s break down how a group of researchers used advanced machine learning (ML) models to create a smart, highly accurate map of groundwater potential—and what this means for the future of water sustainability. 🌱🤖

💡 Why Is Groundwater Mapping So Important?

Groundwater makes up 99% of Earth's liquid freshwater and supports agriculture, industry, and drinking supplies. In countries like Iran, groundwater is especially vital—supplying 60% of water needs, with agriculture gulping down 90% of that! 🧑‍🌾

But here’s the catch:

  • Rainfall is decreasing 🌧️
  • Temperatures are rising 🌡️
  • Wells are multiplying (over 1 million!) 🕳️

This is leading to aquifers drying up, land sinking, and water shortages. So, we need a way to know where water is and manage it better.

🔍 Enter the Smart Solution: AI + Geospatial Data + Explainability

The researchers created a Groundwater Potential Map (GWPM) using:

📊 Machine Learning (CatBoost & Random Forest models)
🧠 Feature Selection with Boruta-XGBoost
🔬 Interpretability using SHAP values (more on this soon!)

The target? Accurately predict where water is likely to be underground and make sense of why it’s there.

📌 The Study Zone: Fars Province, Iran

Located in southern Iran, Fars Province is a mix of mountains and plains, with agriculture as its lifeline. But it’s a tough environment:

  • Only 201 mm rainfall/year (hyper-arid!) ☀️
  • Rising number of deep wells due to droughts
  • Major water stress in rural areas

That makes it a perfect testbed for AI-powered groundwater mapping.

🛠️ How the Groundwater Map Was Built (Step-by-Step)

Let’s simplify the tech process:

1. 🗺️ Collecting 22 Environmental Layers

From global and national databases, the researchers gathered features like:

  • Elevation, slope, rivers 🏞️
  • Soil texture (sand, clay, silt) 🧱
  • Land use (farms, forests, cities) 🌾🌲🏙️
  • Rainfall and terrain ruggedness ☔
2. ⚙️ Selecting the Most Useful Features

They used the Boruta-XGBoost algorithm to cut out noise and focus on what matters. Out went features like:

  • Plan curvature 📐
  • Aspect 🌄
  • Stream power index

Top features that stayed:

  • Land Use/Land Cover (LULC)
  • Terrain Roughness Index (TRI)
  • Elevation
  • Sand Content

These turned out to be the most water-relevant!

3. 🧮 Training ML Models

They trained two powerful decision tree models:

🌳 Random Forest (RF)
🐱 CatBoost (short for "Categorical Boosting")

These models were optimized using Bayesian optimization to tune hyperparameters for best performance.

📊 Results That Speak Volumes
🥇 Winner: CatBoost!
MetricRandom ForestCatBoost
AUC (Accuracy)0.83960.8778 🔥
RMSE (Error)0.40720.3779 ✅
Accuracy77.0%80.7% 📈

🎯 CatBoost nailed it, especially in correctly identifying:

  • 84% of low-potential wells
  • 62% of high-potential wells
📌 What the Map Looks Like

Using the models, researchers divided the region into 5 zones:

  1. Very High Groundwater Potential 🟢
  2. High 🔵
  3. Moderate 🟡
  4. Low 🔴
  5. Very Low ⚫

According to CatBoost:

🟢 25% of the area = very high potential
🔴 15% = low potential
⚫ 9% = very low potential

🔍 Let’s Talk Explainability: Why SHAP Matters

One of the coolest parts? They didn't just get a prediction—they explained it too, using SHAP (SHapley Additive exPlanations). 🎯

SHAP values break down how much each feature influenced the final prediction. This is huge for building trust in ML models and making better decisions.

📌 Key Takeaways from SHAP
Most Influential FeaturesLeast Influential Features
Land Use / Land Cover 🌾Permeability ❌
Terrain Roughness 🏔️Porosity ❌
Sand Content 🏜️Slope ❌
Elevation ⛰️Silt ❌

👉 SHAP maps also showed where in the region each feature had the most impact, allowing local-level decision-making.

🌍 What Does This Mean for Water Management?

With these smart maps and insights, here’s how Iran and other regions can benefit:

✅ Avoid digging wells in dry zones
✅ Plan crops suited for high-potential areas
✅ Prioritize groundwater recharge projects
✅ Adapt policies based on local water potential

This is a huge leap toward sustainable water use in semi-arid regions. 🛑💦

🔮 Future Prospects

The study opens doors to even smarter, more adaptable models by:

🌱 Integrating satellite data on land changes
☀️ Considering climate change effects
🧠 Combining AI models (ensemble learning)
📏 Using higher resolution mapping (10m or 30m pixels)

The team also encourages exploring more explainable AI tools to bring transparency to environmental modeling. 🌐

🧠 Final Thoughts

This research shows the power of blending environmental science with AI to solve one of humanity’s oldest challenges: finding water. By mapping the invisible underground flows using tools like CatBoost and SHAP, scientists are helping communities make smarter, more sustainable choices. 🌎💧


Concepts to Know

💧 Groundwater - Water stored underground in soil and rock layers—like a hidden reservoir beneath our feet! - More about this concept in the article "Safeguarding Groundwater from Coal Mines: How Science Battles Pollution Risks 🌊🛡️".

🗺️ Groundwater Potential Map (GWPM) - A visual map showing where groundwater is most likely to be found—high potential = more water, low potential = less water.

🤖 Machine Learning (ML) - A type of AI where computers learn from data to make predictions—like telling where groundwater might be without being told exactly how. - More about this concept in the article "How Machine Learning is Safeguarding Honey Bees from Toxic Pesticides 🐝 🍯".

🌲 Random Forest (RF) - An ML model made of lots of decision trees working together, like a forest of mini-experts voting on the best answer. - More about this concept in the article "Predicting Tomorrow Through Sentiment Analysis: How AI is Changing Stock Market Forecasting 📈🤖".

🐱 CatBoost - A smart and fast ML algorithm that handles messy or categorical data (like land use types) really well. - More about this concept in the article "Smart Energy Insights: How Machine Learning is Transforming Neighborhood Design 🏙️💡".

🎯 Feature Selection - Picking the most useful pieces of information (features) from all available data to make the model smarter and faster. - More about this concept in the article "Unlocking the Future of Gesture Control: AI-Powered Hand Recognition ✋🤖".

🌟 Boruta-XGBoost - A combo of algorithms used to find the most important features by comparing real data with fake “shadow” data to filter out the noise.

🧠 SHAP (SHapley Additive exPlanations) - A technique that shows how much each feature influenced the prediction—like asking your model “Why did you say that?” - More about this concept in the article "Unlocking the Black Box: How Explainable AI (XAI) is Transforming Malware Detection 🦠 🤖".

📈 AUC (Area Under the Curve) - A number between 0.5 and 1.0 showing how accurate a model is—the closer to 1, the better!

📉 RMSE (Root Mean Square Error) - A score that shows how far off the model’s predictions are—lower = more accurate. - More about this concept in the article "Revolutionizing Diabetes Care: AI Meets Continuous Glucose Monitoring (CGM) 🩸 📈".

🌍 Fars Province - A dry, agricultural region in southern Iran where groundwater is essential due to limited rainfall and high water use.

🌾 Land Use / Land Cover (LULC) - Describes how the land is being used—farms, forests, cities, etc.—which affects how much water can soak into the ground. - More about this concept in the article "🌊 Mapping the Future: How Geospatial Tech is Saving Bangladesh's Groundwater 🗺️".

🏔️ Terrain Roughness Index (TRI) - A measure of how bumpy or rugged the land is, which affects water flow and accumulation.


Source: Hosseini, F.S.; Jafari, A.; Zandi, I.; Alesheikh, A.A.; Rezaie, F. Groundwater Potential Mapping Using Optimized Decision Tree-Based Ensemble Learning Model with Local and Global Explainability. Water 2025, 17, 1520. https://doi.org/10.3390/w17101520

From: K. N. Toosi University of Technology; University of Tehran; Korea University of Science and Technology.

© 2025 EngiSphere.com