Predicting the distribution of arsenic in groundwater by machine learning technique in two worst hit districts of Assam, India: a risk to public health
Arsenic (As) is a well-known human carcinogen and a significant chemical contaminant in groundwater. The spatial heterogeneity in the distribution of As in groundwater makes it difficult to predict the location of safe areas for tube well installations for consumption and agricultural use. Geospatial machine learning techniques have been used to predict the location of safe and unsafe areas of groundwater As contaminations. Here we used a similar machine learning approach to determine the risk and extent of As >10 ug/L in groundwater at a finer spatial resolution (250m x 250m) in two worst-hit districts of Assam, India, to advise policymakers for targeted campaigning for mitigation. Random Forest Model was employed in Python environments to predict probabilities of the occurrences of As at concentrations >10 ug/L using several intrinsic and extrinsic predictor variables. The selection of predictor variables was based on their inherent relationship with the occurrence of As in groundwater. The relationships between predictor variables and proportions of As occurrences >10 ug/L follow the well-documented processes leading to As release in groundwater. We identified extensive areas of potential As hotspots based on the probability of 0.7 for As >10 ug/L. These identified areas include areas that were not previously surveyed and extended beyond previously known As hotspots. Twenty-five percent of the land area (1,500 km2) was identified as a high-risk zone with an estimated population of 155,000 potentially consuming As through drinking water or food cooked with water containing As >10 ug/L. The ternary hazard map (i.e., high, moderate, and low risk for As >10 ug/L) could inform the policymakers to target the regions by establishing newer drinking water treatment plants and supplying safe drinking water.