Efficient water quality prediction models based on machine learning algorithms for Nainital Lake, Uttarakhand

Author(s):  
Manisha Koranga ◽  
Pushpa Pant ◽  
Tarun Kumar ◽  
Durgesh Pant ◽  
Ashutosh Kumar Bhatt ◽  
...  
2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Matthijs Blankers ◽  
Louk F. M. van der Post ◽  
Jack J. M. Dekker

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.


2021 ◽  
Vol 218 ◽  
pp. 44-51
Author(s):  
D. Venkata Vara Prasad ◽  
Lokeswari Y. Venkataramana ◽  
P. Senthil Kumar ◽  
G. Prasannamedha ◽  
K. Soumya ◽  
...  

Author(s):  
Sankhadeep Chatterjee ◽  
Sarbartha Sarkar ◽  
Nilanjan Dey ◽  
Amira S. Ashour ◽  
Soumya Sen

Water pollution due to industrial and domestic reasons is highly affecting the water quality. In undeveloped and developed countries, it has become a major reason behind a number of water borne diseases. Poor public health is putting an extra economic liability in order to deploy precautionary measures against these diseases. Recent research works have been directed toward more sustainable solutions to this problem. It has been revealed that good quality of water supply can not only improve the public health, it also accelerates economic growth of a geographical location as well. Water quality prediction using machine learning methods is still at its primitive stage. Besides, most of the studies did not follow any national or international standard for water quality prediction. In the current work, both the problems have been addressed. First, advanced machine learning methods, namely Artificial Neural Networks (ANNs) supported by a well-known multi-objective optimization algorithm called the Non-dominated Sorting Genetic Algorithm-II (NSGA-II) has been used to classify the water samples into two different classes. Secondly, Indian national standard for water quality (IS 10500:2012) has been utilized for this classification task. The hybrid NN-NSGA-II model is compared with another two well-known meta-heuristic supported ANN classifiers, namely ANN trained by Genetic Algorithm (NN-GA) and by Particle Swarm Optimization (NN-PSO). Apart from that, the support vector machine (SVM) has also been included in the comparative study. Besides analysing the performance based on several performance measuring methods, the statistical significance of the results obtained by NN-NSGA-II has been judged by performing Wilcoxon rank sum test with 5% confidence level. Results have indicated the ingenuity of the proposed NN-NSGA-II model over the other classifiers under current study.


Author(s):  
Ruchika Malhotra ◽  
Anuradha Chug

Software maintenance is an expensive activity that consumes a major portion of the cost of the total project. Various activities carried out during maintenance include the addition of new features, deletion of obsolete code, correction of errors, etc. Software maintainability means the ease with which these operations can be carried out. If the maintainability can be measured in early phases of the software development, it helps in better planning and optimum resource utilization. Measurement of design properties such as coupling, cohesion, etc. in early phases of development often leads us to derive the corresponding maintainability with the help of prediction models. In this paper, we performed a systematic review of the existing studies related to software maintainability from January 1991 to October 2015. In total, 96 primary studies were identified out of which 47 studies were from journals, 36 from conference proceedings and 13 from others. All studies were compiled in structured form and analyzed through numerous perspectives such as the use of design metrics, prediction model, tools, data sources, prediction accuracy, etc. According to the review results, we found that the use of machine learning algorithms in predicting maintainability has increased since 2005. The use of evolutionary algorithms has also begun in related sub-fields since 2010. We have observed that design metrics is still the most favored option to capture the characteristics of any given software before deploying it further in prediction model for determining the corresponding software maintainability. A significant increase in the use of public dataset for making the prediction models has also been observed and in this regard two public datasets User Interface Management System (UIMS) and Quality Evaluation System (QUES) proposed by Li and Henry is quite popular among researchers. Although machine learning algorithms are still the most popular methods, however, we suggest that researchers working on software maintainability area should experiment on the use of open source datasets with hybrid algorithms. In this regard, more empirical studies are also required to be conducted on a large number of datasets so that a generalized theory could be made. The current paper will be beneficial for practitioners, researchers and developers as they can use these models and metrics for creating benchmark and standards. Findings of this extensive review would also be useful for novices in the field of software maintainability as it not only provides explicit definitions, but also lays a foundation for further research by providing a quick link to all important studies in the said field. Finally, this study also compiles current trends, emerging sub-fields and identifies various opportunities of future research in the field of software maintainability.


2019 ◽  
Vol 20 (3) ◽  
pp. 177-184 ◽  
Author(s):  
Nantao Zheng ◽  
Kairou Wang ◽  
Weihua Zhan ◽  
Lei Deng

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.


Sign in / Sign up

Export Citation Format

Share Document