Water Quality Prediction Using Statistical Tool and Machine Learning Algorithm

2020 ◽  
pp. 609-623
Author(s):  
Arun Kumar Beerala ◽  
Gobinath R. ◽  
Shyamala G. ◽  
Siribommala Manvitha

Water is the most valuable natural resource for all living things and the ecosystem. The quality of groundwater is changed due to change in ecosystem, industrialisation, and urbanisation, etc. In the study, 60 samples were taken and analysed for various physio-chemical parameters. The sampling locations were located using global positioning system (GPS) and were taken for two consecutive years for two different seasons, monsoon (Nov-Dec) and post-monsoon (Jan-Mar). In 2016-2017 and 2017-2018 pH, EC, and TDS were obtained in the field. Hardness and Chloride are determined using titration method. Nitrate and Sulphate were determined using Spectrophotometer. Machine learning techniques were used to train the data set and to predict the unknown values. The dominant elements of groundwater are as follows: Ca2, Mg2 for cation and Cl-, SO42, NO3− for anions. The regression value for the training data set was found to be 0.90596, and for the entire network, it was found to be 0.81729. The best performance was observed as 0.0022605 at epoch 223.

Author(s):  
Arun Kumar Beerala ◽  
Gobinath R. ◽  
Shyamala G. ◽  
Siribommala Manvitha

Water is the most valuable natural resource for all living things and the ecosystem. The quality of groundwater is changed due to change in ecosystem, industrialisation, and urbanisation, etc. In the study, 60 samples were taken and analysed for various physio-chemical parameters. The sampling locations were located using global positioning system (GPS) and were taken for two consecutive years for two different seasons, monsoon (Nov-Dec) and post-monsoon (Jan-Mar). In 2016-2017 and 2017-2018 pH, EC, and TDS were obtained in the field. Hardness and Chloride are determined using titration method. Nitrate and Sulphate were determined using Spectrophotometer. Machine learning techniques were used to train the data set and to predict the unknown values. The dominant elements of groundwater are as follows: Ca2, Mg2 for cation and Cl-, SO42, NO3− for anions. The regression value for the training data set was found to be 0.90596, and for the entire network, it was found to be 0.81729. The best performance was observed as 0.0022605 at epoch 223.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


Author(s):  
Yanxiang Yu ◽  
◽  
Chicheng Xu ◽  
Siddharth Misra ◽  
Weichang Li ◽  
...  

Compressional and shear sonic traveltime logs (DTC and DTS, respectively) are crucial for subsurface characterization and seismic-well tie. However, these two logs are often missing or incomplete in many oil and gas wells. Therefore, many petrophysical and geophysical workflows include sonic log synthetization or pseudo-log generation based on multivariate regression or rock physics relations. Started on March 1, 2020, and concluded on May 7, 2020, the SPWLA PDDA SIG hosted a contest aiming to predict the DTC and DTS logs from seven “easy-to-acquire” conventional logs using machine-learning methods (GitHub, 2020). In the contest, a total number of 20,525 data points with half-foot resolution from three wells was collected to train regression models using machine-learning techniques. Each data point had seven features, consisting of the conventional “easy-to-acquire” logs: caliper, neutron porosity, gamma ray (GR), deep resistivity, medium resistivity, photoelectric factor, and bulk density, respectively, as well as two sonic logs (DTC and DTS) as the target. The separate data set of 11,089 samples from a fourth well was then used as the blind test data set. The prediction performance of the model was evaluated using root mean square error (RMSE) as the metric, shown in the equation below: RMSE=sqrt(1/2*1/m* [∑_(i=1)^m▒〖(〖DTC〗_pred^i-〖DTC〗_true^i)〗^2 + 〖(〖DTS〗_pred^i-〖DTS〗_true^i)〗^2 ] In the benchmark model, (Yu et al., 2020), we used a Random Forest regressor and conducted minimal preprocessing to the training data set; an RMSE score of 17.93 was achieved on the test data set. The top five models from the contest, on average, beat the performance of our benchmark model by 27% in the RMSE score. In the paper, we will review these five solutions, including preprocess techniques and different machine-learning models, including neural network, long short-term memory (LSTM), and ensemble trees. We found that data cleaning and clustering were critical for improving the performance in all models.


2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


2021 ◽  
Author(s):  
Sophie Goliber ◽  
Taryn Black ◽  
Ginny Catania ◽  
James M. Lea ◽  
Helene Olsen ◽  
...  

Abstract. Marine-terminating outlet glacier terminus traces, mapped from satellite and aerial imagery, have been used extensively in understanding how outlet glaciers adjust to climate change variability over a range of time scales. Numerous studies have digitized termini manually, but this process is labor-intensive, and no consistent approach exists. A lack of coordination leads to duplication of efforts, particularly for Greenland, which is a major scientific research focus. At the same time, machine learning techniques are rapidly making progress in their ability to automate accurate extraction of glacier termini, with promising developments across a number of optical and SAR satellite sensors. These techniques rely on high quality, manually digitized terminus traces to be used as training data for robust automatic traces. Here we present a database of manually digitized terminus traces for machine learning and scientific applications. These data have been collected, cleaned, assigned with appropriate metadata including image scenes, and compiled so they can be easily accessed by scientists. The TermPicks data set includes 39,060 individual terminus traces for 278 glaciers with a mean and median number of traces per glacier of 136 ± 190 and 93, respectively. Across all glaciers, 32,567 dates have been picked, of which 4,467 have traces from more than one author (duplication of 14 %). We find a median error of ∼100 m among manually-traced termini. Most traces are obtained after 1999, when Landsat 7 was launched. We also provide an overview of an updated version of The Google Earth Engine Digitization Tool (GEEDiT), which has been developed specifically for future manual picking of the Greenland Ice Sheet.


Author(s):  
Ahmet ÇELİK

People learn by examining, observing and researching their environment. They actually gains experience from what they have learned. By using the experience they have gained, they can adapt to the new situation they encounter and make decisions. People always make decisions by comparing their previous knowledge while describing objects and classifying them. Similarities and differences to previously learned objects are very effective in decision making. It has been shown in the studies that the experiential learning method can also be used on machines. Intelligent machines and devices that use machine learning methods in their structure are widely used in many areas. Machine learning can be performed using different algorithms. These algorithms use the attributes of the objects in the data set when making decisions. Similarities and differences in the attributes of objects are obtained by comparing them with previous experiences. As a result of the comparison, a decision is made and predictions are made about the classes of the objects. In this study, kNN machine learning algorithm, which is a supervised learning method, was used on the Zoo dataset. In this data set, there are attributes of common living things. By using these attributes, the classes of living things in the data set are determined. The “k” neighbor value and weight parameter selected in the kNN algorithm affect the learning success. In this study, the effect of two parameters used in the kNN algorithm on learning success is shown. According to the results obtained, the "k=1" neighbor value and the "Distance Weight" parameter were selected and the highest success result was obtained.


2021 ◽  
Vol 8 (1) ◽  
pp. 28
Author(s):  
S. L. Ávila ◽  
H. M. Schaberle ◽  
S. Youssef ◽  
F. S. Pacheco ◽  
C. A. Penz

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.


Author(s):  
Jonathan M. Gumley ◽  
Hayden Marcollo ◽  
Stuart Wales ◽  
Andrew E. Potts ◽  
Christopher J. Carra

Abstract There is growing importance in the offshore floating production sector to develop reliable and robust means of continuously monitoring the integrity of mooring systems for FPSOs and FPUs, particularly in light of the upcoming introduction of API-RP-2MIM. Here, the limitations of the current range of monitoring techniques are discussed, including well established technologies such as load cells, sonar, or visual inspection, within the context of the growing mainstream acceptance of data science and machine learning. Due to the large fleet of floating production platforms currently in service, there is a need for a readily deployable solution that can be retrofitted to existing platforms to passively monitor the performance of floating assets on their moorings, for which machine learning based systems have particular advantages. An earlier investigation conducted in 2016 on a shallow water, single point moored FPSO employed host facility data from in-service field measurements before and after a single mooring line failure event. This paper presents how the same machine learning techniques were applied to a deep water, semi taut, spread moored system where there was no host facility data available, therefore requiring a calibrated hydrodynamic numerical model to be used as the basis for the training data set. The machine learning techniques applied to both real and synthetically generated data were successful in replicating the response of the original system, even with the latter subjected to different variations of artificial noise. Furthermore, utilizing a probability-based approach, it was demonstrated that replicating the response of the underlying system was a powerful technique for predicting changes in the mooring system.


Data Science in healthcare is a innovative and capable for industry implementing the data science applications. Data analytics is recent science in to discover the medical data set to explore and discover the disease. It’s a beginning attempt to identify the disease with the help of large amount of medical dataset. Using this data science methodology, it makes the user to find their disease without the help of health care centres. Healthcare and data science are often linked through finances as the industry attempts to reduce its expenses with the help of large amounts of data. Data science and medicine are rapidly developing, and it is important that they advance together. Health care information is very effective in the society. In a human life day to day heart disease had increased. Based on the heart disease to monitor different factors in human body to analyse and prevent the heart disease. To classify the factors using the machine learning algorithms and to predict the disease is major part. Major part of involves machine level based supervised learning algorithm such as SVM, Naviebayes, Decision Trees and Random forest.


2020 ◽  
Vol 8 (6) ◽  
pp. 4684-4688

Per the statistics received from BBC, data varies for every earthquake occurred till date. Approximately, up to thousands are dead, about 50,000 are injured, around 1-3 Million are dislocated, while a significant amount go missing and homeless. Almost 100% structural damage is experienced. It also affects the economic loss, varying from 10 to 16 million dollars. A magnitude corresponding to 5 and above is classified as deadliest. The most life-threatening earthquake occurred till date took place in Indonesia where about 3 million were dead, 1-2 million were injured and the structural damage accounted to 100%. Hence, the consequences of earthquake are devastating and are not limited to loss and damage of living as well as nonliving, but it also causes significant amount of change-from surrounding and lifestyle to economic. Every such parameter desiderates into forecasting earthquake. A couple of minutes’ notice and individuals can act to shield themselves from damage and demise; can decrease harm and monetary misfortunes, and property, characteristic assets can be secured. In current scenario, an accurate forecaster is designed and developed, a system that will forecast the catastrophe. It focuses on detecting early signs of earthquake by using machine learning algorithms. System is entitled to basic steps of developing learning systems along with life cycle of data science. Data-sets for Indian sub-continental along with rest of the World are collected from government sources. Pre-processing of data is followed by construction of stacking model that combines Random Forest and Support Vector Machine Algorithms. Algorithms develop this mathematical model reliant on “training data-set”. Model looks for pattern that leads to catastrophe and adapt to it in its building, so as to settle on choices and forecasts without being expressly customized to play out the task. After forecast, we broadcast the message to government officials and across various platforms. The focus of information to obtain is keenly represented by the 3 factors – Time, Locality and Magnitude.


Sign in / Sign up

Export Citation Format

Share Document