Harnessing Machine Learning Techniques for Mapping Aquaculture Waterbodies in Bangladesh

Hannah Ferriby; Amir Pouyan Nejadhashemi; Juan Sebastian Hernandez-Suarez; Nathan Moore; Josué Kpodo; Ian Kropp; Rasu Eeswaran; Ben Belton; Mohammad Mahfujul Haque

doi:10.3390/rs13234890

Harnessing Machine Learning Techniques for Mapping Aquaculture Waterbodies in Bangladesh

Remote Sensing ◽

10.3390/rs13234890 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4890

Author(s):

Hannah Ferriby ◽

Amir Pouyan Nejadhashemi ◽

Juan Sebastian Hernandez-Suarez ◽

Nathan Moore ◽

Josué Kpodo ◽

...

Keyword(s):

Machine Learning ◽

Rural Economy ◽

Learning Algorithm ◽

Google Earth ◽

Machine Learning Techniques ◽

Buffer Size ◽

Improvement Strategy ◽

Convolution Filter ◽

Ground Truthing ◽

The Impact

Aquaculture in Bangladesh has grown dramatically in an unplanned manner in the past few decades, becoming a major contributor to the rural economy in many parts of the country. National systems for the collection of statistics have been unable to keep pace with these rapid changes, and more accurate, up to date information is needed to inform policymakers. Using Sentinel-2 top of atmosphere reflectance data within Google Earth Engine, we proposed six different strategies for improving fishpond detection as the existing techniques seem unreliable. These techniques include: (1) identification of the best time period for image collection, (2) testing the buffer size for threshold optimization, (3) determining the best combination of image reducer and water-identifying indices, (4) introduction of a convolution filter to enhance edge-detection, (5) evaluating the impact of ground truthing data on machine learning algorithm training, and (6) identifying the best machine learning classifier. Each enhancement builds on the previous one to develop a comprehensive improvement strategy called the enhanced method for fishpond detection. We compared the results of each improvement strategy to known ground truthing fishponds as the metric of success. For machine learning classifiers, we compared the precision, recall, and F1 score to determine the quality of results. Among four machine learning methods studied here, the classification and regression trees performed the best with a precision of 0.738, recall of 0.827, and F1 score of 0.780. Overall, the proposed strategies enhanced fishpond area detection in all districts within the study area.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

Classification of multiwavelength transients with Machine Learning

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3873 ◽

2020 ◽

Author(s):

K Sooknunan ◽

M Lochner ◽

Bruce A Bassett ◽

H V Peiris ◽

R Fender ◽

...

Keyword(s):

Machine Learning ◽

Small Sample ◽

Light Curves ◽

Machine Learning Techniques ◽

Optical Data ◽

Test Time ◽

Test Accuracy ◽

Training Set ◽

The Impact

Abstract With the advent of powerful telescopes such as the Square Kilometer Array and the Vera C. Rubin Observatory, we are entering an era of multiwavelength transient astronomy that will lead to a dramatic increase in data volume. Machine learning techniques are well suited to address this data challenge and rapidly classify newly detected transients. We present a multiwavelength classification algorithm consisting of three steps: (1) interpolation and augmentation of the data using Gaussian processes; (2) feature extraction using wavelets; (3) classification with random forests. Augmentation provides improved performance at test time by balancing the classes and adding diversity into the training set. In the first application of machine learning to the classification of real radio transient data, we apply our technique to the Green Bank Interferometer and other radio light curves. We find we are able to accurately classify most of the eleven classes of radio variables and transients after just eight hours of observations, achieving an overall test accuracy of 78%. We fully investigate the impact of the small sample size of 82 publicly available light curves and use data augmentation techniques to mitigate the effect. We also show that on a significantly larger simulated representative training set that the algorithm achieves an overall accuracy of 97%, illustrating that the method is likely to provide excellent performance on future surveys. Finally, we demonstrate the effectiveness of simultaneous multiwavelength observations by showing how incorporating just one optical data point into the analysis improves the accuracy of the worst performing class by 19%.

Download Full-text

Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070436 ◽

2021 ◽

Vol 10 (7) ◽

pp. 436

Author(s):

Amerah Alghanim ◽

Musfira Jilani ◽

Michela Bertolotto ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Classification Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Semantic Inference ◽

Road Type ◽

The Impact

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

Download Full-text

Development of Machine Learning Models to Evaluate the Toughness of OPH Alloys

Materials ◽

10.3390/ma14216713 ◽

2021 ◽

Vol 14 (21) ◽

pp. 6713

Author(s):

Omid Khalaj ◽

Moslem Ghobadi ◽

Ehsan Saebnoori ◽

Alireza Zarezadeh ◽

Mohammadreza Shishesaz ◽

...

Keyword(s):

Machine Learning ◽

Mechanical Properties ◽

Mechanical Alloying ◽

Fuzzy Inference ◽

Oxide Dispersion Strengthened ◽

Machine Learning Techniques ◽

Support Vector ◽

Anfis Model ◽

Inference Systems ◽

The Impact

Oxide Precipitation-Hardened (OPH) alloys are a new generation of Oxide Dispersion-Strengthened (ODS) alloys recently developed by the authors. The mechanical properties of this group of alloys are significantly influenced by the chemical composition and appropriate heat treatment (HT). The main steps in producing OPH alloys consist of mechanical alloying (MA) and consolidation, followed by hot rolling. Toughness was obtained from standard tensile test results for different variants of OPH alloy to understand their mechanical properties. Three machine learning techniques were developed using experimental data to simulate different outcomes. The effectivity of the impact of each parameter on the toughness of OPH alloys is discussed. By using the experimental results performed by the authors, the composition of OPH alloys (Al, Mo, Fe, Cr, Ta, Y, and O), HT conditions, and mechanical alloying (MA) were used to train the models as inputs and toughness was set as the output. The results demonstrated that all three models are suitable for predicting the toughness of OPH alloys, and the models fulfilled all the desired requirements. However, several criteria validated the fact that the adaptive neuro-fuzzy inference systems (ANFIS) model results in better conditions and has a better ability to simulate. The mean square error (MSE) for artificial neural networks (ANN), ANFIS, and support vector regression (SVR) models was 459.22, 0.0418, and 651.68 respectively. After performing the sensitivity analysis (SA) an optimized ANFIS model was achieved with a MSE value of 0.003 and demonstrated that HT temperature is the most significant of these parameters, and this acts as a critical rule in training the data sets.

Download Full-text

Comparison of Machine Learning algorithm for COVID-19 Death Risk Prediction

10.21203/rs.3.rs-196077/v1 ◽

2021 ◽

Author(s):

Praveeen Anandhanathan ◽

Priyanka Gopalan

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Techniques ◽

Support Vector ◽

Nearest Neighbour ◽

Decision Tree Algorithm ◽

The Past ◽

Random Forest Method ◽

Learning Techniques ◽

The World

Abstract Coronavirus disease (COVID-19) is spreading across the world. Since at first it has appeared in Wuhan, China in December 2019, it has become a serious issue across the globe. There are no accurate resources to predict and find the disease. So, by knowing the past patients’ records, it could guide the clinicians to fight against the pandemic. Therefore, for the prediction of healthiness from symptoms Machine learning techniques can be implemented. From this we are going to analyse only the symptoms which occurs in every patient. These predictions can help clinicians in the easier manner to cure the patients. Already for prediction of many of the diseases, techniques like SVM (Support vector Machine), Fuzzy k-Means Clustering, Decision Tree algorithm, Random Forest Method, ANN (Artificial Neural Network), KNN (k-Nearest Neighbour), Naïve Bayes, Linear Regression model are used. As we haven’t faced this disease before, we can’t say which technique will give the maximum accuracy. So, we are going to provide an efficient result by comparing all the such algorithms in RStudio.

Download Full-text

Machine-learning annotation of human splicing branchpoints

10.1101/094003 ◽

2016 ◽

Cited By ~ 3

Author(s):

Bethany Signal ◽

Brian S Gloss ◽

Marcel E Dinger ◽

Timothy R Mercer

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Gene Splicing ◽

Genetic Encoding ◽

Genome Wide ◽

Common Genetic Variants ◽

A Genome ◽

Wide Scale ◽

The Impact ◽

Splicing Patterns

ABSTRACTBackgroundThe branchpoint element is required for the first lariat-forming reaction in splicing. However due to difficulty in experimentally mapping at a genome-wide scale, current catalogues are incomplete.ResultsWe have developed a machine-learning algorithm trained with empirical human branchpoint annotations to identify branchpoint elements from primary genome sequence alone. Using this approach, we can accurately locate branchpoints elements in 85% of introns in current gene annotations. Consistent with branchpoints as basal genetic elements, we find our annotation is unbiased towards gene type and expression levels. A major fraction of introns was found to encode multiple branchpoints raising the prospect that mutational redundancy is encoded in key genes. We also confirmed all deleterious branchpoint mutations annotated in clinical variant databases, and further identified thousands of clinical and common genetic variants with similar predicted effects.ConclusionsWe propose the broad annotation of branchpoints constitutes a valuable resource for further investigations into the genetic encoding of splicing patterns, and interpreting the impact of common- and disease-causing human genetic variation on gene splicing.

Download Full-text

Relating Land Use/Cover and Landscape Pattern to the Water Quality under the Simulation of SWAT in a Reservoir Basin, Southeast China

Sustainability ◽

10.3390/su131911067 ◽

2021 ◽

Vol 13 (19) ◽

pp. 11067

Author(s):

Kaige Lei ◽

Yifan Wu ◽

Feng Li ◽

Jiayu Yang ◽

Mingtao Xiang ◽

...

Keyword(s):

Water Quality ◽

Land Use ◽

Google Earth ◽

Landscape Patterns ◽

Buffer Size ◽

Construction Land ◽

Drinking Water Standard ◽

Reservoir Basin ◽

The Impact ◽

Cover Pattern

Understanding the relationship between land use/cover pattern and water quality could provide guidelines for non-point source pollution and facilitate sustainable development. The previous studies mainly relate the land use/cover of the entire region to the water quality at the monitoring sites, but the water quality at monitoring sites did not totally reflect the water environment of the entire basin. In this study, the land use/cover was monitored on Google Earth Engine in Tang-Pu Reservoir basin, China. In order to reflect the water quality of the whole study area, the spatial distribution of the determinants for water quality there, i.e., the total nitrogen and total phosphorus (TN&TP), were simulated by the Soil and Water Assessment Tool (SWAT). The redundancy analysis explored the correlations between land use/cover pattern and simulated TN&TP. The results showed that: (1) From 2009 to 2019, forest was the dominant land cover, and there was little land use/cover change. The landscape fragmentation increased, and the connectivity decreased. (2) About 25% TP concentrations and nearly all the TN concentrations at the monitoring points did not reach drinking water standard, which means nitrogen and phosphorus pollution were the most serious problems. The highest output per unit TN&TP simulated by SWAT were 44.50 kg/hm2 and 9.51 kg/hm2 and occurred in areas with highly fragile landscape patterns. (3) TN&TP correlated positively with cultivated and construction land but negatively with forest. The correlation between forest and TN&TP summited at 500–700-m buffer and construction land at 100-m buffer. As the buffer size increased, the correlation between the cultivated land, and the TN weakened, while the correlation with the TP increased. TN&TP correlated positively with the Shannon’s Diversity Index and negatively with the Contagion Index. This study provides a new perspective for exporting the impact of land use/cover pattern on water quality.

Download Full-text

DEVELOPMENT OF A MACHINE LEARNING ALGORITHM TO PREDICT AUTHOR’S AGE FROM TEXT

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v7.i10.2019.408 ◽

2020 ◽

Vol 7 (10) ◽

pp. 380-389

Author(s):

Asogwa D.C ◽

Anigbogu S.O ◽

Anigbogu G.N ◽

Efozia F.N

Keyword(s):

Machine Learning ◽

Language Processing ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Machine Learning Algorithm ◽

Age Group ◽

Political Views ◽

Learning Techniques ◽

Age Prediction

Author's age prediction is the task of determining the author's age by studying the texts written by them. The prediction of author’s age can be enlightening about the different trends, opinions social and political views of an age group. Marketers always use this to encourage a product or a service to an age group following their conveyed interests and opinions. Methodologies in natural language processing have made it possible to predict author’s age from text by examining the variation of linguistic characteristics. Also, many machine learning algorithms have been used in author’s age prediction. However, in social networks, computational linguists are challenged with numerous issues just as machine learning techniques are performance driven with its own challenges in realistic scenarios. This work developed a model that can predict author's age from text with a machine learning algorithm (Naïve Bayes) using three types of features namely, content based, style based and topic based. The trained model gave a prediction accuracy of 80%.

Download Full-text

TermPicks: A century of Greenland glacier terminus data for use in machine learning applications

10.5194/tc-2021-311 ◽

2021 ◽

Author(s):

Sophie Goliber ◽

Taryn Black ◽

Ginny Catania ◽

James M. Lea ◽

Helene Olsen ◽

...

Keyword(s):

Machine Learning ◽

Greenland Ice Sheet ◽

Median Number ◽

Google Earth ◽

Training Data ◽

Machine Learning Techniques ◽

Data Set ◽

Outlet Glacier ◽

Glacier Terminus ◽

Median Error

Abstract. Marine-terminating outlet glacier terminus traces, mapped from satellite and aerial imagery, have been used extensively in understanding how outlet glaciers adjust to climate change variability over a range of time scales. Numerous studies have digitized termini manually, but this process is labor-intensive, and no consistent approach exists. A lack of coordination leads to duplication of efforts, particularly for Greenland, which is a major scientific research focus. At the same time, machine learning techniques are rapidly making progress in their ability to automate accurate extraction of glacier termini, with promising developments across a number of optical and SAR satellite sensors. These techniques rely on high quality, manually digitized terminus traces to be used as training data for robust automatic traces. Here we present a database of manually digitized terminus traces for machine learning and scientific applications. These data have been collected, cleaned, assigned with appropriate metadata including image scenes, and compiled so they can be easily accessed by scientists. The TermPicks data set includes 39,060 individual terminus traces for 278 glaciers with a mean and median number of traces per glacier of 136 ± 190 and 93, respectively. Across all glaciers, 32,567 dates have been picked, of which 4,467 have traces from more than one author (duplication of 14 %). We find a median error of ∼100 m among manually-traced termini. Most traces are obtained after 1999, when Landsat 7 was launched. We also provide an overview of an updated version of The Google Earth Engine Digitization Tool (GEEDiT), which has been developed specifically for future manual picking of the Greenland Ice Sheet.

Download Full-text