Prediction ability of machine learning algorithms in Himalaya region of Pakistan for landslide susceptibility mapping

Author(s):  
Naeem Shahzad ◽  
Xiaoli Ding ◽  
Sawaid Abbas

<p>Machine learning has proven most effective in mapping landslide susceptibility. We carry out experiments with two machine learning algorithms, SVM and MaxENT to study their effectiveness for some mountaneous areas in Pakistan. A data set of 112 historic landslides are used in the study with 70% of the landslides are used for training and the rest for validation. 15 landslide casuative factors are used initially and ineffective ones are eliminated based on information Gain Ratio and Multicollinearity test techniques.  The perfromances of the landslides susceptibility maps generated are assessed using receiver operating curves (ROC), confusion matrix (CM) (Kappa, root mean square error, mean absolute error and balanced accuracy), landslide density (LD), R-index and Pearson’s Chi-squared tests. The result show that both of the models work well in this area. However, the lowest significant value ‘p’ (<0.05) during Chi-square test, showed that both the landslide models have statistical significant difference.</p>

Author(s):  
Yuzuo Zhang ◽  
Yuanhao Li ◽  
Xinyan Zhang ◽  
Shijue Zheng

In the coal-fired power generation system, it is necessary to predict the NOx emissions of power station boilers when it comes to the step to spray ammonia to ensure that NOx emissions do not exceed national standards. Using traditional machine learning algorithms in the modeling of power station boilers will require features selection and steady-state extraction, which is not suitable for practical applications. In order to reduce the NOx prediction error rate under variable operating conditions, a multi-model fusion algorithm S3LX combined with linear regression, XGBoost, and long-short-term memory recurrent neural network is proposed to model the NOx emission prediction of power station boilers. The preprocessing data scheme suitable for power station boiler data sets is proposed and implemented in this paper, which can perform numerical processing, data cleaning and data standardization for boiler’s data and features. A 7-day historical operating data set of a unit in Guangzhou Shajiao C Power Plant was used as the training set and test set and was used to build the NOx emission prediction model after data preprocessing. Results show that compared with traditional machine learning algorithms, S3LX has good prediction ability under varying conditions with an average error of 4.28%. Compared with the average prediction error of the multi-layer perceptron 9.16%, SVM 7.37%, S3LX makes the error significantly reduced and satisfies the actual engineering demand.


Author(s):  
Aditya Parameswaran ◽  
Dibyendu Mishra ◽  
Sanchit Bansal ◽  
Vinayak Agarwal ◽  
Anjali Goyal ◽  
...  

Background. Office of Academic Affairs (OAA), Office of Student Life (OSL) and Information Technology Helpdesk (ITD) are support functions within a university which receives hundreds of email messages on the daily basis. A large percentage of emails received by these departments are frequent and commonly used queries or request for information. Responding to every query by manually typing is a tedious and time consuming task and an automated approach for email response suggestion can save lot of time. Methods. We propose an application and solution approach for automatically generating and suggesting short email responses to support queries in a university environment. Our proposed solution can be used as one tap or one click solution for responding to various types of queries raised by faculty members and students in a university. We create a dataset for the application domain and make it publicly available. We apply a machine learning framework for classifying emails into categories such as office of academic affairs or information technology department. We apply a machine learning based classification approach for sub-category level classification also. We apply text pre-processing techniques, feature selection, support vector machine and naïve naive classifiers. We present an approach to overcome various natural language processing based challenges in the text. Results. We conduct a series of experiments and evaluate the approach using confusion matrix and accuracy based metrics. We study the discriminatory power of features and compare their relevance for the classification task. Our experimental results reveal that the proposed approach is effective. We conclude from our experiments that discriminatory features can be extracted from the text within our specific domain and automatic email response suggestion can be accurately created using machine learning algorithms and framework. We experiment with two different learning algorithms and observe that SVM outperforms Naïve Bayes. We achieve a classification accuracy of above $85\%$ for all the classes and sub-classes. Discussion. Our experiments on email response suggestion are conducted on a corpus consists of short and frequent emails by a university function but the proposed approach and techniques can be generalized to other domains also. We observe that different classifiers give different results and there is a significant difference in the predictive power of features.


Skin disease recognition and observing is a major challenge looked by the medical industry. Because of expanding contamination and utilization of lousy nourishment, the tally of patients experiencing skin related issues is expanding at a quicker rate. Well-being isn’t the main concern, however unfortunate skin hurts our certainty. Customary and appropriate skin checking is a significant advance towards early discovery of any destructive or starting changes in skin that may bring about skin disease. Machine learning methods can add to the improvement of capable frameworks which can order various classes of skin illnesses. To identify skin maladies, first, it is required to separate the skin and non-skin. In this paper, five diverse machine learning algorithms have been chosen and executed on skin infection data set to anticipate the exact class of skin disease. Out of a few machine learning algorithms, we have worked on Random forest, naive bayes, logistic regression, kernel SVM and CNN. A similar examination dependent on confusion matrix parameters and training accuracy has been performed and delineated utilizing graphs. It is discovered that CNN is giving best training precision for the right expectation of skin diseases among all selected.


2020 ◽  
Vol 38 (1) ◽  
pp. 65-80 ◽  
Author(s):  
Ammara Zamir ◽  
Hikmat Ullah Khan ◽  
Tassawar Iqbal ◽  
Nazish Yousaf ◽  
Farah Aslam ◽  
...  

Purpose This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’ personal and sensitive information for monetary purposes. Phishing affects diverse fields, such as e-commerce, online business, banking and digital marketing, and is ordinarily carried out by sending spam emails and developing identical websites resembling the original websites. As people surf the targeted website, the phishers hijack their personal information. Design/methodology/approach Features of phishing data set are analysed by using feature selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination (RFE) for feature selection. Two features are proposed combining the strongest and weakest attributes. Principal component analysis with diverse machine learning algorithms including (random forest [RF], neural network [NN], bagging, support vector machine, Naïve Bayes and k-nearest neighbour) is applied on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF + NN + Bagging) and Stacking2 (kNN + RF + Bagging) are applied by combining highest scoring classifiers to improve the classification accuracy. Findings The proposed features played an important role in improving the accuracy of all the classifiers. The results show that RFE plays an important role to remove the least important feature from the data set. Furthermore, Stacking1 (RF + NN + Bagging) outperformed all other classifiers in terms of classification accuracy to detect phishing website with 97.4% accuracy. Originality/value This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.


Sign in / Sign up

Export Citation Format

Share Document