A Study of Cross-National Differences in Happiness Factors Using Machine Learning Approach

2015 ◽  
Vol 25 (09n10) ◽  
pp. 1699-1702 ◽  
Author(s):  
Theresia Ratih Dewi Saputri ◽  
Seok-Won Lee

National happiness has been actively studied throughout the past years. The happiness factor varies due to different human perspectives. The factors used in this work include both physical needs and the mental needs of humanity, for example, the educational factor. This work identified more than 90 features that can be used to predict the country happiness. Due to numerous features, it is unwise to rely on the prediction of national happiness by manual analysis. Therefore, this work used a machine learning technique called Support Vector Machine (SVM) to learn and predict the country happiness. In order to improve the prediction accuracy, dimensionality reduction technique which is the information gain was also used in this work. This technique was chosen due to its ability to explore the interrelationships among a set of variables. Using data of 187 countries from the UN Development Project, this work is able to identify which factor needed to be improved by a certain country to increase the happiness of their citizens.

Cancers ◽  
2019 ◽  
Vol 11 (3) ◽  
pp. 431 ◽  
Author(s):  
Oneeb Rehman ◽  
Hanqi Zhuang ◽  
Ali Muhamed Ali ◽  
Ali Ibrahim ◽  
Zhongwei Li

Certain small noncoding microRNAs (miRNAs) are differentially expressed in normal tissues and cancers, which makes them great candidates for biomarkers for cancer. Previously, a selected subset of miRNAs has been experimentally verified to be linked to breast cancer. In this paper, we validated the importance of these miRNAs using a machine learning approach on miRNA expression data. We performed feature selection, using Information Gain (IG), Chi-Squared (CHI2) and Least Absolute Shrinkage and Selection Operation (LASSO), on the set of these relevant miRNAs to rank them by importance. We then performed cancer classification using these miRNAs as features using Random Forest (RF) and Support Vector Machine (SVM) classifiers. Our results demonstrated that the miRNAs ranked higher by our analysis had higher classifier performance. Performance becomes lower as the rank of the miRNA decreases, confirming that these miRNAs had different degrees of importance as biomarkers. Furthermore, we discovered that using a minimum of three miRNAs as biomarkers for breast cancers can be as effective as using the entire set of 1800 miRNAs. This work suggests that machine learning is a useful tool for functional studies of miRNAs for cancer detection and diagnosis.


Energies ◽  
2018 ◽  
Vol 11 (9) ◽  
pp. 2328 ◽  
Author(s):  
Md Shafiullah ◽  
M. Abido ◽  
Taher Abdel-Fattah

Precise information of fault location plays a vital role in expediting the restoration process, after being subjected to any kind of fault in power distribution grids. This paper proposed the Stockwell transform (ST) based optimized machine learning approach, to locate the faults and to identify the faulty sections in the distribution grids. This research employed the ST to extract useful features from the recorded three-phase current signals and fetches them as inputs to different machine learning tools (MLT), including the multilayer perceptron neural networks (MLP-NN), support vector machines (SVM), and extreme learning machines (ELM). The proposed approach employed the constriction-factor particle swarm optimization (CF-PSO) technique, to optimize the parameters of the SVM and ELM for their better generalization performance. Hence, it compared the obtained results of the test datasets in terms of the selected statistical performance indices, including the root mean squared error (RMSE), mean absolute percentage error (MAPE), percent bias (PBIAS), RMSE-observations to standard deviation ratio (RSR), coefficient of determination (R2), Willmott’s index of agreement (WIA), and Nash–Sutcliffe model efficiency coefficient (NSEC) to confirm the effectiveness of the developed fault location scheme. The satisfactory values of the statistical performance indices, indicated the superiority of the optimized machine learning tools over the non-optimized tools in locating faults. In addition, this research confirmed the efficacy of the faulty section identification scheme based on overall accuracy. Furthermore, the presented results validated the robustness of the developed approach against the measurement noise and uncertainties associated with pre-fault loading condition, fault resistance, and inception angle.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0241925
Author(s):  
Gerardo A. Ceballos ◽  
Luis F. Hernandez ◽  
Daniel Paredes ◽  
Luis R. Betancourt ◽  
Midhat H. Abdulreda

The application of artificial intelligence (AI) and machine learning (ML) in biomedical research promises to unlock new information from the vast amounts of data being generated through the delivery of healthcare and the expanding high-throughput research applications. Such information can aid medical diagnoses and reveal various unique patterns of biochemical and immune features that can serve as early disease biomarkers. In this report, we demonstrate the feasibility of using an AI/ML approach in a relatively small dataset to discriminate among three categories of samples obtained from mice that either rejected or tolerated their pancreatic islet allografts following transplant in the anterior chamber of the eye, and from naïve controls. We created a locked software based on a support vector machine (SVM) technique for pattern recognition in electropherograms (EPGs) generated by micellar electrokinetic chromatography and laser induced fluorescence detection (MEKC-LIFD). Predictions were made based only on the aligned EPGs obtained in microliter-size aqueous humor samples representative of the immediate local microenvironment of the islet allografts. The analysis identified discriminative peaks in the EPGs of the three sample categories. Our classifier software was tested with targeted and untargeted peaks. Working with the patterns of untargeted peaks (i.e., based on the whole pattern of EPGs), it was able to achieve a 21 out of 22 positive classification score with a corresponding 95.45% prediction accuracy among the three sample categories, and 100% accuracy between the rejecting and tolerant recipients. These findings demonstrate the feasibility of AI/ML approaches to classify small numbers of samples and they warrant further studies to identify the analytes/biochemicals corresponding to discriminative features as potential biomarkers of islet allograft immune rejection and tolerance.


Author(s):  
Mokhtar Al-Suhaiqi ◽  
Muneer A. S. Hazaa ◽  
Mohammed Albared

Due to rapid growth of research articles in various languages, cross-lingual plagiarism detection problem has received increasing interest in recent years. Cross-lingual plagiarism detection is more challenging task than monolingual plagiarism detection. This paper addresses the problem of cross-lingual plagiarism detection (CLPD) by proposing a method that combines keyphrases extraction, monolingual detection methods and machine learning approach. The research methodology used in this study has facilitated to accomplish the objectives in terms of designing, developing, and implementing an efficient Arabic – English cross lingual plagiarism detection. This paper empirically evaluates five different monolingual plagiarism detection methods namely i)N-Grams Similarity, ii)Longest Common Subsequence, iii)Dice Coefficient, iv)Fingerprint based Jaccard Similarity  and v) Fingerprint based Containment Similarity. In addition, three machine learning approaches namely i) naïve Bayes, ii) Support Vector Machine, and iii) linear logistic regression classifiers are used for Arabic-English Cross-language plagiarism detection. Several experiments are conducted to evaluate the performance of the key phrases extraction methods. In addition, Several experiments to investigate the performance of machine learning techniques to find the best method for Arabic-English Cross-language plagiarism detection. According to the experiments of Arabic-English Cross-language plagiarism detection, the highest result was obtained using SVM   classifier with 92% f-measure. In addition, the highest results were obtained by all classifiers are achieved, when most of the monolingual plagiarism detection methods are used. 


Author(s):  
Ali Al-Ramini ◽  
Mohammad A Takallou ◽  
Daniel P Piatkowski ◽  
Fadi Alsaleem

Most cities in the United States lack comprehensive or connected bicycle infrastructure; therefore, inexpensive and easy-to-implement solutions for connecting existing bicycle infrastructure are increasingly being employed. Signage is one of the promising solutions. However, the necessary data for evaluating its effect on cycling ridership is lacking. To overcome this challenge, this study tests the potential of using readily-available crowdsourced data in concert with machine-learning methods to provide insight into signage intervention effectiveness. We do this by assessing a natural experiment to identify the potential effects of adding or replacing signage within existing bicycle infrastructure in 2019 in the city of Omaha, Nebraska. Specifically, we first visually compare cycling traffic changes in 2019 to those from the previous two years (2017–2018) using data extracted from the Strava fitness app. Then, we use a new three-step machine-learning approach to quantify the impact of signage while controlling for weather, demographics, and street characteristics. The steps are as follows: Step 1 (modeling and validation) build and train a model from the available 2017 crowdsourced data (i.e., Strava, Census, and weather) that accurately predicts the cycling traffic data for any street within the study area in 2018; Step 2 (prediction) use the model from Step 1 to predict bicycle traffic in 2019 while assuming new signage was not added; Step 3 (impact evaluation) use the difference in prediction from actual traffic in 2019 as evidence of the likely impact of signage. While our work does not demonstrate causality, it does demonstrate an inexpensive method, using readily-available data, to identify changing trends in bicycling over the same time that new infrastructure investments are being added.


2020 ◽  
Vol 10 (16) ◽  
pp. 5673 ◽  
Author(s):  
Daniela Cardone ◽  
David Perpetuini ◽  
Chiara Filippini ◽  
Edoardo Spadolini ◽  
Lorenza Mancini ◽  
...  

Traffic accidents determine a large number of injuries, sometimes fatal, every year. Among other factors affecting a driver’s performance, an important role is played by stress which can decrease decision-making capabilities and situational awareness. In this perspective, it would be beneficial to develop a non-invasive driver stress monitoring system able to recognize the driver’s altered state. In this study, a contactless procedure for drivers’ stress state assessment by means of thermal infrared imaging was investigated. Thermal imaging was acquired during an experiment on a driving simulator, and thermal features of stress were investigated with comparison to a gold-standard metric (i.e., the stress index, SI) extracted from contact electrocardiography (ECG). A data-driven multivariate machine learning approach based on a non-linear support vector regression (SVR) was employed to estimate the SI through thermal features extracted from facial regions of interest (i.e., nose tip, nostrils, glabella). The predicted SI showed a good correlation with the real SI (r = 0.61, p = ~0). A two-level classification of the stress state (STRESS, SI ≥ 150, versus NO STRESS, SI < 150) was then performed based on the predicted SI. The ROC analysis showed a good classification performance with an AUC of 0.80, a sensitivity of 77%, and a specificity of 78%.


Author(s):  
Yuta Maeda ◽  
Yoshiko Yamanaka ◽  
Takeo Ito ◽  
Shinichiro Horikawa

Summary We propose a new algorithm, focusing on spatial amplitude patterns, to automatically detect volcano seismic events from continuous waveforms. Candidate seismic events are detected based on signal-to-noise ratios. The algorithm then utilizes supervised machine learning to classify the existing candidate events into true and false categories. The input learning data are the ratios of the number of time samples with amplitudes greater than the background noise level at 1 s intervals (large amplitude ratios) given at every station site, and a manual classification table in which ‘true'' or ‘false'' flags are assigned to candidate events. A two-step approach is implemented in our procedure. First, using the large amplitude ratios at all stations, a neural network model representing a continuous spatial distribution of large amplitude probabilities is investigated at 1 s intervals. Second, several features are extracted from these spatial distributions, and a relation between the features and classification to true and false events is learned by a support vector machine. This two-step approach is essential to account for temporal loss of data, or station installation, movement, or removal. We evaluated the algorithm using data from Mt. Ontake, Japan, during the first ten days of a dense observation trial in the summit region (November 1–10, 2017). Results showed a classification accuracy of more than 97 per cent.


Sign in / Sign up

Export Citation Format

Share Document