Determining Zygosity in Infant Twins – Revisiting the Questionnaire Approach

Abstract Accurate zygosity determination is a fundamental step in twin research. Although DNA-based testing is the gold standard for determining zygosity, collecting biological samples is not feasible in all research settings or all families. Previous work has demonstrated the feasibility of zygosity estimation based on questionnaire (physical similarity) data in older twins, but the extent to which this is also a reliable approach in infancy is less well established. Here, we report the accuracy of different questionnaire-based zygosity determination approaches (traditional and machine learning) in 5.5 month-old twins. The participant cohort comprised 284 infant twin pairs (128 dizygotic and 156 monozygotic) who participated in the Babytwins Study Sweden (BATSS). Manual scoring based on an established technique validated in older twins accurately predicted 90.49% of the zygosities with a sensitivity of 91.65% and specificity of 89.06%. The machine learning approach improved the prediction accuracy to 93.10%, with a sensitivity of 91.30% and specificity of 94.29%. Additionally, we quantified the systematic impact of zygosity misclassification on estimates of genetic and environmental influences using simulation-based sensitivity analysis on a separate data set to show the implication of our machine learning accuracy gain. In conclusion, our study demonstrates the feasibility of determining zygosity in very young infant twins using a questionnaire with four items and builds a scalable machine learning model with better metrics, thus a viable alternative to DNA tests in large-scale infant twin studies.

Download Full-text

Machine Learning Model for GSM BSC Control Plane Units

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1044.0886s19 ◽

2019 ◽

Vol 8 (6S) ◽

pp. 219-223

Keyword(s):

Machine Learning ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Model Parameters ◽

Large Set ◽

Data Set ◽

Wide Acceptance ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Accuracy Of Prediction

At maximum traffic intensity i.e. during the busy hour, the GSM BSC signalling units (BSU) measured CPU load will be at its peak. The BSUs CPU load is a function of the number of transceivers (TRXs) mapped to it and hence the volume of offered traffic being handled by the unit. The unit CPU load is also a function of the nature of the offered load, i.e. with the same volume of offered load, the CPU load with the nominal traffic profile would be different as compared to some other arbitrary traffic profile. To manage future traffic growth, a model to estimate the BSU unit CPU load is an essential need. In recent times, using Machine Learning (ML) to develop such a model is an approach that has gained wide acceptance. Since, the estimation of CPU load is difficult as it depends on large set of parameters, machine learning approach is more scalable. In this paper, we describe a back-propagation neural network model that was developed to estimate the BSU unit CPU load. We describe the model parameters and choices and implementation architecture, and estimate its accuracy of prediction, based on an evaluation data set. We also discuss alternative ML architectures and compare their relative prediction accuracies, to the primary ML model

Download Full-text

Predicting risk of dyslexia with an online gamified test

PLoS ONE ◽

10.1371/journal.pone.0241687 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0241687

Author(s):

Luz Rello ◽

Ricardo Baeza-Yates ◽

Abdullah Ali ◽

Jeffrey P. Bigham ◽

Miquel Serra

Keyword(s):

Machine Learning ◽

Screening Tool ◽

Failure Detection ◽

Learning Disorder ◽

Desktop Computer ◽

Data Set ◽

Specific Learning Disorder ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Specific Learning

Dyslexia is a specific learning disorder related to school failure. Detection is both crucial and challenging, especially in languages with transparent orthographies, such as Spanish. To make detecting dyslexia easier, we designed an online gamified test and a predictive machine learning model. In a study with more than 3,600 participants, our model correctly detected over 80% of the participants with dyslexia. To check the robustness of the method we tested our method using a new data set with over 1,300 participants with age customized tests in a different environment -a tablet instead of a desktop computer- reaching a recall of over 78% for the class with dyslexia for children 12 years old or older. Our work shows that dyslexia can be screened using a machine learning approach. An online screening tool in Spanish based on our methods has already been used by more than 200,000 people.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

MODES: model-based optimization on distributed embedded systems

Machine Learning ◽

10.1007/s10994-021-06014-6 ◽

2021 ◽

Author(s):

Junjie Shi ◽

Jiang Bian ◽

Jakob Richter ◽

Kuan-Hsun Chen ◽

Jörg Rahnenführer ◽

...

Keyword(s):

Machine Learning ◽

Embedded Systems ◽

Learning Model ◽

Black Box ◽

Distributed Embedded Systems ◽

Data Set ◽

Individual Model ◽

Model Based ◽

Machine Learning Model ◽

Distributed Machine Learning

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

Machine learning identifies an immunological pattern associated with multiple juvenile idiopathic arthritis subtypes

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2018-214354 ◽

2019 ◽

Vol 78 (5) ◽

pp. 617-628 ◽

Cited By ~ 5

Author(s):

Erika Van Nieuwenhove ◽

Vasiliki Lagou ◽

Lien Van Eyck ◽

James Dooley ◽

Ulrich Bodenhofer ◽

...

Keyword(s):

Machine Learning ◽

Juvenile Idiopathic Arthritis ◽

Large Scale ◽

Inflammatory Diseases ◽

Adaptive Immune System ◽

Healthy Children ◽

Learning Approaches ◽

Data Set ◽

Immune Signature ◽

Systemic Jia

ObjectivesJuvenile idiopathic arthritis (JIA) is the most common class of childhood rheumatic diseases, with distinct disease subsets that may have diverging pathophysiological origins. Both adaptive and innate immune processes have been proposed as primary drivers, which may account for the observed clinical heterogeneity, but few high-depth studies have been performed.MethodsHere we profiled the adaptive immune system of 85 patients with JIA and 43 age-matched controls with indepth flow cytometry and machine learning approaches.ResultsImmune profiling identified immunological changes in patients with JIA. This immune signature was shared across a broad spectrum of childhood inflammatory diseases. The immune signature was identified in clinically distinct subsets of JIA, but was accentuated in patients with systemic JIA and those patients with active disease. Despite the extensive overlap in the immunological spectrum exhibited by healthy children and patients with JIA, machine learning analysis of the data set proved capable of discriminating patients with JIA from healthy controls with ~90% accuracy.ConclusionsThese results pave the way for large-scale immune phenotyping longitudinal studies of JIA. The ability to discriminate between patients with JIA and healthy individuals provides proof of principle for the use of machine learning to identify immune signatures that are predictive to treatment response group.

Download Full-text

Different firm responses to the COVID-19 pandemic shocks: machine-learning evidence on the Vietnamese labor market

International Journal of Emerging Markets ◽

10.1108/ijoem-02-2021-0292 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lam Hoang Viet Le ◽

Toan Luu Duc Huynh ◽

Bryan S. Weber ◽

Bao Khac Quoc Nguyen

Keyword(s):

Machine Learning ◽

Labor Market ◽

Large Scale ◽

Government Support ◽

Policy Implications ◽

Machine Learning Techniques ◽

Firm Characteristics ◽

Data Set ◽

Content Type ◽

Firm Responses

PurposeThis paper aims to identify the disproportionate impacts of the COVID-19 pandemic on labor markets.Design/methodology/approachThe authors conduct a large-scale survey on 16,000 firms from 82 industries in Ho Chi Minh City, Vietnam, and analyze the data set by using different machine-learning methods.FindingsFirst, job loss and reduction in state-owned enterprises have been significantly larger than in other types of organizations. Second, employees of foreign direct investment enterprises suffer a significantly lower labor income than those of other groups. Third, the adverse effects of the COVID-19 pandemic on the labor market are heterogeneous across industries and geographies. Finally, firms with high revenue in 2019 are more likely to adopt preventive measures, including the reduction of labor forces. The authors also find a significant correlation between firms' revenue and labor reduction as traditional econometrics and machine-learning techniques suggest.Originality/valueThis study has two main policy implications. First, although government support through taxes has been provided, the authors highlight evidence that there may be some additional benefit from targeting firms that have characteristics associated with layoffs or other negative labor responses. Second, the authors provide information that shows which firm characteristics are associated with particular labor market responses such as layoffs, which may help target stimulus packages. Although the COVID-19 pandemic affects most industries and occupations, heterogeneous firm responses suggest that there could be several varieties of targeted policies-targeting firms that are likely to reduce labor forces or firms likely to face reduced revenue. In this paper, the authors outline several industries and firm characteristics which appear to more directly be reducing employee counts or having negative labor responses which may lead to more cost–effect stimulus.

Download Full-text

Math proficiency prediction in computer-based international large-scale assessments using a multi-class machine learning model

10.1109/sisy52375.2021.9582522 ◽

2021 ◽

Author(s):

Aleksandar Pejic ◽

Piroska Stanic Molcer ◽

Kristian Gulaci

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Model ◽

Machine Learning Model ◽

Computer Based ◽

Large Scale Assessments ◽

Math Proficiency

Download Full-text

A Fast Machine Learning Model for Large-Scale Estimation of Annual Solar Irradiation on Rooftops

Proceedings of the ISES Solar World Congress 2019 ◽

10.18086/swc.2019.45.12 ◽

2019 ◽

Author(s):

Alina Walch ◽

Roberto Castello ◽

Nahid Mohajeri ◽

Jean-Louis Scartezzini

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Model ◽

Solar Irradiation ◽

Scale Estimation ◽

Machine Learning Model

Download Full-text

Comparing the utility of in vivo transposon mutagenesis approaches in yeast species to infer gene essentiality

10.1101/732552 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anton Levitan ◽

Andrew N. Gale ◽

Emma K. Dallon ◽

Darby W. Kozan ◽

Kyle W. Cunningham ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Yeast Species ◽

Transposon Mutagenesis ◽

Learning Approach ◽

Long Distance ◽

Gene Essentiality ◽

Genome Wide ◽

Machine Learning Approach

ABSTRACTIn vivo transposon mutagenesis, coupled with deep sequencing, enables large-scale genome-wide mutant screens for genes essential in different growth conditions. We analyzed six large-scale studies performed on haploid strains of three yeast species (Saccharomyces cerevisiae, Schizosaccaromyces pombe, and Candida albicans), each mutagenized with two of three different heterologous transposons (AcDs, Hermes, and PiggyBac). Using a machine-learning approach, we evaluated the ability of the data to predict gene essentiality. Important data features included sufficient numbers and distribution of independent insertion events. All transposons showed some bias in insertion site preference because of jackpot events, and preferences for specific insertion sequences and short-distance vs long-distance insertions. For PiggyBac, a stringent target sequence limited the ability to predict essentiality in genes with few or no target sequences. The machine learning approach also robustly predicted gene function in less well-studied species by leveraging cross-species orthologs. Finally, comparisons of isogenic diploid versus haploid S. cerevisiae isolates identified several genes that are haplo-insufficient, while most essential genes, as expected, were recessive. We provide recommendations for the choice of transposons and the inference of gene essentiality in genome-wide studies of eukaryotic haploid microbes such as yeasts, including species that have been less amenable to classical genetic studies.

Download Full-text