Estimating Seismic Moment Tensors based on Bayesian Machine Learning

Estimating fast earthquakes&#8217; source mechanism is essential for near real time hazard assessments, which are based on shakemaps and further downstream analysis such as physics based aftershock probability calculations. The model and data uncertainties associated to the estimated source mechanism are also crucial. We propose a Baysian Machine Learning algorithm trained on normalized synthetic waveforms for estimating the full moment tensor of earthquakes almost instantaneously with associated source parameter uncertainties. A prior assumption is an appropriate location of the earthquakes along with its associated uncertainties. Here, this is obtained by already established Machine learning based algorithms, where the training data set is computed by forward calculations of synthetic waveforms based on Green&#8217;s functions calculated for a specified 1-D velocity model using the Pyrocko software package. The learned labels, which are the information learned by the Machine Learning algorithm associated to the data, are the moment tensor components, described with only five unique parameters. For predefined locations in an area of interest we train a full independent Bayesian Convolutional Neural Network (BNN). With variational inference the weights of the network are not scalar but represent a distribution of weights for the activation of neurons. Each evaluation of input data into our BNN yields therefore to a set of predictions with associated probabilities. This allows us to evaluate an ensemble of possible source mechanisms for each evaluation of input waveform data. As a test set, we trained our models for an area south of the Coso geothermal field in California for a fixed set of broadband stations at maximum 150 km distance. We validate our approach with a subset of earthquakes from the Ridgecrest 2019-2020 sequence. For this data set we compare the results of the estimates of our Machine Learning based approach with independently determined focal mechanism and moment tensors. Overall, we benchmark our approach with data unseen during the training process of the Machine Learning models and show its capabilities for generating similar source mechanism estimations as independent studies within only a few seconds processing time per earthquake. We finally apply the method to seismic data of a research network monitoring the area around two south-german geothermal power plants. Our approach demonstrates the potential of Machine Learning for being implemented in operational frameworks for fast earthquake source mechanism estimation with associated uncertainties.

Download Full-text

Big Data for Health Care Analytics using Extreme Machine Learning Based on Map Reduce

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5808.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2758-2762

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Storage ◽

Clinical Data ◽

Disease Risk ◽

Learning Algorithm ◽

Information Storage ◽

Support Vector ◽

Machine Learning Algorithm ◽

Data Set

A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.

Download Full-text

A SYNTHETIC DATA SET OF 3D OOCYTE IMAGES AND MACHINE LEARNING ALGORITHM AS A MODEL TO ASSESS THE REPRODUCTIVE POTENTIAL OF OOCYTES

Fertility and Sterility ◽

10.1016/j.fertnstert.2020.08.424 ◽

2020 ◽

Vol 114 (3) ◽

pp. e145

Author(s):

Gerard Letterie ◽

Nathan Kundtz

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Reproductive Potential ◽

Synthetic Data ◽

Machine Learning Algorithm ◽

Data Set

Download Full-text

Plural marking patterns of nouns and their associates in the world’s languages

Studies in Language ◽

10.1075/sl.16001.che ◽

2020 ◽

Vol 44 (1) ◽

pp. 231-269

Author(s):

Rong Chen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Data Set ◽

Component Structure ◽

Universal Distribution ◽

Two Component ◽

The World ◽

Plural Marking

Abstract Plural marking reaches most corners of languages. When a noun occurs with another linguistic element, which is called associate in this paper, plural marking on the two-component structure has four logically possible patterns: doubly unmarked, noun-marked, associate-marked and doubly marked. These four patterns do not distribute homogeneously in the world’s languages, because they are motivated by two competing motivations iconicity and economy. Some patterns are preferred over others, and this preference is consistently found in languages across the world. In other words, there exists a universal distribution of the four plural marking patterns. Furthermore, holding the view that plural marking on associates expresses plurality of nouns, I propose a hypothetical universal which uses the number of pluralized associates to predict plural marking on nouns. A data set collected from a sample of 100 languages is used to test the hypothetical universal, by employing the machine learning algorithm logistic regression.

Download Full-text

A Surveillance on Machine Learning Algorithms and Its Applications

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9064 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4294-4298

Author(s):

B. R. Sunil Kumar ◽

B. S. Siddhartha ◽

S. N. Shwetha ◽

K. Arpitha

Keyword(s):

Machine Learning ◽

Health Care ◽

Sentiment Analysis ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

Data Set ◽

Pros And Cons ◽

Primary Advantage

This paper intends to use distinct machine learning algorithms and exploring its multi-features. The primary advantage of machine learning is, a machine learning algorithm can predict its work automatically by learning what to do with information. This paper reveals the concept of machine learning and its algorithms which can be used for different applications such as health care, sentiment analysis and many more. Sometimes the programmers will get confused which algorithm to apply for their applications. This paper provides an idea related to the algorithm used on the basis of how accurately it fits. Based on the collected data, one of the algorithms can be selected based upon its pros and cons. By considering the data set, the base model is developed, trained and tested. Then the trained model is ready for prediction and can be deployed on the basis of feasibility.

Download Full-text

Image Classification using CNN and Machine Learning

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195298 ◽

2019 ◽

pp. 575-580

Author(s):

G. Keerthi Devipriya ◽

E. Chandana ◽

B. Prathyusha ◽

T. Seshu Chakravarthy

Keyword(s):

Machine Learning ◽

Image Classification ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Data Set ◽

Training Models ◽

Classification Of Images ◽

Respective Category

Here by in this paper we are interested for classification of Images and Recognition. We expose the performance of training models by using a classifier algorithm and an API that contains set of images where we need to compare the uploaded image with the set of images available in the data set that we have taken. After identifying its respective category the image need to be placed in it. In order to classify images we are using a machine learning algorithm that comparing and placing the images.

Download Full-text

Application of machine learning algorithm for predicting gestational diabetes mellitus in early pregnancy†

Frontiers of Nursing ◽

10.2478/fon-2021-0022 ◽

2021 ◽

Vol 8 (3) ◽

pp. 209-221

Author(s):

Li-Li Wei ◽

Yue-Shuai Pan ◽

Yan Zhang ◽

Kai Chen ◽

Hao-Yu Wang ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Random Forest Algorithm ◽

Random Forest Regression ◽

Data Set

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.

Download Full-text

Understanding the expression of grievances in the Arabic Twitter-sphere using machine learning

Journal of Criminological Research Policy and Practice ◽

10.1108/jcrpp-02-2019-0009 ◽

2019 ◽

Vol 5 (2) ◽

pp. 108-119

Author(s):

Yeslam Al-Saggaf ◽

Amanda Davies

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Network ◽

Network Analysis ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Data Set ◽

Content Type ◽

The Social ◽

Twitter Users

Purpose The purpose of this paper is to discuss the design, application and findings of a case study in which the application of a machine learning algorithm is utilised to identify the grievances in Twitter in an Arabian context. Design/methodology/approach To understand the characteristics of the Twitter users who expressed the identified grievances, data mining techniques and social network analysis were utilised. The study extracted a total of 23,363 tweets and these were stored as a data set. The machine learning algorithm applied to this data set was followed by utilising a data mining process to explore the characteristics of the Twitter feed users. The network of the users was mapped and the individual level of interactivity and network density were calculated. Findings The machine learning algorithm revealed 12 themes all of which were underpinned by the coalition of Arab countries blockade of Qatar. The data mining analysis revealed that the tweets could be clustered in three clusters, the main cluster included users with a large number of followers and friends but who did not mention other users in their tweets. The social network analysis revealed that whilst a large proportion of users engaged in direct messages with others, the network ties between them were not registered as strong. Practical implications Borum (2011) notes that invoking grievances is the first step in the radicalisation process. It is hoped that by understanding these grievances, the study will shed light on what radical groups could invoke to win the sympathy of aggrieved people. Originality/value In combination, the machine learning algorithm offered insights into the grievances expressed within the tweets in an Arabian context. The data mining and the social network analyses revealed the characteristics of the Twitter users highlighting identifying and managing early intervention of radicalisation.

Download Full-text

Data Set Construction and Performance Comparison of Machine Learning Algorithm for Detection of Unauthorized AP

Advances in Computer Science and Ubiquitous Computing - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-10-7605-3_144 ◽

2017 ◽

pp. 910-914

Author(s):

Doyeon Kim ◽

Dongkyoo Shin ◽

Dongil Shin

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Performance Comparison ◽

Machine Learning Algorithm ◽

Data Set ◽

And Performance

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

Feature Engineering for Various Data Types in Data Science

Advances in Data Mining and Database Management - Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science ◽

10.4018/978-1-7998-6659-6.ch001 ◽

2021 ◽

pp. 1-16

Author(s):

Nilesh Kumar Sahu ◽

Manorama Patnaik ◽

Itu Snigdh

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Input Data ◽

Data Science ◽

Learning Algorithm ◽

Feature Engineering ◽

Machine Learning Algorithm ◽

Data Types ◽

Data Set ◽

Machine Learning Models

The precision of any machine learning algorithm depends on the data set, its suitability, and its volume. Therefore, data and its characteristics have currently become the predominant components of any predictive or precision-based domain like machine learning. Feature engineering refers to the process of changing and preparing this input data so that it is ready for training machine learning models. Several features such as categorical, numerical, mixed, date, and time are to be considered for feature extraction in feature engineering. Datasets containing characteristics such as cardinality, missing data, and rare labels for categorical features, distribution, outliers, and magnitude are currently considered as features. This chapter discusses various data types and their techniques for applying to feature engineering. This chapter also focuses on the implementation of various data techniques for feature extraction.

Download Full-text