Discovery of new stellar groups in the Orion complex

We test the ability of two unsupervised machine learning algorithms, EnLink and Shared Nearest Neighbor (SNN), to identify stellar groupings in the Orion star-forming complex as an application to the 5D astrometric data from Gaia DR2. The algorithms represent two distinct approaches to limiting user bias when selecting parameter values and evaluating the relative weights among astrometric parameters. EnLink adopts a locally adaptive distance metric and eliminates the need for parameter tuning through automation. The original SNN relies only on human input for parameter tuning so we modified SNN to run in two stages. We first ran the original SNN 7000 times, each with a randomly generated sample according to within-source co-variance matrices provided in Gaia DR2 and random parameter values within reasonable ranges. During the second stage, we modified SNN to identify the most repeating stellar groups from the 25 798 we obtained in the first stage. We recovered 22 spatially and kinematically coherent groups in the Orion complex, 12 of which were previously unknown. The groups show a wide distribution of distances extending as far as about 150 pc in front of the star-forming Orion molecular clouds, to about 50 pc beyond them, where we, unexpectedly, find several groups. Our results reveal the wealth of sub-structure in the OB association, within and beyond the classical Blaauw Orion OBI sub-groups. A full characterization of the new groups is essential as it offers the potential to unveil how star formation proceeds globally in large complexes such as Orion.

Download Full-text

Variable Selection and Parameter Tuning for BART Modeling in the Fragile Families Challenge

Socius Sociological Research for a Dynamic World ◽

10.1177/2378023119825886 ◽

2019 ◽

Vol 5 ◽

pp. 237802311982588 ◽

Cited By ~ 2

Author(s):

Nicole Bohme Carnegie ◽

James Wu

Keyword(s):

Variable Selection ◽

Missing Values ◽

Linear Models ◽

Parameter Tuning ◽

Black Box ◽

Machine Learning Algorithms ◽

Outcome Variable ◽

Fragile Families ◽

Great Flexibility ◽

Additive Regression

Our goal for the Fragile Families Challenge was to develop a hands-off approach that could be applied in many settings to identify relationships that theory-based models might miss. Data processing was our first and most time-consuming task, particularly handling missing values. Our second task was to reduce the number of variables for modeling, and we compared several techniques for variable selection: least absolute selection and shrinkage operator, regression with a horseshoe prior, Bayesian generalized linear models, and Bayesian additive regression trees (BART). We found minimal differences in final performance based on the choice of variable selection method. We proceeded with BART for modeling because it requires minimal assumptions and permits great flexibility in fitting surfaces and based on previous success using BART in black-box modeling competitions. In addition, BART allows for probabilistic statements about the predictions and other inferences, which is an advantage over most machine learning algorithms. A drawback to BART, however, is that it is often difficult to identify or characterize individual predictors that have strong influences on the outcome variable.

Download Full-text

Feasibility of Using Floor Vibration to Detect Human Falls

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18010200 ◽

2020 ◽

Vol 18 (1) ◽

pp. 200

Author(s):

Yu Shao ◽

Xinyue Wang ◽

Wenjie Song ◽

Sobia Ilyas ◽

Haibo Guo ◽

...

Keyword(s):

Machine Learning ◽

Human Body ◽

Nearest Neighbor ◽

Body Movement ◽

Modern Society ◽

Recognition System ◽

Machine Learning Algorithms ◽

K Nearest Neighbor ◽

Study Results ◽

Floor Vibration

With the increasing aging population in modern society, falls as well as fall-induced injuries in elderly people become one of the major public health problems. This study proposes a classification framework that uses floor vibrations to detect fall events as well as distinguish different fall postures. A scaled 3D-printed model with twelve fully adjustable joints that can simulate human body movement was built to generate human fall data. The mass proportion of a human body takes was carefully studied and was reflected in the model. Object drops, human falling tests were carried out and the vibration signature generated in the floor was recorded for analyses. Machine learning algorithms including K-means algorithm and K nearest neighbor algorithm were introduced in the classification process. Three classifiers (human walking versus human fall, human fall versus object drop, human falls from different postures) were developed in this study. Results showed that the three proposed classifiers can achieve the accuracy of 100, 85, and 91%. This paper developed a framework of using floor vibration to build the pattern recognition system in detecting human falls based on a machine learning approach.

Download Full-text

A Comparative Survey of Feature Extraction and Machine Learning Methods in Diverse Acoustic Environments

Sensors ◽

10.3390/s21041274 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1274

Author(s):

Daniel Bonet-Solà ◽

Rosa Ma Alsina-Pagès

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Best Practice ◽

Nearest Neighbor ◽

Gaussian Mixture ◽

Machine Learning Algorithms ◽

Multimedia Retrieval ◽

Natural Environments ◽

K Nearest Neighbor ◽

Acoustic Environments

Acoustic event detection and analysis has been widely developed in the last few years for its valuable application in monitoring elderly or dependant people, for surveillance issues, for multimedia retrieval, or even for biodiversity metrics in natural environments. For this purpose, sound source identification is a key issue to give a smart technological answer to all the aforementioned applications. Diverse types of sounds and variate environments, together with a number of challenges in terms of application, widen the choice of artificial intelligence algorithm proposal. This paper presents a comparative study on combining several feature extraction algorithms (Mel Frequency Cepstrum Coefficients (MFCC), Gammatone Cepstrum Coefficients (GTCC), and Narrow Band (NB)) with a group of machine learning algorithms (k-Nearest Neighbor (kNN), Neural Networks (NN), and Gaussian Mixture Model (GMM)), tested over five different acoustic environments. This work has the goal of detailing a best practice method and evaluate the reliability of this general-purpose algorithm for all the classes. Preliminary results show that most of the combinations of feature extraction and machine learning present acceptable results in most of the described corpora. Nevertheless, there is a combination that outperforms the others: the use of GTCC together with kNN, and its results are further analyzed for all the corpora.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

Multi-objective optimization of shared nearest neighbor similarity for feature selection

Applied Soft Computing ◽

10.1016/j.asoc.2015.08.042 ◽

2015 ◽

Vol 37 ◽

pp. 751-762 ◽

Cited By ~ 9

Author(s):

Partha Pratim Kundu ◽

Sushmita Mitra

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Multi Objective Optimization ◽

Multi Objective ◽

Shared Nearest Neighbor

Download Full-text

A Comparative Analysis of Machine Learning Algorithms Modeled from Machine Vision-Based Lettuce Growth Stage Classification in Smart Aquaponics

International Journal of Environmental Science and Development ◽

10.18178/ijesd.2020.11.9.1288 ◽

2020 ◽

Vol 11 (9) ◽

pp. 442-449 ◽

Cited By ~ 1

Author(s):

Sandy C. Lauguico ◽

◽

Ronnie S. Concepcion II ◽

Jonnel D. Alejandrino ◽

Rogelio Ruzcko Tobias ◽

...

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Machine Vision ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Urban Farming ◽

K Nearest Neighbor ◽

Lettuce Growth

The arising problem on food scarcity drives the innovation of urban farming. One of the methods in urban farming is the smart aquaponics. However, for a smart aquaponics to yield crops successfully, it needs intensive monitoring, control, and automation. An efficient way of implementing this is the utilization of vision systems and machine learning algorithms to optimize the capabilities of the farming technique. To realize this, a comparative analysis of three machine learning estimators: Logistic Regression (LR), K-Nearest Neighbor (KNN), and Linear Support Vector Machine (L-SVM) was conducted. This was done by modeling each algorithm from the machine vision-feature extracted images of lettuce which were raised in a smart aquaponics setup. Each of the model was optimized to increase cross and hold-out validations. The results showed that KNN having the tuned hyperparameters of n_neighbors=24, weights='distance', algorithm='auto', leaf_size = 10 was the most effective model for the given dataset, yielding a cross-validation mean accuracy of 87.06% and a classification accuracy of 91.67%.

Download Full-text

Selection of Pairings Reaching Evenly Across the Data (SPREAD): A simple algorithm to design maximally informative fully crossed mating experiments

10.1101/009720 ◽

2014 ◽

Author(s):

Kolea Zimmerman ◽

Daniel Levitis ◽

Ethan Addicott ◽

Anne Pringle

Keyword(s):

Nearest Neighbor ◽

Simple Algorithm ◽

Two Dimensional ◽

Neighbor Distance ◽

Crossing Experiments ◽

The Mean ◽

Parameter Values ◽

Selection Of ◽

Novel Algorithm ◽

Trait Space

We present a novel algorithm for the design of crossing experiments. The algorithm identifies a set of individuals (a ?crossing-set?) from a larger pool of potential crossing-sets by maximizing the diversity of traits of interest, for example, maximizing the range of genetic and geographic distances between individuals included in the crossing-set. To calculate diversity, we use the mean nearest neighbor distance of crosses plotted in trait space. We implement our algorithm on a real dataset ofNeurospora crassastrains, using the genetic and geographic distances between potential crosses as a two-dimensional trait space. In simulated mating experiments, crossing-sets selected by our algorithm provide better estimates of underlying parameter values than randomly chosen crossing-sets.

Download Full-text

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset

International Journal of Computer Science and Mobile Computing ◽

10.47760/ijcsmc.2021.v10i03.002 ◽

2021 ◽

Vol 10 (3) ◽

pp. 14-25

Author(s):

Parilkumar Shiroya

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor

Download Full-text

Sensor Validation for Indoor Air Quality using Machine Learning

10.5753/eniac.2020.12174 ◽

2020 ◽

Author(s):

Vagner Seibert ◽

Ricardo Araújo ◽

Richard McElligott

Keyword(s):

Machine Learning ◽

Air Quality ◽

Indoor Air Quality ◽

Indoor Air ◽

Nearest Neighbor ◽

Contextual Information ◽

Machine Learning Algorithms ◽

K Nearest Neighbor ◽

Sensor Validation ◽

Single Reading

To guarantee a high indoor air quality is an increasingly important task. Sensors measure pollutants in the air and allow for monitoring and controlling air quality. However, all sensors are susceptible to failures, either permanent or transitory, that can yield incorrect readings. Automatically detecting such faulty readings is therefore crucial to guarantee sensors' reliability. In this paper we evaluate three Machine Learning algorithms applied to the task of classifying a single reading from a sensor as faulty or not, comparing them to standard statistical approaches. We show that all tested machine learning methods -- Multi-layer Perceptron, K-Nearest Neighbor and Random Forest -- outperform their statistical counterparts, both by allowing better separation boundaries and by allowing for the use of contextual information. We further show that this result does not depend on the amount of data, but ML methods are able to continue to improve as more data is made available.

Download Full-text

Android Malware Detection using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1011.0982s1219 ◽

2020 ◽

Vol 8 (2S12) ◽

pp. 65-70

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

User Interest ◽

Android Malware ◽

Android Malware Detection

Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.

Download Full-text

Discovery of new stellar groups in the Orion complex

Variable Selection and Parameter Tuning for BART Modeling in the Fragile Families Challenge

Feasibility of Using Floor Vibration to Detect Human Falls

A Comparative Survey of Feature Extraction and Machine Learning Methods in Diverse Acoustic Environments

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Multi-objective optimization of shared nearest neighbor similarity for feature selection

A Comparative Analysis of Machine Learning Algorithms Modeled from Machine Vision-Based Lettuce Growth Stage Classification in Smart Aquaponics

Selection of Pairings Reaching Evenly Across the Data (SPREAD): A simple algorithm to design maximally informative fully crossed mating experiments

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset﻿

Sensor Validation for Indoor Air Quality using Machine Learning

Android Malware Detection using Machine Learning

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset