An unsupervised machine-learning-based classification of aerosol microphysical properties over 10 years at Cabo Verde

Abstract. The Cape Verde Atmospheric Observatory (CVAO), which is influenced by both, marine and desert dust air masses, has been used for long-term measurements of different properties of the atmospheric aerosol from 2008 to 2017. These properties include particle number size distributions (PNSD), light absorbing carbon (LAC) and concentrations of cloud condensation nuclei (CCN) together with their hygroscopicity. Here we summarize the results obtained for these properties and use an unsupervised machine learning algorithm for the classification of aerosol types. Five types of aerosols, i.e., marine, freshly-formed, mixture, moderate dust and heavy dust, were classified. Air masses during marine periods are from the Atlantic Ocean and during dust periods are from the Sahara. Heavy dust was more frequently present during wintertime, whereas the clean marine periods were more frequently present during springtime. It was observed that during the dust periods CCN number concentrations at a supersaturation of 0.30 % are roughly 2.5 times higher than during marine periods, but the hygroscopicity (κ) of particles in the size range from ∼30 to ∼175 nm during marine and dust periods are comparable. The long-term data presented here, together with the aerosol classification, can be used as a base to improve our understanding of annual cycles of the atmospheric aerosol in the eastern tropical Atlantic and on aerosol-cloud interactions and it can be used as a base for driving, evaluating and constraining atmospheric model simulations.

Download Full-text

Analysis of the mandibular canal course using unsupervised machine learning algorithm

PLoS ONE ◽

10.1371/journal.pone.0260194 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0260194

Author(s):

Young Hyun Kim ◽

Kug Jin Jeon ◽

Chena Lee ◽

Yoon Joo Choi ◽

Hoi-In Jung ◽

...

Keyword(s):

Machine Learning ◽

Cluster Analysis ◽

Learning Algorithm ◽

Mandibular Canal ◽

Unsupervised Machine Learning ◽

Computed Tomography Images ◽

Significant Difference ◽

Cluster 2 ◽

Axial View

Objectives Anatomical structure classification is necessary task in medical field, but the inevitable variability of interpretation among experts makes reliable classification difficult. This study aims to introduce cluster analysis, unsupervised machine learning method, for classification of three-dimensional (3D) mandibular canal (MC) courses, and to visualize standard MC courses derived from cluster analysis in the Korean population. Materials and methods A total of 429 cone-beam computed tomography images were used. Four sites in the mandible were selected for the measurement of the MC course and four parameters, two vertical and two horizontal parameters were measured per site. Cluster analysis was carried out as follows: parameter measurement, parameter normalization, cluster tendency evaluation, optimal number of clusters determination, and k-means cluster analysis. The 3D MC courses were classified into three types with statistically significant mean differences by cluster analysis. Results Cluster 1 showed a smooth line running towards the lingual side in the axial view and a steep slope in the sagittal view. Cluster 2 ran in an almost straight line closest to the lingual and inferior border of mandible. Cluster 3 showed the pathway with a bent buccally in the axial view and an increasing slope in the sagittal view in the posterior area. Cluster 2 showed the highest distribution (42.1%), and males were more widely distributed (57.1%) than the females (42.9%). Cluster 3 comprised similar ratio of male and female cases and accounted for 31.9% of the total distribution. Cluster 1 had the least distribution (26.0%) Distributions of the right and left sides did not show a statistically significant difference. Conclusion The MC courses were automatically classified as three types through cluster analysis. Cluster analysis enables the unbiased classification of the anatomical structures by reducing observer variability and can present representative standard information for each classified group.

Download Full-text

Machine learning algorithm improved automated droplet classification of ddPCR for detection of BRAF V600E in paraffin-embedded samples

Scientific Reports ◽

10.1038/s41598-021-92014-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gabriel A. Colozza-Gama ◽

Fabiano Callegari ◽

Nikola Bešič ◽

Ana C. de J. Paviza ◽

Janete M. Cerutti

Keyword(s):

Machine Learning ◽

Sanger Sequencing ◽

Learning Algorithm ◽

Absolute Quantification ◽

Braf V600e Mutation ◽

Braf V600e ◽

Driver Genes ◽

Quantitative Classification ◽

Cancer Driver

AbstractSomatic mutations in cancer driver genes can help diagnosis, prognosis and treatment decisions. Formalin-fixed paraffin-embedded (FFPE) specimen is the main source of DNA for somatic mutation detection. To overcome constraints of DNA isolated from FFPE, we compared pyrosequencing and ddPCR analysis for absolute quantification of BRAF V600E mutation in the DNA extracted from FFPE specimens and compared the results to the qualitative detection information obtained by Sanger Sequencing. Sanger sequencing was able to detect BRAF V600E mutation only when it was present in more than 15% total alleles. Although the sensitivity of ddPCR is higher than that observed for Sanger, it was less consistent than pyrosequencing, likely due to droplet classification bias of FFPE-derived DNA. To address the droplet allocation bias in ddPCR analysis, we have compared different algorithms for automated droplet classification and next correlated these findings with those obtained from pyrosequencing. By examining the addition of non-classifiable droplets (rain) in ddPCR, it was possible to obtain better qualitative classification of droplets and better quantitative classification compared to no rain droplets, when considering pyrosequencing results. Notable, only the Machine learning k-NN algorithm was able to automatically classify the samples, surpassing manual classification based on no-template controls, which shows promise in clinical practice.

Download Full-text

Multi-Class Assessment Based on Random Forests

Education Sciences ◽

10.3390/educsci11030092 ◽

2021 ◽

Vol 11 (3) ◽

pp. 92

Author(s):

Mehdi Berriri ◽

Sofiane Djema ◽

Gaëtan Rey ◽

Christel Dartigues-Pallez

Keyword(s):

Higher Education ◽

Machine Learning ◽

Random Forests ◽

Learning Algorithm ◽

Teaching Staff ◽

Machine Learning Algorithm ◽

Process Data ◽

Training Courses ◽

Education Courses

Today, many students are moving towards higher education courses that do not suit them and end up failing. The purpose of this study is to help provide counselors with better knowledge so that they can offer future students courses corresponding to their profile. The second objective is to allow the teaching staff to propose training courses adapted to students by anticipating their possible difficulties. This is possible thanks to a machine learning algorithm called Random Forest, allowing for the classification of the students depending on their results. We had to process data, generate models using our algorithm, and cross the results obtained to have a better final prediction. We tested our method on different use cases, from two classes to five classes. These sets of classes represent the different intervals with an average ranging from 0 to 20. Thus, an accuracy of 75% was achieved with a set of five classes and up to 85% for sets of two and three classes.

Download Full-text

Eye-blink artifact removal from single channel EEG with k-means and SSA

Scientific Reports ◽

10.1038/s41598-021-90437-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ajay Kumar Maddirala ◽

Kalyana C Veluvolu

Keyword(s):

Machine Learning ◽

Single Channel ◽

Learning Algorithm ◽

Singular Spectrum Analysis ◽

Machine Learning Algorithm ◽

Eeg Signal ◽

Eeg Signals ◽

Unsupervised Machine Learning ◽

Eye Blink ◽

Blink Artifact

AbstractIn recent years, the usage of portable electroencephalogram (EEG) devices are becoming popular for both clinical and non-clinical applications. In order to provide more comfort to the subject and measure the EEG signals for several hours, these devices usually consists of fewer EEG channels or even with a single EEG channel. However, electrooculogram (EOG) signal, also known as eye-blink artifact, produced by involuntary movement of eyelids, always contaminate the EEG signals. Very few techniques are available to remove these artifacts from single channel EEG and most of these techniques modify the uncontaminated regions of the EEG signal. In this paper, we developed a new framework that combines unsupervised machine learning algorithm (k-means) and singular spectrum analysis (SSA) technique to remove eye blink artifact without modifying actual EEG signal. The novelty of the work lies in the extraction of the eye-blink artifact based on the time-domain features of the EEG signal and the unsupervised machine learning algorithm. The extracted eye-blink artifact is further processed by the SSA method and finally subtracted from the contaminated single channel EEG signal to obtain the corrected EEG signal. Results with synthetic and real EEG signals demonstrate the superiority of the proposed method over the existing methods. Moreover, the frequency based measures [the power spectrum ratio ($$\Gamma $$ Γ ) and the mean absolute error (MAE)] also show that the proposed method does not modify the uncontaminated regions of the EEG signal while removing the eye-blink artifact.

Download Full-text

Pressure pattern recognition in buildings using an unsupervised machine-learning algorithm

Journal of Wind Engineering and Industrial Aerodynamics ◽

10.1016/j.jweia.2021.104629 ◽

2021 ◽

Vol 214 ◽

pp. 104629

Author(s):

Bubryur Kim ◽

N. Yuvaraj ◽

K.T. Tse ◽

Dong-Eun Lee ◽

Gang Hu

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Unsupervised Machine Learning ◽

Pressure Pattern

Download Full-text

Classification of Daily Irradiance Profiles and the Behaviour of Photovoltaic Plant Elements: The Effects of Cloud Enhancement

Applied Sciences ◽

10.3390/app11115230 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5230

Author(s):

Isabel Santiago ◽

Jorge Luis Esquivel-Martin ◽

David Trillo-Montero ◽

Rafael Jesús Real-Calvo ◽

Víctor Pallarés-López

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Automatic Classification ◽

Sampling Frequency ◽

Machine Learning Algorithms ◽

Unsupervised Machine Learning ◽

Average Efficiency ◽

Clear Sky ◽

Photovoltaic Plant

In this work, the automatic classification of daily irradiance profiles registered in a photovoltaic installation located in the south of Spain was carried out for a period of nine years, with a sampling frequency of 5 min, and the subsequent analysis of the operation of the elements of the installation on each type of day was also performed. The classification was based on the total daily irradiance values and the fluctuations of this parameter throughout the day. The irradiance profiles were grouped into nine different categories using unsupervised machine learning algorithms for clustering, implemented in Python. It was found that the behaviour of the modules and the inverter of the installation was influenced by the type of day obtained, such that the latter worked with a better average efficiency on days with higher irradiance and lower fluctuations. However, the modules worked with better average efficiency on days with irradiance fluctuations than on clear sky days. This behaviour of the modules may be due to the presence, on days with passing clouds, of the phenomenon known as cloud enhancement, in which, due to reflections of radiation on the edges of the clouds, irradiance values can be higher at certain moments than those that occur on clear sky days, without passing clouds. This is due to the higher energy generated during these irradiance peaks and to the lower temperatures that the module reaches due to the shaded areas created by the clouds, resulting in a reduction in its temperature losses.

Download Full-text

Unsupervised Machine Learning and Data Mining Procedures Reveal Short Term, Climate Driven Patterns Linking Physico-Chemical Features and Zooplankton Diversity in Small Ponds

Water ◽

10.3390/w13091217 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1217

Author(s):

Nicolò Bellin ◽

Erica Racchetti ◽

Catia Maurone ◽

Marco Bartoli ◽

Valeria Rossi

Keyword(s):

Machine Learning ◽

Fuzzy Sets ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Short Term ◽

Unsupervised Machine Learning ◽

Fuzzy C Means ◽

Air Temperatures ◽

Physico Chemical ◽

Shallow Ponds

Machine Learning (ML) is an increasingly accessible discipline in computer science that develops dynamic algorithms capable of data-driven decisions and whose use in ecology is growing. Fuzzy sets are suitable descriptors of ecological communities as compared to other standard algorithms and allow the description of decisions that include elements of uncertainty and vagueness. However, fuzzy sets are scarcely applied in ecology. In this work, an unsupervised machine learning algorithm, fuzzy c-means and association rules mining were applied to assess the factors influencing the assemblage composition and distribution patterns of 12 zooplankton taxa in 24 shallow ponds in northern Italy. The fuzzy c-means algorithm was implemented to classify the ponds in terms of taxa they support, and to identify the influence of chemical and physical environmental features on the assemblage patterns. Data retrieved during 2014 and 2015 were compared, taking into account that 2014 late spring and summer air temperatures were much lower than historical records, whereas 2015 mean monthly air temperatures were much warmer than historical averages. In both years, fuzzy c-means show a strong clustering of ponds in two groups, contrasting sites characterized by different physico-chemical and biological features. Climatic anomalies, affecting the temperature regime, together with the main water supply to shallow ponds (e.g., surface runoff vs. groundwater) represent disturbance factors producing large interannual differences in the chemistry, biology and short-term dynamic of small aquatic ecosystems. Unsupervised machine learning algorithms and fuzzy sets may help in catching such apparently erratic differences.

Download Full-text

Automatic Classification of Sub-Techniques in Classical Cross-Country Skiing Using a Machine Learning Algorithm on Micro-Sensor Data

Sensors ◽

10.3390/s18010075 ◽

2017 ◽

Vol 18 (2) ◽

pp. 75 ◽

Cited By ~ 10

Author(s):

Ole Rindal ◽

Trine Seeberg ◽

Johannes Tjønnås ◽

Pål Haugnes ◽

Øyvind Sandbakk

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Automatic Classification ◽

Sensor Data ◽

Machine Learning Algorithm ◽

Micro Sensor ◽

Cross Country Skiing ◽

Cross Country ◽

Classical Cross

Download Full-text

Machine Learning Based Track Classification and Estimation using Kalman Filter

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2616.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1700-1704

Keyword(s):

Machine Learning ◽

Kalman Filter ◽

Learning Algorithm ◽

Signal To Noise Ratio ◽

Minimum Mean Square Error ◽

Velocity Model ◽

Supervised Machine Learning ◽

Multiple Target ◽

Low Snr

Classification of target from a mixture of multiple target information is quite challenging. In This paper we have used supervised Machine learning algorithm namely Linear Regression to classify the received data which is a mixture of target-return with the noise and clutter. Target state is estimated from the classified data using Kalman filter. Linear Kalman filter with constant velocity model is used in this paper. Minimum Mean Square Error (MMSE) analysis is used to measure the performance of the estimated track at various Signal to Noise Ratio (SNR) levels. The results state that the error is high for Low SNR, for High SNR the error is Low

Download Full-text

Analyzing the Impact of Climate Factors on GNSS-Derived Displacements by Combining the Extended Helmert Transformation and XGboost Machine Learning Algorithm

Journal of Sensors ◽

10.1155/2021/9926442 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Hanlin Liu ◽

Linqiang Yang ◽

Linchao Li

Keyword(s):

Machine Learning ◽

Puerto Rico ◽

Reference Frame ◽

Learning Algorithm ◽

Virgin Islands ◽

Machine Learning Algorithm ◽

Climate Factors ◽

Helmert Transformation ◽

The Impact

A variety of climate factors influence the precision of the long-term Global Navigation Satellite System (GNSS) monitoring data. To precisely analyze the effect of different climate factors on long-term GNSS monitoring records, this study combines the extended seven-parameter Helmert transformation and a machine learning algorithm named Extreme Gradient boosting (XGboost) to establish a hybrid model. We established a local-scale reference frame called stable Puerto Rico and Virgin Islands reference frame of 2019 (PRVI19) using ten continuously operating long-term GNSS sites located in the rigid portion of the Puerto Rico and Virgin Islands (PRVI) microplate. The stability of PRVI19 is approximately 0.4 mm/year and 0.5 mm/year in the horizontal and vertical directions, respectively. The stable reference frame PRVI19 can avoid the risk of bias due to long-term plate motions when studying localized ground deformation. Furthermore, we applied the XGBoost algorithm to the postprocessed long-term GNSS records and daily climate data to train the model. We quantitatively evaluated the importance of various daily climate factors on the GNSS time series. The results show that wind is the most influential factor with a unit-less index of 0.013. Notably, we used the model with climate and GNSS records to predict the GNSS-derived displacements. The results show that the predicted displacements have a slightly lower root mean square error compared to the fitted results using spline method (prediction: 0.22 versus fitted: 0.31). It indicates that the proposed model considering the climate records has the appropriate predict results for long-term GNSS monitoring.

Download Full-text