Machine learning applied to emerald gemstone grading: framework proposal and creation of a public dataset

Author(s):  
F. B. Pena ◽  
D. Crabi ◽  
S. C. Izidoro ◽  
É. O. Rodrigues ◽  
G. Bernardes
2021 ◽  
Vol 7 (2) ◽  
pp. 164-168
Author(s):  
Cuong Le Dinh Phu ◽  
Dong Wang

Diabetes is a chronic disease whereby blood glucose is not metabolized in the body. Electronic health records (EHRs) (Yadav, P. et al., 2018). for each individual or a population have become important to standing developing trends of diseases. Machine learning helps provide accurate predictions higher than actual assessments. The main problem that we are trying to apply machine learning model and using EHRs that combines the strength of a machine learning model with various features and hyperparameter optimization or tuning. The hyperparameter optimization (Feurer, M., 2019) uses the random search optimization which minimizes a predefined loss function on given independent data. The evaluation on the method comparisons indicated that machine learning models has increased the ratio of metrics compared to previous models (Accuracy, Recall, F1 and AUC score) on the same public dataset that is reprocessed.


Breast cancer in women is one of the most dangerous cancers leading to death in women by developing breast tissue. In this work, the application of the Deep Neural Network (DNN) model is implemented on AWS machine learning platform, besides, a comparison with other ML techniques includes XGBoost and Random Forest on a public dataset. Breast cancer prediction based on DNN model with Hyperparameter tuning has the best results of the plot of model accuracy for the training and validation sets and performance evaluation metrics to test the model.


Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1130
Author(s):  
Jan Vrba ◽  
Matous Cejnek ◽  
Jakub Steinbach ◽  
Zuzana Krbcova

This study proposes a fully automated gearbox fault diagnosis approach that does not require knowledge about the specific gearbox construction and its load. The proposed approach is based on evaluating an adaptive filter’s prediction error. The obtained prediction error’s standard deviation is further processed with a support-vector machine to classify the gearbox’s condition. The proposed method was cross-validated on a public dataset, segmented into 1760 test samples, against two other reference methods. The accuracy achieved by the proposed method was better than the accuracies of the reference methods. The accuracy of the proposed method was on average 9% higher compared to both reference methods for different support vector settings.


Author(s):  
Andrea Tundis ◽  
Leon Böck ◽  
Victoria Stanilescu ◽  
Max Mühlhäuser

Online social networks (OSNs) represent powerful digital tools to communicate and quickly disseminate information in a non-official way. As they are freely accessible and easy to use, criminals abuse of them for achieving their purposes, for example, by spreading propaganda and radicalising people. Unfortunately, due to their vast usage, it is not always trivial to identify criminals using them unlawfully. Machine learning techniques have shown benefits in problem solving belonging to different application domains, when, due to the huge dimension in terms of data and variables to consider, it is not feasible their manual assessment. However, since the OSNs domain is relatively young, a variety of issues related to data availability makes it difficult to apply and immediately benefit from such techniques, in supporting the detection of criminals on OSNs. In this perspective, this paper wants to share the experience conducted in using a public dataset containing information related to criminals in order to both (i) extract specific features and to build a model for the detection of terrorists on Facebook social network, and (ii) to highlight the current limits. The research methodology as well as the gathered results are fully presented and then the data-related issues, emerged from this experience, are discussed. .


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Aurelio Cortese ◽  
Saori C. Tanaka ◽  
Kaoru Amano ◽  
Ai Koizumi ◽  
Hakwan Lau ◽  
...  

AbstractDecoded neurofeedback (DecNef) is a form of closed-loop functional magnetic resonance imaging (fMRI) combined with machine learning approaches, which holds some promises for clinical applications. Yet, currently only a few research groups have had the opportunity to run such experiments; furthermore, there is no existing public dataset for scientists to analyse and investigate some of the factors enabling the manipulation of brain dynamics. We release here the data from published DecNef studies, consisting of 5 separate fMRI datasets, each with multiple sessions recorded per participant. For each participant the data consists of a session that was used in the main experiment to train the machine learning decoder, and several (from 3 to 10) closed-loop fMRI neural reinforcement sessions. The large dataset, currently comprising more than 60 participants, will be useful to the fMRI community at large and to researchers trying to understand the mechanisms underlying non-invasive modulation of brain dynamics. Finally, the data collection size will increase over time as data from newly run DecNef studies will be added.


2021 ◽  
Vol 13 (13) ◽  
pp. 2629
Author(s):  
Iris de Gélis ◽  
Sébastien Lefèvre ◽  
Thomas Corpetti

In the context of rapid urbanization, monitoring the evolution of cities is crucial. To do so, 3D change detection and characterization is of capital importance since, unlike 2D images, 3D data contain vertical information of utmost importance to monitoring city evolution (that occurs along both horizontal and vertical axes). Urban 3D change detection has thus received growing attention, and various methods have been published on the topic. Nevertheless, no quantitative comparison on a public dataset has been reported yet. This study presents an experimental comparison of six methods: three traditional (difference of DSMs, C2C and M3C2), one machine learning with hand-crafted features (a random forest model with a stability feature) and two deep learning (feed-forward and Siamese architectures). In order to compare these methods, we prepared five sub-datasets containing simulated pairs of 3D annotated point clouds with different characteristics: from high to low resolution, with various levels of noise. The methods have been tested on each sub-dataset for binary and multi-class segmentation. For supervised methods, we also assessed the transfer learning capacity and the influence of the training set size. The methods we used provide various kinds of results (2D pixels, 2D patches or 3D points), and each of them is impacted by the resolution of the PCs. However, while the performances of deep learning methods highly depend on the size of the training set, they seem to be less impacted by training on datasets with different characteristics. Oppositely, conventional machine learning methods exhibit stable results, even with smaller training sets, but embed low transfer learning capacities. While the main changes in our datasets were usually identified, there were still numerous instances of false detection, especially in dense urban areas, thereby calling for further development in this field. To assist such developments, we provide a public dataset composed of pairs of point clouds with different qualities together with their change-related annotations. This dataset was built with an original simulation tool which allows one to generate bi-temporal urban point clouds under various conditions.


2020 ◽  
Vol 21 (20) ◽  
pp. 7726
Author(s):  
Minsun Jung ◽  
Insoon Jang ◽  
Kwangsoo Kim ◽  
Kyung Chul Moon

Non-muscle-invasive bladder cancer (NMIBC) consists of transcriptional subtypes that are distinguishable from those of muscle-invasive cancer. We aimed to identify genetic signatures of NMIBC related to basal (K5/6) and luminal (K20) keratin expression. Based on immunohistochemical staining, papillary high-grade NMIBC was classified into K5/6-only (K5/6High-K20Low), K20-only (K5/6Low-K20High), double-high (K5/6High-K20High), and double-low (K5/6Low-K20Low) groups (n = 4 per group). Differentially expressed genes identified between each group using RNA sequencing were subjected to functional enrichment analyses. A public dataset was used for validation. Machine learning algorithms were implemented to predict our samples against UROMOL subtypes. Transcriptional investigation demonstrated that the K20-only group was enriched in the cell cycle, proliferation, and progression gene sets, and this result was also observed in the public dataset. The K5/6-only group was closely regulated by basal-type gene sets and showed activated invasive or adhesive functions. The double-high group was enriched in cell cycle arrest, macromolecule biosynthesis, and FGFR3 signaling. The double-low group moderately expressed genes related to cell cycle and macromolecule biosynthesis. All K20-only group tumors were classified as UROMOL “class 2” by the machine learning algorithms. K5/6 and K20 expression levels indicate the transcriptional subtypes of NMIBC. The K5/6Low-K20High expression is a marker of high-risk NMIBC.


2021 ◽  
Vol 236 ◽  
pp. 04006
Author(s):  
Zhijian Lu ◽  
Gang Liu ◽  
Rongwen Liao

The problem of dataset imbalance has raised a wide concern in many machine learning areas, but not in non-intrusive load monitoring, or load disaggregation. In this study, a pictorial evaluation method is proposed to representation the imbalance class distribution in datasets. We colored a Karnaugh maps according to the quantities of different variables combination to offer a visual impact to the whole dataset. After utilizing this method on a public dataset and its testing result, a clear imbalanced abundance in the dataset and an exciting performance have been found. A preliminary Python package to realize this mapping method has been uploaded on GitHuba.


2021 ◽  
Vol 8 ◽  
Author(s):  
Li Zhang ◽  
Suraj Mishra ◽  
Tianyu Zhang ◽  
Yue Zhang ◽  
Duo Zhang ◽  
...  

Background: Today's machine-learning based dermatologic research has largely focused on pigmented/non-pigmented lesions concerning skin cancers. However, studies on machine-learning-aided diagnosis of depigmented non-melanocytic lesions, which are more difficult to diagnose by unaided eye, are very few.Objective: We aim to assess the performance of deep learning methods for diagnosing vitiligo by deploying Convolutional Neural Networks (CNNs) and comparing their diagnosis accuracy with that of human raters with different levels of experience.Methods: A Chinese in-house dataset (2,876 images) and a world-wide public dataset (1,341 images) containing vitiligo and other depigmented/hypopigmented lesions were constructed. Three CNN models were trained on close-up images in both datasets. The results by the CNNs were compared with those by 14 human raters from four groups: expert raters (>10 years of experience), intermediate raters (5–10 years), dermatology residents, and general practitioners. F1 score, the area under the receiver operating characteristic curve (AUC), specificity, and sensitivity metrics were used to compare the performance of the CNNs with that of the raters.Results: For the in-house dataset, CNNs achieved a comparable F1 score (mean [standard deviation]) with expert raters (0.8864 [0.005] vs. 0.8933 [0.044]) and outperformed intermediate raters (0.7603 [0.029]), dermatology residents (0.6161 [0.068]) and general practitioners (0.4964 [0.139]). For the public dataset, CNNs achieved a higher F1 score (0.9684 [0.005]) compared to the diagnosis of expert raters (0.9221 [0.031]).Conclusion: Properly designed and trained CNNs are able to diagnose vitiligo without the aid of Wood's lamp images and outperform human raters in an experimental setting.


Sign in / Sign up

Export Citation Format

Share Document