Selection of breast features for young women in northwestern China based on the random forest algorithm

2021 ◽  
pp. 004051752110408
Author(s):  
Jie Zhou ◽  
Qian Mao ◽  
Jun Zhang ◽  
Newman ML Lau ◽  
Jianming Chen

In the research of breast morphology, numerous breast features are measured, whereas only a few parameters are adopted for classification. Therefore, how to extract the key variables from the multi-dimensional features in a rational way is an issue that is focused upon. This study aimed to reduce the complexity of the dimensionality reduction for further improving the objectivity and interpretability of the selected breast features. Since the random forest (RF) algorithm can quantify the feature importance during training, the method was adopted to determine the optimal breast features for classification and recognition in this paper. Firstly, the anthropometric data of 360 females from northwestern China aged from 19 to 27 years were measured by non-contact three-dimensional body scanning technology and the contact manual measurement method. Then, the k-means clustering was applied to categorize breast shapes, and the RF algorithm was utilized to quantify and rank the importance of 25 breast features. Finally, to verify the availability of the RF algorithm on breast feature selection, the t-distributed stochastic neighbor embedding method was adopted to visualize the distribution of breast shape clusters into two dimensions. Meanwhile, four neural networks were determined to recognize the breast morphology. The results demonstrate that fewer breast features can effectively increase the accuracy of breast shape classification and recognition. The best performance of breast shape classification and recognition is obtained when the number of breast features is 13. In this case, the average Hamming loss of four neural networks is the smallest (0.1136). Interestingly, the bust circumference and the horizontal curve of breasts across the bust points are found to be the most important of the 25 breast features in this paper. The importance of the breast curve features is higher than that of the breast cross-sectional features, while the breast positioning features have the lowest importance. Meanwhile, the RF algorithm is verified to be more effective than traditional dimensionality reduction methods, such as principal component analysis, hierarchical clustering, and recursive feature elimination. The approach developed in this paper can be generalized to the dimensionality reduction of other body morphology.

Author(s):  
Erika Viktória Miszory ◽  
Melinda Járomi ◽  
Annamária Pakai

Abstract Aim The number of Hungarian polio patients can be estimated at approximately 3000. Polio infection is currently affecting people 56–65 years of age. The aim of the study was to reveal the quality of life of patients living with polio virus in Hungary. Subject and methods The quantitative cross-sectional study was conducted in January–April 2017 among polyomyelitis patients living in Hungary. In the non-random, targeted, expert sample selection, the target group was composed of patients infected with poliovirus (N = 268). We have excluded those who refused to sign the consent statement. Our data collection method was an SF-36 questionnaire. Using the IBM SPSS Statistics Version 22 program, descriptive and mathematical statistics (χ2-test) were calculated (p < 0.05). Results The mean age of the members of the examined population is 63.5 years; 68.1% were women and 31.90% were men. The majority of the respondents were infected by the polyovirus in 1956 (11.9%), 1957 (24.3%), and 1959 (19.5%). Polio patients, with the exception of two dimensions (mental health, social operation), on the scale of 100 do not reach the “average” quality of life (physical functioning 23 points, functional role 36 points, emotional role 47 points, body pain 48 points, general health 42 points, vitality 50 points, health change 31 points). Conclusion The quality of life of polio patients is far below the dimensions of physical function, while the difference in mental health compared to healthy people is minimal. It would be important to educate health professionals about the existing disease, to develop an effective rehabilitation method.


2021 ◽  
Vol 22 (5) ◽  
pp. 2704
Author(s):  
Andi Nur Nilamyani ◽  
Firda Nurul Auliah ◽  
Mohammad Ali Moni ◽  
Watshara Shoombuatong ◽  
Md Mehedi Hasan ◽  
...  

Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.


2020 ◽  
Vol 2 (1) ◽  
pp. 23-36
Author(s):  
Syed Aamir Ali Shah ◽  
Muhammad Asif Manzoor ◽  
Abdul Bais

Forest structure estimation is very important in geological, ecological and environmental studies. It provides the basis for the carbon stock estimation and effective means of sequestration of carbon sources and sinks. Multiple parameters are used to estimate the forest structure like above ground biomass, leaf area index and diameter at breast height. Among all these parameters, vegetation height has unique standing. In addition to forest structure estimation it provides the insight into long term historical changes and the estimates of stand age of the forests as well. There are multiple techniques available to estimate the canopy height. Light detection and ranging (LiDAR) based methods, being the accurate and useful ones, are very expensive to obtain and have no global coverage. There is a need to establish a mechanism to estimate the canopy height using freely available satellite imagery like Landsat images. Multiple studies are available which contribute in this area. The majority use Landsat images with random forest models. Although random forest based models are widely used in remote sensing applications, they lack the ability to utilize the spatial association of neighboring pixels in modeling process. In this research work, we define Convolutional Neural Network based model and analyze that model for three test configurations. We replicate the random forest based setup of Grant et al., which is a similar state-of-the-art study, and compare our results and show that the convolutional neural networks (CNN) based models not only capture the spatial association of neighboring pixels but also outperform the state-of-the-art.


2021 ◽  
Vol 11 (3) ◽  
pp. 1013
Author(s):  
Zvezdan Lončarević ◽  
Rok Pahič ◽  
Aleš Ude ◽  
Andrej Gams

Autonomous robot learning in unstructured environments often faces the problem that the dimensionality of the search space is too large for practical applications. Dimensionality reduction techniques have been developed to address this problem and describe motor skills in low-dimensional latent spaces. Most of these techniques require the availability of a sufficiently large database of example task executions to compute the latent space. However, the generation of many example task executions on a real robot is tedious, and prone to errors and equipment failures. The main result of this paper is a new approach for efficient database gathering by performing a small number of task executions with a real robot and applying statistical generalization, e.g., Gaussian process regression, to generate more data. We have shown in our experiments that the data generated this way can be used for dimensionality reduction with autoencoder neural networks. The resulting latent spaces can be exploited to implement robot learning more efficiently. The proposed approach has been evaluated on the problem of robotic throwing at a target. Simulation and real-world results with a humanoid robot TALOS are provided. They confirm the effectiveness of generalization-based database acquisition and the efficiency of learning in a low-dimensional latent space.


2018 ◽  
Author(s):  
Νικόλαος Πασσαλής

Οι πρόσφατες εξελίξεις στον τομέα της Βαθιάς Μάθησης (Deep Learning) παρείχαν ισχυρά εργαλεία ανάλυσης δεδομένων. Παρόλα αυτά, η μεγάλη υπολογιστική πολυπλοκότητα των μεθόδων Βαθιάς Μάθησης περιορίζει σημαντικά τη δυνατότητα εφαρμογής τους, ειδικά όταν οι διαθέσιμοι υπολογιστικοί πόροι είναι περιορισμένοι. Επιπλέον, η ευελιξία πολλών μεθόδων βαθιάς μάθησης περιορίζεται σημαντικά από την αδυναμία τους να συνδυαστούν αποτελεσματικά με κλασικές μεθόδους Μηχανικής Μάθησης. Η κύρια στόχευση της παρούσας διδακτορικής διατριβής είναι η ανάπτυξη μεθόδων Βαθιάς Μάθησης οι οποίες θα μπορούν να χρησιμοποιηθούν αποτελεσματικά για την επίλυση διαφόρων προβλημάτων ανάλυσης δεδομένων (κατηγοριοποίηση, ομαδοποίηση, παλινδρόμηση, κτλ.) με τη χρήση διαφορετικών δεδομένων (εικόνα, βίντεο, κείμενο, χρονοσειρές), ενώ ταυτόχρονα αντιμετωπίζουν αποτελεσματικά τα παραπάνω προβλήματα. Για τον σκοπό αυτό, πρώτα αναπτύχθηκε μία νευρωνική επέκταση του μοντέλου του Σάκου Χαρακτηριστικών (Bag-of-Features), η οποία συνδυάστηκε με πολλούς διαφορετικούς εξαγωγείς χαρακτηριστικών (feature extractors), συμπεριλαμβανομένων Βαθιών Συνελικτικών Νευρωνικών Δικτύων (Deep Convolutional Neural Networks). Αυτό επέτρεψε τη σημαντική αύξηση και της ακρίβειας των δικτύων, όσο και της αντοχής τους σε μεταβολές στην κατανομή εισόδου, καθώς και τη μείωση του πλήθους των παραμέτρων που απαιτούνται σε σύγκριση με ανταγωνιστικές μεθόδους. Στη συνέχεια, προτάθηκε μία μέθοδος μάθησης αναπαραστάσεων η οποία είναι ικανή να παράγει αναπαραστάσεις προσαρμοσμένες για το πρόβλημα της ανάκτησης πληροφορίας, αυξάνοντας σημαντικά την επίδοση των αναπαραστάσεων στα αντίστοιχα προβλήματα. Έπειτα, προτάθηκε μία ευέλικτη και αποδοτική μέθοδος μεταφοράς γνώσης (knowledge transfer), η οποία είναι σε θέση να ‘‘αποστάξει’’ τη γνώση από ένα μεγάλο και περίπλοκο νευρωνικό δίκτυο σε ένα γρηγορότερο και μικρότερο. Η αποτελεσματικότητα της προτεινόμενης μεθόδου διαπιστώθηκε με τη χρήση πολλών διαφορετικών πρωτοκόλλων αξιολόγησης. Επίσης, διαπιστώθηκε ότι το πρόβλημα μείωσης διάστασης (dimensionality reduction) μπορεί να εκφραστεί ως ένα πρόβλημα μεταφοράς γνώσης από μία κατάλληλα ορισμένη Συνάρτηση Πυκνότητας Πιθανότητας (Probability Density Function, PDF) σε ένα μοντέλο Μηχανικής Μάθησης με τη χρήση της μεθόδου που περιεγράφηκε προηγουμένως. Έτσι είναι εφικτό να οριστεί ένα γενικό πλαίσιο (framework) μείωσης διάστασης, το οποίο επίσης συνδυάστηκε με μοντέλα Βαθιάς Μάθησης, ώστε να εξάγει αναπαραστάσεις βελτιστοποιημένες για προβλήματα ομαδοποίησης. Τέλος, αναπτύχθηκε μία βιβλιοθήκη ανοικτού κώδικα η οποία υλοποιεί την παραπάνω μέθοδο μείωσης διάστασης, καθώς και μία μέθοδο σταθεροποίησης της σύγκλισης στοχαστικών τεχνικών βελτιστοποίησης αρχιτεκτονικών Βαθιάς Μάθησης.


Sign in / Sign up

Export Citation Format

Share Document