boosted decision tree
Recently Published Documents


TOTAL DOCUMENTS

81
(FIVE YEARS 62)

H-INDEX

8
(FIVE YEARS 2)

2022 ◽  
Author(s):  
Marwa Helmy ◽  
Eman Eldaydamony ◽  
Nagham Mekky ◽  
Mohammed Elmogy ◽  
Hassan Soliman

Abstract Identifying genes related to Parkinson's disease (PD) is an active and effective research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. In recent years, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long non-coding (lncRNA) genes, which play an essential role in biological processes and the Transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the UCSC genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using five numerical mapping techniques with Fourier transform and PyFeat method with Adaboost technique as feature selection. Finally, the features were fed to the gradient boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics are used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy (ACC) equals 78.1%, the area under the curve (AUC) equals 84.9%, the area under precision-recall (AUPR) equals 85.0%, F1-score equals 78.2%, Matthews correlation coefficient (MCC) equals 0.564, Sensitivity (SEN) equals 79.1%, and specificity (SPC) equals 77.1%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.


Author(s):  
Ram C. Sharma ◽  
Hidetake Hirayama ◽  
Keitarou Hara

Advanced Land Observing Satellite 3 (ALOS-3) is capable of observing global land areas with wide swath (4000 km along-track direction and 70 km cross-track direction) at high spatial resolution (panchromatic: 0.8m, multispectral: 3.2m). Maintenance and updating of Land Cover and Vegetation (LCV) information at national level is one of the major goals of the ALOS-3 mission. This paper presents the potential of simulated ALOS-3 images for the classification and mapping of LCV types. We simulated WorldView-3 images according to the configuration of the ALOS-3 satellite sensor and the ALOS-3 simulated (ALOS-3S) images were utilized for the classification and mapping of LCV types in two cool temperate ecosystems. This research dealt with classification and mapping of 17 classes in the Hakkoda site and 25 classes in the Zao site. We employed a Gradient Boosted Decision Tree (GBDT) classifier with 10-fold cross-validation method for assessing the potential of ALOS-3S images. In the Hakkoda site, we obtained overall accuracy, 0.811 and kappa coefficient, 0.798. In the Zao site, overall accuracy and kappa coefficient were 0.725 and 0.711 respectively. Regardless of limited temporal scenes available in the research, ALOS-3S images showed high potential (at least 0.711 kappa-coefficient) for the LCV classification. The availability of more temporal scenes from ALOS-3 satellite is expected for improved classification and mapping of LCV types in the future.


2021 ◽  
Vol 2021 (12) ◽  
Author(s):  
Yasmine Amhis ◽  
Marie Hartmann ◽  
Clément Helsens ◽  
Donal Hill ◽  
Olcyr Sumensari

Abstract This paper presents the prospects for a precise measurement of the branching fraction of the leptonic $$ {B}_c^{+} $$ B c + → τ+ντ decay at the Future Circular Collider (FCC-ee) running at the Z -pole. A detailed description of the simulation and analysis framework is provided. To select signal candidates, two Boosted Decision Tree algorithms are employed and optimised. The first stage suppresses inclusive $$ b\overline{b} $$ b b ¯ , $$ c\overline{c} $$ c c ¯ , and $$ q\overline{q} $$ q q ¯ backgrounds using event-based topological information. A second stage utilises the properties of the hadronic τ+→ π+π+π−$$ \overline{\nu} $$ ν ¯ τ decay to further suppress these backgrounds, and is also found to achieve high rejection for the B+→ τ+ντ background. The number of $$ {B}_c^{+} $$ B c + → τ+ντ candidates is estimated for various Tera-Z scenarios, and the potential precision of signal yield and branching fraction measurements evaluated. The phenomenological impact of such measurements on various New Physics scenarios is also explored.


2021 ◽  
Vol 16 (12) ◽  
pp. C12007
Author(s):  
K. Leonard DeHolton

Abstract The DeepCore sub-array within the IceCube Neutrino Observatory is a densely instrumented region of Antarctic ice designed to observe atmospheric neutrino interactions above 5 GeV via Cherenkov radiation. An essential aspect of any neutrino oscillation analysis is the ability to accurately identify the flavor of neutrino events in the detector. This task is particularly difficult at low energies when very little light is deposited in the detector. Here we discuss the use of machine learning to perform event classification at low energies in IceCube using a boosted decision tree (BDT). A BDT is trained using reconstructed quantities to identify track-like events, which result from muon neutrino charged current interactions. This new method improves the accuracy of particle identification compared to traditional classification methods which rely on univariate straight cuts.


2021 ◽  
Vol 1201 (1) ◽  
pp. 012086
Author(s):  
A El-Menshawy ◽  
Z Gul ◽  
I El-Thalji

Abstract Most industrial systems have supervisory control and data acquisition (SCADA) systems that collect and store process parameters. SCADA data is seen as a valuable source to get and extract insights about the asset health condition and associated maintenance operations. It is still unclear how appliable and valid insights SCADA data might provide. The purpose of this paper is to explore the potential benefits of SCADA data for maintenance purposes and discuss the limitations from a machine learning perspective. In this paper, a two-year SCADA data related to a wind turbine generator is extracted and analysed using several machine learning algorithms, i.e., two-class boosted decision tree, two-class decision forest, k-means clustering on Azure ML learning studio. It is concluded that the SCADA data can be useful for failure detection and prediction once rich training data is given. In a failure prediction context, data richness means ensuring that fault features are presented in the training data. Moreover, the logs file can be used as labelled data to supervise some algorithms once they are reported in a more rigorous manner (timing, description).


2021 ◽  
Vol 2021 (11) ◽  
Author(s):  
◽  
A. Tumasyan ◽  
W. Adam ◽  
J. W. Andrejkovic ◽  
T. Bergauer ◽  
...  

Abstract A measurement of the cross section of the associated production of a single top quark and a W boson in final states with a muon or electron and jets in proton-proton collisions at $$ \sqrt{s} $$ s = 13 TeV is presented. The data correspond to an integrated luminosity of 36 fb−1 collected with the CMS detector at the CERN LHC in 2016. A boosted decision tree is used to separate the tW signal from the dominant t$$ \overline{\mathrm{t}} $$ t ¯ background, whilst the subleading W+jets and multijet backgrounds are constrained using data-based estimates. This result is the first observation of the tW process in final states containing a muon or electron and jets, with a significance exceeding 5 standard deviations. The cross section is determined to be 89 ± 4 (stat) ± 12 (syst) pb, consistent with the standard model.


Author(s):  
Nurul Farhana Hamzah ◽  
◽  
Nazri Mohd Nawi ◽  
Abdulkareem A. Hezam ◽  
◽  
...  

Heart failure means that the heart is not pumping well as normal as it should be. A congestive heart failure is a form of heart failure that involves seeking timely medical care, although the two terms are sometimes used interchangeably. Heart failure happens when the heart muscle does not pump blood as well as it can, often referred to as congestive heart failure. Some disorders, such as heart's narrowed arteries (coronary artery disease) or high blood pressure, eventually make the heart too weak or rigid to fill and pump effectively. Early detection of heart failure by using data mining techniques has gained popularity among researchers. This research uses some classification techniques for heart failure classification from medical data. This research analyzed the performance of some classification algorithms, namely Support Vector Machine (SVM), Decision Forest (DF), and Boosted Decision Tree (BDT), to classify accurately heart failure risk data as input. The best algorithm among the three is discovered for heart failure classification at the end of this research.


Author(s):  
Phuong T. Nguyen ◽  
Juri Di Rocco ◽  
Ludovico Iovino ◽  
Davide Di Ruscio ◽  
Alfonso Pierantonio

AbstractModeling is a ubiquitous activity in the process of software development. In recent years, such an activity has reached a high degree of intricacy, guided by the heterogeneity of the components, data sources, and tasks. The democratized use of models has led to the necessity for suitable machinery for mining modeling repositories. Among others, the classification of metamodels into independent categories facilitates personalized searches by boosting the visibility of metamodels. Nevertheless, the manual classification of metamodels is not only a tedious but also an error-prone task. According to our observation, misclassification is the norm which leads to a reduction in reachability as well as reusability of metamodels. Handling such complexity requires suitable tooling to leverage raw data into practical knowledge that can help modelers with their daily tasks. In our previous work, we proposed AURORA as a machine learning classifier for metamodel repositories. In this paper, we present a thorough evaluation of the system by taking into consideration different settings as well as evaluation metrics. More importantly, we improve the original AURORA tool by changing its internal design. Experimental results demonstrate that the proposed amendment is beneficial to the classification of metamodels. We also compared our approach with two baseline algorithms, namely gradient boosted decision tree and support vector machines. Eventually, we see that AURORA outperforms the baselines with respect to various quality metrics.


2021 ◽  
Author(s):  
Michael T.W. McKibben ◽  
Michael S. Barker

Nearly all lineages of land plants have experienced at least one whole genome duplication (WGD) in their history. The legacy of these ancient WGDs is still observable in the diploidized genomes of extant plants. Genes originating from WGD-paleologs-can be maintained in diploidized genomes for millions of years. These paleologs have the potential to shape plant evolution through sub- and neofunctionalization, increased genetic diversity, and reciprocal gene loss among lineages. Current methods for classifying paleologs often rely on only a subset of potential genomic features, have varying levels of accuracy, and often require significant data and/or computational time. Here we developed a supervised machine learning approach to classify paleologs from a target WGD in diploidized genomes across a broad range of different duplication histories. We collected empirical data on syntenic block sizes and other genomic features from 27 plant species each with a different history of paleopolyploidy. Features from these genomes were used to develop simulations of syntenic blocks and paleologs to train a gradient boosted decision tree. Using this approach, Frackify (Fractionation Classify), we were able to accurately identify and classify paleologs across a broad range of parameter space, including cases with multiple overlapping WGDs. We then compared Frackify with other paleolog inference approaches in six species with paleotetraploid and paleohexaploid ancestries. Frackify provides a way to combine multiple genomic features to quickly classify paleologs while providing a high degree of consistency with existing approaches.


Sign in / Sign up

Export Citation Format

Share Document