scholarly journals XGB4mcPred: Identification of DNA N4-Methylcytosine Sites in Multiple Species Based on an eXtreme Gradient Boosting Algorithm and DNA Sequence Information

Algorithms ◽  
2021 ◽  
Vol 14 (10) ◽  
pp. 283
Author(s):  
Xiao Wang ◽  
Xi Lin ◽  
Rong Wang ◽  
Kai-Qi Fan ◽  
Li-Jun Han ◽  
...  

DNA N4-methylcytosine(4mC) plays an important role in numerous biological functions and is a mechanism of particular epigenetic importance. Therefore, accurate identification of the 4mC sites in DNA sequences is necessary to understand the functional mechanism. Although some effective calculation tools have been proposed to identifying DNA 4mC sites, it is still challenging to improve identification accuracy and generalization ability. Therefore, there is a great need to build a computational tool to accurately identify the position of DNA 4mC sites. Hence, this study proposed a novel predictor XGB4mcPred, a predictor for the identification of 4mC sites trained using an extreme gradient boosting algorithm (XGBoost) and DNA sequence information. Firstly, we used the One-Hot encoding on adjacent and spaced nucleotides, dinucleotides, and trinucleotides of the original 4mC site sequences as feature vectors. Then, the importance values of the feature vectors pre-trained by the XGBoost algorithm were used as a threshold to filter redundant features, resulting in a significant improvement in the identification accuracy of the constructed XGB4mcPred predictor to identify 4mC sites. The analysis shows that there is a clear preference for nucleotide sequences between 4mC sites and non-4mC site sequences in six datasets from multiple species, and the optimized features can better distinguish 4mC sites from non-4mC sites. The experimental results of cross-validation and independent tests from six different species show that our proposed predictor XGB4mcPred significantly outperformed other state-of-the-art predictors and was improved to varying degrees compared with other state-of-the-art predictors. Additionally, the user-friendly webserver we used to developed the XGB4mcPred predictor was made freely accessible.

2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Kai Gu ◽  
Jianqi Wang ◽  
Hong Qian ◽  
Xiaoyan Su

On basis of fault categories detection, the diagnosis of rotor fault causes is proposed, which has great contributions to the field of intelligent operation and maintenance. To improve the diagnostic accuracy and practical efficiency, a hybrid model based on the particle swarm optimization-extreme gradient boosting algorithm, namely, PSO-XGBoost is designed. XGBoost is used as a classifier to diagnose rotor fault causes, having good performance due to the second-order Taylor expansion and the explicit regularization term. PSO is used to automatically optimize the process of adjusting the XGBoost’s parameters, which overcomes the shortcomings when using the empirical method or the trial-and-error method to adjust parameters of the XGBoost model. The hybrid model combines the advantages of the two algorithms and can diagnose nine rotor fault causes accurately. Following diagnostic results, maintenance measures referring to the corresponding knowledge base are provided intelligently. Finally, the proposed PSO-XGBoost model is compared with five state-of-the-art intelligent classification methods. The experimental results demonstrate that the proposed method has higher diagnostic accuracy and practical efficiency in diagnosing rotor fault causes.


Author(s):  
Hai Tao ◽  
Maria Habib ◽  
Ibrahim Aljarah ◽  
Hossam Faris ◽  
Haitham Abdulmohsin Afan ◽  
...  

2021 ◽  
Author(s):  
Leila Zahedi ◽  
Farid Ghareh Mohammadi ◽  
M. Hadi Amini

Machine learning techniques lend themselves as promising decision-making and analytic tools in a wide range of applications. Different ML algorithms have various hyper-parameters. In order to tailor an ML model towards a specific application, a large number of hyper-parameters should be tuned. Tuning the hyper-parameters directly affects the performance (accuracy and run-time). However, for large-scale search spaces, efficiently exploring the ample number of combinations of hyper-parameters is computationally challenging. Existing automated hyper-parameter tuning techniques suffer from high time complexity. In this paper, we propose HyP-ABC, an automatic innovative hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach, to measure the classification accuracy of three ML algorithms, namely random forest, extreme gradient boosting, and support vector machine. Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned, making it worthwhile for real-world hyper-parameter optimization problems. We further compare our proposed HyP-ABC algorithm with state-of-the-art techniques. In order to ensure the robustness of the proposed method, the algorithm takes a wide range of feasible hyper-parameter values, and is tested using a real-world educational dataset.


Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1565
Author(s):  
Muhammad Aminu Lawal ◽  
Riaz Ahmed Shaikh ◽  
Syed Raheel Hassan

The advancement in IoT has prompted its application in areas such as smart homes, smart cities, etc., and this has aided its exponential growth. However, alongside this development, IoT networks are experiencing a rise in security challenges such as botnet attacks, which often appear as network anomalies. Similarly, providing security solutions has been challenging due to the low resources that characterize the devices in IoT networks. To overcome these challenges, the fog computing paradigm has provided an enabling environment that offers additional resources for deploying security solutions such as anomaly mitigation schemes. In this paper, we propose a hybrid anomaly mitigation framework for IoT using fog computing to ensure faster and accurate anomaly detection. The framework employs signature- and anomaly-based detection methodologies for its two modules, respectively. The signature-based module utilizes a database of attack sources (blacklisted IP addresses) to ensure faster detection when attacks are executed from the blacklisted IP address, while the anomaly-based module uses an extreme gradient boosting algorithm for accurate classification of network traffic flow into normal or abnormal. We evaluated the performance of both modules using an IoT-based dataset in terms response time for the signature-based module and accuracy in binary and multiclass classification for the anomaly-based module. The results show that the signature-based module achieves a fast attack detection of at least six times faster than the anomaly-based module in each number of instances evaluated. The anomaly-based module using the XGBoost classifier detects attacks with an accuracy of 99% and at least 97% for average recall, average precision, and average F1 score for binary and multiclass classification. Additionally, it recorded 0.05 in terms of false-positive rates.


2020 ◽  
Vol 11 ◽  
Author(s):  
Tianhang Chen ◽  
Xiangeng Wang ◽  
Yanyi Chu ◽  
Yanjing Wang ◽  
Mingming Jiang ◽  
...  

Author(s):  
Marco Febriadi Kokasih ◽  
Adi Suryaputra Paramita

Online marketplace in the field of property renting like Airbnb is growing. Many property owners have begun renting out their properties to fulfil this demand. Determining a fair price for both property owners and tourists is a challenge. Therefore, this study aims to create a software that can create a prediction model for property rent price. Variable that will be used for this study is listing feature, neighbourhood, review, date and host information. Prediction model is created based on the dataset given by the user and processed with Extreme Gradient Boosting algorithm which then will be stored in the system. The result of this study is expected to create prediction models for property rent price for property owners and tourists consideration when considering to rent a property. In conclusion, Extreme Gradient Boosting algorithm is able to create property rental price prediction with the average of RMSE of 10.86 or 13.30%.


2021 ◽  
Vol 12 ◽  
Author(s):  
Kun Niu ◽  
Ximei Luo ◽  
Shumei Zhang ◽  
Zhixia Teng ◽  
Tianjiao Zhang ◽  
...  

Enhancers are regulatory DNA sequences that could be bound by specific proteins named transcription factors (TFs). The interactions between enhancers and TFs regulate specific genes by increasing the target gene expression. Therefore, enhancer identification and classification have been a critical issue in the enhancer field. Unfortunately, so far there has been a lack of suitable methods to identify enhancers. Previous research has mainly focused on the features of the enhancer’s function and interactions, which ignores the sequence information. As we know, the recurrent neural network (RNN) and long short-term memory (LSTM) models are currently the most common methods for processing time series data. LSTM is more suitable than RNN to address the DNA sequence. In this paper, we take the advantages of LSTM to build a method named iEnhancer-EBLSTM to identify enhancers. iEnhancer-ensembles of bidirectional LSTM (EBLSTM) consists of two steps. In the first step, we extract subsequences by sliding a 3-mer window along the DNA sequence as features. Second, EBLSTM model is used to identify enhancers from the candidate input sequences. We use the dataset from the study of Quang H et al. as the benchmarks. The experimental results from the datasets demonstrate the efficiency of our proposed model.


2021 ◽  
Vol 25 (Spec. issue 1) ◽  
pp. 1-7
Author(s):  
Ahmet Yurttakal

The thermal conductivity estimation for the soil is an important step for many geothermal applications. But it is a difficult and complicated process since it involves a variety of factors that have significant effects on the thermal conductivity of soils such as soil moisture and granular structure. In this study, regression was performed with the extreme gradient boosting algorithm to develop a model for estimating thermal conductivity value. The performance of the model was measured on the unseen test data. As a result, the proposed algorithm reached 0.18 RMSE, 0.99 R2, and 3.18% MAE values which state that the algorithm is encouraging.


Sign in / Sign up

Export Citation Format

Share Document