Uncertain Johnson–Schumacher growth model with imprecise observations and k-fold cross-validation test

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

Leave-$p$ -Out Cross-Validation Test for Uncertain Verhulst-Pearl Model With Imprecise Observations

IEEE Access ◽

10.1109/access.2019.2939386 ◽

2019 ◽

Vol 7 ◽

pp. 131705-131709 ◽

Cited By ~ 7

Author(s):

Shiqin Liu

Keyword(s):

Cross Validation ◽

Validation Test ◽

Imprecise Observations

Download Full-text

RFAmyloid: A Web Server for Predicting Amyloid Proteins

International Journal of Molecular Sciences ◽

10.3390/ijms19072071 ◽

2018 ◽

Vol 19 (7) ◽

pp. 2071 ◽

Cited By ~ 14

Author(s):

Mengting Niu ◽

Yanjuan Li ◽

Chunyu Wang ◽

Ke Han

Keyword(s):

Feature Extraction ◽

Acid Composition ◽

Extraction Method ◽

Cross Validation ◽

Protein Composition ◽

Amyloid Protein ◽

Feature Extraction Method ◽

Validation Test ◽

Fibrous Protein ◽

Fold Cross Validation

Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer’s disease and Creutzfeldt–Jakob’s disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based on random forest to identify amyloid, and it employed SVMProt 188-D feature extraction method based on protein composition and physicochemical properties and pse-in-one feature extraction method based on amino acid composition, autocorrelation pseudo acid composition, profile-based features and predicted structures features. In the ten-fold cross-validation test, RFAmy’s overall accuracy was 89.19% and F-measure was 0.891. Results were obtained by comparison experiments with other feature, classifiers, and existing methods. This shows the effectiveness of RFAmy in predicting amyloid protein. The RFAmy proposed in this paper can be accessed through the URL http://server.malab.cn/RFAmyloid/.

Download Full-text

Cross-Validation for the Uncertain Chapman-Richards Growth Model with Imprecise Observations

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488520500336 ◽

2020 ◽

Vol 28 (05) ◽

pp. 769-783 ◽

Cited By ~ 3

Author(s):

Zhe Liu ◽

Lifen Jia

Keyword(s):

Growth Model ◽

Regression Models ◽

Cross Validation ◽

Growth Models ◽

Growth Curves ◽

Unknown Parameters ◽

Model Selection Method ◽

Least Squares Estimates ◽

Imprecise Observations ◽

Richards Growth Model

Regression analysis estimates the relationships among variables which has been widely used in growth curves, and cross-validation as a model selection method assesses the generalization ability of regression models. Classical methods assume that the observation values of variables are precise numbers while in many cases data are imprecisely collected. So this paper explores the Chapman-Richards growth model which is one of the widely used growth models with imprecise observations under the framework of uncertainty theory. The least squares estimates of unknown parameters in this model are given. Moreover, cross-validation with imprecise observations is proposed. Furthermore, estimates of the expected value and variance of the uncertain error using residuals are given. In addition, ways to predict the value of response variable with new observed values of predictor variables are discussed. Finally, a numerical example illustrates our approach.

Download Full-text

Utilizing Collocated Crop Growth Model Simulations to Train Agronomic Satellite Retrieval Algorithms

Remote Sensing ◽

10.3390/rs10121968 ◽

2018 ◽

Vol 10 (12) ◽

pp. 1968 ◽

Cited By ~ 2

Author(s):

Nathaniel Levitan ◽

Barry Gross

Keyword(s):

Growth Model ◽

Cross Validation ◽

Regional Scale ◽

Ground Truth ◽

Crop Growth ◽

State Variables ◽

Field Scale ◽

Crop Growth Model ◽

Model Simulations ◽

Fold Cross Validation

Due to its worldwide coverage and high revisit time, satellite-based remote sensing provides the ability to monitor in-season crop state variables and yields globally. In this study, we presented a novel approach to training agronomic satellite retrieval algorithms by utilizing collocated crop growth model simulations and solar-reflective satellite measurements. Specifically, we showed that bidirectional long short-term memory networks (BLSTMs) can be trained to predict the in-season state variables and yields of Agricultural Production Systems sIMulator (APSIM) maize crop growth model simulations from collocated Moderate Resolution Imaging Spectroradiometer (MODIS) 500-m satellite measurements over the United States Corn Belt at a regional scale. We evaluated the performance of the BLSTMs through both k-fold cross validation and comparison to regional scale ground-truth yields and phenology. Using k-fold cross validation, we showed that three distinct in-season maize state variables (leaf area index, aboveground biomass, and specific leaf area) can be retrieved with cross-validated R2 values ranging from 0.4 to 0.8 for significant portions of the season. Several other plant, soil, and phenological in-season state variables were also evaluated in the study for their retrievability via k-fold cross validation. In addition, by comparing to survey-based United State Department of Agriculture (USDA) ground truth data, we showed that the BLSTMs are able to predict actual county-level yields with R2 values between 0.45 and 0.6 and actual state-level phenological dates (emergence, silking, and maturity) with R2 values between 0.75 and 0.85. We believe that a potential application of this methodology is to develop satellite products to monitor in-season field-scale crop growth on a global scale by reproducing the methodology with field-scale crop growth model simulations (utilizing farmer-recorded field-scale agromanagement data) and collocated high-resolution satellite data (fused with moderate-resolution satellite data).

Download Full-text

Prediction Outcome for Massive Multiplayer Online Games Using Data Mining

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v11.i1.pp248-255 ◽

2018 ◽

Vol 11 (1) ◽

pp. 248 ◽

Cited By ~ 1

Author(s):

Shazwani Samsurim ◽

Nor Ashikin Mohamad Kamal ◽

Marina Ismail ◽

Norizan Mat Diah

Keyword(s):

Cross Validation ◽

Online Games ◽

Classification Algorithms ◽

Multiplayer Online Games ◽

Validation Test ◽

Prediction Outcome ◽

Proposed Model ◽

Using Data ◽

The Right ◽

Fold Cross Validation

Massive Multiplayer Online (MMO) game is one of the famous game genres among teenagers nowadays. MMO games allow gamers to interact and play with up to thousand players. Rainbow Six Siege (RSS) belongs to MMO type of game. However, due to many operators that are available in this game, the player needs to choose the right operator to counter the enemy operator. Therefore, based on the characteristic of the selected operator, this paper attempted to predict the outcomes of the game. In our prediction model, characteristics for these operators were extracted from 120 live stream replays. Three classification algorithms were utilized to predict the outcome of the game. Among these algorithms, IBK had obtained outstanding performance in the dataset. The accuracy of the model is 93.75%, applying 5-fold cross-validation test. The success rate reveals that our proposed model is suitable to predict the outcome of the game.

Download Full-text

PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations

International Journal of Molecular Sciences ◽

10.3390/ijms22042120 ◽

2021 ◽

Vol 22 (4) ◽

pp. 2120

Author(s):

Firda Nurul Auliah ◽

Andi Nur Nilamyani ◽

Watshara Shoombuatong ◽

Md Ashad Alam ◽

Md Mehedi Hasan ◽

...

Keyword(s):

Cross Validation ◽

Experimental Methods ◽

Computational Algorithms ◽

Post Translational Modification ◽

Multiple Sequence ◽

Final Model ◽

Validation Test ◽

Feature Encoding ◽

Independent Test ◽

Fold Cross Validation

Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Klasifikasi Berita Kriminal Menggunakan NaÃ¯ve Bayes Classifier (NBC) dengan Pengujian K-Fold Cross Validation

Jurnal Sains dan Informatika ◽

10.34128/jsi.v5i2.177 ◽

2019 ◽

Vol 5 (2) ◽

pp. 108-117

Author(s):

Herfia Rhomadhona ◽

Jaka Permadi

Keyword(s):

Cross Validation ◽

Online Media ◽

Bayes Classifier ◽

Ve Bayes ◽

Fold Cross Validation

Berita kriminalitas merupakan berita yang selalu menjadi trending topik di setiap media massa, khususnya media massa online. Media massa online terlah menyediakan beberapa fasilitas untuk mempermudah masyarakan dalam mencari sebuah berita berdasarkan topik. Media massa online melabeli suatu berita berdasarkan kategorinya. Namun, media massa online tidak memberikan sub kategori pada berita tersebut. Sebagai contoh jika seorang pengguna membuka kategori kriminal, maka yang ditampilkan adalah semua jenis berita kriminal tanpa memberikan informasi yang spesifik dari jenis kriminalitasnya. Permasalahan tersebut dapat diatasi dengan mengklasifikasikan berita kriminalitas berdasarkan subkategori. Penelitian ini menggunakan metode NaÃ¯ve Bayes Classifier (NBC) untuk mengklasifikasi berita berdasarkan sub kategorinya. Adapun subkategori terbagi kedalam 5 kategori yaitu korupsi, narkoba, pencurian, pemerkosaan dan pembunuhan. Penelitian ini bertujuan untuk mengetahui kemampuan NBC dalam mengklasifikasi berita dengan melakukan pengujian menggunakan teknik K-Fold Cross Validation dengan nilai K dari 3 sampai 10. Hasil pengujian menyatakan bahwa NBC memiliki kemampuan dalam klasifikasi berita kriminal dengan nilai precision sebesar 98,53 %, nilai recall sebesar 98,44 % dan nilai accuracy sebesar 99,38 %.

Download Full-text