A Method for Detecting Harmful Entries on Informal School Websites Using Morphosemantic Patterns

This paper presents a novel method of analyzing morphosemantic patterns in language to the detect cyberbullying, or frequently appearing harmful messages and entries that aim to humiliate other users. The morphosemantic patterns represent a novel concept, with the assumption that analyzed elements can be perceived as a combination of morphological information, such as parts of speech, and semantic information, such as semantic roles, categories, etc. The patterns are further automatically extracted from the data containing harmful entries (cyberbullying) and non-harmful entries found on the informal websites of Japanese high schools. These website data were prepared and standardized by the Human Rights Center in Mie Prefecture, Japan. The patterns extracted in this way are further applied to a document classification task using the provided data in 10-fold cross-validation. The results indicate that morphosemantic sentence representation can be considered useful in the task of detecting the deceptive and provocative language used in cyberbullying.

Download Full-text

A Novel Method for Gender and Age Detection Based on EEG Brain Signals

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/10 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Haitham Issa ◽

Sali Issa ◽

Wahab Shah

Keyword(s):

Cross Validation ◽

Image Feature ◽

Emotional States ◽

Time Frequency ◽

Brain Signals ◽

Average Accuracy ◽

Gender And Age ◽

Novel Method ◽

Fold Cross Validation ◽

Validation Strategy

This paper presents a new gender and age classification system based on Electroencephalography (EEG) brain signals. First, Continuous Wavelet Transform (CWT) technique is used to get the time-frequency information of only one EEG electrode for eight distinct emotional states instead of the ordinary neutral or relax states. Then, sequential steps are implemented to extract the improved grayscale image feature. For system evaluation, a three-fold-cross validation strategy is applied to construct four different classifiers. The experimental test shows that the proposed extracted feature with Convolutional Neural Network (CNN) classifier improves the performance of both gender and age classification, and achieves an average accuracy of 96.3% and 89% for gender and age classification, respectively. Moreover, the ability to predict human gender and age during the mood of different emotional states is practically approved.

Download Full-text

A two-step discriminated method to identify thermophilic proteins

International Journal of Biomathematics ◽

10.1142/s1793524517500504 ◽

2017 ◽

Vol 10 (04) ◽

pp. 1750050 ◽

Cited By ~ 33

Author(s):

Hua Tang ◽

Ren-Zhi Cao ◽

Wen Wang ◽

Tie-Shan Liu ◽

Li-Ming Wang ◽

...

Keyword(s):

Protein Engineering ◽

Chemical Reaction ◽

Cross Validation ◽

Promising Method ◽

Enzyme Design ◽

Relevant Field ◽

Novel Method ◽

Fold Cross Validation

Improving thermostability of an enzyme can accelerate the relevant chemical reaction. Thus, the analysis and prediction of thermophilic proteins are conducive to protein engineering and enzyme design. In this study, a novel method based on two-step discrimination was proposed to distinguish between thermophilic and non-thermophilic proteins. The model was rigorously benchmarked on an objective dataset including 915 thermophilic proteins and 793 non-thermophilic proteins. Results showed that the overall accuracy of our method is 94.44% in 5-fold cross-validation, which is higher than those of other published methods. We believe that the two-step discriminated strategy will become a promising method in the relevant field of protein bioinformatics.

Download Full-text

Using BERT to identify drug-target interactions from whole PubMed

10.21203/rs.3.rs-1015236/v1 ◽

2021 ◽

Author(s):

Jehad Aldahdooh ◽

Markus Vähä-Koskela ◽

Jing Tang ◽

Ziaurrehman Tanoli

Keyword(s):

Drug Target ◽

Cross Validation ◽

Drug Repurposing ◽

Experimental Information ◽

Future Studies ◽

Drug Mechanism ◽

Assay Format ◽

Large Databases ◽

Novel Method ◽

Fold Cross Validation

Abstract Background: Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and are manually curated by large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the number of articles providing this data (~0.1 million) likely constitutes only a fraction of all articles on PubMed that contain experimentally determined DTIs. Finding such articles and extracting the experimental information is a challenging task, and there is a pressing need for systematic approaches to assist the curation of DTIs. To this end, we propose Bidirectional Encoder Representations from Transformers (BERT) to identify such articles. Because DTI data intimately depends on the type of assays used to generate it, we also aimed to incorporate functions to predict the assay format. Results: Our novel method identified ~2.1 million articles (along with drug and protein information) that are not previously included in public DTI databases. Using 10-fold cross-validation, we obtained ~99% accuracy for identifying articles containing quantitative drug-target profiles. The accuracy for the prediction of assay format is ~90%, which leaves room for improvement in future studies. Conclusion: The BERT model in this study is robust and the proposed pipeline can be used to identify previously overlooked articles containing quantitative DTIs. Overall, our method provides a significant advancement in machine-assisted DTI extraction and curation. We expect it to be a useful addition to drug mechanism discovery and repurposing.

Download Full-text

Prediction of miRNA-Disease Association Using Deep Collaborative Filtering

BioMed Research International ◽

10.1155/2021/6652948 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Li Wang ◽

Cheng Zhong

Keyword(s):

Collaborative Filtering ◽

Cross Validation ◽

Kidney Neoplasms ◽

Feature Vector ◽

High Failure Rate ◽

Experimental Identification ◽

Disease Similarity ◽

Disease Associations ◽

Novel Method ◽

Fold Cross Validation

The existing studies have shown that miRNAs are related to human diseases by regulating gene expression. Identifying miRNA association with diseases will contribute to diagnosis, treatment, and prognosis of diseases. The experimental identification of miRNA-disease associations is time-consuming, tremendously expensive, and of high-failure rate. In recent years, many researchers predicted potential associations between miRNAs and diseases by computational approaches. In this paper, we proposed a novel method using deep collaborative filtering called DCFMDA to predict miRNA-disease potential associations. To improve prediction performance, we integrated neural network matrix factorization (NNMF) and multilayer perceptron (MLP) in a deep collaborative filtering framework. We utilized known miRNA-disease associations to capture miRNA-disease interaction features by NNMF and utilized miRNA similarity and disease similarity to extract miRNA feature vector and disease feature vector, respectively, by MLP. At last, we merged outputs of the NNMF and MLP to obtain the prediction matrix. The experimental results indicate that compared with other existing computational methods, our method can achieve the AUC of 0.9466 based on 10-fold cross-validation. In addition, case studies show that the DCFMDA can effectively predict candidate miRNAs for breast neoplasms, colon neoplasms, kidney neoplasms, leukemia, and lymphoma.

Download Full-text

ADVIAN: Alzheimer's Disease VGG-Inspired Attention Network Based on Convolutional Block Attention Module and Multiple Way Data Augmentation

Frontiers in Aging Neuroscience ◽

10.3389/fnagi.2021.687456 ◽

2021 ◽

Vol 13 ◽

Author(s):

Shui-Hua Wang ◽

Qinghua Zhou ◽

Ming Yang ◽

Yu-Dong Zhang

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Cross Validation ◽

Data Augmentation ◽

State Of The Art ◽

Attention Network ◽

Backbone Network ◽

Novel Method ◽

Precision And Accuracy ◽

Fold Cross Validation

Aim: Alzheimer's disease is a neurodegenerative disease that causes 60–70% of all cases of dementia. This study is to provide a novel method that can identify AD more accurately.Methods: We first propose a VGG-inspired network (VIN) as the backbone network and investigate the use of attention mechanisms. We proposed an Alzheimer's Disease VGG-Inspired Attention Network (ADVIAN), where we integrate convolutional block attention modules on a VIN backbone. Also, 18-way data augmentation is proposed to avoid overfitting. Ten runs of 10-fold cross-validation are carried out to report the unbiased performance.Results: The sensitivity and specificity reach 97.65 ± 1.36 and 97.86 ± 1.55, respectively. Its precision and accuracy are 97.87 ± 1.53 and 97.76 ± 1.13, respectively. The F1 score, MCC, and FMI are obtained as 97.75 ± 1.13, 95.53 ± 2.27, and 97.76 ± 1.13, respectively. The AUC is 0.9852.Conclusion: The proposed ADVIAN gives better results than 11 state-of-the-art methods. Besides, experimental results demonstrate the effectiveness of 18-way data augmentation.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Klasifikasi Berita Kriminal Menggunakan NaÃ¯ve Bayes Classifier (NBC) dengan Pengujian K-Fold Cross Validation

Jurnal Sains dan Informatika ◽

10.34128/jsi.v5i2.177 ◽

2019 ◽

Vol 5 (2) ◽

pp. 108-117

Author(s):

Herfia Rhomadhona ◽

Jaka Permadi

Keyword(s):

Cross Validation ◽

Online Media ◽

Bayes Classifier ◽

Ve Bayes ◽

Fold Cross Validation

Berita kriminalitas merupakan berita yang selalu menjadi trending topik di setiap media massa, khususnya media massa online. Media massa online terlah menyediakan beberapa fasilitas untuk mempermudah masyarakan dalam mencari sebuah berita berdasarkan topik. Media massa online melabeli suatu berita berdasarkan kategorinya. Namun, media massa online tidak memberikan sub kategori pada berita tersebut. Sebagai contoh jika seorang pengguna membuka kategori kriminal, maka yang ditampilkan adalah semua jenis berita kriminal tanpa memberikan informasi yang spesifik dari jenis kriminalitasnya. Permasalahan tersebut dapat diatasi dengan mengklasifikasikan berita kriminalitas berdasarkan subkategori. Penelitian ini menggunakan metode NaÃ¯ve Bayes Classifier (NBC) untuk mengklasifikasi berita berdasarkan sub kategorinya. Adapun subkategori terbagi kedalam 5 kategori yaitu korupsi, narkoba, pencurian, pemerkosaan dan pembunuhan. Penelitian ini bertujuan untuk mengetahui kemampuan NBC dalam mengklasifikasi berita dengan melakukan pengujian menggunakan teknik K-Fold Cross Validation dengan nilai K dari 3 sampai 10. Hasil pengujian menyatakan bahwa NBC memiliki kemampuan dalam klasifikasi berita kriminal dengan nilai precision sebesar 98,53 %, nilai recall sebesar 98,44 % dan nilai accuracy sebesar 99,38 %.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

Environmental Sensitivity and Awareness as Differentiating Factors in the Purchase Decision-Making Process in the Smartphone Industry—Case of Polish Consumers

Sustainability ◽

10.3390/su13010348 ◽

2021 ◽

Vol 13 (1) ◽

pp. 348

Author(s):

Lukasz Skowron ◽

Monika Sak-Skowron

Keyword(s):

Expectation Maximization ◽

Cross Validation ◽

Expectation Maximization Algorithm ◽

Decision Making Process ◽

Environmental Sensitivity ◽

Significance Level ◽

On Line ◽

Purchase Process ◽

The Impact ◽

Fold Cross Validation

The first of the research objectives discussed in this article was to analyze the differences related to the valuation of particular factors influencing the purchase process in the smartphone industry, expressed by respondents with different sensitivity and environmental awareness, as well as the assessment of their knowledge about the impact of smartphones on the natural environment. The second objective of the research was to determine whether the level of environmental sensitivity, awareness and knowledge about the impact of smartphones on the environment has a statistically significant influence on the respondents’ choice of smartphone brand. The survey was conducted using an on-line questionnaire, distributed by a specialized research agency on a representative sample of over 1000 Polish residents. In order to identify the various customers clusters, the expectation-maximization algorithm and the v-fold cross-validation were used. Additionally, in order to analyze the significance level of differences between clusters the nonparametric Mann-Whitney U-test was carried out. The results show unequivocally that people with a different approach to ecological issues demonstrate statistically significant differences in their purchasing behaviors in the smartphone industry. Furthermore, it was noticed that in the case of comparing some smartphones brands, there is a statistically confirmed difference in the environmental sensitivity and awareness of the customers who use them. Moreover, the research has shown that in Polish customers’ consciousness smartphones are mistakenly considered to be relatively safe and environmentally friendly products.

Download Full-text