roc score
Recently Published Documents


TOTAL DOCUMENTS

17
(FIVE YEARS 14)

H-INDEX

3
(FIVE YEARS 1)

2022 ◽  
Author(s):  
Aayush Grover ◽  
Laurent Gatto

Protein subcellular localization prediction plays a crucial role in improving our understandings of different diseases and consequently assists in building drug targeting and drug development pipelines. Proteins are known to co-exist at multiple subcellular locations which make the task of prediction extremely challenging. A protein interaction network is a graph that captures interactions between different proteins. It is safe to assume that if two proteins are interacting, they must share some subcellular locations. With this regard, we propose ProtFinder - the first deep learning-based model that exclusively relies on protein interaction networks to predict the multiple subcellular locations of proteins. We also integrate biological priors like the cellular component of Gene Ontology to make ProtFinder a more biology-aware intelligent system. ProtFinder is trained and tested using the STRING and BioPlex databases whereas the annotations of proteins are obtained from the Human Protein Atlas. Our model gives an AUC-ROC score of 90.00% and an MCC score of 83.42% on a held-out set of proteins. We also apply ProtFinder to annotate proteins that currently do not have confident location annotations. We observe that ProtFinder is able to confirm some of these unreliable location annotations, while in some cases complementing the existing databases with novel location annotations.


2021 ◽  
Vol 5 (4) ◽  
pp. 448
Author(s):  
Budi Juarto ◽  
Abba Suganda Girsang

The number of news produced every day is as much as 3 million per day, making readers have many choices in choosing news according to each reader's topic and category preferences. The recommendation system can make it easier for users to choose the news to read. The method that can be used in providing recommendations from the same user is collaborative filtering. Neural collaborative filtering is usually being used for recommendation systems by combining collaborative filtering with neural networks. However, this method has the disadvantage of recommending the similarity of news content such as news titles and content to users. This research wants to develop neural collaborative filtering using sentences BERT. Sentence BERT is applied to news titles and news contents that are converted into sentence embedding. The results of this sentence embedding are used in neural collaboration with item id, user id, and news category. We use a Microsoft news dataset of 50,000 users and 51,282 news, with 5,475,542 interactions between users and news. The evaluation carried out in this study uses precision, recall, and ROC curves to predict news clicks by the user. Another evaluation uses a hit ratio with the leave one out method. The evaluation results obtained a precision value of 99.14%, recall of 92.48%, f1-score of 95.69%, and ROC score of 98%. Evaluation measurement using the hit ratio@10 produces a hit ratio of 74% at fiftieth epochs for neural collaborative with sentence BERT which is better than neural collaborative filtering (NCF) and NCF with news category.


Author(s):  
Nikola Ljubešić ◽  
Nataša Logar ◽  
Iztok Kosem

Collocations play a very important role in language description, especially in identifying meanings of words. Modern lexicography’s inevitable part of meaning deduction are lists of collocates ranked by some statistical measurement. In the paper, we present a comparison between two approaches to the ranking of collocates: (a) the logDice method, which is dominantly used and frequency-based, and (b) the fastText word embeddings method, which is new and semantic-based. The comparison was made on two Slovene datasets, one representing general language headwords and their collocates, and the other representing headwords and their collocates extracted from a language for special purposes corpus. In the experiment, two methods were used: for the quantitative part of the evaluation, we used supervised machine learning with the area-under-the-curve (AUC) ROC score and support-vector machines (SVMs) algorithm, and in the qualitative part the ranking results of the two methods were evaluated by lexicographers. The results were somewhat inconsistent; while the quantitative evaluation confirmed that the machine-learning-based approach produced better collocate ranking results than the frequency-based one, lexicographers in most cases considered the listings of collocates of both methods very similar.


Author(s):  
Shikha Bhat ◽  
Anuradha Pandey ◽  
Akshay Kanakan ◽  
Ranjeet Maurya ◽  
Janani Srinivasa Vasudevan ◽  
...  

The global coronavirus disease 2019 (COVID-19) pandemic has demonstrated the range of disease severity and pathogen genomic diversity emanating from a singular virus (severe acute respiratory syndrome coronavirus 2, SARS-CoV-2). This diversity in disease manifestations and genomic mutations has challenged healthcare management and resource allocation during the pandemic, especially for countries such as India with a bigger population base. Here, we undertake a combinatorial approach toward scrutinizing the diagnostic and genomic diversity to extract meaningful information from the chaos of COVID-19 in the Indian context. Using methods of statistical correlation, machine learning (ML), and genomic sequencing on a clinically comprehensive patient dataset with corresponding with/without respiratory support samples, we highlight specific significant diagnostic parameters and ML models for assessing the risk of developing severe COVID-19. This information is further contextualized in the backdrop of SARS-CoV-2 genomic features in the cohort for pathogen genomic evolution monitoring. Analysis of the patient demographic features and symptoms revealed that age, breathlessness, and cough were significantly associated with severe disease; at the same time, we found no severe patient reporting absence of physical symptoms. Observing the trends in biochemical/biophysical diagnostic parameters, we noted that the respiratory rate, total leukocyte count (TLC), blood urea levels, and C-reactive protein (CRP) levels were directly correlated with the probability of developing severe disease. Out of five different ML algorithms tested to predict patient severity, the multi-layer perceptron-based model performed the best, with a receiver operating characteristic (ROC) score of 0.96 and an F1 score of 0.791. The SARS-CoV-2 genomic analysis highlighted a set of mutations with global frequency flips and future inculcation into variants of concern (VOCs) and variants of interest (VOIs), which can be further monitored and annotated for functional significance. In summary, our findings highlight the importance of SARS-CoV-2 genomic surveillance and statistical analysis of clinical data to develop a risk assessment ML model.


2021 ◽  
Author(s):  
Dianshuang Zhou ◽  
Xin Li ◽  
Shipeng Shang ◽  
Hui Zhi ◽  
Peng Wang ◽  
...  

Abstract Background: Long noncoding RNAs (LncRNAs) represent a large category of functional RNA molecules that play a significant role in human cancers. lncRNAs can be genes modulators to affect the biological process of multiple cancers.Methods: Here, we developed a computational framework that uses lncRNA-mRNA network and mutations in individual genes of 9 cancers from TCGA to prioritize cancer lncRNA modulators. Our method screened risky cancer lncRNA regulators based on integrated multiple lncRNA functional networks and 3 calculation methods in network. Results: Validation analyses revealed that our method was more effective than prioritization based on a single lncRNA network. This method showed high predictive performance and the highest ROC score was 0.836 in breast cancer. It’s worth noting that we found that 5 lncRNAs scores were abnormally high and these lncRNAs appeared in 9 cancers. By consulting the literatures, these 5 lncRNAs were experimentally supported lncRNAs. Analyses of prioritizing lncRNAs reveal that these lncRNAs are enriched in various cancer-related biological processes and pathways.Conclusions: Together, these results demonstrated the ability of this method identifying candidate lncRNA molecules and improved insights into the pathogenesis of cancer.


2021 ◽  
Vol 15 (02) ◽  
pp. 241-262
Author(s):  
Wasif Bokhari ◽  
Ajay Bansal

In medical disease diagnosis, the cost of a false negative could greatly outweigh the cost of a false positive. This is because the former could cost a life, whereas the latter may only cause medical costs and stress to the patient. The unique nature of this problem highlights the need of asymmetric error control for binary classification applications. In this domain, traditional machine learning classifiers may not be ideal as they do not provide a way to control the number of false negatives below a certain threshold. This paper proposes a novel tree-based binary classification algorithm that can control the number of false negatives with a mathematical guarantee, based on Neyman–Pearson (NP) Lemma. This classifier is evaluated on the data obtained from different heart studies and it predicts the risk of cardiac disease, not only with comparable accuracy and AUC-ROC score but also with full control over the number of false negatives. The methodology used to construct this classifier can be expanded to many more use cases, not only in medical disease diagnosis but also beyond as shown from analysis on different diverse datasets.


Author(s):  
Mafas Raheem

Diabetes has become a famous and lethal disease among the low and medium-income countries. People could not overcome this deadly abnormal condition due to the current lifestyle, food habit and the genetic transmittance. Medical practitioners provide advice to prevent the diabetic condition and medications to control as this disease does not have a permanent cure. However, the detection of the disease is being a tidy process and deployment of machine learning predictive models to conduct smart diagnosis/detection is vital in the healthcare domain nowadays. Though several machine learning models were built in this regard, deploying a Deep Neural Network seems less focused. Therefore, a Deep Neural Network model was built with the support of complete preprocessing, class balancing, normalization, feature selection process and hyper-parameter tuning using the cross-validated searching technique. The model achieved 88% of accuracy and 0.88 ROC score and standing out as a promising predictive model in diagnosing/detecting diabetes.


2021 ◽  
Author(s):  
Dejian Yang ◽  
Youmin Tang ◽  
Xiu-Qun Yang ◽  
Dan Ye ◽  
Ting Liu ◽  
...  

AbstractUnderstanding the relationship between probabilistic and deterministic prediction skills is of important significance for the study of seasonal forecasting and verification. Based on the Brier skill score methodology, we have previously found a theoretical relationship between the probabilistic resolution skill and the deterministic correlation (i.e., anomaly correlation; AC) skill and a lack of necessary or consistent relationship between the probabilistic reliability skill and the deterministic skill in dynamical seasonal prediction. Here, we further theoretically investigate the relationship between the probabilistic relative operating characteristic (ROC) skill and the deterministic skill. The ROC measures the discrimination attribute of probabilistic forecast quality, another important attribute besides the resolution and reliability. With some simplified assumptions, we first derive theoretical expressions for the hit and false-alarm rates that are basic ingredients for the ROC curve, then demonstrate a sole dependence of the ROC curve on the AC, and finally analytically derive a relationship between the related ROC score and the AC. Such a theoretically derived ROC-AC relationship is further examined using dynamical models’ ensemble seasonal hindcasts, which is well verified. The finding here along with our previous findings implies that the discrimination and resolution attributes of probabilistic seasonal forecast skill are intrinsically equivalent to the corresponding deterministic skill, while the reliability appears to be the fundamental attribute of the probabilistic skill that differs from the deterministic skill, which constitutes an understanding of the fundamental similarities and difference between the two types of seasonal forecasting skills and predictability and can offer important implications for the study of seasonal forecasting and verification.


Author(s):  
Mohammad Hamim Zajuli Al Faroby ◽  
Mohammad Isa Irawan ◽  
Ni Nyoman Tri Puspaningsih

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces  features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of  and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.


Author(s):  
Mochammad Agus Afrianto ◽  
Meditya Wasesa

Background: Literature in the peer-to-peer accommodation has put a substantial focus on accommodation listings' price determinants. Developing prediction models related to the demand for accommodation listings is vital in revenue management because accurate price and demand forecasts will help determine the best revenue management responses.Objective: This study aims to develop prediction models to determine the booking likelihood of accommodation listings.Methods: Using an Airbnb dataset, we developed four machine learning models, namely Logistics Regression, Decision Tree, K-Nearest Neighbor (KNN), and Random Forest Classifiers. We assessed the models using the AUC-ROC score and the model development time by using the ten-fold three-way split and the ten-fold cross-validation procedures.Results: In terms of average AUC-ROC score, the Random Forest Classifiers outperformed other evaluated models. In three-ways split procedure, it had a 15.03% higher AUC-ROC score than Decision Tree, 2.93 % higher than KNN, and 2.38% higher than Logistics Regression. In the cross-validation procedure, it has a 26,99% higher AUC-ROC score than Decision Tree, 4.41 % higher than KNN, and 3.31% higher than Logistics Regression.  It should be noted that the Decision Tree model has the lowest AUC-ROC score, but it has the smallest model development time.Conclusion: The performance of random forest models in predicting booking likelihood of accommodation listings is the most superior. The model can be used by peer-to-peer accommodation owners to improve their revenue management responses. 


Sign in / Sign up

Export Citation Format

Share Document