IIMLP: integrated information-entropy-based method for LncRNA prediction

Abstract Background The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs. Results We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%. Conclusions We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.

Download Full-text

Clinical Score and Machine Learning-Based Model to Predict Diagnosis of Primary Aldosteronism in Arterial Hypertension

Hypertension ◽

10.1161/hypertensionaha.121.17444 ◽

2021 ◽

Vol 78 (5) ◽

pp. 1595-1604

Author(s):

Fabrizio Buffolo ◽

Jacopo Burrello ◽

Alessio Burrello ◽

Daniel Heinrich ◽

Christian Adolf ◽

...

Keyword(s):

Machine Learning ◽

Arterial Hypertension ◽

Primary Aldosteronism ◽

Learning Algorithm ◽

Area Under The Curve ◽

Clinical Score ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Individual Risk ◽

The Individual

Primary aldosteronism (PA) is the cause of arterial hypertension in 4% to 6% of patients, and 30% of patients with PA are affected by unilateral and surgically curable forms. Current guidelines recommend screening for PA ≈50% of patients with hypertension on the basis of individual factors, while some experts suggest screening all patients with hypertension. To define the risk of PA and tailor the diagnostic workup to the individual risk of each patient, we developed a conventional scoring system and supervised machine learning algorithms using a retrospective cohort of 4059 patients with hypertension. On the basis of 6 widely available parameters, we developed a numerical score and 308 machine learning-based models, selecting the one with the highest diagnostic performance. After validation, we obtained high predictive performance with our score (optimized sensitivity of 90.7% for PA and 92.3% for unilateral PA [UPA]). The machine learning-based model provided the highest performance, with an area under the curve of 0.834 for PA and 0.905 for diagnosis of UPA, with optimized sensitivity of 96.6% for PA, and 100.0% for UPA, at validation. The application of the predicting tools allowed the identification of a subgroup of patients with very low risk of PA (0.6% for both models) and null probability of having UPA. In conclusion, this score and the machine learning algorithm can accurately predict the individual pretest probability of PA in patients with hypertension and circumvent screening in up to 32.7% of patients using a machine learning-based model, without omitting patients with surgically curable UPA.

Download Full-text

The silencing of long non-coding RNA ANRIL suppresses invasion, and promotes apoptosis of retinoblastoma cells through the ATM-E2F1 signaling pathway

Bioscience Reports ◽

10.1042/bsr20180558 ◽

2018 ◽

Vol 38 (6) ◽

Cited By ~ 8

Author(s):

Yang Yang ◽

Xiao-Wei Peng

Keyword(s):

Protein Expression ◽

Signaling Pathway ◽

Inhibitory Effect ◽

Down Regulation ◽

Reading Frame ◽

Non Coding Rna ◽

Retinoblastoma Cells ◽

Human Retinoblastoma ◽

Y79 Cells ◽

Long Non Coding Rna

As one of the most common primary intraocular carcinomas, retinoblastoma generally stems from the inactivation of the retinoblastoma RB1 gene in retinal cells. Antisense non-coding RNA in the INK4 locus (ANRIL), a long non-coding RNA (lncRNA), has been reported to affect tumorigenesis and progression of various cancers, including gastric cancer and non-small cell lung cancer. However, limited investigations emphasized the role of ANRIL in human retinoblastoma. Hence, the current study was intended to investigate the effects of ANRIL on the proliferation, apoptosis, and invasion of retinoblastoma HXO-RB44 and Y79 cells. The lentivirus-based packaging system was designed to aid the up-regulation of ANRIL and ATM expressions or employed for the down-regulation of ANRIL in human retinoblastoma cells. Afterward, ANRIL expression, mRNA and protein expression of ATM and E2F1, and protein expression of INK4b, INK4a, alternate reading frame (ARF), p53 and retinoblastoma protein (pRB) were determined in order to elucidate the regulation effect associated with ANRIL on the ATM-E2F1 signaling pathway. In addition, cell viability, apoptosis, and invasion were detected accordingly. The results indicated that the down-regulation of ANRIL or up-regulation of ATM led to an increase in the expressions of ATM, E2F1, INK4b, INK4a, ARF, p53, and pRB. The silencing of ANRIL or up-regulation of ATM exerted an inhibitory effect on the proliferation and invasion while improving the apoptosis of HXO-RB44 and Y79 cells. In conclusion, the key observations of our study demonstrated that ANRIL depletion could act to suppress retinoblastoma progression by activating the ATM-E2F1 signaling pathway. These results provide a potentially promising basis for the targetted intervention treatment of human retinoblastoma.

Download Full-text

Mean Received Resources Meet Machine Learning Algorithms to Improve Link Prediction Methods

Information ◽

10.3390/info13010035 ◽

2022 ◽

Vol 13 (1) ◽

pp. 35

Author(s):

Jibouni Ayoub ◽

Dounia Lotfi ◽

Ahmed Hammouch

Keyword(s):

Machine Learning ◽

Link Prediction ◽

Learning Algorithms ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Actual State ◽

The Future ◽

Auc Value ◽

The Mean ◽

Analysis Of Social Networks

The analysis of social networks has attracted a lot of attention during the last two decades. These networks are dynamic: new links appear and disappear. Link prediction is the problem of inferring links that will appear in the future from the actual state of the network. We use information from nodes and edges and calculate the similarity between users. The more users are similar, the higher the probability of their connection in the future will be. The similarity metrics play an important role in the link prediction field. Due to their simplicity and flexibility, many authors have proposed several metrics such as Jaccard, AA, and Katz and evaluated them using the area under the curve (AUC). In this paper, we propose a new parameterized method to enhance the AUC value of the link prediction metrics by combining them with the mean received resources (MRRs). Experiments show that the proposed method improves the performance of the state-of-the-art metrics. Moreover, we used machine learning algorithms to classify links and confirm the efficiency of the proposed combination.

Download Full-text

Nutritional biomarkers and machine learning for personalized nutrition applications and health optimization

Intelligent Decision Technologies ◽

10.3233/idt-210233 ◽

2021 ◽

pp. 1-9

Author(s):

Dimitrios P. Panagoulias ◽

Dionisios N. Sotiropoulos ◽

George A. Tsihrintzis

Keyword(s):

Machine Learning ◽

Physiological State ◽

Prediction Method ◽

Disease Diagnosis ◽

Classification Model ◽

Cell Physiology ◽

Cellular Processes ◽

General Evaluation ◽

Model Training ◽

The One

The doctrine of the “one size fits all” approach in the field of disease diagnosis and patient management is being replaced by a more per patient approach known as “personalized medicine”. In this spirit, biomarkers are key variables in the research and development of new methods for prognostic and classification model training based on advances in the field of artificial intelligence [1, 2, 3]. Metabolomics refers to the systematic study of the unique chemical fingerprints that cellular processes leave behind. The metabolic profile of a person can provide a snapshot of cell physiology and, by extension, metabolomics provide a direct “functional reading of the physiological state” of an organism. Via employing machine learning methodologies, a general evaluation chart of nutritional biomarkers is formulated and an optimised prediction method for body to mass index is investigated with the aim to discover dietary patterns.

Download Full-text

Why segmentation matters: a Machine Learning approach for predicting loan defaults in the Peer-to-Peer (P2P) Financial Ecosystem

Risk Management Magazine ◽

10.47473/2020rmm0089 ◽

2021 ◽

Vol 16 (2) ◽

pp. 35-49

Author(s):

Adamaria Perrotta ◽

◽

Georgios Bliatsios ◽

Keyword(s):

Machine Learning ◽

Area Under The Curve ◽

Peer To Peer ◽

Weight Of Evidence ◽

K Nearest Neighbors ◽

Default Prediction ◽

Sequential Feature Selection ◽

Loan Defaults ◽

The One ◽

Online Lending

Peer-to-Peer (P2P) lending is an online lending process allowing individuals to obtain or concede loans without the interference of traditional financial intermediaries. It has grown quickly the last years, with some platforms reaching billions of dollars of loans in principal in a short amount of time. Since each loan is associated with the probability of loss due to a borrower's failure, this paper addresses the borrower's default prediction problem in the P2P financial ecosystem. The main assumption, which makes this study different from the available literature, is that borrowers sharing the same homeownership status display similar risk profile, thus a model per segment should be developed. We estimate the Probability of Default (PD) of a borrower by using Logistic Regression (LR) coupled with Weight of Evidence encoding. The features set is identified via the Sequential Feature Selection (SFS). We compare the forward against the backward SFS, in terms of the Area Under the Curve (AUC), and we choose the one that maximizes this statistic. Finally, we compare the results of the chosen LR approach against two other popular Machine Learning (ML) techniques: the k Nearest Neighbors (k-NN) and the Random Forest (RF).

Download Full-text

The Role of Tumor-related LncRNA PART1 in cancer

Current Pharmaceutical Design ◽

10.2174/1381612827666210705161955 ◽

2021 ◽

Vol 27 ◽

Author(s):

Jinlan Chen ◽

Enqing Meng ◽

Yexiang Lin ◽

Yujie Shen ◽

Chengyu Hu ◽

...

Keyword(s):

Molecular Mechanisms ◽

Malignant Tumors ◽

Migration And Invasion ◽

Signal Pathways ◽

Toll Like Receptor ◽

Non Coding Rna ◽

Regulated Expression ◽

The One ◽

Long Non Coding Rna

Background: As we all know, long non-coding RNA (lncRNA) affects tumor progression, which has caused a great upsurge in recent years. It can also affect the growth, migration, and invasion of tumors. When we refer to the abnormal expression of lncRNA, we will find it associated with malignant tumors. In addition, lncRNA has been proved to be a key targeted gene for the treatment of some diseases. PART1, a member of lncRNA, has been reported as a regulator in the process of tumor occurrence and development. This study aims to reveal the biological functions, specific mechanisms, and clinical significance of PART1 in various tumor cells. Methods: Through the careful search of PUBMED, the mechanisms of the effect of PART1 on tumorigenesis and development are summarized. Results: On the one hand, the up-regulated expression of PART1 plays a tumor-promoting role in tumors, including lung cancer, prostate cancer, bladder cancer and so on. On the other hand, PART1 is down-regulated in gastric cancer, glioma and other tumors to play a tumor inhibitory role. In addition, PART1 regulates tumor growth mainly by targeting microRNA such as miR-635, directly regulating the expression of proteins such as FUS/EZH2, affecting signal pathways such as the Toll-like receptor pathway, or regulating immune cells. Conclusion: PART1 is closely related to tumors by regulating a variety of molecular mechanisms. In addition, PART1 can be used as a clinical marker for the early diagnosis of tumors and plays an important role in tumor-targeted therapy.

Download Full-text

Prediction of Adverse Events in Stable Non-Variceal Gastrointestinal Bleeding Using Machine Learning

Journal of Clinical Medicine ◽

10.3390/jcm9082603 ◽

2020 ◽

Vol 9 (8) ◽

pp. 2603 ◽

Cited By ~ 1

Author(s):

Dong-Woo Seo ◽

Hahn Yi ◽

Beomhee Park ◽

Youn-Jung Kim ◽

Dae Ho Jung ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Adverse Events ◽

Gastrointestinal Bleeding ◽

Area Under The Curve ◽

Scoring Systems ◽

Hemodynamic Instability ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Importance Analysis

Clinical risk-scoring systems are important for identifying patients with upper gastrointestinal bleeding (UGIB) who are at a high risk of hemodynamic instability. We developed an algorithm that predicts adverse events in patients with initially stable non-variceal UGIB using machine learning (ML). Using prospective observational registry, 1439 out of 3363 consecutive patients were enrolled. Primary outcomes included adverse events such as mortality, hypotension, and rebleeding within 7 days. Four machine learning algorithms, namely, logistic regression with regularization (LR), random forest classifier (RF), gradient boosting classifier (GB), and voting classifier (VC), were compared with the Glasgow–Blatchford score (GBS) and Rockall scores. The RF model showed the highest accuracies and significant improvement over conventional methods for predicting mortality (area under the curve: RF 0.917 vs. GBS 0.710), but the performance of the VC model was best in hypotension (VC 0.757 vs. GBS 0.668) and rebleeding within 7 days (VC 0.733 vs. GBS 0.694). Clinically significant variables including blood urea nitrogen, albumin, hemoglobin, platelet, prothrombin time, age, and lactate were identified by the global feature importance analysis. These results suggest that ML models will be useful early predictive tools for identifying high-risk patients with initially stable non-variceal UGIB admitted at an emergency department.

Download Full-text

COMPUTATIONAL STUDY ON THE RUPTURE RISK IN REAL CEREBRAL ANEURYSMS WITH GEOMETRICAL AND FLUID-MECHANICAL PARAMETERS USING FSI SIMULATIONS AND MACHINE LEARNING ALGORITHMS

Journal of Mechanics in Medicine and Biology ◽

10.1142/s0219519419500143 ◽

2019 ◽

Vol 19 (03) ◽

pp. 1950014

Author(s):

ALFREDO ARANDA ◽

ALVARO VALENCIA

Keyword(s):

Machine Learning ◽

Computational Study ◽

Learning Algorithms ◽

Area Under The Curve ◽

Cerebral Aneurysms ◽

Machine Learning Algorithms ◽

Maximum Height ◽

Rupture Risk ◽

Relative Residence Time ◽

Von Mises

Fluid-mechanical and morphological parameters are recognized as major factors in the rupture risk of human aneurysms. On the other hand, it is well known that a lot of machine learning tools are available to study a variety of problems in many fields. In this work, fluid–structure interaction (FSI) simulations were carried out to examine a database of 60 real saccular cerebral aneurysms (30 ruptured and 30 unruptured) using reconstructions by angiography images. With the results of the simulations and geometric analyses, we studied the analysis of variance (ANOVA) statistic test in many variables and we obtained that aspect ratio (AR), bottleneck factor (BNF), maximum height of the aneurysms (MH), relative residence time (RRT), Womersley number (WN) and Von-Mises strain (VMS) are statically significant and good predictors for the models. In consequence, these ones were used in five machine learning algorithms to determine the rupture risk predictions of the aneurysms, where the adaptative boosting (AdaBoost) was calculated with the highest area under the curve (AUC) in the receiver operating characteristic (ROC) curve (AUC 0.944).

Download Full-text

LncPred-IEL: A Long Non-coding RNA Prediction Method using Iterative Ensemble Learning

2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm47256.2019.8982948 ◽

2019 ◽

Cited By ~ 3

Author(s):

Yanzhen Xu ◽

Xiaohan Zhao ◽

Shuai Liu ◽

Shichao Liu ◽

Yanqing Niu ◽

...

Keyword(s):

Ensemble Learning ◽

Prediction Method ◽

Non Coding Rna ◽

Long Non Coding Rna

Download Full-text

Radiomic Features and Machine Learning for the Discrimination of Renal Tumor Histological Subtypes: A Pragmatic Study Using Clinical-Routine Computed Tomography

Cancers ◽

10.3390/cancers12103010 ◽

2020 ◽

Vol 12 (10) ◽

pp. 3010

Author(s):

Johannes Uhlig ◽

Andreas Leha ◽

Laura M. Delonge ◽

Anna-Maria Haack ◽

Brian Shuch ◽

...

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

Renal Tumor ◽

Learning Algorithms ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Clinical Routine ◽

Tumor Subtypes ◽

Chromophobe Rcc ◽

Venous Phase

This study evaluates the diagnostic performance of radiomic features and machine learning algorithms for renal tumor subtype assessment in venous computed tomography (CT) studies from clinical routine. Patients undergoing surgical resection and histopathological assessment of renal tumors at a tertiary referral center between 2012 and 2019 were included. Preoperative venous-phase CTs from multiple referring imaging centers were segmented, and standardized radiomic features extracted. After preprocessing, class imbalance handling, and feature selection, machine learning algorithms were used to predict renal tumor subtypes using 10-fold cross validation, assessed as multiclass area under the curve (AUC). In total, n = 201 patients were included (73.7% male; mean age 66 ± 11 years), with n = 131 clear cell renal cell carcinomas (ccRCC), n = 29 papillary RCC, n = 11 chromophobe RCC, n = 16 oncocytomas, and n = 14 angiomyolipomas (AML). An extreme gradient boosting algorithm demonstrated the highest accuracy (multiclass area under the curve (AUC) = 0.72). The worst discrimination was evident for oncocytomas vs. AML and oncocytomas vs. chromophobe RCC (AUC = 0.55 and AUC = 0.45, respectively). In sensitivity analyses excluding oncocytomas, a random forest algorithm showed the highest accuracy, with multiclass AUC = 0.78. Radiomic feature analyses from venous-phase CT acquired in clinical practice with subsequent machine learning can discriminate renal tumor subtypes with moderate accuracy. The classification of oncocytomas seems to be the most complex with the lowest accuracy.

Download Full-text