feature vectors
Recently Published Documents





2022 ◽  
Ira S. Hofer ◽  
Marina Kupina ◽  
Lori Laddaran ◽  
Eran Halperin

Abstract Introduction: Manuscripts that have successfully used machine learning (ML) to predict a variety of perioperative outcomes often use only a limited number of features selected by a clinician. We hypothesized that techniques leveraging a broad set of features for patient laboratory results, medications, and the surgical procedure name would improve performance as compared to a more limited set of features chosen by clinicians. Methods Feature vectors for laboratory results included 702 features total derived from 39 laboratory tests, medications consisted of a binary flag for 126 commonly used medications, procedure name used the Word2Vec package for create a vector of length 100. Nine models were trained: Baseline Features, one for each of the three types of data Baseline+Each data type (, all features, and then all features with feature reduction algorithm. Results Across both outcomes the models that contained all features (model 8) (Mortality ROC-AUC 94.42, PR-AUC 31.0; AKI ROC-AUC 92.47, PR-AUC 76.73) was superior to models with only subsets of features Conclusion Featurization techniques leveraging a broad away of clinical data can improve performance of perioperative prediction models.

2022 ◽  
Vol 23 (1) ◽  
Li Wang ◽  
Cheng Zhong

Abstract Background Long non-coding RNAs (lncRNAs) are related to human diseases by regulating gene expression. Identifying lncRNA-disease associations (LDAs) will contribute to diagnose, treatment, and prognosis of diseases. However, the identification of LDAs by the biological experiments is time-consuming, costly and inefficient. Therefore, the development of efficient and high-accuracy computational methods for predicting LDAs is of great significance. Results In this paper, we propose a novel computational method (gGATLDA) to predict LDAs based on graph-level graph attention network. Firstly, we extract the enclosing subgraphs of each lncRNA-disease pair. Secondly, we construct the feature vectors by integrating lncRNA similarity and disease similarity as node attributes in subgraphs. Finally, we train a graph neural network (GNN) model by feeding the subgraphs and feature vectors to it, and use the trained GNN model to predict lncRNA-disease potential association scores. The experimental results show that our method can achieve higher area under the receiver operation characteristic curve (AUC), area under the precision recall curve (AUPR), accuracy and F1-Score than the state-of-the-art methods in five fold cross-validation. Case studies show that our method can effectively identify lncRNAs associated with breast cancer, gastric cancer, prostate cancer, and renal cancer. Conclusion The experimental results indicate that our method is a useful approach for predicting potential LDAs.

2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

Currently, considerable research has been done in vehicle type classification, especially due to the success of deep learning in many image classification problems. In this research, a system incorporating hybrid features is proposed to improve the performance of vehicle type classification. The feature vectors are extracted from the pre-processed images using Gabor features, a histogram of oriented gradients and a local optimal oriented pattern. The hybrid set of features contains complementary information that could help discriminate between the classes better, further, an ant colony optimizer is utilized to reduce the dimension of the extracted feature vectors. Finally, a deep neural network is used to classify the types of vehicles in the images. The proposed approach was tested on the MIO vision traffic camera dataset and another more challenging real-world dataset consisting of videos of multiple lanes of a toll plaza. The proposed model showed an improvement in accuracy ranging from 0.28% to 8.68% in the MIO TCD dataset when compared to well-known neural network architectures.

Information ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 15
Amirata Ghorbani ◽  
Dina Berenbaum ◽  
Maor Ivgi ◽  
Yuval Dafna ◽  
James Y. Zou

Interpretability is becoming an active research topic as machine learning (ML) models are more widely used to make critical decisions. Tabular data are one of the most commonly used modes of data in diverse applications such as healthcare and finance. Much of the existing interpretability methods used for tabular data only report feature-importance scores—either locally (per example) or globally (per model)—but they do not provide interpretation or visualization of how the features interact. We address this limitation by introducing Feature Vectors, a new global interpretability method designed for tabular datasets. In addition to providing feature-importance, Feature Vectors discovers the inherent semantic relationship among features via an intuitive feature visualization technique. Our systematic experiments demonstrate the empirical utility of this new method by applying it to several real-world datasets. We further provide an easy-to-use Python package for Feature Vectors.

2021 ◽  
pp. 1-12
Melesio Crespo-Sanchez ◽  
Ivan Lopez-Arevalo ◽  
Edwin Aldana-Bobadilla ◽  
Alejandro Molina-Villegas

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.

2021 ◽  
Vol 2021 ◽  
pp. 1-18
Ling Shu ◽  
Jinxing Shen ◽  
Xiaoming Liu

With a view to solving the defect that multiscale amplitude-aware permutation entropy (MAAPE) can only quantify the low-frequency features of time series and ignore the high-frequency features which are equally important, a novel nonlinear time series feature extraction method, hierarchical amplitude-aware permutation entropy (HAAPE), is proposed. By constructing high and low-frequency operators, this method can extract the features of different frequency bands of time series simultaneously, so as to avoid the issue of information loss. In view of its advantages, HAAPE is introduced into the field of fault diagnosis to extract fault features from vibration signals of rotating machinery. Combined with the pairwise feature proximity (PWFP) feature selection method and gray wolf algorithm optimization support vector machine (GWO-SVM), a new intelligent fault diagnosis method for rotating machinery is proposed. In our method, firstly, HAPPE is adopted to extract the original high and low-frequency fault features of rotating machinery. After that, PWFP is used to sort the original features, and the important features are filtered to obtain low-dimensional sensitive feature vectors. Finally, the sensitive feature vectors are input into GWO-SVM for training and testing, so as to realize the fault identification of rotating machinery. The performance of the proposed method is verified using two data sets of bearing and gearbox. The results show that the proposed method enjoys obvious advantages over the existing methods, and the identification accuracy reaches 100%.

2021 ◽  
Vol 22 (1) ◽  
Junwei Luo ◽  
Hongyu Ding ◽  
Jiquan Shen ◽  
Haixia Zhai ◽  
Zhengjiang Wu ◽  

Abstract Background Structural variations (SVs) occupy a prominent position in human genetic diversity, and deletions form an important type of SV that has been suggested to be associated with genetic diseases. Although various deletion calling methods based on long reads have been proposed, a new approach is still needed to mine features in long-read alignment information. Recently, deep learning has attracted much attention in genome analysis, and it is a promising technique for calling SVs. Results In this paper, we propose BreakNet, a deep learning method that detects deletions by using long reads. BreakNet first extracts feature matrices from long-read alignments. Second, it uses a time-distributed convolutional neural network (CNN) to integrate and map the feature matrices to feature vectors. Third, BreakNet employs a bidirectional long short-term memory (BLSTM) model to analyse the produced set of continuous feature vectors in both the forward and backward directions. Finally, a classification module determines whether a region refers to a deletion. On real long-read sequencing datasets, we demonstrate that BreakNet outperforms Sniffles, SVIM and cuteSV in terms of their F1 scores. The source code for the proposed method is available from GitHub at https://github.com/luojunwei/BreakNet. Conclusions Our work shows that deep learning can be combined with long reads to call deletions more effectively than existing methods.

2021 ◽  
Vol 2137 (1) ◽  
pp. 012067
Tong Wang ◽  
Wenan Tan ◽  
Jianxin Xue

Abstract The composition of proteins nearly correlated with its function. Therefore, it is very ungently important to discuss a method that can automatically forecast protein structure. The fusion encoding method of PseAA and DC was adopted to describe the protein features. Using this encoding method to express protein sequences will produce higher dimensional feature vectors. This paper uses the algorithm of predigesting the characteristic dimension of proteins. By extracting significant feature vectors from the primitive feature vectors, eigenvectors with high dimensions are changed to eigenvectors with low dimensions. The experimental method of jackknife test is adopted. The consequences indicate that the arithmetic put forwarded here is appropriate for identifying whether the given protein is a homo-oligomer or a hetero-oligomer.

Diagnostics ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 2212
Jong-Uk Park ◽  
Erdenebayar Urtnasan ◽  
Sang-Ha Kim ◽  
Kyoung-Joung Lee

(1) Purpose: this study proposes a method of prediction of cardiovascular diseases (CVDs) that can develop within ten years in patients with sleep-disordered breathing (SDB). (2) Methods: For the design and evaluation of the algorithm, the Sleep Heart Health Study (SHHS) data from the 3367 participants were divided into a training set, validation set, and test set in the ratio of 5:3:2. From the data during a baseline period when patients did not have any CVD, we extracted 18 features from electrography (ECG) based on signal processing methods, 30 ECG features based on artificial intelligence (AI), ten clinical risk factors for CVD. We trained the model and evaluated it by using CVD outcomes result, monitored in follow-ups. The optimal feature vectors were selected through statistical analysis and support vector machine recursive feature elimination (SVM-RFE) of the extracted feature vectors. Features based on AI, a novel proposal from this study, showed excellent performance out of all selected feature vectors. In addition, new parameters based on AI were possibly meaningful predictors for CVD, when used in addition to the predictors for CVD that are already known. The selected features were used as inputs to the prediction model based on SVM for CVD, determining the development of CVD-free, coronary heart disease (CHD), heart failure (HF), or stroke within ten years. (3) Results: As a result, the respective recall and precision values were 82.9% and 87.5% for CVD-free; 71.9% and 63.8% for CVD; 57.2% and 55.4% for CHD; 52.6% and 40.8% for HF; 52.4% and 44.6% for stroke. The F1-score between CVD and CVD-free was 76.5%, and it was 59.1% in class four. (4) Conclusion: In conclusion, our results confirm the excellence of the prediction model for CVD in patients with SDB and verify the possibility of prediction within ten years of the CVDs that may occur in patients with SDB.

Sign in / Sign up

Export Citation Format

Share Document