Atomtransmachine: An atomic feature representation model for machine learning

Background: Revealing the subcellular location of a newly discovered protein can bring insight into their function and guide research at the cellular level. The experimental methods currently used to identify the protein subcellular locations are both time-consuming and expensive. Thus, it is highly desired to develop computational methods for efficiently and effectively identifying the protein subcellular locations. Especially, the rapidly increasing number of protein sequences entering the genome databases has called for the development of automated analysis methods. Methods: In this review, we will describe the recent advances in predicting the protein subcellular locations with machine learning from the following aspects: i) Protein subcellular location benchmark dataset construction, ii) Protein feature representation and feature descriptors, iii) Common machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web servers. Result & Conclusion: Concomitant with a large number of protein sequences generated by highthroughput technologies, four future directions for predicting protein subcellular locations with machine learning should be paid attention. One direction is the selection of novel and effective features (e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins. Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth one is the protein multiple location sites prediction.

Download Full-text

Cancer Research Trend Analysis Based on Fusion Feature Representation

Entropy ◽

10.3390/e23030338 ◽

2021 ◽

Vol 23 (3) ◽

pp. 338

Author(s):

Jingqiao Wu ◽

Xiaoyue Feng ◽

Renchu Guan ◽

Yanchun Liang

Keyword(s):

Cancer Research ◽

Trend Analysis ◽

Research Trends ◽

Feature Representation ◽

Research Trend ◽

Good Representation ◽

Text Representation ◽

Classical Text ◽

Representation Model ◽

Text Feature

Machine learning models can automatically discover biomedical research trends and promote the dissemination of information and knowledge. Text feature representation is a critical and challenging task in natural language processing. Most methods of text feature representation are based on word representation. A good representation can capture semantic and structural information. In this paper, two fusion algorithms are proposed, namely, the Tr-W2v and Ti-W2v algorithms. They are based on the classical text feature representation model and consider the importance of words. The results show that the effectiveness of the two fusion text representation models is better than the classical text representation model, and the results based on the Tr-W2v algorithm are the best. Furthermore, based on the Tr-W2v algorithm, trend analyses of cancer research are conducted, including correlation analysis, keyword trend analysis, and improved keyword trend analysis. The discovery of the research trends and the evolution of hotspots for cancers can help doctors and biological researchers collect information and provide guidance for further research.

Download Full-text

Facing Erosion Identification in Railway Lines Using Pixel-Wise Deep-Based Approaches

Remote Sensing ◽

10.3390/rs12040739 ◽

2020 ◽

Vol 12 (4) ◽

pp. 739

Author(s):

Keiller Nogueira ◽

Gabriel L. S. Machado ◽

Pedro H. T. Gama ◽

Caio C. V. da Silva ◽

Remis Balaniuk ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

High Impact ◽

Automatic Machine ◽

Feature Representation ◽

Data Driven ◽

Maintenance Costs ◽

Crucial Step ◽

Machine Learning Methods ◽

High Resolution Images

Soil erosion is considered one of the most expensive natural hazards with a high impact on several infrastructure assets. Among them, railway lines are one of the most likely constructions for the appearance of erosion and, consequently, one of the most troublesome due to the maintenance costs, risks of derailments, and so on. Therefore, it is fundamental to identify and monitor erosion in railway lines to prevent major consequences. Currently, erosion identification is manually performed by humans using huge image sets, a time-consuming and slow task. Hence, automatic machine learning methods appear as an appealing alternative. A crucial step for automatic erosion identification is to create a good feature representation. Towards such objective, deep learning can learn data-driven features and classifiers. In this paper, we propose a novel deep learning-based framework capable of performing erosion identification in railway lines. Six techniques were evaluated and the best one, Dynamic Dilated ConvNet, was integrated into this framework that was then encapsulated into a new ArcGIS plugin to facilitate its use by non-programmer users. To analyze such techniques, we also propose a new dataset, composed of almost 2000 high-resolution images.

Download Full-text

Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study (Preprint)

10.2196/preprints.15371 ◽

2019 ◽

Author(s):

Derek Howard ◽

Marta M Maslej ◽

Justin Lee ◽

Jacob Ritchie ◽

Geoffrey Woollard ◽

...

Keyword(s):

Mental Health ◽

Machine Learning ◽

Social Media ◽

Transfer Learning ◽

Computational Linguistics ◽

Feature Representation ◽

Fine Tuning ◽

Language Models ◽

Universal Sentence ◽

Text Feature

BACKGROUND Mental illness affects a significant portion of the worldwide population. Online mental health forums can provide a supportive environment for those afflicted and also generate a large amount of data that can be mined to predict mental health states using machine learning methods. OBJECTIVE This study aimed to benchmark multiple methods of text feature representation for social media posts and compare their downstream use with automated machine learning (AutoML) tools. We tested on datasets that contain posts labeled for perceived suicide risk or moderator attention in the context of self-harm. Specifically, we assessed the ability of the methods to prioritize posts that a moderator would identify for immediate response. METHODS We used 1588 labeled posts from the Computational Linguistics and Clinical Psychology (CLPsych) 2017 shared task collected from the Reachout.com forum. Posts were represented using lexicon-based tools, including Valence Aware Dictionary and sEntiment Reasoner, Empath, and Linguistic Inquiry and Word Count, and also using pretrained artificial neural network models, including DeepMoji, Universal Sentence Encoder, and Generative Pretrained Transformer-1 (GPT-1). We used Tree-based Optimization Tool and Auto-Sklearn as AutoML tools to generate classifiers to triage the posts. RESULTS The top-performing system used features derived from the GPT-1 model, which was fine-tuned on over 150,000 unlabeled posts from Reachout.com. Our top system had a macroaveraged F1 score of 0.572, providing a new state-of-the-art result on the CLPsych 2017 task. This was achieved without additional information from metadata or preceding posts. Error analyses revealed that this top system often misses expressions of hopelessness. In addition, we have presented visualizations that aid in the understanding of the learned classifiers. CONCLUSIONS In this study, we found that transfer learning is an effective strategy for predicting risk with relatively little labeled data and noted that fine-tuning of pretrained language models provides further gains when large amounts of unlabeled text are available.

Download Full-text

Resampled dimensional reduction for feature representation in machine learning

10.21203/rs.3.pex-1636/v1 ◽

2021 ◽

Author(s):

Herdiantri Sufriyana ◽

Yu Wei Wu ◽

Emily Chia-Yu Su

Keyword(s):

Machine Learning ◽

Parameter Estimation ◽

Prediction Model ◽

Sample Size ◽

Dimensional Reduction ◽

Latent Variables ◽

Feature Representation ◽

Estimated Parameters ◽

Representation Technique ◽

Selection Of

Abstract We aimed to provide a resampling protocol for dimensional reduction resulting a few latent variables. The applicability focuses on but not limited for developing a machine learning prediction model in order to improve the number of sample size in relative to the number of candidate predictors. By this feature representation technique, one can improve generalization by preventing latent variables to overfit data used to conduct the dimensional reduction. However, this technique may warrant more computational capacity and time to conduct the procedure. The key stages consisted of derivation of latent variables from multiple resampling subsets, parameter estimation of latent variables in population, and selection of latent variables transformed by the estimated parameters.

Download Full-text

Sparse coding based dense feature representation model for hyperspectral image classification

Journal of Electronic Imaging ◽

10.1117/1.jei.24.6.063009 ◽

2015 ◽

Vol 24 (6) ◽

pp. 063009 ◽

Cited By ~ 5

Author(s):

Ender Oguslu ◽

Guoqing Zhou ◽

Zezhong Zheng ◽

Khan Iftekharuddin ◽

Jiang Li

Keyword(s):

Image Classification ◽

Sparse Coding ◽

Hyperspectral Image ◽

Feature Representation ◽

Hyperspectral Image Classification ◽

Representation Model

Download Full-text

Novel Feature Representation and Machine Learning Methods in Computational Proteomics

Current Proteomics ◽

10.2174/157016461805210924161719 ◽

2021 ◽

Vol 18 (5) ◽

pp. 606-607

Author(s):

Lei Chen

Keyword(s):

Machine Learning ◽

Feature Representation ◽

Learning Methods ◽

Computational Proteomics ◽

Machine Learning Methods

Download Full-text

Leveraging Natural Language Processing Applications Using Machine Learning

Handbook of Research on Emerging Trends and Applications of Machine Learning - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9643-1.ch016 ◽

2020 ◽

pp. 338-360

Author(s):

Janjanam Prabhudas ◽

C. H. Pradeep Reddy

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Feature Representation ◽

Learning Models ◽

Primary Focus ◽

And Performance

The enormous increase of information along with the computational abilities of machines created innovative applications in natural language processing by invoking machine learning models. This chapter will project the trends of natural language processing by employing machine learning and its models in the context of text summarization. This chapter is organized to make the researcher understand technical perspectives regarding feature representation and their models to consider before applying on language-oriented tasks. Further, the present chapter revises the details of primary models of deep learning, its applications, and performance in the context of language processing. The primary focus of this chapter is to illustrate the technical research findings and gaps of text summarization based on deep learning along with state-of-the-art deep learning models for TS.

Download Full-text

Detection of Economy-Related Turkish Tweets Based on Machine Learning Approaches

10.4018/978-1-7998-8413-2.ch008 ◽

2022 ◽

pp. 171-195

Author(s):

Jale Bektaş

Keyword(s):

Machine Learning ◽

Text Mining ◽

Text Classification ◽

Integration Method ◽

Classification Problem ◽

Feature Representation ◽

Learning Approaches ◽

Machine Learning Methods ◽

Linguistic Approach ◽

Turkish Language

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.

Download Full-text

Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation

2019 14th Conference on Industrial and Information Systems (ICIIS) ◽

10.1109/iciis47346.2019.9063341 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sajeetha Thavareesan ◽

Sinnathamby Mahesan

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Feature Representation ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text