“Are we tweeting our real selves?” personality prediction of Indian Twitter users using deep learning ensemble model

2021 ◽  
pp. 107101
Author(s):  
Rhea Mahajan ◽  
Remia Mahajan ◽  
Eishita Sharma ◽  
Vibhakar Mansotra
Author(s):  
Ahmet Haşim Yurttakal ◽  
Hasan Erbay ◽  
Türkan İkizceli ◽  
Seyhan Karaçavuş ◽  
Cenker Biçer

Breast cancer is the most common cancer that progresses from cells in the breast tissue among women. Early-stage detection could reduce death rates significantly, and the detection-stage determines the treatment process. Mammography is utilized to discover breast cancer at an early stage prior to any physical sign. However, mammography might return false-negative, in which case, if it is suspected that lesions might have cancer of chance greater than two percent, a biopsy is recommended. About 30 percent of biopsies result in malignancy that means the rate of unnecessary biopsies is high. So to reduce unnecessary biopsies, recently, due to its excellent capability in soft tissue imaging, Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) has been utilized to detect breast cancer. Nowadays, DCE-MRI is a highly recommended method not only to identify breast cancer but also to monitor its development, and to interpret tumorous regions. However, in addition to being a time-consuming process, the accuracy depends on radiologists’ experience. Radiomic data, on the other hand, are used in medical imaging and have the potential to extract disease characteristics that can not be seen by the naked eye. Radiomics are hard-coded features and provide crucial information about the disease where it is imaged. Conversely, deep learning methods like convolutional neural networks(CNNs) learn features automatically from the dataset. Especially in medical imaging, CNNs’ performance is better than compared to hard-coded features-based methods. However, combining the power of these two types of features increases accuracy significantly, which is especially critical in medicine. Herein, a stacked ensemble of gradient boosting and deep learning models were developed to classify breast tumors using DCE-MRI images. The model makes use of radiomics acquired from pixel information in breast DCE-MRI images. Prior to train the model, radiomics had been applied to the factor analysis to refine the feature set and eliminate unuseful features. The performance metrics, as well as the comparisons to some well-known machine learning methods, state the ensemble model outperforms its counterparts. The ensembled model’s accuracy is 94.87% and its AUC value is 0.9728. The recall and precision are 1.0 and 0.9130, respectively, whereas F1-score is 0.9545.


Author(s):  
Yu Zhu

The objective is to predict and analyze the behaviors of users in the social network platform by using the personality theory and computational technologies, thereby acquiring the personality characteristics of social network users more effectively. First, social network data are analyzed, which finds that the type of text data marks the majority. By using data mining technology, the raw data of numerous social network users can be obtained. Based on the random walk model, the data information of the text status of social network users is analyzed, and a user personality prediction method integrating multi-label learning is proposed. In addition, the online social network platform Weibo is taken as the research object. The blog information of Weibo users is obtained through crawler technology. Then, the users are labeled in accordance with personality characteristics. The Pearson correlation coefficient is used to evaluate the relation between the user personality characteristics and the user behavior characteristics of the Weibo users. The correlation between the network behaviors and personality characteristics of Weibo users is analyzed, and the scientificity of the prediction method is verified by the Big Five Model of Personality. By applying relevant technologies and algorithms of data mining and deep learning, the learning ability of neural networks on data characteristics can be improved. In terms of performance on analyzing text information of social network users, the user personality prediction method of integrated multi-label learning based on the random walk model has a large advantage. For the problem of personality prediction of social network users, through combining data mining technology and deep neural network technology in deep learning, the data processing results of social network user behaviors are more accurate.


Energies ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 3020
Author(s):  
Anam-Nawaz Khan ◽  
Naeem Iqbal ◽  
Atif Rizwan ◽  
Rashid Ahmad ◽  
Do-Hyeun Kim

Due to the availability of smart metering infrastructure, high-resolution electric consumption data is readily available to study the dynamics of residential electric consumption at finely resolved spatial and temporal scales. Analyzing the electric consumption data enables the policymakers and building owners to understand consumer’s demand-consumption behaviors. Furthermore, analysis and accurate forecasting of electric consumption are substantial for consumer involvement in time-of-use tariffs, critical peak pricing, and consumer-specific demand response initiatives. Alongside its vast economic and sustainability implications, such as energy wastage and decarbonization of the energy sector, accurate consumption forecasting facilitates power system planning and stable grid operations. Energy consumption forecasting is an active research area; despite the abundance of devised models, electric consumption forecasting in residential buildings remains challenging due to high occupant energy use behavior variability. Hence the search for an appropriate model for accurate electric consumption forecasting is ever continuing. To this aim, this paper presents a spatial and temporal ensemble forecasting model for short-term electric consumption forecasting. The proposed work involves exploring electric consumption profiles at the apartment level through cluster analysis based on the k-means algorithm. The ensemble forecasting model consists of two deep learning models; Long Short-Term Memory Unit (LSTM) and Gated Recurrent Unit (GRU). First, the apartment-level historical electric consumption data is clustered. Later the clusters are aggregated based on consumption profiles of consumers. At the building and floor level, the ensemble models are trained using aggregated electric consumption data. The proposed ensemble model forecasts the electric consumption at three spatial scales apartment, building, and floor level for hourly, daily, and weekly forecasting horizon. Furthermore, the impact of spatial-temporal granularity and cluster analysis on the prediction accuracy is analyzed. The dataset used in this study comprises high-resolution electric consumption data acquired through smart meters recorded on an hourly basis over the period of one year. The consumption data belongs to four multifamily residential buildings situated in an urban area of South Korea. To prove the effectiveness of our proposed forecasting model, we compared our model with widely known machine learning models and deep learning variants. The results achieved by our proposed ensemble scheme verify that model has learned the sequential behavior of electric consumption by producing superior performance with the lowest MAPE of 4.182 and 4.54 at building and floor level prediction, respectively. The experimental findings suggest that the model has efficiently captured the dynamic electric consumption characteristics to exploit ensemble model diversities and achieved lower forecasting error. The proposed ensemble forecasting scheme is well suited for predictive modeling and short-term load forecasting.


Author(s):  
Mohammad Shorfuzzaman ◽  
M. Shamim Hossain ◽  
Abdulmotaleb El Saddik

Diabetic retinopathy (DR) is one of the most common causes of vision loss in people who have diabetes for a prolonged period. Convolutional neural networks (CNNs) have become increasingly popular for computer-aided DR diagnosis using retinal fundus images. While these CNNs are highly reliable, their lack of sufficient explainability prevents them from being widely used in medical practice. In this article, we propose a novel explainable deep learning ensemble model where weights from different models are fused into a single model to extract salient features from various retinal lesions found on fundus images. The extracted features are then fed to a custom classifier for the final diagnosis of DR severity level. The model is trained on an APTOS dataset containing retinal fundus images of various DR grades using a cyclical learning rates strategy with an automatic learning rate finder for decaying the learning rate to improve model accuracy. We develop an explainability approach by leveraging gradient-weighted class activation mapping and shapely adaptive explanations to highlight the areas of fundus images that are most indicative of different DR stages. This allows ophthalmologists to view our model's decision in a way that they can understand. Evaluation results using three different datasets (APTOS, MESSIDOR, IDRiD) show the effectiveness of our model, achieving superior classification rates with a high degree of precision (0.970), sensitivity (0.980), and AUC (0.978). We believe that the proposed model, which jointly offers state-of-the-art diagnosis performance and explainability, will address the black-box nature of deep CNN models in robust detection of DR grading.


2018 ◽  
Vol 33 (1) ◽  
pp. A-H51_1-11 ◽  
Author(s):  
Koichiro Tamura ◽  
Katsuya Uenoyama ◽  
Shuhei Iitsuka ◽  
Yutaka Matsuo

2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Juhong Namgung ◽  
Siwoon Son ◽  
Yang-Sae Moon

In recent years, cyberattacks using command and control (C&C) servers have significantly increased. To hide their C&C servers, attackers often use a domain generation algorithm (DGA), which automatically generates domain names for the C&C servers. Accordingly, extensive research on DGA domain detection has been conducted. However, existing methods cannot accurately detect continuously generated DGA domains and can easily be evaded by an attacker. Recently, long short-term memory- (LSTM-) based deep learning models have been introduced to detect DGA domains in real time using only domain names without feature extraction or additional information. In this paper, we propose an efficient DGA domain detection method based on bidirectional LSTM (BiLSTM), which learns bidirectional information as opposed to unidirectional information learned by LSTM. We further maximize the detection performance with a convolutional neural network (CNN) + BiLSTM ensemble model using Attention mechanism, which allows the model to learn both local and global information in a domain sequence. Experimental results show that existing CNN and LSTM models achieved F1-scores of 0.9384 and 0.9597, respectively, while the proposed BiLSTM and ensemble models achieved higher F1-scores of 0.9618 and 0.9666, respectively. In addition, the ensemble model achieved the best performance for most DGA domain classes, enabling more accurate DGA domain detection than existing models.


Author(s):  
Yiwei Li ◽  
G Brian Golding ◽  
Lucian Ilie

Abstract Motivation Proteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein–protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods. Results We propose DEep Learning Prediction of Highly probable protein Interaction sites (DELPHI), a new sequence-based deep learning suite for PPI-binding sites prediction. DELPHI has an ensemble structure which combines a CNN and a RNN component with fine tuning technique. Three novel features, HSP, position information and ProtVec are used in addition to nine existing ones. We comprehensively compare DELPHI to nine state-of-the-art programmes on five datasets, and DELPHI outperforms the competing methods in all metrics even though its training dataset shares the least similarities with the testing datasets. In the most important metrics, AUPRC and MCC, it surpasses the second best programmes by as much as 18.5% and 27.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model and, especially, the three new features. Using DELPHI it is shown that there is a strong correlation with protein-binding residues (PBRs) and sites with strong evolutionary conservation. In addition, DELPHI’s predicted PBR sites closely match known data from Pfam. DELPHI is available as open-sourced standalone software and web server. Availability and implementation The DELPHI web server can be found at delphi.csd.uwo.ca/, with all datasets and results in this study. The trained models, the DELPHI standalone source code, and the feature computation pipeline are freely available at github.com/lucian-ilie/DELPHI. Supplementary information Supplementary data are available at Bioinformatics online.


Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1942
Author(s):  
Pyae Pyae Phyo ◽  
Yung-Cheol Byun

The energy manufacturers are required to produce an accurate amount of energy by meeting the energy requirements at the end-user side. Consequently, energy prediction becomes an essential role in the electric industrial zone. In this paper, we propose the hybrid ensemble deep learning model, which combines multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM), and hybrid CNN-LSTM to improve the forecasting performance. These DL architectures are more popular and better than other machine learning (ML) models for time series electrical load prediction. Therefore, hourly-based energy data are collected from Jeju Island, South Korea, and applied for forecasting. We considered external features associated with meteorological conditions affecting energy. Two-year training and one-year testing data are preprocessed and arranged to reform the times series, which are then trained in each DL model. The forecasting results of the proposed ensemble model are evaluated by using mean square error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Error metrics are compared with DL stand-alone models such as MLP, CNN, LSTM, and CNN-LSTM. Our ensemble model provides better performance than other forecasting models, providing minimum MAPE at 0.75%, and was proven to be inherently symmetric for forecasting time-series energy and demand data, which is of utmost concern to the power system sector.


Personality has been important for a number of types of cooperation; it has useful in predicting job achievement, expert and emotional relationship achievement, and even tendency towards a variety of interfaces. To accurately examine the characters of users, a personality test must be carried out. In numerous areas of online life it is usually impractical to use character research. . We used SVM classification, Random Forest algorithm, Naïve Bayes Algorithm and Logistic regression to comparatively predict the user’s personality accurately. The main goal of the paper is to evaluate the machine learning models using the four parameters- accuracy, precision, recall, f1 score and basing upon these parameters the best machine learning model will be used to classify the big five personality traits of the twitter users.


Sign in / Sign up

Export Citation Format

Share Document