Protein Entity Name Recognition Using Orthographic, Morphological and Proteinhood Features

Author(s):  
Sagara Sumathipala ◽  
◽  
Koichi Yamada ◽  
Muneyuki Unehara ◽  
Izumi Suzuki

Protein name identification in text is an important and challenging fundamental precursor in biomedical information processing. For example, accurate identification of protein names affects the finding of protein-protein interactions from biomedical literature. In this paper, we present an efficient protein name identification technique based on a rich set of features: orthographic, morphological as well as Proteinhood features which are introduced newly in this study. The method was evaluated on GENIA corpus with the use of different machine learning algorithms. The highest values for precision 92.1%, recall 86.5% and F-measure 89.2% were achieved on Random Forest, while reducing the training and testing time significantly. We studied and showed the impact of the Proteinhood feature in protein identification as well as the effect of tuning the parameters of the machine learning algorithm.

2021 ◽  
Author(s):  
Liu Juan ◽  
Hayat Ali ◽  
Zihyi Yang ◽  
xiaolie Zhang ◽  
Jing Feng

Abstract Machine learning algorithms provide significant indications in metabolomics to predict chemical compounds in metabolic pathways and in their modules. The modules in the metabolic pathway are sub networks of functionally related genes based on rules such as protein-protein interactions, co-regulated expression, coordinated physiological activity, and successive reaction steps. Fully functional modules are helpful to improve the diseases process, drug discover, and prediction of missing reaction. All modules in the metabolic pathway are not functional due to missing reaction steps. The structural mapping of chemical compounds with the pathway module is helpful to understand the mechanism of prediction unknown reaction step. The main purpose of this paper to predict the chemical compounds in pathway modules and their classes. We have constructed binary and multi-label classification data sets to predict pathway module and module classes, respectively. In order to identify the pathway module and its classes, we have built an ensemble Extra trees classifier to learn the molecular and atomic properties of chemical compounds. We have also experimented with different ensemble machine learning algorithm for the prediction of pathway modules. The overall prediction rate of the classifier 98.59%, indicating extra tree classifier features are more interpretable and have a high predictive performance on various tasks.


Hoax news on social media has had a dramatic effect on our society in recent years. The impact of hoax news felt by many people, anxiety, financial loss, and loss of the right name. Therefore we need a detection system that can help reduce hoax news on social media. Hoax news classification is one of the stages in the construction of a hoax news detection system, and this unsupervised learning algorithm becomes a method for creating hoax news datasets, machine learning tools for data processing, and text processing for detecting data. The next will produce a classification of a hoax or not a Hoax based on the text inputted. Hoax news classification in this study uses five algorithms, namely Support Vector Machine, Naïve Bayes, Decision Tree, Logistic Regression, Stochastic Gradient Descent, and Neural Network (MLP). These five algorithms to produce the best algorithm that can use to detect hoax news, with the highest parameters, accuracy, F-measure, Precision, and recall. From the results of testing conducted on five classification algorithms produced shows that the NN-MPL algorithm has an average of 93% for the value of accuracy, F-Measure, and Precision, the highest compared to five other algorithms, but for the highest Recall value generated from the algorithm SVM which is 94%. the results of this experiment show that different effects for different classifiers, and that means that the more hoax data used as training data, the more accurate the system calculates accuracy in more detail.


2020 ◽  
pp. 1-12
Author(s):  
Nan Lin

Our country’s economic growth is overly dependent on government investment, and bank credit and money supply lack a strict monitoring mechanism. Therefore, rapid economic growth is always accompanied by inflation risks. In order to study the effect of inflation impact analysis, based on machine learning algorithms, this paper combines artificial intelligence technology to analyze the impact of inflation expectations, and constructs the central bank information disclosure index and inflation expectations index. Moreover, this paper will perform ADF unit root test on the data. In addition, after confirming that the data is stable, this paper uses the Markov Regime Transfer Vector Autoregressive (MSVAR) model and state-dependent impulse response function to test and analyze the effect of China’s central bank communication in guiding the formation of inflation expectations. Through research, we can see that the machine learning algorithm constructed in this paper has significant effects, which can provide a reference for the analysis of the impact of inflation expectations.


Author(s):  
Serhii Yevseiev ◽  
Anna Goloskokova ◽  
Olexander Shmatko

This article investigated the problem of using machine learning algorithms to recognize and identify a user in a video sequence. The scientific novelty lies in the proposed improved Viola-Jones method, which will allow more efficient and faster recognition of a person's face. The practical value of the results obtained in the work is determined by the possibility of using the proposed method to create systems for human face recognition. A review of existing methods of face recognition, their main characteristics, architecture and features was carried out. Based on the study of methods and algorithms for finding faces in images, the Viola-Jones method, wavelet transform and the method of principal components were chosen. These methods are among the best in terms of the ratio of recognition efficiency and work speed. Possible modifications of the Viola-Jones method are presented. The main contribution presented in this article is an experimental study of the impact of various types of noise and the improvement of company security through the development of a computer system for recognizing and identifying users in a video sequence. During the study, the following tasks were solved: – a model of face recognition is proposed, that is, the system automatically detects a person's face in the image (scanned photos or video materials); – an algorithm for analyzing a face is proposed, that is, a representation of a person's face in the form of 68 modal points; – an algorithm for creating a digital fingerprint of a face, which converts the results of facial analysis into a digital code; – development of a match search module, that is, the module compares the faceprint with the database until a match is found


2020 ◽  
Vol 75 (9) ◽  
pp. 2677-2680 ◽  
Author(s):  
Ed Moran ◽  
Esther Robinson ◽  
Christopher Green ◽  
Matt Keeling ◽  
Benjamin Collyer

Abstract Background Electronic decision support systems could reduce the use of inappropriate or ineffective empirical antibiotics. We assessed the accuracy of an open-source machine-learning algorithm trained in predicting antibiotic resistance for three Gram-negative bacterial species isolated from patients’ blood and urine within 48 h of hospital admission. Methods This retrospective, observational study used routine clinical information collected between January 2010 and October 2016 in Birmingham, UK. Patients from whose blood or urine cultures Escherichia coli, Klebsiella pneumoniae or Pseudomonas aeruginosa was isolated were identified. Their demographic, microbiology and prescribing data were used to train an open-source machine-learning algorithm—XGBoost—in predicting resistance to co-amoxiclav and piperacillin/tazobactam. Multivariate analysis was performed to identify predictors of resistance and create a point-scoring tool. The performance of both methods was compared with that of the original prescribers. Results There were 15 695 admissions. The AUC of the receiver operating characteristic curve for the point-scoring tools ranged from 0.61 to 0.67, and performed no better than medical staff in the selection of appropriate antibiotics. The machine-learning system performed statistically but marginally better (AUC 0.70) and could have reduced the use of unnecessary broad-spectrum antibiotics by as much as 40% among those given co-amoxiclav, piperacillin/tazobactam or carbapenems. A validation study is required. Conclusions Machine-learning algorithms have the potential to help clinicians predict antimicrobial resistance in patients found to have a Gram-negative infection of blood or urine. Prospective studies are required to assess performance in an unselected patient cohort, understand the acceptability of such systems to clinicians and patients, and assess the impact on patient outcome.


Computers ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 113
Author(s):  
James Coe ◽  
Mustafa Atay

The research aims to evaluate the impact of race in facial recognition across two types of algorithms. We give a general insight into facial recognition and discuss four problems related to facial recognition. We review our system design, development, and architectures and give an in-depth evaluation plan for each type of algorithm, dataset, and a look into the software and its architecture. We thoroughly explain the results and findings of our experimentation and provide analysis for the machine learning algorithms and deep learning algorithms. Concluding the investigation, we compare the results of two kinds of algorithms and compare their accuracy, metrics, miss rates, and performances to observe which algorithms mitigate racial bias the most. We evaluate racial bias across five machine learning algorithms and three deep learning algorithms using racially imbalanced and balanced datasets. We evaluate and compare the accuracy and miss rates between all tested algorithms and report that SVC is the superior machine learning algorithm and VGG16 is the best deep learning algorithm based on our experimental study. Our findings conclude the algorithm that mitigates the bias the most is VGG16, and all our deep learning algorithms outperformed their machine learning counterparts.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2020 ◽  
pp. 1-11
Author(s):  
Jie Liu ◽  
Lin Lin ◽  
Xiufang Liang

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 656
Author(s):  
Xavier Larriva-Novo ◽  
Víctor A. Villagrá ◽  
Mario Vega-Barbas ◽  
Diego Rivera ◽  
Mario Sanz Rodrigo

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.


Sign in / Sign up

Export Citation Format

Share Document