Predicting the citation and impact factor of terms for scientific publications using machine learning algorithms

CPT2020 The 8th International Scientific Conference on Computing in Physics and Technology Proceedings ◽

10.30987/conferencearticle_5fd755c0ea6458.82600196 ◽

2020 ◽

Author(s):

Aleksey Klokov ◽

Evgenii Slobodyuk ◽

Michael Charnine

Keyword(s):

Machine Learning ◽

Semantic Processing ◽

The Body ◽

Machine Learning Algorithms ◽

Scientific Publications ◽

Text Data ◽

Semantic Relationships ◽

Subject Areas ◽

The Subject ◽

Scientific Environment

The object of the research when writing the work was the body of text data collected together with the scientific advisor and the algorithms for processing the natural language of analysis. The stream of hypotheses has been tested against computer science scientific publications through a series of simulation experiments described in this dissertation. The subject of the research is algorithms and the results of the algorithms, aimed at predicting promising topics and terms that appear in the course of time in the scientific environment. The result of this work is a set of machine learning models, with the help of which experiments were carried out to identify promising terms and semantic relationships in the text corpus. The resulting models can be used for semantic processing and analysis of other subject areas.

Get full-text (via PubEx)

A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.2.3811 ◽

2020 ◽

Vol 15 (2) ◽

Author(s):

Jia Luo ◽

Dongwen Yu ◽

Zong Dai

Keyword(s):

Machine Learning ◽

Fuzzy Clustering ◽

Latent Dirichlet Allocation ◽

Learning Model ◽

Machine Learning Algorithms ◽

Text Data ◽

Huge Data ◽

Machine Learning Model ◽

N Gram ◽

Dirichlet Allocation

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

Get full-text (via PubEx)

Historical development of linguistics’ bibliography in Russia (1860-2013)

Bibliosphere ◽

10.20913/1815-3186-2016-3-3-8 ◽

2016 ◽

pp. 3-8

Author(s):

V. V. Goncharova

Keyword(s):

Historical Development ◽

The Body ◽

Linguistic Resources ◽

Slavic Linguistics ◽

History Of ◽

Rich Information ◽

Subject Areas ◽

The Subject ◽

Language Norms ◽

Further Development

The interdisciplinary character of the science of language causes great difficulties in bibliographic support in this field. The object of bibliographing in linguistics is not only literature on the language, but also a variety of linguistic resources, which represent a special object to study a branch of linguistics - lexicography. Bibliography of linguistics is the least studied field by specialists among humanitarian bibliographic complexes. The paper first studied the array of domestic bibliographic sources for more than 150 years; the most significant of them are shown. The subject of research is national bibliographic resources in the linguistics field. The objective is to characterize the historical development of the linguistic bibliography in Russia. To achieve this goal we had to solve a number of tasks: identify existing sources for ongoing historical research; to trace the history of forming bibliographic sources, bibliography of bibliographies of linguistics; to form and analyze the body of bibliographic materials; to characterize the problematic areas in the bibliographic software of linguistics. Using the bibliometric analysis it was studied an array of bibliographic products published between 1860 and 2013, the dynamics of bibliographic resources formation was determined, the degree of bibliographic support of some topics and issues in linguistic science and prior directions of their development were revealed. The main results of the study should be considered: 1. The nuclear of fundamental indices on general and applied linguistics is singled out in analyzed literature sources covering the period 1918-1977, as well as in Slavic linguistics for 1825-1981. The complex of current and retrospective bibliographic products was formed and replenished in the country in 1963-1988. 2. The largest share of bibliographic sources in linguistics is presented by book and article bibliography (over 70%), many of which remain bibliographically unrecorded and unused. 3. The following subject areas of linguistics are considered to be bibliographically supported: inter-linguistics, culture of speech and language norms, lexicography, linguistic geography, linguistics regional geography, onomastics. 4. An obvious need to continue the index or database of bibliographic aids in the field of linguistics over the past 50 years is marked. 5. Further development of the linguistics bibliography is impossible to imagine without creating an electronic guide on the bibliographic resources of linguistics, which would reflect the diversity of bibliographical resources and provide their rich information potential for professionals and remote users

Get full-text (via PubEx)

Tremor Identification Using Machine Learning in Parkinson's Disease

Early Detection of Neurological Disorders Using Machine Learning Systems - Advances in Medical Technologies and Clinical Practice ◽

10.4018/978-1-5225-8567-1.ch008 ◽

2019 ◽

pp. 128-151

Author(s):

Angana Saikia ◽

Vinayak Majhi ◽

Masaraf Hussain ◽

Sudip Paul ◽

Amitava Datta

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Support Vector Machine ◽

Parkinson's Disease ◽

Discriminant Analysis ◽

Learning Algorithms ◽

The Body ◽

Machine Learning Algorithms ◽

Support Vector

Tremor is an involuntary quivering movement or shake. Characteristically occurring at rest, the classic slow, rhythmic tremor of Parkinson's disease (PD) typically starts in one hand, foot, or leg and can eventually affect both sides of the body. The resting tremor of PD can also occur in the jaw, chin, mouth, or tongue. Loss of dopamine leads to the symptoms of Parkinson's disease and may include a tremor. For some people, a tremor might be the first symptom of PD. Various studies have proposed measurable technologies and the analysis of the characteristics of Parkinsonian tremors using different techniques. Various machine-learning algorithms such as a support vector machine (SVM) with three kernels, a discriminant analysis, a random forest, and a kNN algorithm are also used to classify and identify various kinds of tremors. This chapter focuses on an in-depth review on identification and classification of various Parkinsonian tremors using machine learning algorithms.

Get full-text (via PubEx)

Recent Progress in Quantum Machine Learning

Limitations and Future Applications of Quantum Cryptography - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-7998-6677-0.ch012 ◽

2021 ◽

pp. 232-256

Author(s):

Amandeep Singh Bhatia ◽

Renata Wong

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Quantum Computing ◽

Recent Progress ◽

Machine Learning Algorithms ◽

Learning Models ◽

Exciting Field ◽

Quantum Machine Learning ◽

The Subject ◽

Machine Learning Models

Quantum computing is a new exciting field which can be exploited to great speed and innovation in machine learning and artificial intelligence. Quantum machine learning at crossroads explores the interaction between quantum computing and machine learning, supplementing each other to create models and also to accelerate existing machine learning models predicting better and accurate classifications. The main purpose is to explore methods, concepts, theories, and algorithms that focus and utilize quantum computing features such as superposition and entanglement to enhance the abilities of machine learning computations enormously faster. It is a natural goal to study the present and future quantum technologies with machine learning that can enhance the existing classical algorithms. The objective of this chapter is to facilitate the reader to grasp the key components involved in the field to be able to understand the essentialities of the subject and thus can compare computations of quantum computing with its counterpart classical machine learning algorithms.

Get full-text (via PubEx)

MARIE: A Context-Aware Term Mapping with String Matching and Embedding Vectors

Applied Sciences ◽

10.3390/app10217831 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7831

Author(s):

Han Kyul Kim ◽

Sae Won Choi ◽

Ye Seul Bae ◽

Jiin Choi ◽

Hyein Kwon ◽

...

Keyword(s):

Machine Learning ◽

Contextual Information ◽

String Matching ◽

Similarity Measures ◽

Mapping Method ◽

Machine Learning Algorithms ◽

Training Data ◽

Context Aware ◽

Text Data ◽

Data Standardization

With growing interest in machine learning, text standardization is becoming an increasingly important aspect of data pre-processing within biomedical communities. As performances of machine learning algorithms are affected by both the amount and the quality of their training data, effective data standardization is needed to guarantee consistent data integrity. Furthermore, biomedical organizations, depending on their geographical locations or affiliations, rely on different sets of text standardization in practice. To facilitate easier machine learning-related collaborations between these organizations, an effective yet practical text data standardization method is needed. In this paper, we introduce MARIE (a context-aware term mapping method with string matching and embedding vectors), an unsupervised learning-based tool, to find standardized clinical terminologies for queries, such as a hospital’s own codes. By incorporating both string matching methods and term embedding vectors generated by BioBERT (bidirectional encoder representations from transformers for biomedical text mining), it utilizes both structural and contextual information to calculate similarity measures between source and target terms. Compared to previous term mapping methods, MARIE shows improved mapping accuracy. Furthermore, it can be easily expanded to incorporate any string matching or term embedding methods. Without requiring any additional model training, it is not only effective, but also a practical term mapping method for text data standardization and pre-processing.

Get full-text (via PubEx)

Future Prediction of Diabetics using XG Booster Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5144.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2128-2132

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

The Body ◽

Machine Learning Algorithms ◽

Support Vector ◽

Common Disease ◽

Data Set ◽

Glucose Content

Diabetes is a most common disease that occurs to most of the humans now a day. The predictions for this disease are proposed through machine learning techniques. Through this method the risk factors of this disease are identified and can be prevented from increasing. Early prediction in such disease can be controlled and save human’s life. For the early predictions of this disease we collect data set having 8 attributes diabetic of 200 patients. The patients’ sugar level in the body is tested by the features of patient’s glucose content in the body and according to the age. The main Machine learning algorithms are Support vector machine (SVM), naive bayes (NB), K nearest neighbor (KNN) and Decision Tree (DT). In the exiting the Naive Bayes the accuracy levels are 66% but in the Decision tree the accuracy levels are 70 to 71%. The accuracy levels of the patients are not proper in range. But in XG boost classifiers even after the Naïve Bayes 74 Percentage and in Decision tree the accuracy levels are 89 to 90%. In the proposed system the accuracy ranges are shown properly and this is only used mostly. A dataset of 729 patients can be stored in Mongo DB and in that 129 patients repots are taken for the prediction purpose and the remaining are used for training. The training datasets are used for the prediction purposes.

Get full-text (via PubEx)

Detection of Cardiac Arrhythmia using Machine Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4249.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 11704-11707

Keyword(s):

Machine Learning ◽

Cardiac Arrhythmia ◽

Sinus Node ◽

Research Work ◽

Heart Rhythm ◽

The Body ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Medical Attention ◽

Advantages And Disadvantages

Cardiac Arrhythmia is a type of condition a human being suffers from abnormal heart rhythm. This is experienced due to the malfunctioning of electrical impulses that coordinate the heartbeat. When this happens the heartbeats slow/ fast more precisely irregularly. The rhythm of the heart is controlled by a major node called the sinus node which is present at the top of the heart, triggers the electrical pulses which make the heart to beat and pumping of blood to the body. Some of the symptoms of Cardiac Arrhythmia are fainting, unconsciousness, shortness of breath, unexpected functioning of the heart. It leads to death in minutes if medical attention is not provided. To diagnose this doctor, require to study the heart recordings evaluate heartbeats from different parts of the body accurately. It takes a lot of time to evaluate so based on the research work contributed in this field we try to propose a different approach to the same. In this paper, we compare different machine learning techniques and algorithms proposed by different authors and understand the advantages and disadvantages of the system and to bring a new system in place of the existing system where all have used the same ECG recordings from the same database of MIT-BIH. With the initial research work done by us we found out that the use of Phonocardiogram Recordings (PCG) provides more fidelity and accurate compared to ECG recordings. With the initial stage of work, we take the PCG recordings dataset and convert it to a spectrogram image and apply a convolutional neural network to predict the normal or abnormal heartbeat

Get full-text (via PubEx)

Combining wearable sensor signals, machine learning and biomechanics to estimate tibial bone force and damage during running

10.31236/osf.io/vesh3 ◽

2020 ◽

Author(s):

Emily Matijevich ◽

Leon R. Scott ◽

Peter Volgyesi ◽

Kendall H. Derry ◽

Karl Zelik

Keyword(s):

Machine Learning ◽

Injury Risk ◽

Wearable Sensors ◽

Reaction Force ◽

The Body ◽

Machine Learning Algorithms ◽

Peak Force ◽

Tibial Bone ◽

Bone Damage ◽

Damage Estimates

There are tremendous opportunities to advance science, clinical care, sports performance, and societal health if we are able to develop tools for monitoring musculoskeletal loading (e.g., forces on bones or muscles) outside the lab. While wearable sensors enable non-invasive monitoring of human movement in applied situations, current commercial wearables do not estimate tissue-level loading on structures inside the body. Here we explore the feasibility of using wearable sensors to estimate tibial bone force during running. First, we used lab-based data and musculoskeletal modeling to estimate tibial force for ten participants running across a range of speeds and slopes. Next, we converted lab-based data to signals feasibly measured with wearables (inertial measurement units on the foot and shank, and a pressure-insole) and used these data to develop two multi-sensor algo rithms for estimating peak tibial force: one physics-based and one machine learning. Additionally, to reflect current running wearables that utilize foot impact metrics to infer musculoskeletal loading or injury risk, we estimated tibial force using the ground reaction force vertical average loading rate (VALR). Using VALR to estimate peak tibial force resulted in a mean absolute percent error of 9.9%, which was no more accurate than a theoretical step counter that assumed the same peak force for every running step. Our physics-based algorithm reduced error to 5.2%, and our machine learning algorithm reduced error to 2.6%. Further, to gain insights into how force estimation accuracy relates to overuse injury risk, we computed bone damage expected due to peak force. We found that modest errors in tibial force translated into large errors in bone damage estimates. For example, a 9.9% error in tibial force using VALR translated into 104% error in bone damage estimates. Encouragingly, the physics-based and machine learning algorithms reduced damage errors to 41% and 18%, respectively. This study highlights the exciting potential to combine wearables, musculoskeletal biomechanics and machine learning to develop more accurate tools for monitoring musculoskeletal loading in applied situations.

Get full-text (via PubEx)

Comparing Person-Specific and Independent Models on Subject-Dependent and Independent Human Activity Recognition Performance

Sensors ◽

10.3390/s20133647 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3647

Author(s):

Sebastian Scheurer ◽

Salvatore Tedesco ◽

Brendan O’Flynn ◽

Kenneth N. Brown

Keyword(s):

Machine Learning ◽

Activity Recognition ◽

Human Activity ◽

Recognition Performance ◽

Human Activity Recognition ◽

Machine Learning Algorithms ◽

Specific Target ◽

Single Person ◽

Target User ◽

The Subject

The distinction between subject-dependent and subject-independent performance is ubiquitous in the human activity recognition (HAR) literature. We assess whether HAR models really do achieve better subject-dependent performance than subject-independent performance, whether a model trained with data from many users achieves better subject-independent performance than one trained with data from a single person, and whether one trained with data from a single specific target user performs better for that user than one trained with data from many. To those ends, we compare four popular machine learning algorithms’ subject-dependent and subject-independent performances across eight datasets using three different personalisation–generalisation approaches, which we term person-independent models (PIMs), person-specific models (PSMs), and ensembles of PSMs (EPSMs). We further consider three different ways to construct such an ensemble: unweighted, κ -weighted, and baseline-feature-weighted. Our analysis shows that PSMs outperform PIMs by 43.5% in terms of their subject-dependent performances, whereas PIMs outperform PSMs by 55.9% and κ -weighted EPSMs—the best-performing EPSM type—by 16.4% in terms of the subject-independent performance.

Get full-text (via PubEx)

A Smart Architecture for Diabetic Patient Monitoring Using Machine Learning Algorithms

Healthcare ◽

10.3390/healthcare8030348 ◽

2020 ◽

Vol 8 (3) ◽

pp. 348

Author(s):

Amine Rghioui ◽

Jaime Lloret ◽

Sandra Sendra ◽

Abdelmajid Oumnad

Keyword(s):

Machine Learning ◽

Intelligent System ◽

Learning Algorithms ◽

Communication Technologies ◽

High Energy ◽

The Body ◽

Machine Learning Algorithms ◽

Smart Devices ◽

Diabetic Patients ◽

Intelligent Healthcare

Continuous monitoring of diabetic patients improves their quality of life. The use of multiple technologies such as the Internet of Things (IoT), embedded systems, communication technologies, artificial intelligence, and smart devices can reduce the economic costs of the healthcare system. Different communication technologies have made it possible to provide personalized and remote health services. In order to respond to the needs of future intelligent e-health applications, we are called to develop intelligent healthcare systems and expand the number of applications connected to the network. Therefore, the 5G network should support intelligent healthcare applications, to meet some important requirements such as high bandwidth and high energy efficiency. This article presents an intelligent architecture for monitoring diabetic patients by using machine learning algorithms. The architecture elements included smart devices, sensors, and smartphones to collect measurements from the body. The intelligent system collected the data received from the patient, and performed data classification using machine learning in order to make a diagnosis. The proposed prediction system was evaluated by several machine learning algorithms, and the simulation results demonstrated that the sequential minimal optimization (SMO) algorithm gives superior classification accuracy, sensitivity, and precision compared to other algorithms.

Get full-text (via PubEx)