scholarly journals ECG Signal as Robust and Reliable Biometric Marker: Datasets and Algorithms Comparison

Sensors ◽  
2019 ◽  
Vol 19 (10) ◽  
pp. 2350 ◽  
Author(s):  
Mariusz Pelc ◽  
Yuriy Khoma ◽  
Volodymyr Khoma

In this paper, the possibility of using the ECG signal as an unequivocal biometric marker for authentication and identification purposes has been presented. Furthermore, since the ECG signal was acquired from 4 sources using different measurement equipment, electrodes positioning and number of patients as well as the duration of the ECG record acquisition, we have additionally provided an estimation of the extent of information available in the ECG record. To provide a more objective assessment of the credibility of the identification method, some selected machine learning algorithms were used in two combinations: with and without compression. The results that we have obtained confirm that the ECG signal can be acclaimed as a valid biometric marker that is very robust to hardware variations, noise and artifacts presence, that is stable over time and that is scalable across quite a solid (~100) number of users. Our experiments indicate that the most promising algorithms for ECG identification are LDA, KNN and MLP algorithms. Moreover, our results show that PCA compression, used as part of data preprocessing, does not only bring any noticeable benefits but in some cases might even reduce accuracy.

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Qi Shao ◽  
Yongming Xu ◽  
Hanyi Wu

COVID-19 has swept through the world since December 2019 and caused a large number of patients and deaths. Spatial prediction on the spread of the epidemic is greatly important for disease control and management. In this study, we predicted the cumulative confirmed cases (CCCs) from Jan 17 to Mar 1, 2020, in mainland China at the city level, using machine learning algorithms, geographically weighted regression (GWR), and partial least squares regression (PLSR) based on population flow, geolocation, meteorological, and socioeconomic variables. The validation results showed that machine learning algorithms and GWR achieved good performances. These models could not effectively predict CCCs in Wuhan, the first city that reported COVID-19 cases in China, but performed well in other cities. Random Forest (RF) outperformed other methods with a CV ‐ R 2 of 0.84. In this model, the population flow from Wuhan to other cities (WP) was the most important feature and the other features also made considerable contributions to the prediction accuracy. Compared with RF, GWR showed a slightly worse performance ( CV ‐ R 2 = 0.81 ) but required fewer spatial independent variables. This study explored the spatial prediction of the epidemic based on multisource spatial independent variables, providing references for the estimation of CCCs in the regions lacking accurate and timely.


Author(s):  
Walaa Alkady ◽  
Muhammad Zanaty ◽  
Heba M. Afify

Abstract The coronavirus infection is increasingly evolving to be an international epidemic in 27 countries as a serious respiratory disease. Therefore, the computational biology carrying this virus that correlated with the human population is urgently needed. In this paper, the classification of the human protein sequences of COVID-19 according to the country is applied by machine learning algorithms. The proposed model is based on the distinguishing of 9238 sequences by three stages including data preprocessing, data labeling, and classification. In the first stage, the function of data preprocessing converts the amino acids of COVID-19 protein sequences to eight groups of numbers based on volume and dipole of the amino acids. In the second stage, there are two methods for data labeling of 27 countries from 0 to 26. The first method is based on the selection of one number for each country according to the code number of countries while the second method is based on binary elements only for each country. The classification algorithms are executed to discover different COVID-19 protein sequences according to their countries. The findings are concluded that the accuracy of 100% performed by country based binary labeling method with Linear Regression (LR) or K-Nearest Neighbor (KNN) or Support Vector Machine (SVM) classifiers. Further, it found that the USA with large data records in infection rate has more priority for correct classification compared to other countries with a low data rate. The unbalanced data for COVID-19 protein sequences is considered a major issue, especially the available data in USA represented 76% from a total of 9238 sequences. As a consequence, this proposed model will help as a diagnostic bioinformatics tool for the COVID-19 protein sequences among different countries.


2021 ◽  
Vol 11 (21) ◽  
pp. 10442
Author(s):  
Karlo Babić ◽  
Milan Petrović ◽  
Slobodan Beliga ◽  
Sanda Martinčić-Ipšić ◽  
Mihaela Matešić ◽  
...  

This study aims to provide insights into the COVID-19-related communication on Twitter in the Republic of Croatia. For that purpose, we developed an NL-based framework that enables automatic analysis of a large dataset of tweets in the Croatian language. We collected and analysed 206,196 tweets related to COVID-19 and constructed a dataset of 10,000 tweets which we manually annotated with a sentiment label. We trained the Cro-CoV-cseBERT language model for the representation and clustering of tweets. Additionally, we compared the performance of four machine learning algorithms on the task of sentiment classification. After identifying the best performing setup of NLP methods, we applied the proposed framework in the task of characterisation of COVID-19 tweets in Croatia. More precisely, we performed sentiment analysis and tracked the sentiment over time. Furthermore, we detected how tweets are grouped into clusters with similar themes across three pandemic waves. Additionally, we characterised the tweets by analysing the distribution of sentiment polarity (in each thematic cluster and over time) and the number of retweets (in each thematic cluster and sentiment class). These results could be useful for additional research and interpretation in the domains of sociology, psychology or other sciences, as well as for the authorities, who could use them to address crisis communication problems.


Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6636
Author(s):  
Dylan den Hartog ◽  
Jaap Harlaar ◽  
Gerwin Smit

Stumbling during gait is commonly encountered in patients who suffer from mild to serious walking problems, e.g., after stroke, in osteoarthritis, or amputees using a lower leg prosthesis. Instead of self-reporting, an objective assessment of the number of stumbles in daily life would inform clinicians more accurately and enable the evaluation of treatments that aim to achieve a safer walking pattern. An easy-to-use wearable might fulfill this need. The goal of the present study was to investigate whether a single inertial measurement unit (IMU) placed at the shank and machine learning algorithms could be used to detect and classify stumbling events in a dataset comprising of a wide variety of daily movements. Ten healthy test subjects were deliberately tripped by an unexpected and unseen obstacle while walking on a treadmill. The subjects stumbled a total of 276 times, both using an elevating recovery strategy and a lowering recovery strategy. Subjects also performed multiple Activities of Daily Living. During data processing, an event-defined window segmentation technique was used to trace high peaks in acceleration that could potentially be stumbles. In the reduced dataset, time windows were labelled with the aid of video annotation. Subsequently, discriminative features were extracted and fed to train seven different types of machine learning algorithms. Trained machine learning algorithms were validated using leave-one-subject-out cross-validation. Support Vector Machine (SVM) algorithms were most successful, and could detect and classify stumbles with 100% sensitivity, 100% specificity, and 96.7% accuracy in the independent testing dataset. The SVM algorithms were implemented in a user-friendly, freely available, stumble detection app named Stumblemeter. This work shows that stumble detection and classification based on SVM is accurate and ready to apply in clinical practice.


Author(s):  
Dylan den Hartog ◽  
Jaap Harlaar ◽  
Gerwin Smit

Stumbling during gait is commonly encountered in patients who suffer from mild to serious walking problems, e.g. after stroke, in osteoarthritis, or amputees using a lower leg prosthesis. Instead of self-reporting, an objective assessment of the amount of stumbles in daily life would inform clinicians more accurately and enable the evaluation of treatments that aim to achive a safer walking pattern. An easy to use wearable might fullfill this need. The goal of the present study was to investigate whether a single inertial measurement unit (IMU) placed at the shank and machine learning algorithms could be used to detect and classify stumbling events in a dataset comprising of a wide variety of daily movements. Ten healthy test subjects were deliberately tripped by an unexpected and unseen obstacle while walking on a treadmill. The subjects stumbled a total of 276 times, both using an elevating recovery strategy and a lowering recovery strategy. Subjects also performed multiple Activities of Daily Living. During data processing, an event-defined window segmentation technique was used to trace high peaks in acceleration which could potentially be stumbles. In the reduced dataset, time windows were labelled with the aid of video annotation. Subsequently, discriminative features were extracted and fed to train seven different types of machine learning algorithms. Trained machine learning algorithms were validated using leave-one-subject-out cross-validation. Support Vector Machine (SVM) algorithms were most succesful, and could detect and classify stumbles with 100% sensitivity, 100% specificity and, 96.7% accuracy, in the independent testing dataset. The SVM algorithms were implemented in a user-friendly, freely available, stumble detection app named Stumblemeter. This work shows that stumble detection and classification based on SVMs is accurate and ready to apply in clinical practise.


2020 ◽  
Author(s):  
Heba M. Afify ◽  
Muhammad S. Zanaty

Abstract The coronavirus infection is increasingly evolving to be an international epidemic in 27 countries as a serious respiratory disease. Therefore, the computational biology carrying this virus that correlated with the human population is urgently needed. In this paper, the classification of the human protein sequences of COVID-19 according to the country is applied by machine learning algorithms. The proposed model is based on the distinguishing of 9238 sequences by three stages including data preprocessing, data labeling, and classification. In the first stage, the function of data preprocessing converts the amino acids of COVID-19 protein sequences to eight groups of numbers based on volume and dipole of the amino acids. In the second stage, there are two methods for data labeling of 27 countries from 0 to 26. The first method is based on the selection of one number for each country according to the code number of countries while the second method is based on binary elements only for each country. The classification algorithms are executed to discover different COVID-19 protein sequences according to their countries. The findings are concluded that the accuracy of 100% performed by country based binary labeling method with Linear Regression (LR) or K-Nearest Neighbor (KNN) or Support Vector Machine (SVM) classifiers. Further, it found that the USA with large data records in infection rate has more priority for correct classification compared to other countries with a low data rate. The unbalanced data for COVID-19 protein sequences is considered a major issue, especially the available data in USA represented 76% from a total of 9238 sequences. As a consequence, this proposed model will help as a diagnostic bioinformatics tool for the COVID-19 protein sequences among different countries.


Author(s):  
Andy Zeng ◽  
Connor Brenna ◽  
Silvio Ndoja

Background: The number of unmatched Canadian Medical Graduates (CMGs) has risen dramatically over the last decade. To identify long-term solutions to this problem, an understanding of the factors contributing to these rising unmatched rates is critical.  Methods: Using match and electives data from 2009-2019, we employed machine learning algorithms to identify three clusters of disciplines with distinct trends in match and electives behaviours. We assessed the relationships between unmatched rates, competitiveness, rates of parallel planning, and program selection practices at a discipline level.  Results: Across Canada, growth in CMGs has outpaced growth in residency seats, narrowing the seat-to-applicant ratio. Yet not all disciplines have been affected equally - a subset of surgical disciplines experienced a consistent decline in residency seats over time. Applicants to these disciplines are also at disproportionate risk of becoming unmatched, and this is associated with lower rates of parallel planning as quantified through clinical electives and match applications. This, in turn, is associated with the program selection practices of these disciplines.  Conclusion: Long term solutions to the unmatched CMG crisis require more nuance than indiscriminately increasing residency seats and should consider cluster specific match ratios as well as regulations around clinical electives and program selection practices.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


Sign in / Sign up

Export Citation Format

Share Document