scholarly journals EEG-based Classification of Epileptic and Non-Epileptic Events using Multi-Array Decomposition

Author(s):  
Evangelia Pippa ◽  
Vasileios G. Kanas ◽  
Evangelia I. Zacharaki ◽  
Vasiliki Tsirka ◽  
Michael Koutroumanidis ◽  
...  

In this paper, the classification of epileptic and non-epileptic events from EEG is investigated based on temporal and spectral analysis and two different schemes for the formulation of the training set. Although matrix representation which treats features as concatenated vectors allows capturing dependencies across channels, it leads to significant increase of feature vector dimensionality and lacks a means of modeling dependencies between features. Thus, the authors compare the commonly used matrix representation with a tensor-based scheme. TUCKER decomposition is applied to learn the essence of original, high-dimensional domain of feature space. In contrast to other relevant studies, the authors extend the non-epileptic class to both psychogenic non-epileptic seizure and vasovagal syncope. The classification schemes were evaluated on EEG epochs from 11 subjects. The proposed tensor scheme achieved an accuracy of 97,7% which is better compared to the spatiotemporal model even after trying to improve the latter by dimensionality reduction through principal component analysis and feature selection.

2005 ◽  
Vol 17 (6) ◽  
pp. 645-654
Author(s):  
Ryo Fukano ◽  
◽  
Yasuo Kuniyoshi ◽  
Takuya Otani ◽  
Takumi Kobayashi ◽  
...  

We propose acquiring several properties of an unknown manipulated object, through without using arbitrary information. It consists of explorative manipulation and observation with sensors. By observing self-motion with the target object, it acquires time series sensor data embedded in the motion constraints of the manipulated object. We assume that manipulation features are expressed as a cooperative relation between the fingers and the relation is extractable as a correlation of the time series sensor data. High-order local autocorrelation widely used in image recognition provides the feature vector from data. In feature space, contrastive motion constraints construct the axis of variance. Principal component analysis (PCA) finds the axis mapping constraints. Clustering is used to make classes corresponding to constraints in PCA space. The classes correspond to symbolic representation for the robot. The efficacy of our proposal is demonstrated through simulation and experiments in a task involving opening a screw on lid of unknown size from a bottle.


2008 ◽  
Author(s):  
Dirk-jan Kroon ◽  
Erik van Oort ◽  
Kees Slump

This paper presents a local feature vector based method for automated Multiple Sclerosis (MS) lesion segmentation of multi spectral MRI data. Twenty datasets from MS patients with FLAIR, T1,T2, MD and FA data with expert annotations are available as training set from the MICCAI 2008 challenge on MS, and 24 test datasets. Our local feature vector method contains neighbourhood voxel intensities, histogram and MS probability atlas information. Principal Component Analysis(PCA) with log-likelihood ratio is used to classify each voxel. MRI suffers from intensity inhomogenities. We try to correct this ‘’bias field’’ with 3 methods: a genetic algorithm, edge preserving filtering and atlas based correction. A large observer variability exist between expert classifications, but the similarity scores between model and expert classifications are often lower. Our model gives the best classification results with raw data, because bias correction gives artifacts at the edges and flatten large MS lesions.


2002 ◽  
Vol 30 (4) ◽  
pp. 239-247
Author(s):  
S. M. Shamsuddin ◽  
M. Darus ◽  
M. N. Sulaiman

Data reduction is a process of feature extraction that transforms the data space into a feature space of much lower dimension compared to the original data space, yet it retains most of the intrinsic information content of the data. This can be done by using a number of methods, such as principal component analysis (PCA), factor analysis, and feature clustering. Principal components are extracted from a collection of multivariate cases as a way of accounting for as much of the variation in that collection as possible by means of as few variables as possible. On the other hand, backpropagation network has been used extensively in classification problems such as XOR problems, share prices prediction, and pattern recognition. This paper proposes an improved error signal of backpropagation network for classification of the reduction invariants using principal component analysis, for extracting the bulk of the useful information present in moment invariants of handwritten digits, leaving the redundant information behind. Higher order centralised scale- invariants are used to extract features of handwritten digits before PCA, and the reduction invariants are sent to the improved backpropagation model for classification purposes.


Author(s):  
Saban Ozturk ◽  
Umut Ozkaya ◽  
Mucahid Barstugan

AbstractNecessary screenings must be performed to control the spread of the Corona Virus (COVID-19) in daily life and to make a preliminary diagnosis of suspicious cases. The long duration of pathological laboratory tests and the wrong test results led the researchers to focus on different fields. Fast and accurate diagnoses are essential for effective interventions with COVID-19. The information obtained by using X-ray and Computed Tomography (CT) images is vital in making clinical diagnoses. Therefore it was aimed to develop a machine learning method for the detection of viral epidemics by analyzing X-ray images. In this study, images belonging to 6 situations, including coronavirus images, are classified. Since the number of images in the dataset is deficient and unbalanced, it is more convenient to analyze these images with hand-crafted feature extraction methods. For this purpose, firstly, all the images in the dataset are extracted with the help of four feature extraction algorithms. These extracted features are combined in raw form. The unbalanced data problem is eliminated by producing feature vectors with the SMOTE algorithm. Finally, the feature vector is reduced in size by using a stacked auto-encoder and principal component analysis to remove interconnected features in the feature vector. According to the obtained results, it is seen that the proposed method has leveraging performance, especially in order to make the diagnosis of COVID-19 in a short time and effectively.


2017 ◽  
Vol 8 (2) ◽  
pp. 167-205 ◽  
Author(s):  
Boris Galitsky

To support a natural flow of a conversation between humans and automated agents, rhetoric structures of each message has to be analyzed. We classify a pair of paragraphs of text as appropriate for one to follow another, or inappropriate, based on both topic and communicative discourse considerations.  To represent a multi-sentence message with respect to how it should follow a previous message in a conversation or dialogue, we build an extension of a discourse tree for it. Extended discourse tree is based on a discourse tree for RST relations with labels for communicative actions, and also additional arcs for anaphora and ontology-based relations for entities. We refer to such trees as Communicative Discourse Trees (CDTs). We explore syntactic and discourse features that are indicative of correct vs incorrect request-response or question-answer pairs. Two learning frameworks are used to recognize such correct pairs: deterministic, nearest-neighbor learning of CDTs as graphs, and a tree kernel learning of CDTs, where a feature space of all CDT sub-trees is subject to SVM learning.  We form the positive training set from the correct pairs obtained from Yahoo Answers, social network, corporate conversations including Enron emails, customer complaints and interviews by journalists. The corresponding negative training set is artificially created by attaching responses for different, inappropriate requests that include relevant keywords. The evaluation showed that it is possible to recognize valid pairs in 70% of cases in the domains of weak request-response agreement and 80% of cases in the domains of strong agreement, which is essential to support automated conversations.  These accuracies are comparable with the benchmark task of classification of discourse trees themselves as valid or invalid, and also with classification of multi-sentence answers in factoid question-answering systems.  The applicability of proposed machinery to the problem of chatbots, social chats and programming via NL is demonstrated. We conclude that learning rhetoric structures in the form of CDTs is the key source of data to support answering complex questions, chatbots and dialogue management.


Author(s):  
Владимир Александрович Минаев ◽  
Алена Дмитриевна Реброва ◽  
Александр Викторович Симонов

В статье обсуждаются модели классификации текстового контента и методы его предварительной обработки с целью выявления деструктивных воздействий в социальных медиа. Показано, что основным источником деструктивного контента выступает профиль пользователей, характеризующийся набором личным данных, содержанием публикаций, параметрами сообщества, аккаунтов сети, сообщений и чатов. Говорится об актуальности автоматизированного сбора и анализа данных с помощью моделей прецедентного и дедуктивного обучения. Рассматриваются их основные разновидности и задачи, решаемые на их основе, включающие прогнозирование и типологизацию в аспекте деструктивного содержания текстов, снижение размерности признаков их описания. Исследованы и применены основные методы векторизации текстов: Bag of Words, TF_IDF, Word2vec. На практических корпусах текстов из социальной сети ВКонтакте решены задачи выявления деструктивного контента, связанного с радикальным исламом. Показано, что с помощью примененных моделей и методов все тексты, включающие деструктивный контент, классифицированы верно. Наиболее высокую точность (0,97) при решении задачи распознавания деструктивного контента дает системная интеграция алгоритма векторизации Bag of Words, метода главных компонент для снижения пространства признаков описания текстов и логистической регрессии или случайного леса как моделей обучения. Сделан вывод, что наборы данных, имеющие связь с исламским радикализмом, характеризуются достаточно четкими признаками, которые хорошо вычисляемы с помощью современных моделей, методов и алгоритмов, и могут эффективно применяться для автоматизированной классификации текстовых массивов с целью выявления их деструктивной направленности. Развитие направления, представленного в статье, связано с увеличением исследуемых корпусов документов, более детальным анализом текстов на основе сложных моделей распознавания латентной экстремистской пропаганды, в том числе - представленной в фото, аудио- и видеоформатах. The article discusses models of classification of text content and methods of its pre-processing in order to identify destructive influences in social media. It is shown that the main source of destructive content is the user profile, which is characterized by a set of personal data, the content of publications, community parameters, network accounts, messages and chats. Automated data collection and analysis using case-based and deductive learning models is discussed. We consider their main varieties and the tasks solved on their basis, including forecasting and typology in the aspect of the destructive content of texts, reducing the dimension of the features of their description. The main methods of text vectorization are investigated and applied: Bag of Words, TF_IDF, Word2vec. The tasks of identifying destructive content related to Islamic radicalism are solved on the practical corpus of texts from the social network VKontakte. It is shown that using the applied models and methods, all texts that include destructive content are classified correctly. The highest accuracy (0.97) in solving the problem of recognizing destructive content is provided by the system integration of the Bag of Words vectorization algorithm, the principal component method for reducing the feature space of text descriptions, and logistic regression or random forest as learning models. It is concluded that the data sets associated with Islamic radicalism are characterized by sufficiently clear features that are well calculated using modern models, methods and algorithms, and can be effectively used for automated classification of text arrays in order to identify their destructive orientation. The development of the direction presented in the article is associated with an increase in the studied corpus of documents, a more detailed analysis of texts based on complex models for recognizing latent extremist propaganda, including those presented in photo, audio and video formats.


2014 ◽  
Vol 986-987 ◽  
pp. 1491-1496 ◽  
Author(s):  
Qiang Wang ◽  
Yong Bao Liu ◽  
Xing He ◽  
Shu Yong Liu ◽  
Jian Hua Liu

Selection of secondary variables is an effective way to reduce redundant information and to improve efficiency in nonlinear system modeling. The combination of Kernel Principal Component Analysis (KPCA) and K-Nearest Neighbor (KNN) is applied to fault diagnosis of bearing. In this approach, the integral operator kernel functions is used to realize the nonlinear map from the raw feature space of vibration signals to high dimensional feature space, and structure and statistics in the feature space to extract the feature vector from the fault signal with the principal component analytic method. Assessment method using the feature vector of the Kernel Principal Component Analysis, and then enter the sensitive features to K-Nearest Neighbor classification. The experimental results indicated that this method has good accuracy.


Author(s):  
Hyeuk Kim

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.


2020 ◽  
Author(s):  
Xin Yi See ◽  
Benjamin Reiner ◽  
Xuelan Wen ◽  
T. Alexander Wheeler ◽  
Channing Klein ◽  
...  

<div> <div> <div> <p>Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H- pyrrole (C) via Ti- catalyzed formal [2+2+1] cycloaddition of phenyl propyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space along with k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and only optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (> 90% C) by incorporating 2,6-dimethyl- 4-(pyrrolidin-1-yl)pyridine as a ligand. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development. </p> </div> </div> </div>


2018 ◽  
Vol 21 (2) ◽  
pp. 125-137
Author(s):  
Jolanta Stasiak ◽  
Marcin Koba ◽  
Marcin Gackowski ◽  
Tomasz Baczek

Aim and Objective: In this study, chemometric methods as correlation analysis, cluster analysis (CA), principal component analysis (PCA), and factor analysis (FA) have been used to reduce the number of chromatographic parameters (logk/logkw) and various (e.g., 0D, 1D, 2D, 3D) structural descriptors for three different groups of drugs, such as 12 analgesic drugs, 11 cardiovascular drugs and 36 “other” compounds and especially to choose the most important data of them. Material and Methods: All chemometric analyses have been carried out, graphically presented and also discussed for each group of drugs. At first, compounds’ structural and chromatographic parameters were correlated. The best results of correlation analysis were as follows: correlation coefficients like R = 0.93, R = 0.88, R = 0.91 for cardiac medications, analgesic drugs, and 36 “other” compounds, respectively. Next, part of molecular and HPLC experimental data from each group of drugs were submitted to FA/PCA and CA techniques. Results: Almost all results obtained by FA or PCA, and total data variance, from all analyzed parameters (experimental and calculated) were explained by first two/three factors: 84.28%, 76.38 %, 69.71% for cardiovascular drugs, for analgesic drugs and for 36 “other” compounds, respectively. Compounds clustering by CA method had similar characteristic as those obtained by FA/PCA. In our paper, statistical classification of mentioned drugs performed has been widely characterized and discussed in case of their molecular structure and pharmacological activity. Conclusion: Proposed QSAR strategy of reduced number of parameters could be useful starting point for further statistical analysis as well as support for designing new drugs and predicting their possible activity.


Sign in / Sign up

Export Citation Format

Share Document