Using principal component analysis to improve earthquake magnitude prediction in Japan

Abstract Increasing attention has been paid to the prediction of earthquakes with data mining techniques during the last decade. Several works have already proposed the use of certain features serving as inputs for supervised classifiers. However, they have been successfully used without any further transformation so far. In this work, the use of principal component analysis (PCA) to reduce data dimensionality and generate new datasets is proposed. In particular, this step is inserted in a successfully already used methodology to predict earthquakes. Tokyo, one of the cities mostly threatened by large earthquakes occurrence in Japan, is studied. Several well-known classifiers combined with PCA have been used. Noticeable improvement in the results is reported.

Download Full-text

Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids

10.20944/preprints202008.0074.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Edy Irwansyah ◽

Ebiet Salim Pratama ◽

Margaretha Ohyver

Keyword(s):

Data Mining ◽

Cardiovascular Disease ◽

Principal Component Analysis ◽

Data Reduction ◽

Clustering Algorithm ◽

Principal Component ◽

Component Analysis ◽

Data Mining Techniques ◽

The World ◽

Using Data

Cardiovascular disease is the number one cause of death in the world and Quoting from WHO, around 31% of deaths in the world are caused by cardiovascular diseases and more than 75% of deaths occur in developing countries. The results of patients with cardiovascular disease produce many medical records that can be used for further patient management. This study aims to develop a method of data mining by grouping patients with cardiovascular disease to determine the level of patient complications in the two clusters. The method applied is principal component analysis (PCA) which aims to reduce the dimensions of the large data available and the techniques of data mining in the form of cluster analysis which implements the K-Medoids algorithm. The results of data reduction with PCA resulted in five new components with a cumulative proportion variance of 0.8311. The five new components are implemented for cluster formation using the K-Medoids algorithm which results in the form of two clusters with a silhouette coefficient of 0.35. Combination of techniques of Data reduction by PCA and the application of the K-Medoids clustering algorithm are new ways for grouping data of patients with cardiovascular disease based on the level of patient complications in each cluster of data generated.

Download Full-text

Data Mining in Analysis of Biomechanical Signals

Solid State Phenomena ◽

10.4028/www.scientific.net/ssp.147-149.588 ◽

2009 ◽

Vol 147-149 ◽

pp. 588-593 ◽

Cited By ~ 3

Author(s):

Marcin Derlatka ◽

Jolanta Pauk

Keyword(s):

Data Mining ◽

Principal Component Analysis ◽

Cerebral Palsy ◽

Spina Bifida ◽

Decision Tree ◽

Principal Component ◽

Data Preprocessing ◽

Component Analysis ◽

Kernel Principal Component Analysis

In the paper the procedure of processing biomechanical data has been proposed. It consists of selecting proper noiseless data, preprocessing data by means of model’s identification and Kernel Principal Component Analysis and next classification using decision tree. The obtained results of classification into groups (normal and two selected pathology of gait: Spina Bifida and Cerebral Palsy) were very good.

Download Full-text

Statistical approaches in literature: An application of principal component analysis and factor analysis to analyze the different arrangements about the Quran’s Suras

Digital Scholarship in the Humanities ◽

10.1093/llc/fqaa006 ◽

2020 ◽

Author(s):

Yanwen Wang ◽

Javad Garjami ◽

Milena Tsvetkova ◽

Nguyen Huu Hau ◽

Kim-Hung Pho

Keyword(s):

Data Mining ◽

Principal Component Analysis ◽

Factor Analysis ◽

Data Analysis ◽

Principal Component ◽

Component Analysis ◽

Holy Quran ◽

The Holy Quran ◽

Statistical Approaches

Abstract Data mining, statistics, and data analysis are popular techniques to study datasets and extract knowledge from them. In this article, principal component analysis and factor analysis were applied to cluster thirteen different given arrangements about the Suras of the Holy Quran. The results showed that these thirteen arrangements can be categorized in two parts such that the first part includes Blachère, Davood, Grimm, Nöldeke, Bazargan, E’temad-al-Saltane and Muir, and the second part includes Ebn Nadim, Jaber, Ebn Abbas, Hazrat Ali, Khazan, and Al-Azhar.

Download Full-text

Is it possible to detect earlier ionospheric precursors before large earthquakes using principal component analysis (PCA)?

Arabian Journal of Geosciences ◽

10.1007/s12517-011-0419-z ◽

2011 ◽

Vol 6 (4) ◽

pp. 1091-1100 ◽

Cited By ~ 5

Author(s):

Jyh-Woei Lin

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Large Earthquakes

Download Full-text

Predicting breast cancer recurrence using principal component analysis as feature extraction: an unbiased comparative analysis

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v6i3.462 ◽

2020 ◽

Vol 6 (3) ◽

pp. 313

Author(s):

Zuhaira Muhammad Zain ◽

Mona Alshenaifi ◽

Abeer Aljaloud ◽

Tamadhur Albednah ◽

Reham Alghanim ◽

...

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Principal Component Analysis ◽

Feature Extraction ◽

Medical Information ◽

Cancer Recurrence ◽

Principal Component ◽

Component Analysis ◽

Breast Cancer Recurrence ◽

F Measure

Breast cancer recurrence is among the most noteworthy fears faced by women. Nevertheless, with modern innovations in data mining technology, early recurrence prediction can help relieve these fears. Although medical information is typically complicated, and simplifying searches to the most relevant input is challenging, new sophisticated data mining techniques promise accurate predictions from high-dimensional data. In this study, the performances of three established data mining algorithms: Naïve Bayes (NB), k-nearest neighbor (KNN), and fast decision tree (REPTree), adopting the feature extraction algorithm, principal component analysis (PCA), for predicting breast cancer recurrence were contrasted. The comparison was conducted between models built in the absence and presence of PCA. The results showed that KNN produced better prediction without PCA (F-measure = 72.1%), whereas the other two techniques: NB and REPTree, improved when used with PCA (F-measure = 76.1% and 72.8%, respectively). This study can benefit the healthcare industry in assisting physicians in predicting breast cancer recurrence precisely.

Download Full-text

Data Mining Based on Principal Component Analysis: Application to the Nitric Oxide Response in Escherichia coli

Journal of Statistical Science and Application ◽

10.17265/2328-224x/2014.01.001 ◽

2014 ◽

Vol 2 (1) ◽

Author(s):

AiLing Teh ◽

Donovan Layton ◽

Daniel R. Hyduke ◽

Laura R. Jarboe ◽

Derrick K. Rollins ◽

...

Keyword(s):

Nitric Oxide ◽

Escherichia Coli ◽

Data Mining ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Analysis Application

Download Full-text

Improved Weighted Page Ranking Algorithm Based on Principal Component Analysis and Map Reduce Frame work for Web Access

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.2.2144 ◽

2019 ◽

Vol 8 (2) ◽

pp. 32-39

Author(s):

T. Mylsami ◽

B. L. Shivakumar

Keyword(s):

Data Mining ◽

Principal Component Analysis ◽

Web Mining ◽

Principal Component ◽

Component Analysis ◽

Mean Value ◽

Ranking Algorithm ◽

Mean Values ◽

Page Ranking ◽

The Web

In general the World Wide Web become the most useful information resource used for information retrievals and knowledge discoveries. But the Information on Web to be expand in size and density. The retrieval of the required information on the web is efficiently and effectively to be challenge one. For the tremendous growth of the web has created challenges for the search engine technology. Web mining is an area in which applies data mining techniques to deal the requirements. The following are the popular Web Mining algorithms, such as PageRanking (PR), Weighted PageRanking (WPR) and Hyperlink-Induced Topic Search (HITS), are quite commonly used algorithm to sort out and rank the search results. In among the page ranking algorithm uses web structure mining and web content mining to estimate the relevancy of a web site and not to deal the scalability problem and also visits of inlinks and outlinks of the pages. In recent days to access fast and efficient page ranking algorithm for webpage retrieval remains as a challenging. This paper proposed a new improved WPR algorithm which uses a Principal Component Analysis technique called (PWPR) based on mean value of page ranks. The proposed PWPR algorithm takes into account the importance of both the number of visits of inlinks and outlinks of the pages and distributes rank scores based on the popularity of the pages. The weight values of the pages is computed from the inlinks and outlinks with their mean values. But in PWPR method new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. To solve this problem is a MapReduce (MR) framework is promising approach to refreshing mining results for mining big data .The proposed MR algorithm reduces the time complexity of the PWPR algorithm by reducing the number of iterations to reach a convergence point.

Download Full-text