scholarly journals Stock Trading Based on Principal Component Analysis and Clustering Analysis

Author(s):  
Yanru Guo
Author(s):  
Bingxian Leng ◽  
Yunfei Fu ◽  
Siyuan Li

This paper mainly uses the idea of pedigree clustering analysis, gray prediction and principal component analysis. The clustering analysis model, GM (1,1) model and principal component analysis model were established by using SPSS software to analyze the correlation matrices and principal component analysis. MATLAB software was used to calculate the correlation matrices. In January, The difference in price changes of major food prices in cities is calculated, and had forecasted the various food prices in June 2016. For the first issue, the main food is classified and the data are processed. After that, the SPSS software is used to classify the 27 kinds of food into four categories by using the pedigree cluster analysis model and the system clustering. The four categories are made by EXCEL. The price of food changes over time with a line chart that analyzes the characteristics of food price volatility. For the second issue, the gray prediction model is established based on the food classification of each kind of food price. First, the original data is cumulated, test and processed, so that the data have a strong regularity, and then establish a gray differential equation, and then use MATLAB software to solve the model. And then the residual test and post-check test, have C <0.35, the prediction accuracy is better. Finally, predict the price trend in June 2016 through the function. For the third issue, we analyzed the main components of 27 kinds of food types by celery, octopus, chicken (white striped chicken), duck and Chinese cabbage by using the data of principal given and analyzed by principal component analysis. It can be detected by measuring a small amount of food, this predict CPI value relatively accurate. Through the study of the characteristics of the region, select Shanghai and Shenyang, by looking for the relevant CPI and food price data, using spss software, principal component analysis, the impact of the CPI on several types of food, and then calculated by matlab algorithm weight, and then the data obtained by the analysis and comparison, different regions should be selected for different types of food for testing.


2021 ◽  
Vol 1 (1) ◽  
pp. 190-199
Author(s):  
Rijalul Fikri ◽  
Aswin Mushardiyanto ◽  
Mochamad Naufal Laudza’Banin ◽  
Kristiana Maureen ◽  
Harry Patria

Berdasarkan dataset tentang informasi kemiskinan kabupaten/kota tahun 2020 yang dikeluarkan oleh Badan Pusat Statistik Indonesia, dipilih variabel bebas sebanyak dua puluh variabel yang digunakan dalam penelitian ini. Kemudian dilakukan uji korelasi antar variabel bebas tersebut dan diketahui terdapat variabel yang berkorelasi dikategorikan berkorelasi sangat tinggi, dengan nilai korelasi sebesar 0,921 (Persentase Penduduk Miskin - P1 (Poverty Gap Index)) dan 0,964 (P1 (Poverty Gap Index) - P2 (Proverty Severity Index)). Variabel yang memiliki korelasi sangat tinggi jika digunakan akan menyebabkan terjadinya multikolinearitas, sehingga opsi untuk menghilangkan multikolinearitas adalah dengan menggunakan Principal Component Analysis (PCA). Dengan menggunakan Proporsi Kumulatif Varians dan minimum persentase keragaman data sebesar 80% maka didapatkan output berupa dimensi data baru PCA sebanyak tiga dimensi data atau tiga variabel bebas baru. Dengan menggunakan variabel input baru berupa PCA 0, PCA 1 dan PCA 2 dilakukanlah penentuan jumlah cluster dengan metode Silhouette Coefficient dan analisa clustering menggunakan metode K-Means didapatkanlah empat kelompok/cluster, dengan jumlah anggota cluster 1 sebanyak 117 Kabupaten/Kota, cluster 2 sebanyak 154 Kabupaten/Kota, cluster 3 sebanyak 173 Kabupaten/Kota dan cluster 4 sebanyak 70 Kabupaten/Kota.


2020 ◽  
Vol 214 ◽  
pp. 03003
Author(s):  
Jiayi Yan ◽  
Qian Pu ◽  
Junfei Liu

Based on the knowledge of economics, this paper selects 22 macroeconomic indicators that best reflect the overall economic situation of the United States. After differential, logarithmic and exponential preprocessing of the original data, this paper, based on the power spectral analysis model, adaptively identifies the periodicity of the selected economic indicators, and visualize the results. As a result, it screens out 11 indicators with obvious periodicity. In the process of solving the weighted distance based on principal component analysis, correlation test is first conducted on the selected 11 single indicators of periodicity to obtain Pearson correlation heatmap. Then, the principal components are extracted by selecting the first five principal components as the virtual indicators to represent the monthly economic situation, and calculating the weighted distance value between months for visualization. Finally, we select the results of 36 months’ smoothing for analysis, figure out the time intervals with similar economic situation, and verify the conjecture of economic periodicity. Finally, based on K-MEAN clustering analysis, the economic conditions of 352 months are classified into 3 clusters by using the weighted distance after 36 months’ smoothing. From the visualized results, it is found that there are two complete cycles, i.e. red-yellow-blue and red-yellow-blue, which is consistent with the conclusion of principal component analysis model, and proves the existence of economic cycle again. In conclusion, based on the above PCA weighted distance and clustering analysis, it can be concluded that the economic period is around 176 months, in favor of medium long periodicity theory.


2005 ◽  
Vol 5 (3-4) ◽  
pp. 197-208
Author(s):  
S. Chung ◽  
H. Lee ◽  
M. Yu ◽  
J. Koo ◽  
I. Hyun ◽  
...  

In order to identify the relation between revenue water (RW) ratio and key local factors in a quantifiable way, 90 effect factors were considered as regional characteristics for 79 Korean cities. Seven statistically significant effect factors were chosen through correlation analysis. Three principal components independently influencing RW ratio were extracted by principal component analysis (PCA). The 79 cities were grouped into six clusters by k-means clustering (KMC) of the factor scores of the cities. Then key local factors were identified and their impacts were quantified by multiple regression analysis (MRA) and they were justified by T-test and F-test. The approach through correlation-PCA-KMC-MRA was proved to be one of scientific ways for identification of key local factors. According to the result, it was suggested that a shorter length of distribution system, a water supply with smaller number of bigger customer meters a and gravitational supply through reservoir would be advantageous from a RW ratio's point of view.


VASA ◽  
2012 ◽  
Vol 41 (5) ◽  
pp. 333-342 ◽  
Author(s):  
Kirchberger ◽  
Finger ◽  
Müller-Bühl

Background: The Intermittent Claudication Questionnaire (ICQ) is a short questionnaire for the assessment of health-related quality of life (HRQOL) in patients with intermittent claudication (IC). The objective of this study was to translate the ICQ into German and to investigate the psychometric properties of the German ICQ version in patients with IC. Patients and methods: The original English version was translated using a forward-backward method. The resulting German version was reviewed by the author of the original version and an experienced clinician. Finally, it was tested for clarity with 5 German patients with IC. A sample of 81 patients were administered the German ICQ. The sample consisted of 58.0 % male patients with a median age of 71 years and a median IC duration of 36 months. Test of feasibility included completeness of questionnaires, completion time, and ratings of clarity, length and relevance. Reliability was assessed through a retest in 13 patients at 14 days, and analysis of Cronbach’s alpha for internal consistency. Construct validity was investigated using principal component analysis. Concurrent validity was assessed by correlating the ICQ scores with the Short Form 36 Health Survey (SF-36) as well as clinical measures. Results: The ICQ was completely filled in by 73 subjects (90.1 %) with an average completion time of 6.3 minutes. Cronbach’s alpha coefficient reached 0.75. Intra-class correlation for test-retest reliability was r = 0.88. Principal component analysis resulted in a 3 factor solution. The first factor explained 51.5 of the total variation and all items had loadings of at least 0.65 on it. The ICQ was significantly associated with the SF-36 and treadmill-walking distances whereas no association was found for resting ABPI. Conclusions: The German version of the ICQ demonstrated good feasibility, satisfactory reliability and good validity. Responsiveness should be investigated in further validation studies.


Sign in / Sign up

Export Citation Format

Share Document