scholarly journals Quantification of Information Exchange in Idealized and Climate System Applications

Entropy ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. 1094
Author(s):  
Praveen Kumar Pothapakula ◽  
Cristina Primo ◽  
Bodo Ahrens

Often in climate system studies, linear and symmetric statistical measures are applied to quantify interactions among subsystems or variables. However, they do not allow identification of the driving and responding subsystems. Therefore, in this study, we aimed to apply asymmetric measures from information theory: the axiomatically proposed transfer entropy and the first principle-based information flow to detect and quantify climate interactions. As their estimations are challenging, we initially tested nonparametric estimators like transfer entropy (TE)-binning, TE-kernel, and TE k-nearest neighbor and parametric estimators like TE-linear and information flow (IF)-linear with idealized two-dimensional test cases along with their sensitivity on sample size. Thereafter, we experimentally applied these methods to the Lorenz-96 model and to two real climate phenomena, i.e., (1) the Indo-Pacific Ocean coupling and (2) North Atlantic Oscillation (NAO)–European air temperature coupling. As expected, the linear estimators work for linear systems but fail for strongly nonlinear systems. The TE-kernel and TE k-nearest neighbor estimators are reliable for linear and nonlinear systems. Nevertheless, the nonparametric methods are sensitive to parameter selection and sample size. Thus, this work proposes a composite use of the TE-kernel and TE k-nearest neighbor estimators along with parameter testing for consistent results. The revealed information exchange in Lorenz-96 is dominated by the slow subsystem component. For real climate phenomena, expected bidirectional information exchange between the Indian and Pacific SSTs was detected. Furthermore, expected information exchange from NAO to European air temperature was detected, but also unexpected reversal information exchange. The latter might hint to a hidden process driving both the NAO and European temperatures. Hence, the limitations, availability of time series length and the system at hand must be taken into account before drawing any conclusions from TE and IF-linear estimations.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Prince Mensah Osei ◽  
Anokye M. Adam

We quantify the strength and the directionality of information transfer between the Ghana stock market index and its component stocks as well as observe the same among the individual stocks on the market using transfer entropy. The information flow between the market index and its components and among individual stocks is measured by the effective transfer entropy of the daily logarithm returns generated from the daily market index and stock prices of 32 stocks ranging from 2nd January 2009 to 16th February 2018. We find a bidirectional and unidirectional flow of information between the GSE index and its component stocks, and the stocks dominate the information exchange. Among the individual stocks, SCB is the most active stock in the information exchange as it is the stock that receives the highest amount of information, but the most informative source is EGL (an insurance company) that has the highest net information outflow while the most information sink is PBC that has the highest net information inflow. We further categorize the stocks into 9 stock market sectors and find the insurance sector to be the largest source of information which confirms our earlier findings. Surprisingly, the oil and gas sector is the information sink. Our results confirm the fact that other sectors including oil and gas mitigate their risk exposures through insurance companies and are always expectant of information originating from the insurance sector in relation to regulatory compliance issues. It is our firm conviction that this study would allow stakeholders of the market to make informed buy, sell, or hold decisions.



2009 ◽  
Vol 19 (12) ◽  
pp. 4197-4215 ◽  
Author(s):  
ANGELIKI PAPANA ◽  
DIMITRIS KUGIUMTZIS

We study some of the most commonly used mutual information estimators, based on histograms of fixed or adaptive bin size, k-nearest neighbors and kernels and focus on optimal selection of their free parameters. We examine the consistency of the estimators (convergence to a stable value with the increase of time series length) and the degree of deviation among the estimators. The optimization of parameters is assessed by quantifying the deviation of the estimated mutual information from its true or asymptotic value as a function of the free parameter. Moreover, some commonly used criteria for parameter selection are evaluated for each estimator. The comparative study is based on Monte Carlo simulations on time series from several linear and nonlinear systems of different lengths and noise levels. The results show that the k-nearest neighbor is the most stable and less affected by the method-specific parameter. A data adaptive criterion for optimal binning is suggested for linear systems but it is found to be rather conservative for nonlinear systems. It turns out that the binning and kernel estimators give the least deviation in identifying the lag of the first minimum of mutual information from nonlinear systems, and are stable in the presence of noise.



2011 ◽  
Vol 41 (1) ◽  
pp. 73-82 ◽  
Author(s):  
Jong Su Yim ◽  
Young Hwan Kim ◽  
Sung Ho Kim ◽  
Jin Hyun Jeong ◽  
Man Yong Shin

National Forest Inventories (NFIs) have been used in many countries to assess forest resources at the national level. To facilitate the estimation of forest growing stock volume at more regional scales, the k-nearest neighbor (k-NN) technique was applied in this research to obtain estimates for unmeasured areas by using NFI field data and optical satellite data. The NFI field data were assigned to data sets of three different sample sizes to evaluate the effect of sample size on the accuracy of k-NN estimates. In small-area estimation, calibration techniques, in which samples surveyed outside a county of interest are employed to produce estimates for the county, are often adopted due to the lack of sample observations for the county of interest. Thus, the k-NN estimates, forest growing stock volume and areal proportions by forest types, were compared with estimates obtained from field data with and without calibration. The results indicated that the accuracy of k-NN estimates could be improved as sample size increased. Also, the k-NN technique provided acceptable estimates for small-area estimation. Although there was no significant difference with the calibration approach (p > 0.18), k-NN has potential for small-area estimation and is useful to generate thematic maps of forest attributes.



Signatures have been accepted in commercial transactions as a method of authentication. Digitizing credentials reduce the storage space requisite for the same information from a few cubic inches to so many bytes on a server. The most frequent use of offline signature authentication is to reduce the turnaround time for cheque clearance. In this paper, machine learning classifiers are used to verify the signature using four image based features. BHsig260 dataset (Bangla and Hindi) has been used. We used signatures of 55 users of Hindi and Bangla each. .Six classifier i.e. Boosted Tree, Random forest classifier (RFC), K-nearest neighbor, Multilayer Perceptron, Support Vector Machine (SVM) and Naive Bayes classifier are used in the work. In the paper, the results of Writer independent model show that accuracy of Hindi off-line signature verification is 72.3 % using MLP with the signature sample size of 20 and that of Bangla is 79 % using RFC with the signature sample size of 23.In user dependent model, for some users, we achieved accuracy of more than 92 % using KNN and SVM.



Author(s):  
M. Jeyanthi ◽  
C. Velayutham

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.



2020 ◽  
Vol 17 (1) ◽  
pp. 319-328
Author(s):  
Ade Muchlis Maulana Anwar ◽  
Prihastuti Harsani ◽  
Aries Maesya

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.



Author(s):  
S. Vijaya Rani ◽  
G. N. K. Suresh Babu

The illegal hackers  penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods  available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.



2015 ◽  
Vol 1 (4) ◽  
pp. 270
Author(s):  
Muhammad Syukri Mustafa ◽  
I. Wayan Simpen

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.



Sign in / Sign up

Export Citation Format

Share Document