PAAIcon2016: Multivariate statistics for hydrogeology: moving forward from "the present is the key to the past"

Mapping Intimacies ◽

10.31227/osf.io/fbgmy ◽

2017 ◽

Author(s):

Dasapta Erwin Irawan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Open Source ◽

Multivariate Statistics ◽

Family Life ◽

Principal Component ◽

Demographic Health Survey ◽

The Past ◽

The Future

This abstract has been presented at the PAAI conference 2016, 16-17 Nov 2016. Consists of to part: Part 1 Introduction to Open Science (zip file) and Part 2 Multivariate statistics in hydrogeology.ABSTRACT Geology is one of the oldest science in the world. Originated from natural science, it grows from the observation of sea shells to the sophisticated interpretation of the earth interior. On recent development geological approach need to be more quantitative, related to the needs prediction and simulation. Geology has shifted from “the present is the key to the past” towards “the present is the key to the past as the base of prediction of the future”. Hydrogeology is one of the promising branch of geology that relies more to quantitative analysis. Multivariate statistics is one of the most frequently used resources in this field. We did some literature search and web scraping to analyze current situation and future trend of multivariate statistics application for geological synthesis. We used several sets of keywords but this set gave the most satifying results: “(all in title) multivariate statistics (and) groundwater”, on Google Scholar, Crossref, and ScienceOpen database. The final result was 164 papers. We used VosViewer and Zotero to do some text mining operations. Based on the analysis we can draw some results. Cluster analysis and principal component analysis are still the most frequently used method in hydrogeology. Both are mostly used to extract hydrochemical and isotope data to analyze the hydrogeological nature of groundwater flow. More machine learning methods have been introduced in the last five years in hydrogeological science. `Random forest` and `decision tree` technique are used extensively to learn the from physical and chemical properties of groundwater. Open source tools have also shifted the use of major statistical or programming language such as: SAS and Matlab. Python and R programming are the two famous open source applications in this field. We also note the increase of papers to discuss hydrogeology and public health sector. Therefore such methods are also being used to analyze open demographic data like DHS (demographic health survey) and FLS (Family Life Survey). Strong community of programmer makes the exponential development of both languages, via platform like Github. This has become the future of hydrogeology. ABSTRAK Geologi adalah salah satu ilmu tertua di dunia. Berasal dari ilmu alam, ia berkembang dari observasi kerang laut ke arah interpretasi interior bumi yang kompleks. Dalam perkembangannya saat ini, geologi memerlukan pendekatan yang lebih kuantitatif, berkaitan dengan kebutuhan untuk prediksi dan simulasi. Geologi telah bergeser dari “the present is the key to the past” (saat ini adalah kunci menuju masa lalu) menjadi “the present is the key to the past as the base of prediction of the future” (saat ini adalah kunci menuju masa lalu dan sebagai dasar prediksi masa depan. Hidrogeologi adalah salah satu cabang ilmu geologi yang bersandar kepada analisis kuantitatif. Statistik multivariabel adalah salah satu metode yang digunakan dalam bidang ini. Kami telah melakukan telaah literatur dan penyadapan web untuk menganalisis kondisi saat ini dan trend masa depan tentang aplikasi statistik multivariabel untuk sintesis geologi. Beberapa set kata kunci digunakan, tetapi yang berikut ini memberikan hasil paling memuaskan: “(all in title) multivariate statistics (and) groundwater”. Database Google Scholar, Crossref, dan ScienceOpen menjadi sumber informasi yang menghasilkan hasil terseleksi sebanyak 164 makalah ilmiah. Kami menggunakan aplikasi VosViewer and Zotero untuk mengolah data teks (text mining). Berdasarkan analisis, cluster analysis dan principal component analysis masih menjadi teknik yang paling banyak dipakai. Keduanya umumnya digunakan untuk mengesktrak data hidrokimia dan isotop untuk menganalisis kondisi hidrogeologi dan aliran air tanah. Lebih banyak lagi metode machine learning (pembelajaran mesin) telah dikenalkan dan digunakan dalam lima tahun terakhir. Teknik “Random forest” and “decision tree” yang merupakan pengembangan dari teknik regresi linear juga telah banyak digunakan untuk mempelajari sifat fisik dan kimia air tanah. Penggunaan aplikasi open source juga telah menggeser piranti lunak berbayar yang mahal, seperti SAS and Matlab. Bahasa pemrograman Python and R adalah beberapa saja yang terkenal dalam bidang machine learning. Kami juga menangkap peningkatan jumlah makalah yang isinya merupakan irisan antara bidang hidrogeologi dan kesehatan masyarakat. Karena itu teknik machine learning juga digunakan untuk menganalisis data terbuka demografi seperti DHS (demographic health survey) dan FLS (Family Life Survey). Komunitas programmer yang kuat mampu mengembangan piranti lunak open source ini secara eksponensial, melalui platform seperti Github. Hal ini telah menjadi masa depan dari hidrogeologi.

Download Full-text

Classification of Observations through Combination of the Dimension Reduction and the Cluster Analysis

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.13 ◽

2017 ◽

Vol 7 (8) ◽

pp. 30

Author(s):

Hyeuk Kim

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Unsupervised Learning ◽

Principal Component ◽

Component Analysis ◽

Baseball Players ◽

Partitioning Around Medoids ◽

Different Characteristics

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.

Download Full-text

Analysis of the Bath Motion in the MM-SQC Dynamics Using Unsupervised Machine Learning Dimensionality Reduction Approaches: Principal Component Analysis

10.26434/chemrxiv.13332530 ◽

2020 ◽

Author(s):

Jiawei Peng ◽

Yu Xie ◽

Deping Hu ◽

Zhenggang Lan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Collective Motion ◽

Principal Component ◽

Component Analysis ◽

Nonadiabatic Dynamics ◽

Trajectory Data ◽

Unsupervised Machine Learning ◽

Physical Knowledge ◽

Vibronic Couplings

The system-plus-bath model is an important tool to understand nonadiabatic dynamics for large molecular systems. The understanding of the collective motion of a huge number of bath modes is essential to reveal their key roles in the overall dynamics. We apply the principal component analysis (PCA) to investigate the bath motion based on the massive data generated from the MM-SQC (symmetrical quasi-classical dynamics method based on the Meyer-Miller mapping Hamiltonian) nonadiabatic dynamics of the excited-state energy transfer dynamics of Frenkel-exciton model. The PCA method clearly clarifies that two types of bath modes, which either display the strong vibronic couplings or have the frequencies close to electronic transition, are very important to the nonadiabatic dynamics. These observations are fully consistent with the physical insights. This conclusion is obtained purely based on the PCA understanding of the trajectory data, without the large involvement of pre-defined physical knowledge. The results show that the PCA approach, one of the simplest unsupervised machine learning methods, is very powerful to analyze the complicated nonadiabatic dynamics in condensed phase involving many degrees of freedom.

Download Full-text

STUDY ON PLANKTON OF DIFFERENT CATEGORIES OF LAKES IN SUMMER BY MEANS OF PRINCIPAL COMPONENT ANALYSIS, FACTOR ANALYSIS AND CLUSTER ANALYSIS

Acta Hydrobiologica Sinica ◽

10.3724/sp.j.1035.2010.00043 ◽

2010 ◽

Vol 36 (1) ◽

pp. 43-50

Author(s):

Luo-Jun GONG ◽

Shi-Ping ZHANG ◽

Bang-Xi XIONG ◽

Ding-Zhu LIU ◽

Jin-Zhong LI ◽

...

Keyword(s):

Principal Component Analysis ◽

Cluster Analysis ◽

Factor Analysis ◽

Principal Component ◽

Component Analysis ◽

And Cluster Analysis

Download Full-text

Identification of COVID-19 Clinical Phenotypes by Principal Component Analysis-Based Cluster Analysis

SSRN Electronic Journal ◽

10.2139/ssrn.3582711 ◽

2020 ◽

Author(s):

Wenjing Ye ◽

Lu Weiwei ◽

Yanping Tang ◽

Chen Guoxi ◽

Li Xiaopan ◽

...

Keyword(s):

Principal Component Analysis ◽

Cluster Analysis ◽

Principal Component ◽

Component Analysis ◽

Clinical Phenotypes

Download Full-text

Quality evaluation of tugaksheeree samples by ATR-FTIR spectroscopy using multicomponent analysis

The Natural Products Journal ◽

10.2174/2210315510999200408125555 ◽

2020 ◽

Vol 10 ◽

Cited By ~ 1

Author(s):

Nikunj D. Patel ◽

Niranjan S. Kanaki

Keyword(s):

Principal Component Analysis ◽

Cluster Analysis ◽

Quality Evaluation ◽

Principal Component ◽

Hierarchical Cluster ◽

Attenuated Total Reflectance ◽

Chemometric Methods ◽

Total Reflectance ◽

Fingerprint Region ◽

Ftir Technique

Background: Numerous Ayurvedic formulations contains tugaksheeree as key ingredient. Tugaksheereeis the starch gained from the rhizomes of two plants, Curcuma angustifoliaRoxb. (Zingiberaceae) and Marantaarundinacea (MA) Linn. (Marantaceae). Objective: The primary concerns in quality assessment of Tugaksheeree occur due to adulteration or substitution. Method: In current study, Fourier transform infrared (FTIR) technique with attenuated total reflectance (ATR) facility was used to evaluate tugaksheeree samples. Total 10 different samples were studied and transmittance mode was kept to record the spectra devoid of pellets of KBR. Further treatment was given with multi component tools by considering fingerprint region of the spectra. Multivariate analysis was performed by various chemometric methods. Result: Multi component methods like Principal Component Analysis (PCA), and Hierarchical Cluster Analysis (HCA)were used to discriminate the tugaksheeree samples using Minitab software. Conclusion: This method can be used as a tool to differentiate samples of tugaksheeree from its adulterants and substitutes.

Download Full-text

Combining principal component analysis and fuzzy cluster analysis for China's oil security

2nd International Symposium on Information Technologies and Applications in Education (ISITAE 2008) ◽

10.1049/ic:20080300 ◽

2008 ◽

Author(s):

Wang Jian-zhou ◽

Zhu Su-ling ◽

Sun Dong-huai ◽

Lu Hai-yan

Keyword(s):

Principal Component Analysis ◽

Cluster Analysis ◽

Principal Component ◽

Component Analysis ◽

Fuzzy Cluster ◽

Fuzzy Cluster Analysis

Download Full-text

COVID-19 Pandemic and Adaptive Shopping Patterns: An Insight from Indonesian Consumers

Global Business Review ◽

10.1177/09721509211013512 ◽

2021 ◽

pp. 097215092110135

Author(s):

Arif Hartono ◽

Asma'i Ishak ◽

Agus Abdurrahman ◽

Budi Astuti ◽

Endy Gunanto Marsasi ◽

...

Keyword(s):

Principal Component Analysis ◽

Cluster Analysis ◽

Principal Component ◽

Demographic Characteristics ◽

Depth Information ◽

Factor Scores ◽

Consumer Spending ◽

Consumer Responses ◽

Buying Behaviour ◽

Manova Analysis

Although existing studies on consumers typology are extensively conducted, insights on consumers typology in adapting their shopping attitude and behaviour during the COVID-19 pandemic remain unexplored. Current studies on consumer responses to the COVID-19 pandemic tend to focus on the following themes: panic buying behaviour, consumer spending and consumer consumption. This study explores a typology of adaptive shopping patterns in response to the COVID-19 pandemic. The study involved a survey of 465 Indonesian consumers. Principal component analysis is used to identify the variables related to adaptive shopping patterns. Cluster analysis of the factor scores obtained on the adaptive shopping attitude and behaviour revealed the typology of Indonesian shoppers’ adaptive patterns. Multivariate Analysis of Variance (MANOVA) analysis is used to profile the identified clusters based on attitude, behaviour and demographic characteristics. Results revealed five adaptive shopping patterns with substantial differences among them. This study provides in-depth information about the profile of Indonesian shoppers’ adaptive patterns that would help retailers in understanding consumers and choosing their target group. The major contribution of this study is providing segmentation on shopping adaptive patterns in the context of the COVID-19 pandemic which presents interesting differences compared with previous studies. This study reveals new insights on shoppers’ adaptive attitude and behaviour as consumers coped with the pandemic.

Download Full-text

Genetic Diversity Assessment in Dolichos Bean (Lablab purpureus L.) Based on Principal Component Analysis and Single Linkage Cluster Analysis

Legume Research - An International Journal ◽

10.18805/lr-4561 ◽

2021 ◽

Author(s):

S.R. Singh ◽

S. Rajan ◽

Dinesh Kumar ◽

V.K. Soni

Keyword(s):

Genetic Diversity ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Principal Component ◽

Component Analysis ◽

Single Linkage ◽

Pod Yield ◽

Diversity Assessment ◽

Single Linkage Cluster ◽

Linkage Cluster

Background: Dolichos bean occupies a unique position among the legume vegetables of Indian origin for its high nutritive value and wider climatic adaptability. Despite its wide genetic diversity, no much effort has been undertaken towards genetic improvement of this vegetable crop. Knowledge on genetic variability is an essential pre-requisite as hybrid between two diverse parental lines generates broad spectrum of variability in segregating population. The current study aims to assess the genetic diversity in dolichos genotypes to make an effective selection for yield improvement.Methods: Twenty genotypes collected from different regions were evaluated during year 2016-17 and 2017-18. Data on twelve quantitative traits was analysed using principal component analysis and single linkage cluster analysis for estimation of genetic diversity.Result: Principal component analysis revealed that first five principal components possessed Eigen value greater than 1, cumulatively contributed greater than 82.53% of total variability. The characters positively contributing towards PC-I to PC-V may be considered for dolichos improvement programme as they are major traits involved in genetic variation of pod yield. All genotypes were grouped into three clusters showing non parallelism between geographic and genetic diversity. Cluster-I was best for earliness and number of cluster/plant. Cluster-II for vine length, per cent fruit set, pod length, pod width, pod weight and number of seed /pod, cluster III for number of pods/cluster and pod yield /plant. Selection of parent genotypes from divergent cluster and component having more than one positive trait of interest for hybridization is likely to give better progenies for development of high yielding varieties in Dolichos bean.

Download Full-text

Comparative Analysis of Machine Learning Techniques with Principal Component Analysis on Kidney and Heart Disease

10.1109/icesc51422.2021.9533011 ◽

2021 ◽

Author(s):

Reena Chandra ◽

Manoj Kapil ◽

Avinash Sharma

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Heart Disease ◽

Comparative Analysis ◽

Principal Component ◽

Component Analysis ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Combined Principal Component Analysis (PCA) and Hierarchical Cluster Analysis (HCA): an efficient chemometric approach in aged gel inks discrimination

Australian Journal of Forensic Sciences ◽

10.1080/00450618.2018.1466913 ◽

2018 ◽

Vol 52 (1) ◽

pp. 38-59 ◽

Cited By ~ 1

Author(s):

Muhammad Naeim Mohamad Asri ◽

Wan Nur Syuhaila Mat Desa ◽

Dzulkiflee Ismail

Keyword(s):

Principal Component Analysis ◽

Cluster Analysis ◽

Hierarchical Cluster Analysis ◽

Principal Component ◽

Hierarchical Cluster ◽

Component Analysis ◽

Chemometric Approach

Download Full-text