dimensionality reduction technique
Recently Published Documents


TOTAL DOCUMENTS

102
(FIVE YEARS 45)

H-INDEX

11
(FIVE YEARS 3)

Author(s):  
Nader S. Santarisi ◽  
Sinan S. Faouri

In order to monitor the performance and related efficiency of a combined cycle power plant (CCPP), in addition to the best utilization of its power output, it is vital to predict its full load electrical power output. In this paper, the full load electrical power output of CCPP was predicted employing practically efficient machine learning algorithms, including linear regression, ridge regression, lasso regression, elastic net regression, random forest regression, and gradient boost regression. The original data came from an actual confidential power plant, which was working on a full load for 6 years, with four major features: ambient temperature, relative humidity, atmospheric pressure, and exhaust vacuum, and one target (electrical power output per hour). Different regression performance measures were used, including R2 (coefficient of determination), MAE (Mean Absolute Error), MSE (Mean Squared Error), RMSE (Root Mean Squared Error), and MAPE (Mean Absolute Percentage Error). Research results revealed that the gradient boost regression model outperformed other models with and without using the dimensionality reduction technique (PCA) with the highest R2 of 0.912 and 0.872, respectively, and had the lowest MAPE of 0.872 % and 1.039 %, respectively. Moreover, prediction performance dropped slightly after using the dimensionality reduction technique almost in all regression algorithms used. The novelty in this work is summarized in predicting electrical power output in a CCPP based on a few features using simpler algorithms than reported deep learning and neural networks algorithms combined. That means a lower cost and less complicated procedure as per each, however, resulting in practically accepted results according to the evaluation metrics used.


2021 ◽  
Vol 13 (24) ◽  
pp. 4972
Author(s):  
Nasem Badreldin ◽  
Beatriz Prieto ◽  
Ryan Fisher

Accurate spatial distribution information of native, mixed, and tame grasslands is essential for maintaining ecosystem health in the Prairie. This research aimed to use the latest monitoring technology to assess the remaining grasslands in Saskatchewan’s mixed grassland ecoregion (MGE). The classification approach was based on 78 raster-based variables derived from big remote sensing data of multispectral optical space-borne sensors such as MODIS and Sentinel-2, and synthetic aperture radar (SAR) space-borne sensors such as Sentinel-1. Principal component analysis (PCA) was used as a data dimensionality reduction technique to mitigate big data load and improve processing time. Random Forest (RF) was used in the classification process and incorporated the selected variables from 78 satellite-based layers and 2385 reference training points. Within the MGE, the overall accuracy of the classification was 90.2%. Native grassland had 98.20% of user’s accuracy and 88.40% producer’s accuracy, tame grassland had 81.4% user’s accuracy and 93.8% producer’s accuracy, whereas mixed grassland class had very low user’s accuracy (45.8%) and producer’s accuracy 82.83%. Approximately 3.46 million hectares (40.2%) of the MGE area are grasslands (33.9% native, 4% mixed, and 2.3% tame). This study establishes a novel analytical framework for reliable grassland mapping using big data, identifies future challenges, and provides valuable information for Saskatchewan and North America decision-makers.


2021 ◽  
Author(s):  
Natalia Favila ◽  
David Madrigal-Trejo ◽  
Daniel Legorreta ◽  
Jazmín Sánchez-Pérez ◽  
Laura Espinosa-Asuar ◽  
...  

Understanding both global and local patterns in the structure and interplay of microbial communities has been a fundamental question in ecological research. In this paper, we present a python toolbox that combines two emerging techniques that have been proposed as useful when analyzing compositional microbial data. On one hand, we introduce a visualization module that incorporates the use of UMAP, a recent dimensionality reduction technique that focuses on local patterns, and HDBSCAN, a clustering technique based on density. On the other hand, we have included a module that runs an enhanced version of the SparCC code, sustaining larger datasets than before, and we couple this with network theory analyses to describe the resulting co-occurrence networks, including several novel analyses, such as structural balance metrics and a proposal to discover the underlying topology of a co-occurrence network. We validated the proposed toolbox on 1) a simple and well described biological network of kombucha, consisting of 48 ASVs, and 2) using simulated community networks with known topologies to show that we are able to discern between network topologies. Finally, we showcase the use of the MicNet toolbox on a large dataset from Archean Domes, consisting of more than 2,000 ASVs. Our toolbox is freely available as a github repository (https://github.com/Labevo/MicNetToolbox), and it is accompanied by a web dashboard (http://micnetapplb-1212130533.us-east-1.elb.amazonaws.com) that can be used in a simple and straightforward manner with relative abundance data.


2021 ◽  
Vol 54 (1) ◽  
Author(s):  
Peter J. Schmid

Dynamic mode decomposition (DMD) is a factorization and dimensionality reduction technique for data sequences. In its most common form, it processes high-dimensional sequential measurements, extracts coherent structures, isolates dynamic behavior, and reduces complex evolution processes to their dominant features and essential components. The decomposition is intimately related to Koopman analysis and, since its introduction, has spawned various extensions, generalizations, and improvements. It has been applied to numerical and experimental data sequences taken from simple to complex fluid systems and has also had an impact beyond fluid dynamics in, for example, video surveillance, epidemiology, neurobiology, and financial engineering. This review focuses on the practical aspects of DMD and its variants, as well as on its usage and characteristics as a quantitative tool for the analysis of complex fluid processes. Expected final online publication date for the Annual Review of Fluid Mechanics, Volume 54 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Abyansh Roy ◽  
Heena Dhawan ◽  
Sreedevi Upadhyayula ◽  
Hariprasad Kodamana

AbstractThe present work aims at studying five Indian coals and their solvent extracted clean coal products using Py-GCMS analysis and correlating the characterization data using theoretical principal component analysis. The pyrolysis products of the original coals and the super clean coals were classified as mono-, di- and tri-aromatics, while other prominent products that were obtained included cycloalkanes, n-alkanes, and alkenes ranging from C10–C29. The principal component analysis is a dimensionality reduction technique that reduced the number of input variables in the characterization dataset and gave inferences on the relative composition of constituent compounds and functional groups and structural insights based on scores and loading plots which were consistent with the experimental observations. ATR-FTIR studies confirmed the reduced concentration of ash in the super clean coals and the presence of aromatics. The Py-GCMS data and the ATR-FTIR spectra led to the conclusion that the super clean coals behaved similarly for both coking and non-coking coals with high aromatic concentrations as compared to the raw coal. Neyveli lignite super clean coal was found to show some structural similarity with the original coals, whereas the other super clean coals showed structural similarity within themselves but not with their original coal samples confirming the selective action of the e,N solvent in solubilizing the polycondensed aromatic structures in the coal samples.


2021 ◽  
Author(s):  
Jun Zhang ◽  
Wenzheng Wang ◽  
Qiuyu Wu ◽  
Liwei Hu

Abstract Aerodynamic shape optimization (ASO) based on computational fluid dynamics simulations is extremely computationally demanding because a search needs to be performed in a high-dimensional design space. One solution to this problem is to reduce the dimensionality of the design space for aircraft optimization. Hence, in this study, a dimensionality reduction technique is designed based on a generative adversarial network (GAN) to facilitate ASO. The novel GAN model is developed by combining the GAN with airfoil curve parameterization and can directly produce realistic and highly accurate airfoil curves from input data of aerodynamic shapes. In addition, the respective interpretable characteristic airfoil variables can be obtained by extracting latent codes with physical meaning, while reducing the dimensionality of the airfoil design space. The results of simulation experiments show that the proposed technique can significantly improve the optimization convergence rate of the ASO process.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Nandita Mishra ◽  
Mohamed Nurullah ◽  
Adel Sarea

Purpose International Integrated Reporting Council is in its 10th year of establishment and the integrated reporting (IR) framework released in 2013 was under revision in the year, 2020. Despite some significant developments in the past 10 years, the authors know very little about the perception of preparers towards IR. This paper aims to study the perception of the preparers and to understand the current status of the adoption of IR in India. Design/methodology/approach The top 500 companies from ET 500 list have been analysed. Banks and financial institutions (a total of 69) have been excluded for the study. Out of 431 companies, the status of IR has been checked by the questionnaire-based survey. Principle component analysis, a dimensionality reduction technique was performed on the responses to understand the important components impacting the perception of companies. Also, a case study methodology has been adopted to compare and analyse the IR trends in the manufacturing and industrial sector. Findings The result shows that the majority of companies have a positive opinion about IR and the three major components impacting their perception are – concise reporting, effective and transparent reporting and finally, better decision-making. Practical implications The result of this study will be useful for the policymakers, regulators, companies who have or will adopt IR. Paper gives a relevant view to academicians for assessing the effectiveness and perception of IR. Originality/value Very few studies can be found in India which focusses on analysing the perception of preparers towards the IR. Specially after the circular of SEBI in 2017, it becomes even more important to analyse the insight and awareness of the companies who have adopted IR. The paper is a timely and relevant contribution to the literature by providing insight over the opinion of preparers in India.


2021 ◽  
Vol 26 (3) ◽  
pp. 275-283
Author(s):  
Satla Shivaprasad ◽  
Manchala Sadanandam

Telugu language is one of the historical languages and belongs to the Dravidian family. It contains three dialects named Telangana, Costa Andhra, and Rayalaseema. This paper identifies the dialects of the Telugu language. MFCC, Delta MFCC, and Delta-Delta MFCC are applied with 39 feature vectors for the dialect identification. In addition, ZCR is also applied to identify the dialects. At last combined all the MFCC and ZCR features. A standard database is created to identify the dialects of the Telugu language. Different statistical methods like HMM and GMM are applied for the classification purpose. To improve the accuracy of the model, dimensionality reduction technique PCA is applied to reduce the number of features extracted from the speech signal and applied to models. In this work, with the application of dimensionality reduction, there is an increase in the accuracy of models observed.


2021 ◽  
Author(s):  
Matthew P Moore ◽  
Mark Wilcox ◽  
A Sarah Walker ◽  
David W Eyre

Comparative analysis of Clostridioides difficile whole-genome sequencing (WGS) data enables fine scaled investigation of transmission and is increasingly becoming part of routine surveillance. However, these analyses are constrained by the computational requirements of the large volumes of data involved. By decomposing WGS reads or assemblies into k-mers and using the dimensionality reduction technique MinHash, it is possible to rapidly approximate genomic distances without alignment. Here we assessed the performance of MinHash, as implemented by sourmash, in predicting single nucleotide differences between genomes (SNPs) and C. difficile ribotypes (RTs). For a set of 1,905 diverse C. difficile genomes (differing by 0-168,519 SNPs), using sourmash to screen for closely related genomes, at a sensitivity of 100% for pairs ≤10 SNPs, sourmash reduced the number of pairs from 1,813,560 overall to 161,934, i.e., by 91%, with a positive predictive value of 32% to correctly identify pairs ≤10 SNPs (maximum SNP distance 4,144). At a sensitivity of 95%, pairs were reduced by 94% to 108,266 and PPV increased to 45% (maximum SNP distance 1,009). Increasing the MinHash sketch size above 2000 produced minimal performance improvement. We also explored a MinHash similarity-based ribotype prediction method. Genomes with known ribotypes (n=3,937) were split into a training set (2,937) and test set (1,000) randomly. The training set was used to construct a sourmash index against which genomes from the test set were compared. If the closest 5 genomes in the index had the same ribotype this was taken to predict the searched genome's ribotype. Using our MinHash ribotype index, predicted ribotypes were correct in 780/1000 (78%) genomes, incorrect in 20 (2%), and indeterminant in 200 (20%). Relaxing the classifier to 4/5 closest matches with the same RT improved the correct predictions to 87%. Using MinHash it is possible to subsample C. difficile genome k-mer hashes and use them to approximate small genomic differences within minutes, significantly reducing the search space for further analysis.


Sign in / Sign up

Export Citation Format

Share Document