Analysis of dual-stage filtration and validation of high-dimensional real process data for creation of machine learning algorithms

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.

Download Full-text

Teaching a Machine to Feel Postoperative Pain: Combining High-Dimensional Clinical Data with Machine Learning Algorithms to Forecast Acute Postoperative Pain

Pain Medicine ◽

10.1111/pme.12713 ◽

2015 ◽

Vol 16 (7) ◽

pp. 1386-1401 ◽

Cited By ~ 22

Author(s):

Patrick J. Tighe ◽

Christopher A. Harle ◽

Robert W. Hurley ◽

Haldun Aytug ◽

Andre P. Boezaart ◽

...

Keyword(s):

Machine Learning ◽

Postoperative Pain ◽

Clinical Data ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Acute Postoperative Pain

Download Full-text

Survey on Clustering High-Dimensional data using Hubness

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195671 ◽

2020 ◽

pp. 01-07

Author(s):

Miss. Archana Chaudahri ◽

Mr. Nilesh Vani

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Learning Algorithms ◽

High Dimensional Data ◽

Algorithm Design ◽

Machine Learning Algorithms ◽

High Dimensional ◽

K Nearest Neighbor ◽

Weighted Voting ◽

Conventional Machine

Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion.

Download Full-text

High-dimensional sample selection models: Machine-learning algorithms in the Heckman two-step

10.14264/3105ff5 ◽

2018 ◽

Author(s):

Tharani Ransimala Weerasooriya

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Sample Selection ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Selection Models

Download Full-text

Localizing category-related information in speech with multi-scale analyses

PLoS ONE ◽

10.1371/journal.pone.0258178 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0258178

Author(s):

Sam Tilsen ◽

Seung-Eun Kim ◽

Claire Wang

Keyword(s):

Machine Learning ◽

Vocal Tract ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Acoustic Energy ◽

High Dimensional ◽

Linear Discriminant ◽

Multi Scale ◽

Related Information ◽

Unseen Data

Measurements of the physical outputs of speech—vocal tract geometry and acoustic energy—are high-dimensional, but linguistic theories posit a low-dimensional set of categories such as phonemes and phrase types. How can it be determined when and where in high-dimensional articulatory and acoustic signals there is information related to theoretical categories? For a variety of reasons, it is problematic to directly quantify mutual information between hypothesized categories and signals. To address this issue, a multi-scale analysis method is proposed for localizing category-related information in an ensemble of speech signals using machine learning algorithms. By analyzing how classification accuracy on unseen data varies as the temporal extent of training input is systematically restricted, inferences can be drawn regarding the temporal distribution of category-related information. The method can also be used to investigate redundancy between subsets of signal dimensions. Two types of theoretical categories are examined in this paper: phonemic/gestural categories and syntactic relative clause categories. Moreover, two different machine learning algorithms were examined: linear discriminant analysis and neural networks with long short-term memory units. Both algorithms detected category-related information earlier and later in signals than would be expected given standard theoretical assumptions about when linguistic categories should influence speech. The neural network algorithm was able to identify category-related information to a greater extent than the discriminant analyses.

Download Full-text

Deep learning for predicting disease status using genomic data

10.7287/peerj.preprints.27123v1 ◽

2018 ◽

Author(s):

Qianfan Wu ◽

Adel Boueiz ◽

Alican Bozkurt ◽

Arya Masoomi ◽

Allan Wang ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Rapid Development ◽

Learning Algorithms ◽

Genomic Data ◽

Disease Status ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Learning Approach ◽

Low Dimensional

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.

Download Full-text

High-dimensional hepatopath data analysis by machine learning for predicting HBV-related fibrosis

Scientific Reports ◽

10.1038/s41598-021-84556-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Xiangke Pu ◽

Danni Deng ◽

Chaoyi Chu ◽

Tianle Zhou ◽

Jianhong Liu

Keyword(s):

Machine Learning ◽

Liver Fibrosis ◽

Clinical Data ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Health Concern ◽

High Dimensional ◽

Non Invasive ◽

Predictive Algorithms ◽

Invasive Method

AbstractChronic HBV infection, the main cause of liver cirrhosis and hepatocellular carcinoma, has become a global health concern. Machine learning algorithms are particularly adept at analyzing medical phenomenon by capturing complex and nonlinear relationships in clinical data. Our study proposed a predictive model on the basis of 55 routine laboratory and clinical parameters by machine learning algorithms as a novel non-invasive method for liver fibrosis diagnosis. The model was further evaluated on the accuracy and rationality and proved to be highly accurate and efficient for the prediction of HBV-related fibrosis. In conclusion, we suggested a potential combination of high-dimensional clinical data and machine learning predictive algorithms for the liver fibrosis diagnosis.

Download Full-text

Machine Learning Algorithms for Automatic Lithological Mapping Using Remote Sensing Data: A Case Study from Souk Arbaa Sahel, Sidi Ifni Inlier, Western Anti-Atlas, Morocco

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8060248 ◽

2019 ◽

Vol 8 (6) ◽

pp. 248 ◽

Cited By ~ 7

Author(s):

Imane Bachri ◽

Mustapha Hakdaoui ◽

Mohammed Raji ◽

Ana Cláudia Teodoro ◽

Abdelmajid Benbouziane

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Landsat 8 ◽

Landsat 8 Oli ◽

Lithological Mapping ◽

Sensing Data

Remote sensing data proved to be a valuable resource in a variety of earth science applications. Using high-dimensional data with advanced methods such as machine learning algorithms (MLAs), a sub-domain of artificial intelligence, enhances lithological mapping by spectral classification. Support vector machines (SVM) are one of the most popular MLAs with the ability to define non-linear decision boundaries in high-dimensional feature space by solving a quadratic optimization problem. This paper describes a supervised classification method considering SVM for lithological mapping in the region of Souk Arbaa Sahel belonging to the Sidi Ifni inlier, located in southern Morocco (Western Anti-Atlas). The aims of this study were (1) to refine the existing lithological map of this region, and (2) to evaluate and study the performance of the SVM approach by using combined spectral features of Landsat 8 OLI with digital elevation model (DEM) geomorphometric attributes of ALOS/PALSAR data. We performed an SVM classification method to allow the joint use of geomorphometric features and multispectral data of Landsat 8 OLI. The results indicated an overall classification accuracy of 85%. From the results obtained, we can conclude that the classification approach produced an image containing lithological units which easily identified formations such as silt, alluvium, limestone, dolomite, conglomerate, sandstone, rhyolite, andesite, granodiorite, quartzite, lutite, and ignimbrite, coinciding with those already existing on the published geological map. This result confirms the ability of SVM as a supervised learning algorithm for lithological mapping purposes.

Download Full-text

Nonparametric Variable Selection Using Machine Learning Algorithms in High Dimensional (Large P, Small N) Biomedical Applications

Biomedical Engineering, Trends in Electronics, Communications and Software ◽

10.5772/13541 ◽

2011 ◽

Author(s):

Christina M.R.

Keyword(s):

Machine Learning ◽

Variable Selection ◽

Biomedical Applications ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Large P Small N ◽

Small N

Download Full-text

AUTOMATED ROOT CAUSE ANALYSIS OF NON-CONFORMITIES WITH MACHINE LEARNING ALGORITHMS

Journal of Machine Engineering ◽

10.5604/01.3001.0012.7633 ◽

2018 ◽

Vol 18 (4) ◽

pp. 60-72 ◽

Cited By ~ 1

Author(s):

Tobias MUELLER ◽

Jonathan GREIPEL ◽

Tobias WEBER ◽

Robert H. SCHMITT

Keyword(s):

Machine Learning ◽

Expert Knowledge ◽

Learning Algorithms ◽

Root Cause Analysis ◽

Machine Learning Algorithms ◽

Decision Tree Algorithm ◽

Process Data ◽

Cause Analysis ◽

Root Cause ◽

High Level

To detect root causes of non-conforming parts - parts outside the tolerance limits - in production processes a high level of expert knowledge is necessary. This results in high costs and a low flexibility in the choice of personnel to perform analyses. In modern production a vast amount of process data is available and machine learning algorithms exist which model processes empirically. Aim of this paper is to introduce a procedure for an automated root cause analysis based on machine learning algorithms to reduce the costs and the necessary expert knowledge. Therefore, a decision tree algorithm is chosen. A procedure for its application in an automated root cause analysis is presented and simulations to prove its applicability are conducted. In this paper influences affecting the success of detection are identified and simulated e.g. the necessary amount of data dependent on the amount of variables, the ratio between categories of non-conformities and OK parts as well as detectable root causes. The simulations are based on a regression model to determine the roughness of drilling holes. They prove the applicability of machine learning algorithms for an automated root cause analysis and indicate which influences have to be considered in real scenarios.

Download Full-text