Analysis of dual-stage filtration and validation of high-dimensional real process data for creation of machine learning algorithms

Author(s):  
Dusan Strusnik ◽  
Jurij Avsec
Author(s):  
Qianfan Wu ◽  
Adel Boueiz ◽  
Alican Bozkurt ◽  
Arya Masoomi ◽  
Allan Wang ◽  
...  

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.


Pain Medicine ◽  
2015 ◽  
Vol 16 (7) ◽  
pp. 1386-1401 ◽  
Author(s):  
Patrick J. Tighe ◽  
Christopher A. Harle ◽  
Robert W. Hurley ◽  
Haldun Aytug ◽  
Andre P. Boezaart ◽  
...  

Author(s):  
Miss. Archana Chaudahri ◽  
Mr. Nilesh Vani

Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258178
Author(s):  
Sam Tilsen ◽  
Seung-Eun Kim ◽  
Claire Wang

Measurements of the physical outputs of speech—vocal tract geometry and acoustic energy—are high-dimensional, but linguistic theories posit a low-dimensional set of categories such as phonemes and phrase types. How can it be determined when and where in high-dimensional articulatory and acoustic signals there is information related to theoretical categories? For a variety of reasons, it is problematic to directly quantify mutual information between hypothesized categories and signals. To address this issue, a multi-scale analysis method is proposed for localizing category-related information in an ensemble of speech signals using machine learning algorithms. By analyzing how classification accuracy on unseen data varies as the temporal extent of training input is systematically restricted, inferences can be drawn regarding the temporal distribution of category-related information. The method can also be used to investigate redundancy between subsets of signal dimensions. Two types of theoretical categories are examined in this paper: phonemic/gestural categories and syntactic relative clause categories. Moreover, two different machine learning algorithms were examined: linear discriminant analysis and neural networks with long short-term memory units. Both algorithms detected category-related information earlier and later in signals than would be expected given standard theoretical assumptions about when linguistic categories should influence speech. The neural network algorithm was able to identify category-related information to a greater extent than the discriminant analyses.


2018 ◽  
Author(s):  
Qianfan Wu ◽  
Adel Boueiz ◽  
Alican Bozkurt ◽  
Arya Masoomi ◽  
Allan Wang ◽  
...  

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xiangke Pu ◽  
Danni Deng ◽  
Chaoyi Chu ◽  
Tianle Zhou ◽  
Jianhong Liu

AbstractChronic HBV infection, the main cause of liver cirrhosis and hepatocellular carcinoma, has become a global health concern. Machine learning algorithms are particularly adept at analyzing medical phenomenon by capturing complex and nonlinear relationships in clinical data. Our study proposed a predictive model on the basis of 55 routine laboratory and clinical parameters by machine learning algorithms as a novel non-invasive method for liver fibrosis diagnosis. The model was further evaluated on the accuracy and rationality and proved to be highly accurate and efficient for the prediction of HBV-related fibrosis. In conclusion, we suggested a potential combination of high-dimensional clinical data and machine learning predictive algorithms for the liver fibrosis diagnosis.


2019 ◽  
Vol 8 (6) ◽  
pp. 248 ◽  
Author(s):  
Imane Bachri ◽  
Mustapha Hakdaoui ◽  
Mohammed Raji ◽  
Ana Cláudia Teodoro ◽  
Abdelmajid Benbouziane

Remote sensing data proved to be a valuable resource in a variety of earth science applications. Using high-dimensional data with advanced methods such as machine learning algorithms (MLAs), a sub-domain of artificial intelligence, enhances lithological mapping by spectral classification. Support vector machines (SVM) are one of the most popular MLAs with the ability to define non-linear decision boundaries in high-dimensional feature space by solving a quadratic optimization problem. This paper describes a supervised classification method considering SVM for lithological mapping in the region of Souk Arbaa Sahel belonging to the Sidi Ifni inlier, located in southern Morocco (Western Anti-Atlas). The aims of this study were (1) to refine the existing lithological map of this region, and (2) to evaluate and study the performance of the SVM approach by using combined spectral features of Landsat 8 OLI with digital elevation model (DEM) geomorphometric attributes of ALOS/PALSAR data. We performed an SVM classification method to allow the joint use of geomorphometric features and multispectral data of Landsat 8 OLI. The results indicated an overall classification accuracy of 85%. From the results obtained, we can conclude that the classification approach produced an image containing lithological units which easily identified formations such as silt, alluvium, limestone, dolomite, conglomerate, sandstone, rhyolite, andesite, granodiorite, quartzite, lutite, and ignimbrite, coinciding with those already existing on the published geological map. This result confirms the ability of SVM as a supervised learning algorithm for lithological mapping purposes.


2018 ◽  
Vol 18 (4) ◽  
pp. 60-72 ◽  
Author(s):  
Tobias MUELLER ◽  
Jonathan GREIPEL ◽  
Tobias WEBER ◽  
Robert H. SCHMITT

To detect root causes of non-conforming parts - parts outside the tolerance limits - in production processes a high level of expert knowledge is necessary. This results in high costs and a low flexibility in the choice of personnel to perform analyses. In modern production a vast amount of process data is available and machine learning algorithms exist which model processes empirically. Aim of this paper is to introduce a procedure for an automated root cause analysis based on machine learning algorithms to reduce the costs and the necessary expert knowledge. Therefore, a decision tree algorithm is chosen. A procedure for its application in an automated root cause analysis is presented and simulations to prove its applicability are conducted. In this paper influences affecting the success of detection are identified and simulated e.g. the necessary amount of data dependent on the amount of variables, the ratio between categories of non-conformities and OK parts as well as detectable root causes. The simulations are based on a regression model to determine the roughness of drilling holes. They prove the applicability of machine learning algorithms for an automated root cause analysis and indicate which influences have to be considered in real scenarios.


Sign in / Sign up

Export Citation Format

Share Document