scholarly journals 314Data-adaptive methods for high-dimensional mediation analysis: Application to a randomised trial of tuberculosis vaccination

2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Margarita Moreno-Betancur ◽  
Nicole L Messina ◽  
Kaya Gardiner ◽  
Nigel Curtis ◽  
Stijn Vansteelandt

Abstract Focus of Presentation Statistical methods for causal mediation analysis are useful for understanding the pathways by which a certain treatment or exposure impacts health outcomes. Existing methods necessitate modelling of the distribution of the mediators, which quickly becomes infeasible when mediators are high-dimensional (e.g., biomarkers). We propose novel data-adaptive methods for estimating the indirect effect of a randomised treatment that acts via a pathway represented by a high-dimensional set of measurements. This work was motivated by the Melbourne Infant Study: BCG for Allergy and Infection Reduction (MIS BAIR), a randomised controlled trial investigating the effect of neonatal tuberculosis vaccination on clinical allergy and infection outcomes, and its mechanisms of action. Findings The proposed methods are doubly robust, which allows us to achieve (uniformly) valid statistical inference, even when machine learning algorithms are used for the two required models. We illustrate these in the context of the MIS BAIR study, investigating the mediating role of immune pathways represented by a high-dimensional vector of cytokine responses under various stimulants. We confirm adequate performance of the proposed methods in an extensive simulation study. Conclusions/Implications The proposed methods provide a feasible and flexible analytic strategy for examining high-dimensional mediators in randomised controlled trials. Key messages Data-adaptive methods for mediation analysis are desirable in the context of high-dimensional mediators, such as biomarkers. We propose novel doubly robust methods, which enable valid statistical inference when using machine learning algorithms for estimation.

Author(s):  
Qianfan Wu ◽  
Adel Boueiz ◽  
Alican Bozkurt ◽  
Arya Masoomi ◽  
Allan Wang ◽  
...  

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.


Pain Medicine ◽  
2015 ◽  
Vol 16 (7) ◽  
pp. 1386-1401 ◽  
Author(s):  
Patrick J. Tighe ◽  
Christopher A. Harle ◽  
Robert W. Hurley ◽  
Haldun Aytug ◽  
Andre P. Boezaart ◽  
...  

Author(s):  
Miss. Archana Chaudahri ◽  
Mr. Nilesh Vani

Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258178
Author(s):  
Sam Tilsen ◽  
Seung-Eun Kim ◽  
Claire Wang

Measurements of the physical outputs of speech—vocal tract geometry and acoustic energy—are high-dimensional, but linguistic theories posit a low-dimensional set of categories such as phonemes and phrase types. How can it be determined when and where in high-dimensional articulatory and acoustic signals there is information related to theoretical categories? For a variety of reasons, it is problematic to directly quantify mutual information between hypothesized categories and signals. To address this issue, a multi-scale analysis method is proposed for localizing category-related information in an ensemble of speech signals using machine learning algorithms. By analyzing how classification accuracy on unseen data varies as the temporal extent of training input is systematically restricted, inferences can be drawn regarding the temporal distribution of category-related information. The method can also be used to investigate redundancy between subsets of signal dimensions. Two types of theoretical categories are examined in this paper: phonemic/gestural categories and syntactic relative clause categories. Moreover, two different machine learning algorithms were examined: linear discriminant analysis and neural networks with long short-term memory units. Both algorithms detected category-related information earlier and later in signals than would be expected given standard theoretical assumptions about when linguistic categories should influence speech. The neural network algorithm was able to identify category-related information to a greater extent than the discriminant analyses.


2018 ◽  
Author(s):  
Qianfan Wu ◽  
Adel Boueiz ◽  
Alican Bozkurt ◽  
Arya Masoomi ◽  
Allan Wang ◽  
...  

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xiangke Pu ◽  
Danni Deng ◽  
Chaoyi Chu ◽  
Tianle Zhou ◽  
Jianhong Liu

AbstractChronic HBV infection, the main cause of liver cirrhosis and hepatocellular carcinoma, has become a global health concern. Machine learning algorithms are particularly adept at analyzing medical phenomenon by capturing complex and nonlinear relationships in clinical data. Our study proposed a predictive model on the basis of 55 routine laboratory and clinical parameters by machine learning algorithms as a novel non-invasive method for liver fibrosis diagnosis. The model was further evaluated on the accuracy and rationality and proved to be highly accurate and efficient for the prediction of HBV-related fibrosis. In conclusion, we suggested a potential combination of high-dimensional clinical data and machine learning predictive algorithms for the liver fibrosis diagnosis.


2019 ◽  
Vol 8 (6) ◽  
pp. 248 ◽  
Author(s):  
Imane Bachri ◽  
Mustapha Hakdaoui ◽  
Mohammed Raji ◽  
Ana Cláudia Teodoro ◽  
Abdelmajid Benbouziane

Remote sensing data proved to be a valuable resource in a variety of earth science applications. Using high-dimensional data with advanced methods such as machine learning algorithms (MLAs), a sub-domain of artificial intelligence, enhances lithological mapping by spectral classification. Support vector machines (SVM) are one of the most popular MLAs with the ability to define non-linear decision boundaries in high-dimensional feature space by solving a quadratic optimization problem. This paper describes a supervised classification method considering SVM for lithological mapping in the region of Souk Arbaa Sahel belonging to the Sidi Ifni inlier, located in southern Morocco (Western Anti-Atlas). The aims of this study were (1) to refine the existing lithological map of this region, and (2) to evaluate and study the performance of the SVM approach by using combined spectral features of Landsat 8 OLI with digital elevation model (DEM) geomorphometric attributes of ALOS/PALSAR data. We performed an SVM classification method to allow the joint use of geomorphometric features and multispectral data of Landsat 8 OLI. The results indicated an overall classification accuracy of 85%. From the results obtained, we can conclude that the classification approach produced an image containing lithological units which easily identified formations such as silt, alluvium, limestone, dolomite, conglomerate, sandstone, rhyolite, andesite, granodiorite, quartzite, lutite, and ignimbrite, coinciding with those already existing on the published geological map. This result confirms the ability of SVM as a supervised learning algorithm for lithological mapping purposes.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Nelson Filipe Costa ◽  
Omar Yasser ◽  
Aidar Sultanov ◽  
Gheorghe Sorin Paraoanu

AbstractQuantum phase estimation is a paradigmatic problem in quantum sensing and metrology. Here we show that adaptive methods based on classical machine learning algorithms can be used to enhance the precision of quantum phase estimation when noisy non-entangled qubits are used as sensors. We employ the Differential Evolution (DE) and Particle Swarm Optimization (PSO) algorithms to this task and we identify the optimal feedback policies which minimize the Holevo variance. We benchmark these schemes with respect to scenarios that include Gaussian and Random Telegraph fluctuations as well as reduced Ramsey-fringe visibility due to decoherence. We discuss their robustness against noise in connection with real experimental setups such as Mach–Zehnder interferometry with optical photons and Ramsey interferometry in trapped ions, superconducting qubits and nitrogen-vacancy (NV) centers in diamond.


Sign in / Sign up

Export Citation Format

Share Document