data pruning
Recently Published Documents


TOTAL DOCUMENTS

34
(FIVE YEARS 12)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Vol 17 (7) ◽  
pp. e1009144
Author(s):  
George Crowley ◽  
James Kim ◽  
Sophia Kwon ◽  
Rachel Lam ◽  
David J. Prezant ◽  
...  

Biomarkers predict World Trade Center-Lung Injury (WTC-LI); however, there remains unaddressed multicollinearity in our serum cytokines, chemokines, and high-throughput platform datasets used to phenotype WTC-disease. To address this concern, we used automated, machine-learning, high-dimensional data pruning, and validated identified biomarkers. The parent cohort consisted of male, never-smoking firefighters with WTC-LI (FEV1, %Pred< lower limit of normal (LLN); n = 100) and controls (n = 127) and had their biomarkers assessed. Cases and controls (n = 15/group) underwent untargeted metabolomics, then feature selection performed on metabolites, cytokines, chemokines, and clinical data. Cytokines, chemokines, and clinical biomarkers were validated in the non-overlapping parent-cohort via binary logistic regression with 5-fold cross validation. Random forests of metabolites (n = 580), clinical biomarkers (n = 5), and previously assayed cytokines, chemokines (n = 106) identified that the top 5% of biomarkers important to class separation included pigment epithelium-derived factor (PEDF), macrophage derived chemokine (MDC), systolic blood pressure, macrophage inflammatory protein-4 (MIP-4), growth-regulated oncogene protein (GRO), monocyte chemoattractant protein-1 (MCP-1), apolipoprotein-AII (Apo-AII), cell membrane metabolites (sphingolipids, phospholipids), and branched-chain amino acids. Validated models via confounder-adjusted (age on 9/11, BMI, exposure, and pre-9/11 FEV1, %Pred) binary logistic regression had AUCROC [0.90(0.84–0.96)]. Decreased PEDF and MIP-4, and increased Apo-AII were associated with increased odds of WTC-LI. Increased GRO, MCP-1, and simultaneously decreased MDC were associated with decreased odds of WTC-LI. In conclusion, automated data pruning identified novel WTC-LI biomarkers; performance was validated in an independent cohort. One biomarker—PEDF, an antiangiogenic agent—is a novel, predictive biomarker of particulate-matter-related lung disease. Other biomarkers—GRO, MCP-1, MDC, MIP-4—reveal immune cell involvement in WTC-LI pathogenesis. Findings of our automated biomarker identification warrant further investigation into these potential pharmacotherapy targets.


2021 ◽  
Vol 49 (4) ◽  
pp. 1-13
Author(s):  
Guang Li Li ◽  
Zi Wen Dong ◽  
Zi Kun Pi ◽  
Chen Luo

The factors giving rise to the safety leadership of senior managers in the coal mining industry are underexplored. We conducted 24 semistructured interviews with experienced senior managers in this industry. We used hermeneutic content analysis for the interview content, identified five safety leadership dimensions of senior mangers, and designed the Safety Leadership Scale in Coal Mines in China. Data from 845 participants in nine coal mines were analyzed with confirmatory factor analysis (CFA). We followed a two-step approach, involving data pruning and a second-order CFA, and validated a 21-item measure that shows good validity and reliability. The secondorder CFA results show that the measure yielded an acceptable fit. Our results can be used to estimate the safety leadership level among senior managers in China's coal mining industry.


2020 ◽  
Vol 414 ◽  
pp. 143-152
Author(s):  
Qi Li ◽  
Pengfei Li ◽  
Kezhi Mao ◽  
Edmond Yat-Man Lo

2020 ◽  
Vol 32 (21) ◽  
Author(s):  
Jaroslav Oľha ◽  
Jana Hozzová ◽  
Jan Fousek ◽  
Jiří Filipovič
Keyword(s):  

2020 ◽  
Vol 398 ◽  
pp. 45-54
Author(s):  
Yixing Li ◽  
Shuai Zhang ◽  
Xichuan Zhou ◽  
Fengbo Ren

Aerospace ◽  
2020 ◽  
Vol 7 (5) ◽  
pp. 63 ◽  
Author(s):  
Angelo Lerro ◽  
Alberto Brandl ◽  
Manuela Battipede ◽  
Piero Gili

Digital avionic solutions enable advanced flight control systems to be available also on smaller aircraft. One of the safety-critical segments is the air data system. Innovative architectures allow the use of synthetic sensors that can introduce significant technological and safety advances. The application to aerodynamic angles seems the most promising towards certified applications. In this area, the best procedures concerning the design of synthetic sensors are still an open question within the field. An example is given by the MIDAS project funded in the frame of Clean Sky 2. This paper proposes two data-driven methods that allow to improve performance over the entire flight envelope with particular attention to steady state flight conditions. The training set obtained is considerably undersized with consequent reduction of computational costs. These methods are validated with a real case and they will be used as part of the MIDAS life cycle. The first method, called Data-Driven Identification and Generation of Quasi-Steady States (DIGS), is based on the (i) identification of the lift curve of the aircraft; (ii) augmentation of the training set with artificial flight data points. DIGS’s main aim is to reduce the issue of unbalanced training set. The second method, called Similar Flight Test Data Pruning (SFDP), deals with data reduction based on the isolation of quasi-unique points. Results give an evidence of the validity of the methods for the MIDAS project that can be easily adopted for generic synthetic sensor design for flight control system applications.


Mathematics ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 286 ◽  
Author(s):  
Hamid Saadatfar ◽  
Samiyeh Khosravi ◽  
Javad Hassannataj Joloudari ◽  
Amir Mosavi ◽  
Shahaboddin Shamshirband

The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.


2019 ◽  
Vol 63 (11) ◽  
pp. 1668-1688
Author(s):  
Bojie Shen ◽  
Saiful Islam ◽  
David Taniar

Abstract Retrieval of arbitrary-shaped surrounding data objects has many potential applications in spatial databases including nearby arbitrary-shaped object-of-interests retrieval surrounding a user. In this paper, we propose directional zone concept to determine directional similarity among spatial data objects. Then, we propose a novel query, called direction-based spatial skyline (DSS), which retrieves non-dominated arbitrary-shaped surrounding data objects in spatial databases for a user. The proposed DSS query is rotationally invariant as well as fair. We develop efficient algorithms for processing DSS queries in spatial databases by designing novel data pruning techniques using R-Tree data indexing scheme. Finally, we demonstrate the effectiveness and efficiency of our approach by conducting extensive experiments with real datasets.


Sign in / Sign up

Export Citation Format

Share Document