maximal information coefficient
Recently Published Documents


TOTAL DOCUMENTS

106
(FIVE YEARS 51)

H-INDEX

13
(FIVE YEARS 3)

Energies ◽  
2022 ◽  
Vol 15 (2) ◽  
pp. 605
Author(s):  
Peng Chen ◽  
Yumin Deng ◽  
Xuegui Zhang ◽  
Li Ma ◽  
Yaoliang Yan ◽  
...  

The harsh operating environment aggravates the degradation of pumped storage units (PSUs). Degradation trend prediction (DTP) provides important support for the condition-based maintenance of PSUs. However, the complexity of the performance degradation index (PDI) sequence poses a severe challenge of the reliability of DTP. Additionally, the accuracy of healthy model is often ignored, resulting in an unconvincing PDI. To solve these problems, a combined DTP model that integrates the maximal information coefficient (MIC), light gradient boosting machine (LGBM), variational mode decomposition (VMD) and gated recurrent unit (GRU) is proposed. Firstly, MIC-LGBM is utilized to generate a high-precision healthy model. MIC is applied to select the working parameters with the most relevance, then the LGBM is utilized to construct the healthy model. Afterwards, a performance degradation index (PDI) is generated based on the LGBM healthy model and monitoring data. Finally, the VMD-GRU prediction model is designed to achieve precise DTP under the complex PDI sequence. The proposed model is verified by applying it to a PSU located in Zhejiang province, China. The results reveal that the proposed model achieves the highest precision healthy model and the best prediction performance compared with other comparative models. The absolute average (|AVG|) and standard deviation (STD) of fitting errors are reduced to 0.0275 and 0.9245, and the RMSE, MAE, and R2 are 0.00395, 0.0032, and 0.9226 respectively, on average for two operating conditions.


2021 ◽  
Vol 12 (1) ◽  
pp. 23
Author(s):  
Jiahao Yu ◽  
Rongshun Pan ◽  
Yongman Zhao

Accurate quality prediction can find and eliminate quality hazards. It is difficult to construct an accurate quality mathematical model for the production of small samples with high dimensionality due to the influence of quality characteristics and the complex mechanism of action. In addition, overfitting scenarios are prone to occur in high-dimensional, small-sample industrial product quality prediction. This paper proposes an ensemble learning and measurement model based on stacking and selects eight algorithms as the base learning model. The maximal information coefficient (MIC) is used to obtain the correlation between the base learning models. Models with low correlation and strong predictive power were chosen to build stacking ensemble models, which effectively avoids overfitting and obtains better predictive performance. To improve the prediction performance as the optimization goal, in the data preprocessing stage, boxplots, ordinary least squares (OLS), and multivariate imputation by chained equations (MICE) are used to detect and replace outliers. The CatBoost algorithm is used to construct combined features. Strong combination features were selected to construct a new feature set. Concrete slump data from the University of California Irvine (UCI) machine learning library were used to conduct comprehensive verification experiments. The experimental results show that, compared with the optimal single model, the minimum correlation stacking ensemble learning model has higher precision and stronger robustness, and a new method is provided to guarantee the accuracy of final product quality prediction.


2021 ◽  
Author(s):  
Yijun Ran ◽  
Tianyu Liu ◽  
Tao Jia ◽  
Xiao-Ke Xu

Abstract Network information mining is the study of the network topology, which may answer a large number of application-based questions towards the structural evolution and the function of a real system. The question can be related to how the real system evolves or how individuals interact with each other in social networks. Although the evolution of the real system may seem to be found regularly, capturing patterns on the whole process of evolution is not trivial. Link prediction is one of the most important technologies in network information mining, which can help us understand the evolution mechanism of real-life network. Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures. Currently, widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures. However, these algorithms on highly sparse or long-path networks have poor performance. Here, we propose a new index that is associated with the principles of Structural Equivalence and Shortest Path Length (SESPL) to estimate the likelihood of link existence in long-path networks. Through 548 real networks test, we find that SESPL is more effective and efficient than other similarity-based predictors in long-path networks. Meanwhile, we also exploit the performance of SESPL predictor and of embedding-based approaches via machine learning techniques. The results show that the performance of SESPL can achieve a gain of 44.09% over GraphWave and 7.93% over Node2vec. Finally, according to the matrix of Maximal Information Coefficient (MIC) between all the similarity-based predictors, SESPL is a new independent feature in the space of traditional similarity features.


2021 ◽  
Vol 13 (22) ◽  
pp. 4631
Author(s):  
Xiaodong Xu ◽  
Hui Lin ◽  
Zhaohua Liu ◽  
Zilin Ye ◽  
Xinyu Li ◽  
...  

Remote sensing technology is becoming mainstream for mapping the growing stem volume (GSV) and overcoming the shortage of traditional labor-consumed approaches. Naturally, the GSV estimation accuracy utilizing remote sensing imagery is highly related to the variable selection methods and algorithms. Thus, to reduce the uncertainty caused by variables and models, this paper proposes a combined strategy involving improved variable selection with the collinearity test and the secondary ensemble algorithm to obtain the optimally combined variables and extract a reliable GSV from several base models. Our study extracted four types of alternative variables from the Sentinel-1A and Sentinel-2A image datasets, including vegetation indices, spectral reflectance variables, backscattering coefficients, and texture features. Then, an improved variable selection criterion with the collinearity test was developed and evaluated based on machine learning algorithms (classification and regression trees (CART), k-nearest neighbors (KNN), support vector regression (SVR), and artificial neural network (ANN)) considering the correlation between variables and GSV (with random forest (RF), distance correlation coefficient (DC), maximal information coefficient (MIC), and Pearson correlation coefficient (PCC) as evaluation metrics), and the collinearity among the variables. Additionally, we proposed a secondary ensemble with an improved weighted average approach (IWA) to estimate the reliable forest GSV using the first ensemble models constructed by Bagging and AdaBoost. The experimental results demonstrated that the proposed variable selection criterion efficiently obtained the optimal combined variable set without affecting the forest GSV mapping accuracy. Specifically, considering the first ensemble, the relative root mean square error (rRMSE) values ranged from 21.91% to 30.28% for Bagging and 23.33% to 31.49% for AdaBoost, respectively. After the secondary ensemble involving the IWA, the rRMSE values ranged from 18.89% to 21.34%. Furthermore, the variance of the GSV mapped by the secondary ensemble with various ranking methods was significantly reduced. The results prove that the proposed combined strategy has great potential to reduce the GSV mapping uncertainty imposed by current variable selection approaches and algorithms.


Author(s):  
Vincenzo Catrambone ◽  
Riccardo Barbieri ◽  
Herwig Wendt ◽  
Patrice Abry ◽  
Gaetano Valenza

The study of functional brain–heart interplay has provided meaningful insights in cardiology and neuroscience. Regarding biosignal processing, this interplay involves predominantly neural and heartbeat linear dynamics expressed via time and frequency domain-related features. However, the dynamics of central and autonomous nervous systems show nonlinear and multifractal behaviours, and the extent to which this behaviour influences brain–heart interactions is currently unknown. Here, we report a novel signal processing framework aimed at quantifying nonlinear functional brain–heart interplay in the non-Gaussian and multifractal domains that combines electroencephalography (EEG) and heart rate variability series. This framework relies on a maximal information coefficient analysis between nonlinear multiscale features derived from EEG spectra and from an inhomogeneous point-process model for heartbeat dynamics. Experimental results were gathered from 24 healthy volunteers during a resting state and a cold pressor test, revealing that synchronous changes between brain and heartbeat multifractal spectra occur at higher EEG frequency bands and through nonlinear/complex cardiovascular control. We conclude that significant bodily, sympathovagal changes such as those elicited by cold-pressure stimuli affect the functional brain–heart interplay beyond second-order statistics, thus extending it to multifractal dynamics. These results provide a platform to define novel nervous-system-targeted biomarkers. This article is part of the theme issue ‘Advanced computation in cardiovascular physiology: new challenges and opportunities’.


2021 ◽  
Vol 15 ◽  
Author(s):  
Tie Liang ◽  
Qingyu Zhang ◽  
Lei Hong ◽  
Xiaoguang Liu ◽  
Bin Dong ◽  
...  

As a common neurophysiological phenomenon, voluntary muscle fatigue is accompanied by changes in both the central nervous system and peripheral muscles. Considering the effectiveness of the muscle network and the functional corticomuscular coupling (FCMC) in analyzing motor function, muscle fatigue can be analyzed by quantitating the intermuscular coupling and corticomuscular coupling. However, existing coherence-based research on muscle fatigue are limited by the inability of the coherence algorithm to identify the coupling direction, which cannot further reveal the underlying neural mechanism of muscle fatigue. To address this problem, we applied the time-delayed maximal information coefficient (TDMIC) method to quantitate the directional informational interaction in the muscle network and FCMC during a right-hand stabilized grip task. Eight healthy subjects were recruited to the present study. For the muscle networks, the beta-band information flow increased significantly due to muscle fatigue, and the information flow between the synergist muscles were stronger than that between the synergist and antagonist muscles. The information flow in the muscle network mainly flows to flexor digitorum superficialis (FDS), flexor carpi ulnar (FCU), and brachioradialis (BR). For the FCMC, muscle fatigue caused a significant decrease in the beta- and gamma-band bidirectional information flow. Further analysis revealed that the beta-band information flow was significantly stronger in the descending direction [electroencephalogram (EEG) to surface electromyography (sEMG)] than that in the ascending direction (sEMG to EEG) during pre-fatigue tasks. After muscle fatigue, the beta-band information flow in the ascending direction was significantly stronger than that in the descending direction. The present study demonstrates the influence of muscle fatigue on information flow in muscle networks and FCMC. We proposes that beta-band intermuscular and corticomuscular informational interaction plays an adjusting role in autonomous movement completion under muscle fatigue. Directed information flow analysis can be used as an effective method to explore the neural mechanism of muscle fatigue on the macroscopic scale.


Sign in / Sign up

Export Citation Format

Share Document