Densely Feature Fusion Based on Convolutional Neural Networks for Motor Imagery EEG Classification

Abstract In this work, we argue that the implications of pseudorandom and quantum-random number generators (PRNG and QRNG) inexplicably affect the performances and behaviours of various machine learning models that require a random input. These implications are yet to be explored in soft computing until this work. We use a CPU and a QPU to generate random numbers for multiple machine learning techniques. Random numbers are employed in the random initial weight distributions of dense and convolutional neural networks, in which results show a profound difference in learning patterns for the two. In 50 dense neural networks (25 PRNG/25 QRNG), QRNG increases over PRNG for accent classification at + 0.1%, and QRNG exceeded PRNG for mental state EEG classification by + 2.82%. In 50 convolutional neural networks (25 PRNG/25 QRNG), the MNIST and CIFAR-10 problems are benchmarked, and in MNIST the QRNG experiences a higher starting accuracy than the PRNG but ultimately only exceeds it by 0.02%. In CIFAR-10, the QRNG outperforms PRNG by + 0.92%. The n-random split of a Random Tree is enhanced towards and new Quantum Random Tree (QRT) model, which has differing classification abilities to its classical counterpart, 200 trees are trained and compared (100 PRNG/100 QRNG). Using the accent and EEG classification data sets, a QRT seemed inferior to a RT as it performed on average worse by − 0.12%. This pattern is also seen in the EEG classification problem, where a QRT performs worse than a RT by − 0.28%. Finally, the QRT is ensembled into a Quantum Random Forest (QRF), which also has a noticeable effect when compared to the standard Random Forest (RF). Ten to 100 ensembles of trees are benchmarked for the accent and EEG classification problems. In accent classification, the best RF (100 RT) outperforms the best QRF (100 QRF) by 0.14% accuracy. In EEG classification, the best RF (100 RT) outperforms the best QRF (100 QRT) by 0.08% but is extremely more complex, requiring twice the amount of trees in committee. All differences are observed to be situationally positive or negative and thus are likely data dependent in their observed functional behaviour.

Download Full-text

BE-FNet: 3D Bounding Box Estimation Feature Pyramid Network for Accurate and Efficient Maxillary Sinus Segmentation

Mathematical Problems in Engineering ◽

10.1155/2020/5689301 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16

Author(s):

Zhuofu Deng ◽

Binbin Wang ◽

Zhiliang Zhu

Keyword(s):

Neural Networks ◽

Maxillary Sinus ◽

Convolutional Neural Networks ◽

Feature Fusion ◽

Medical Images ◽

Medical Image Segmentation ◽

3D Segmentation ◽

Deep Convolutional Neural Networks ◽

Bounding Box ◽

Proposed Model

Maxillary sinus segmentation plays an important role in the choice of therapeutic strategies for nasal disease and treatment monitoring. Difficulties in traditional approaches deal with extremely heterogeneous intensity caused by lesions, abnormal anatomy structures, and blurring boundaries of cavity. 2D and 3D deep convolutional neural networks have grown popular in medical image segmentation due to utilization of large labeled datasets to learn discriminative features. However, for 3D segmentation in medical images, 2D networks are not competent in extracting more significant spacial features, and 3D ones suffer from unbearable burden of computation, which results in great challenges to maxillary sinus segmentation. In this paper, we propose a deep neural network with an end-to-end manner to generalize a fully automatic 3D segmentation. At first, our proposed model serves a symmetrical encoder-decoder architecture for multitask of bounding box estimation and in-region 3D segmentation, which cannot reduce excessive computation requirements but eliminate false positives remarkably, promoting 3D segmentation applied in 3D convolutional neural networks. In addition, an overestimation strategy is presented to avoid overfitting phenomena in conventional multitask networks. Meanwhile, we introduce residual dense blocks to increase the depth of the proposed network and attention excitation mechanism to improve the performance of bounding box estimation, both of which bring little influence to computation cost. Especially, the structure of multilevel feature fusion in the pyramid network strengthens the ability of identification to global and local discriminative features in foreground and background achieving more advanced segmentation results. At last, to address problems of blurring boundary and class imbalance in medical images, a hybrid loss function is designed for multiple tasks. To illustrate the strength of our proposed model, we evaluated it against the state-of-the-art methods. Our model performed better significantly with an average Dice 0.947±0.031, VOE 10.23±5.29, and ASD 2.86±2.11, respectively, which denotes a promising technique with strong robust in practice.

Download Full-text

Lipreading Architecture Based on Multiple Convolutional Neural Networks for Sentence-Level Visual Speech Recognition

Sensors ◽

10.3390/s22010072 ◽

2021 ◽

Vol 22 (1) ◽

pp. 72

Author(s):

Sanghun Jeon ◽

Ahmed Elsharkawy ◽

Mun Sang Kim

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Convolutional Neural Networks ◽

Visual Information ◽

Feature Fusion ◽

Error Rates ◽

Visual Speech ◽

Technical Limitation ◽

Visual Speech Recognition ◽

3D Cnn

In visual speech recognition (VSR), speech is transcribed using only visual information to interpret tongue and teeth movements. Recently, deep learning has shown outstanding performance in VSR, with accuracy exceeding that of lipreaders on benchmark datasets. However, several problems still exist when using VSR systems. A major challenge is the distinction of words with similar pronunciation, called homophones; these lead to word ambiguity. Another technical limitation of traditional VSR systems is that visual information does not provide sufficient data for learning words such as “a”, “an”, “eight”, and “bin” because their lengths are shorter than 0.02 s. This report proposes a novel lipreading architecture that combines three different convolutional neural networks (CNNs; a 3D CNN, a densely connected 3D CNN, and a multi-layer feature fusion 3D CNN), which are followed by a two-layer bi-directional gated recurrent unit. The entire network was trained using connectionist temporal classification. The results of the standard automatic speech recognition evaluation metrics show that the proposed architecture reduced the character and word error rates of the baseline model by 5.681% and 11.282%, respectively, for the unseen-speaker dataset. Our proposed architecture exhibits improved performance even when visual ambiguity arises, thereby increasing VSR reliability for practical applications.

Download Full-text