pDeep3: Towards More Accurate Spectrum Prediction with Fast Few-Shot Learning

AbstractSpectrum prediction using deep learning has attracted a lot of attention in recent years. Although existing deep learning methods have dramatically increased the prediction accuracy, there is still considerable space for improvement, which is presently limited by the difference of fragmentation types or instrument settings. In this work, we use the few-shot learning method to fit the data online to make up for the shortcoming. The method is evaluated using ten datasets, where the instruments includes Velos, QE, Lumos, and Sciex, with collision energies being differently set. Experimental results show that few-shot learning can achieve higher prediction accuracy with almost negligible computing resources. For example, on the dataset from a untrained instrument Sciex-6600, within about 10 seconds, the prediction accuracy is increased from 69.7% to 86.4%; on the CID (collision-induced dissociation) dataset, the prediction accuracy of the model trained by HCD (higher energy collision dissociation) spectra is increased from 48.0% to 83.9%. It is also shown that, the method is not critical to data quality and is sufficiently efficient to fill the accuracy gap. The source code of pDeep3 is available at http://pfind.ict.ac.cn/software/pdeep3.

Download Full-text

The Development of an Identification Photo Booth System based on a Deep Learning Automatic Image Capturing Method

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2021.65.2.020403 ◽

2020 ◽

Author(s):

Yu-Xiang Zhao ◽

Yi-Zeng Hsieh ◽

Shih-Syun Lin

Keyword(s):

Deep Learning ◽

Experimental Results ◽

Automatic Annotation ◽

Learning Method ◽

Facial Region ◽

Facial Landmarks ◽

Image Capturing ◽

The Face ◽

Facial Contours

With advances in technology, photo booths equipped with automatic capturing systems have gradually replaced the identification (ID) photo service provided by photography studios, thereby enabling consumers to save a considerable amount of time and money. Common automatic capturing systems employ text and voice instructions to guide users in capturing their ID photos; however, the capturing results may not conform to ID photo specifications. To address this issue, this study proposes an ID photo capturing algorithm that can automatically detect facial contours and adjust the size of captured images. The authors adopted a deep learning method (You Only Look Once) to detect the face and applied a semi-automatic annotation technique of facial landmarks to find the lip and chin regions from the facial region. In the experiments, subjects were seated at various distances and heights for testing the performance of the proposed algorithm. The experimental results show that the proposed algorithm can effectively and accurately capture ID photos that satisfy the required specifications.

Download Full-text

A two point machine learning method for spatial prediction for soil : overcoming the spatially heterogeneous distribution and relationship of soil heavy metal concentration

10.5194/ismc2021-37 ◽

2021 ◽

Author(s):

Gao Bingbo ◽

Alfred Stein ◽

Wang Jinfeng

Keyword(s):

Machine Learning ◽

Heavy Metal ◽

Metal Concentration ◽

Prediction Accuracy ◽

Spatial Prediction ◽

Heavy Metal Concentration ◽

Machine Learning Method ◽

Learning Method ◽

Soil Heavy Metal ◽

The Difference

The soil heavy metal contamination has becoming a serious problem worldwide. An accurate prediction of soil heavy metal concentration at un-sampled locations using a small sample remains a challenge, because of many natural and human factors and resulted complex heterogeneous pattern, and the relationship between influencing factors are also not homogeneous. To overcome those heterogeneities and improve the prediction accuracy, a two point machine learning method is proposed in this paper by fully leveraging the spatial relationship and similarity relationship of high dimensional ancillary variables. It firstly models the difference between paired points using machine learning model, then predict the concentration differences between sampling points and the un-sampled points, and finally utilize the predicted differences to choose near neighbors to obtain the final concentration prediction. In this method, an innovative way to search near neighbors for local model from the difference of response variable was put forward to overcome the Curse of Dimensionality. Its performance was illustrated in two diverse case studies and it is demonstrated that proposed method can dramatically improve the prediction accuracy for soil heavy metal. Besides spatial prediction of soil pollution, it can also be applied to spatial prediction of other other elements of the earth system. And in further the machine learning method in this paper can be replaced to any other supervised learning model according to specific situations. &#160; &#160; &#160; &#160;

Download Full-text

Vision-Based Classification of Mosquito Species: Comparison of Conventional and Deep Learning Methods

Applied Sciences ◽

10.3390/app9183935 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3935 ◽

Cited By ~ 3

Author(s):

Kazushige Okayasu ◽

Kota Yoshida ◽

Masataka Fuchida ◽

Akio Nakamura

Keyword(s):

Deep Learning ◽

Conventional Method ◽

Data Augmentation ◽

Mosquito Species ◽

Experimental Results ◽

Support Vector ◽

Learning Method ◽

Species Classification ◽

Scale Invariant ◽

Machine Method

This study aims to propose a vision-based method to classify mosquito species. To investigate the efficiency of the method, we compared two different classification methods: The handcraft feature-based conventional method and the convolutional neural network-based deep learning method. For the conventional method, 12 types of features were adopted for handcraft feature extraction, while a support vector machine method was adopted for classification. For the deep learning method, three types of architectures were adopted for classification. We built a mosquito image dataset, which included 14,400 images with three types of mosquito species. The dataset comprised 12,000 images for training, 1500 images for testing, and 900 images for validating. Experimental results revealed that the accuracy of the conventional method using the scale-invariant feature transform algorithm was 82.4% at maximum, whereas the accuracy of the deep learning method was 95.5% in a residual network using data augmentation. From the experimental results, deep learning can be considered to be effective for classifying the mosquito species of the proposed dataset. Furthermore, data augmentation improves the accuracy of mosquito species’ classification.

Download Full-text

On the Reproducibility and Replicability of Deep Learning in Software Engineering

ACM Transactions on Software Engineering and Methodology ◽

10.1145/3477535 ◽

2022 ◽

Vol 31 (1) ◽

pp. 1-46

Author(s):

Chao Liu ◽

Cuiyun Gao ◽

Xin Xia ◽

David Lo ◽

John Grundy ◽

...

Keyword(s):

Deep Learning ◽

Software Engineering ◽

Source Code ◽

Experimental Results ◽

Supervised Machine Learning ◽

Optimization Process ◽

Experimental Result ◽

Experimental Setup ◽

High Quality ◽

Two Factors

Context: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Objective: Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) reproducibility —whether the reported experimental results can be obtained by other researchers using authors’ artifacts (i.e., source code and datasets) with the same experimental setup; and (2) replicability —whether the reported experimental result can be obtained by other researchers using their re-implemented artifacts with a different experimental setup. We observed that DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process, unlike classical supervised machine learning (ML) methods (e.g., random forest). This study aims to investigate the urgency and importance of reproducibility and replicability for DL studies on SE tasks. Method: In this study, we conducted a literature review on 147 DL studies recently published in 20 SE venues and 20 AI (Artificial Intelligence) venues to investigate these issues. We also re-ran four representative DL models in SE to investigate important factors that may strongly affect the reproducibility and replicability of a study. Results: Our statistics show the urgency of investigating these two factors in SE, where only 10.2% of the studies investigate any research question to show that their models can address at least one issue of replicability and/or reproducibility. More than 62.6% of the studies do not even share high-quality source code or complete data to support the reproducibility of their complex models. Meanwhile, our experimental results show the importance of reproducibility and replicability, where the reported performance of a DL model could not be reproduced for an unstable optimization process. Replicability could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data. Conclusion: It is urgent for the SE community to provide a long-lasting link to a high-quality reproduction package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.

Download Full-text

A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01575-x ◽

2021 ◽

Vol 21 (S2) ◽

Author(s):

Huanyao Zhang ◽

Danqing Hu ◽

Huilong Duan ◽

Shaolei Li ◽

Nan Wu ◽

...

Keyword(s):

Lung Cancer ◽

Decision Making ◽

Deep Learning ◽

Cancer Screening ◽

Clinical Decision Making ◽

Lung Cancer Screening ◽

Clinical Decision ◽

Experimental Results ◽

Entity Recognition ◽

Learning Method

Abstract Background Computed tomography (CT) reports record a large volume of valuable information about patients’ conditions and the interpretations of radiology images from radiologists, which can be used for clinical decision-making and further academic study. However, the free-text nature of clinical reports is a critical barrier to use this data more effectively. In this study, we investigate a novel deep learning method to extract entities from Chinese CT reports for lung cancer screening and TNM staging. Methods The proposed approach presents a new named entity recognition algorithm, namely the BERT-based-BiLSTM-Transformer network (BERT-BTN) with pre-training, to extract clinical entities for lung cancer screening and staging. Specifically, instead of traditional word embedding methods, BERT is applied to learn the deep semantic representations of characters. Following the long short-term memory layer, a Transformer layer is added to capture the global dependencies between characters. Besides, pre-training technique is employed to alleviate the problem of insufficient labeled data. Results We verify the effectiveness of the proposed approach on a clinical dataset containing 359 CT reports collected from the Department of Thoracic Surgery II of Peking University Cancer Hospital. The experimental results show that the proposed approach achieves an 85.96% macro-F1 score under exact match scheme, which improves the performance by 1.38%, 1.84%, 3.81%,4.29%,5.12%,5.29% and 8.84% compared to BERT-BTN, BERT-LSTM, BERT-fine-tune, BERT-Transformer, FastText-BTN, FastText-BiLSTM and FastText-Transformer, respectively. Conclusions In this study, we developed a novel deep learning method, i.e., BERT-BTN with pre-training, to extract the clinical entities from Chinese CT reports. The experimental results indicate that the proposed approach can efficiently recognize various clinical entities about lung cancer screening and staging, which shows the potential for further clinical decision-making and academic research.

Download Full-text

6mA-Pred: identifying DNA N6-methyladenine sites based on deep learning

PeerJ ◽

10.7717/peerj.10813 ◽

2021 ◽

Vol 9 ◽

pp. e10813

Author(s):

Qianfei Huang ◽

Wenyang Zhou ◽

Fei Guo ◽

Lei Xu ◽

Lichao Zhang

Keyword(s):

Deep Learning ◽

Mus Musculus ◽

Source Code ◽

Experimental Results ◽

Individual Species ◽

Identification Method ◽

Excellent Method ◽

Multiple Species ◽

Site Recognition

With the accumulation of data on 6mA modification sites, an increasing number of scholars have begun to focus on the identification of 6mA sites. Despite the recognized importance of 6mA sites, methods for their identification remain lacking, with most existing methods being aimed at their identification in individual species. In the present study, we aimed to develop an identification method suitable for multiple species. Based on previous research, we propose a method for 6mA site recognition. Our experiments prove that the proposed 6mA-Pred method is effective for identifying 6mA sites in genes from taxa such as rice, Mus musculus, and human. A series of experimental results show that 6mA-Pred is an excellent method. We provide the source code used in the study, which can be obtained from http://39.100.246.211:5004/6mA_Pred/.

Download Full-text

Prediction of Sex and Age from Macular Optical Coherence Tomography Images and Feature Analysis Using Deep Learning

10.1101/2020.12.23.20248805 ◽

2020 ◽

Author(s):

Kuan-Ming Chueh ◽

Yi-Ting Hsieh ◽

Homer H. Chen ◽

I-Hsin Ma ◽

Sheng-Lung Huang

Keyword(s):

Optical Coherence Tomography ◽

Deep Learning ◽

Prediction Accuracy ◽

Optical Coherence ◽

Male And Female ◽

Related Information ◽

Age Related ◽

Macular Diseases ◽

En Face ◽

The Difference

AbstractThe prevalence of certain macular diseases differs between male and female. However, the actual difference in macular structure between male and female was barely understood. Previous studies reported the mean retinal thickness of macula was thinner for female, but here it was observed that the difference is not statistically large enough for sex distinction. Similarly, the age-related non-pathological change of macular structure was also hardly known. It has been found that the thickness of choroid decreases with age. In this study, deep learning was applied to distinguish sex and age from macular optical coherence tomography (OCT) images of 3134 persons and achieved a sex prediction accuracy of 85.6 ± 2.1% and an age prediction error of 5.78 ± 0.29 years. A thorough analysis of the prediction accuracy and the Grad-CAM showed that 1) the foveal contour leads to a better sex distinction than the macular thickness, 2) B-scan macular OCT images contain more sex-related information than en face fundus images, and 3) the age-related characteristics of the macula are on the whole layers of the retina, not just the choroid. These novel findings reported in this study are useful to ophthalmologists for further investigation in the pathogenesis of sex and age-related macular structural diseases.

Download Full-text

A Deep Learning Method for ECG Signal Prediction Based on VMD, Cao Method, and LSTM Neural Network

10.21203/rs.3.rs-139350/v1 ◽

2021 ◽

Author(s):

Fuying Huang ◽

Tuanfa Qin ◽

Limei Wang ◽

Haibin Wan

Keyword(s):

Neural Network ◽

Deep Learning ◽

Prediction Accuracy ◽

Prediction Method ◽

Prediction Methods ◽

Ecg Signal ◽

Learning Method ◽

Signal Prediction ◽

Input Layer ◽

Ecg Data

Abstract Background: In body area network (BAN), accurate prediction of ECG signal can not only let doctors know the patient's condition in advance, but also help to reduce the energy consumption of sensors. In order to improve the accuracy of ECG signal prediction, this paper proposes a deep learning method for ECG signal prediction. Methods: The proposed prediction method combines variational mode decomposition (VMD), Cao method and a long short-term memory (LSTM) neural network. In the method, VMD decomposes ECG data into a series of intrinsic mode functions (IMFs), which reduces the non-stationary character of ECG signals and helps to improve the prediction accuracy. Cao method is used to determine the input dimension of LSTM input layer, namely, the minimum embedding dimension of each IMF is the input dimension of LSTM input layer. Each IMF is predicted by a LSTM neural network which adopts Adam optimizer. All IMFs predictions are aggregated to get the final prediction result. Results: To evaluate the prediction accuracy of the proposed method, simulation experiments are carried out on ECG data from the MIT-BIH Arrhythmia Database. Experimental results show that the RMSE (root mean square error) and MAE (mean absolute error) of the proposed model are 0.001326 and 0.001044 respectively, which are more than 10 percent lower than the traditional prediction methods.Conclusions: Compared with some traditional prediction methods, the proposed prediction method improves the prediction accuracy obviously.

Download Full-text