Automated Detection of Paleoenvironmental Proxy, Eucampia Index, in a Microscopic Slide Using a Convolutional Neural Network System

Mapping Intimacies ◽

10.21203/rs.3.rs-88945/v1 ◽

2020 ◽

Author(s):

Saki Ishino ◽

Takuya Itaki

Keyword(s):

Southern Ocean ◽

Large Scale ◽

Classification Performance ◽

Automated Detection ◽

Model Verification ◽

Training Dataset ◽

Test Dataset ◽

Counting Error ◽

Index Value ◽

Particle Images

Abstract The Eucampia Index, which is calculated from valve ratio of Antarctic diatom Eucampia ainarctica varieties, has been expected to be a useful indicator of sea ice coverage or/and sea surface temperature variation in the Southern Ocean. To verify the relationship between the index value and the environmental factors, considerable effort is needed to classify and count valves of E. antarctica in a very large number of samples. In this study, to realize automated detection of the Eucampia Index, we constructed a deep-learning (one of the learning methods of artificial intelligence) based models for identifying Eucampia valves from various particles in a diatom slide. The microfossil Classification and Rapid Accumulation Device (miCRAD) system, which can be used for scanning a slide and cropping images of particles automatically, was employed to collect images in training dataset for the model and test dataset for model verification. As a result of classifying particle images in the test dataset by the initial model "Eant_1000px_200616", accuracy was 78.8%. The Eucampia Index value prepared in the test dataset was 0.80, and the value predicted using the developed model from the same dataset was 0.76. The predicted value was in the range of the manual counting error. These results suggest that the classification performance of the model is similar to that of a human expert. This study revealed that a model capable of detecting the ratio of two diatom species can be constructed using the miCRAD system for the first time. The miCRAD system connected with the developed model in this study is capable of automatically classifying particle images at the same time of capturing images so that the system can be applied to a large-scale analysis of the Eucampia index in the Southern Ocean. Depending on the setting of the classification category, similar method is relevant to investigators who have to process a large number of diatom samples such as for detecting specific species for biostratigraphic and paleoenvironmental studies.

Download Full-text

Improvement of the Classification Performance of an Intrusion Detection Model for Rare and Unknown Attack Traffic

Electronics ◽

10.3390/electronics10182268 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2268

Author(s):

Sangsoo Han ◽

Youngwon Kim ◽

Soojin Lee

Keyword(s):

Intrusion Detection ◽

Classification Performance ◽

Training Dataset ◽

Validation Dataset ◽

Model Generation ◽

Traffic Classification ◽

Classification Models ◽

Test Dataset ◽

Data Imbalance ◽

Detection Model

How to deal with rare and unknown data in traffic classification has a decisive influence on classification performance. Rare data make it difficult to generate validation datasets to prevent overfitting, and unknown data interferes with learning and degrades the performance of the model. This paper presents a model generation method that accurately classifies rare data and new types of attacks, and does not result in overfitting. First, we use oversampling methods to solve the data imbalance caused by rare data. We separate the test dataset into a training dataset and a validation dataset. A model is created using separate training and validation datasets. Furthermore, the test dataset is used only for evaluating the performance capabilities of classification models, in order to make the test dataset independent of learning. We also use a softmax function that numerically indicates the probability that the model’s predictive results are accurate in detecting new, unknown attacks. Consequently, when applying the proposed method to the NSL_KDD dataset, the accuracy is 91.66%—an improvement of 6–16% compared to existing methods.

Download Full-text

Handcrafted and Deep Learning-Based Radiomic Models Can Distinguish GBM from Brain Metastasis

Journal of Oncology ◽

10.1155/2021/5518717 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Zhiyuan Liu ◽

Zekun Jiang ◽

Li Meng ◽

Jun Yang ◽

Ying Liu ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Brain Metastasis ◽

Area Under The Curve ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Test Dataset ◽

Contrast Enhanced ◽

Magnetic Resonance Imaging Mri

Objective. The purpose of this study was to investigate the feasibility of applying handcrafted radiomics (HCR) and deep learning-based radiomics (DLR) for the accurate preoperative classification of glioblastoma (GBM) and solitary brain metastasis (BM). Methods. A retrospective analysis of the magnetic resonance imaging (MRI) data of 140 patients (110 in the training dataset and 30 in the test dataset) with GBM and 128 patients (98 in the training dataset and 30 in the test dataset) with BM confirmed by surgical pathology was performed. The regions of interest (ROIs) on T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), and contrast-enhanced T1WI (T1CE) were drawn manually, and then, HCR and DLR analyses were performed. On this basis, different machine learning algorithms were implemented and compared to find the optimal modeling method. The final classifiers were identified and validated for different MRI modalities using HCR features and HCR + DLR features. By analyzing the receiver operating characteristic (ROC) curve, the area under the curve (AUC), accuracy, sensitivity, and specificity were calculated to evaluate the predictive efficacy of different methods. Results. In multiclassifier modeling, random forest modeling showed the best distinguishing performance among all MRI modalities. HCR models already showed good results for distinguishing between the two types of brain tumors in the test dataset (T1WI, AUC = 0.86; T2WI, AUC = 0.76; T1CE, AUC = 0.93). By adding DLR features, all AUCs showed significant improvement (T1WI, AUC = 0.87; T2WI, AUC = 0.80; T1CE, AUC = 0.97; p < 0.05 ). The T1CE-based radiomic model showed the best classification performance (AUC = 0.99 in the training dataset and AUC = 0.97 in the test dataset), surpassing the other MRI modalities ( p < 0.05 ). The multimodality radiomic model also showed robust performance (AUC = 1 in the training dataset and AUC = 0.84 in the test dataset). Conclusion. Machine learning models using MRI radiomic features can help distinguish GBM from BM effectively, especially the combination of HCR and DLR features.

Download Full-text

Partition Selection for Large-Scale Data Management Using KNN Join Processing

Mathematical Problems in Engineering ◽

10.1155/2020/7898230 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Yue Hu ◽

Ge Peng ◽

Zehua Wang ◽

Yanrong Cui ◽

Hang Qin

Keyword(s):

Large Scale ◽

Computational Cost ◽

Training Dataset ◽

Test Dataset ◽

Large Scale Data ◽

Large Scale Dataset ◽

Feature Similarity ◽

Data Points ◽

Selection For ◽

Scale Data

For the data processing with increasing avalanche under large datasets, the k nearest neighbors (KNN) algorithm is a particularly expensive operation for both classification and regression predictive problems. To predict the values of new data points, it can calculate the feature similarity between each object in the test dataset and each object in the training dataset. However, due to expensive computational cost, the single computer is out of work to deal with large-scale dataset. In this paper, we propose an adaptive vKNN algorithm, which adopts on the Voronoi diagram under the MapReduce parallel framework and makes full use of the advantages of parallel computing in processing large-scale data. In the process of partition selection, we design a new predictive strategy for sample point to find the optimal relevant partition. Then, we can effectively collect irrelevant data, reduce KNN join computation, and improve the operation efficiency. Finally, we use a large number of 54-dimensional datasets to conduct a large number of experiments on the cluster. The experimental results show that our proposed method is effective and scalable with ensuring accuracy.

Download Full-text

Severity Assessment of COVID-19 Using a CT-Based Radiomics Model

Stem Cells International ◽

10.1155/2021/2263469 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Zhigao Xu ◽

Lili Zhao ◽

Guoqiang Yang ◽

Ying Ren ◽

Jinlong Wu ◽

...

Keyword(s):

Large Scale ◽

Operating Characteristic ◽

Fixed Ratio ◽

Roc Curves ◽

Classification Model ◽

Training Dataset ◽

Support Vector ◽

Svm Classifier ◽

Test Dataset ◽

Ct Features

The coronavirus disease of 2019 (COVID-19) has evolved into a worldwide pandemic. Although CT is sensitive in detecting lesions and assessing their severity, these works mainly depend on radiologists’ subjective judgment, which is inefficient in case of a large-scale outbreak. This work focuses on developing a CT-based radiomics model to assess whether COVID-19 patients are in the early, progressive, severe, or absorption stages of the disease. We retrospectively analyzed the CT images of 284 COVID-19 patients. All of the patients were divided into four groups (0-3): early ( n = 75 ), progressive ( n = 58 ), severe ( n = 75 ), and absorption ( n = 76 ) groups, according to the progression of the disease and the CT features. Meanwhile, they were split randomly to training and test datasets with the fixed ratio of 7 : 3 in each category. Thirty-eight radiomic features were nominated from 1688 radiomic features after using select K -best method and the ElasticNet algorithm. On this basis, a support vector machine (SVM) classifier was trained to build this model. Receiver operating characteristic (ROC) curves were generated to determine the diagnostic performance of various models. The precision, recall, and f 1 -score of the classification model of macro- and microaverage were 0.82, 0.82, 0.81, 0.81, 0.81, and 0.81 for the training dataset and 0.75, 0.73, 0.73, 0.72, 0.72, and 0.72 for the test dataset. The AUCs for groups 0, 1, 2, and 3 on the training dataset were 0.99, 0.97, 0.96, and 0.93, and the microaverage AUC was 0.97 with a macroaverage AUC of 0.97. On the test dataset, AUCs for each group were 0.97, 0.86, 0.83, and 0.89 and the microaverage AUC was 0.89 with a macroaverage AUC of 0.90. The CT-based radiomics model proved efficacious in assessing the severity of COVID-19.

Download Full-text

Predicting Drugs Side Effects Based on Chemical-Chemical Interactions and Protein-Chemical Interactions

BioMed Research International ◽

10.1155/2013/485034 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 11

Author(s):

Lei Chen ◽

Tao Huang ◽

Jian Zhang ◽

Ming-Yue Zheng ◽

Kai-Yan Feng ◽

...

Keyword(s):

Side Effects ◽

Large Scale ◽

Drug Withdrawal ◽

Computational Method ◽

Training Dataset ◽

Chemical Interactions ◽

Test Dataset ◽

Jackknife Test ◽

Drug Compounds ◽

Pharmaceutical Industries

A drug side effect is an undesirable effect which occurs in addition to the intended therapeutic effect of the drug. The unexpected side effects that many patients suffer from are the major causes of large-scale drug withdrawal. To address the problem, it is highly demanded by pharmaceutical industries to develop computational methods for predicting the side effects of drugs. In this study, a novel computational method was developed to predict the side effects of drug compounds by hybridizing the chemical-chemical and protein-chemical interactions. Compared to most of the previous works, our method can rank the potential side effects for any query drug according to their predicted level of risk. A training dataset and test datasets were constructed from the benchmark dataset that contains 835 drug compounds to evaluate the method. By a jackknife test on the training dataset, the 1st order prediction accuracy was 86.30%, while it was 89.16% on the test dataset. It is expected that the new method may become a useful tool for drug design, and that the findings obtained by hybridizing various interactions in a network system may provide useful insights for conducting in-depth pharmacological research as well, particularly at the level of systems biomedicine.

Download Full-text

Non-Blind Image Deconvolution Based on “Ringing” Removal Using Convolutional Neural Network

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-180 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 181-1-181-7

Author(s):

Takahiro Kudo ◽

Takanori Fujisawa ◽

Takuro Yamaguchi ◽

Masaaki Ikehara

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Architecture ◽

Large Scale ◽

Blind Deconvolution ◽

Training Dataset ◽

Image Deconvolution ◽

Classic Problem ◽

Key Points ◽

Blind Image

Image deconvolution has been an important issue recently. It has two kinds of approaches: non-blind and blind. Non-blind deconvolution is a classic problem of image deblurring, which assumes that the PSF is known and does not change universally in space. Recently, Convolutional Neural Network (CNN) has been used for non-blind deconvolution. Though CNNs can deal with complex changes for unknown images, some CNN-based conventional methods can only handle small PSFs and does not consider the use of large PSFs in the real world. In this paper we propose a non-blind deconvolution framework based on a CNN that can remove large scale ringing in a deblurred image. Our method has three key points. The first is that our network architecture is able to preserve both large and small features in the image. The second is that the training dataset is created to preserve the details. The third is that we extend the images to minimize the effects of large ringing on the image borders. In our experiments, we used three kinds of large PSFs and were able to observe high-precision results from our method both quantitatively and qualitatively.

Download Full-text

DeepSSPred: A Deep Learning Based Sulfenylation site predictor via a novel n-segmented optimize federated feature encoder

Protein and Peptide Letters ◽

10.2174/0929866527666201202103411 ◽

2020 ◽

Vol 27 ◽

Author(s):

Zaheer Ullah Khan ◽

Dechang Pi

Keyword(s):

Large Scale ◽

Computational Models ◽

Research Work ◽

Training Data ◽

Training Dataset ◽

Validation Dataset ◽

Cytokine Signaling ◽

Minority Class ◽

Independent Dataset ◽

Feature Encoding

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.

Download Full-text

Development and validation of a deep learning system to screen vision-threatening conditions in high myopia using optical coherence tomography images

British Journal of Ophthalmology ◽

10.1136/bjophthalmol-2020-317825 ◽

2020 ◽

pp. bjophthalmol-2020-317825

Author(s):

Yonghao Li ◽

Weibo Feng ◽

Xiujuan Zhao ◽

Bingqian Liu ◽

Yan Zhang ◽

...

Keyword(s):

Optical Coherence Tomography ◽

Deep Learning ◽

High Myopia ◽

Large Scale ◽

Learning System ◽

Youden Index ◽

Optical Coherence ◽

Test Dataset ◽

Independent Test ◽

Independent Test Dataset

Background/aimsTo apply deep learning technology to develop an artificial intelligence (AI) system that can identify vision-threatening conditions in high myopia patients based on optical coherence tomography (OCT) macular images.MethodsIn this cross-sectional, prospective study, a total of 5505 qualified OCT macular images obtained from 1048 high myopia patients admitted to Zhongshan Ophthalmic Centre (ZOC) from 2012 to 2017 were selected for the development of the AI system. The independent test dataset included 412 images obtained from 91 high myopia patients recruited at ZOC from January 2019 to May 2019. We adopted the InceptionResnetV2 architecture to train four independent convolutional neural network (CNN) models to identify the following four vision-threatening conditions in high myopia: retinoschisis, macular hole, retinal detachment and pathological myopic choroidal neovascularisation. Focal Loss was used to address class imbalance, and optimal operating thresholds were determined according to the Youden Index.ResultsIn the independent test dataset, the areas under the receiver operating characteristic curves were high for all conditions (0.961 to 0.999). Our AI system achieved sensitivities equal to or even better than those of retina specialists as well as high specificities (greater than 90%). Moreover, our AI system provided a transparent and interpretable diagnosis with heatmaps.ConclusionsWe used OCT macular images for the development of CNN models to identify vision-threatening conditions in high myopia patients. Our models achieved reliable sensitivities and high specificities, comparable to those of retina specialists and may be applied for large-scale high myopia screening and patient follow-up.

Download Full-text

Evaluation of the feasibility of explainable computer-aided detection of cardiomegaly on chest radiographs using deep learning

Scientific Reports ◽

10.1038/s41598-021-96433-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mu Sook Lee ◽

Yong Soo Kim ◽

Minki Kim ◽

Muhammad Usman ◽

Shi Sub Byon ◽

...

Keyword(s):

Deep Learning ◽

Diagnostic Performance ◽

Absolute Error ◽

Training Dataset ◽

Computer Aided Detection ◽

Test Dataset ◽

Cardiothoracic Ratio ◽

Computer Aided ◽

Chest X Ray ◽

Public Datasets

AbstractWe examined the feasibility of explainable computer-aided detection of cardiomegaly in routine clinical practice using segmentation-based methods. Overall, 793 retrospectively acquired posterior–anterior (PA) chest X-ray images (CXRs) of 793 patients were used to train deep learning (DL) models for lung and heart segmentation. The training dataset included PA CXRs from two public datasets and in-house PA CXRs. Two fully automated segmentation-based methods using state-of-the-art DL models for lung and heart segmentation were developed. The diagnostic performance was assessed and the reliability of the automatic cardiothoracic ratio (CTR) calculation was determined using the mean absolute error and paired t-test. The effects of thoracic pathological conditions on performance were assessed using subgroup analysis. One thousand PA CXRs of 1000 patients (480 men, 520 women; mean age 63 ± 23 years) were included. The CTR values derived from the DL models and diagnostic performance exhibited excellent agreement with reference standards for the whole test dataset. Performance of segmentation-based methods differed based on thoracic conditions. When tested using CXRs with lesions obscuring heart borders, the performance was lower than that for other thoracic pathological findings. Thus, segmentation-based methods using DL could detect cardiomegaly; however, the feasibility of computer-aided detection of cardiomegaly without human intervention was limited.

Download Full-text

Orchard Mapping with Deep Learning Semantic Segmentation

Sensors ◽

10.3390/s21113813 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3813

Author(s):

Athanasios Anagnostis ◽

Aristotelis C. Tagarakis ◽

Dimitrios Kateris ◽

Vasileios Moysiadis ◽

Claus Grøn Sørensen ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Semantic Segmentation ◽

Automated Detection ◽

Aerial Images ◽

Training Dataset ◽

Field Boundary ◽

Different Seasons ◽

Detection And Localization ◽

Different Levels

This study aimed to propose an approach for orchard trees segmentation using aerial images based on a deep learning convolutional neural network variant, namely the U-net network. The purpose was the automated detection and localization of the canopy of orchard trees under various conditions (i.e., different seasons, different tree ages, different levels of weed coverage). The implemented dataset was composed of images from three different walnut orchards. The achieved variability of the dataset resulted in obtaining images that fell under seven different use cases. The best-trained model achieved 91%, 90%, and 87% accuracy for training, validation, and testing, respectively. The trained model was also tested on never-before-seen orthomosaic images or orchards based on two methods (oversampling and undersampling) in order to tackle issues with out-of-the-field boundary transparent pixels from the image. Even though the training dataset did not contain orthomosaic images, it achieved performance levels that reached up to 99%, demonstrating the robustness of the proposed approach.

Download Full-text