scholarly journals Feature Selection for the Automated Detection of Metaphase Chromosomes: Performance Comparison Using a Receiver Operating Characteristic Method

2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Yuchen Qiu ◽  
Jie Song ◽  
Xianglan Lu ◽  
Yuhua Li ◽  
Bin Zheng ◽  
...  

Background. The purpose of this study is to identify a set of features for optimizing the performance of metaphase chromosome detection under high throughput scanning microscopy. In the development of computer-aided detection (CAD) scheme, feature selection is critically important, as it directly determines the accuracy of the scheme. Although many features have been examined previously, selecting optimal features is often application oriented.Methods. In this experiment, 200 bone marrow cells were first acquired by a high throughput scanning microscope. Then 9 different features were applied individually to group captured images into the clinically analyzable and unanalyzable classes. The performance of these different methods was assessed by a receiving operating characteristic (ROC) method.Results. The results show that using the number of labeled regions on each acquired image is suitable for the first on-line CAD scheme. For the second off-line CAD scheme, it would be suggested to combine four feature extraction methods including the number of labeled regions, average regions area, average region pixel value, and the standard deviation of either region distance or circularity.Conclusion. This study demonstrates an effective method of feature selection and comparison to facilitate the optimization of the CAD schemes for high throughput scanning microscope in the future.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Joffrey L. Leevy ◽  
John Hancock ◽  
Richard Zuech ◽  
Taghi M. Khoshgoftaar

AbstractMachine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.


2012 ◽  
Vol 532-533 ◽  
pp. 1191-1195 ◽  
Author(s):  
Zhen Yan Liu ◽  
Wei Ping Wang ◽  
Yong Wang

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.


2018 ◽  
Vol 17 (1) ◽  
pp. 37-49 ◽  
Author(s):  
Abdolrazagh Hashemi Shahraki ◽  
Subba Rao Chaganti ◽  
Daniel Heath

Abstract The characterization of microbial community dynamics using genomic methods is rapidly expanding, impacting many fields including medical, ecological, and environmental research and applications. One of the biggest challenges for such studies is the isolation of environmental DNA (eDNA) from a variety of samples, diverse microbes, and widely variable community compositions. The current study developed environmentally friendly, user safe, economical, and high throughput eDNA extraction methods for mixed aquatic microbial communities and tested them using 16 s rRNA gene meta-barcoding. Five different lysis buffers including (1) cetyltrimethylammonium bromide (CTAB), (2) digestion buffer (DB), (3) guanidinium isothiocyanate (GITC), (4) sucrose lysis (SL), and (5) SL-CTAB, coupled with four different purification methods: (1) phenol-chloroform-isoamyl alcohol (PCI), (2) magnetic Bead-Robotic, (3) magnetic Bead-Manual, and (4) membrane-filtration were tested for their efficacy in extracting eDNA from recreational freshwater samples. Results indicated that the CTAB-PCI and SL-Bead-Robotic methods yielded the highest genomic eDNA concentrations and succeeded in detecting the core microbial community including the rare microbes. However, our study recommends the SL-Bead-Robotic eDNA extraction protocol because this method is safe, environmentally friendly, rapid, high-throughput and inexpensive.


Viruses ◽  
2018 ◽  
Vol 10 (10) ◽  
pp. 566 ◽  
Author(s):  
Siemon Ng ◽  
Cassandra Braxton ◽  
Marc Eloit ◽  
Szi Feng ◽  
Romain Fragnoud ◽  
...  

A key step for broad viral detection using high-throughput sequencing (HTS) is optimizing the sample preparation strategy for extracting viral-specific nucleic acids since viral genomes are diverse: They can be single-stranded or double-stranded RNA or DNA, and can vary from a few thousand bases to over millions of bases, which might introduce biases during nucleic acid extraction. In addition, viral particles can be enveloped or non-enveloped with variable resistance to pre-treatment, which may influence their susceptibility to extraction procedures. Since the identity of the potential adventitious agents is unknown prior to their detection, efficient sample preparation should be unbiased toward all different viral types in order to maximize the probability of detecting any potential adventitious viruses using HTS. Furthermore, the quality assessment of each step for sample processing is also a critical but challenging aspect. This paper presents our current perspectives for optimizing upstream sample processing and library preparation as part of the discussion in the Advanced Virus Detection Technologies Interest group (AVDTIG). The topics include: Use of nuclease treatment to enrich for encapsidated nucleic acids, techniques for amplifying low amounts of virus nucleic acids, selection of different extraction methods, relevant controls, the use of spike recovery experiments, and quality control measures during library preparation.


Sign in / Sign up

Export Citation Format

Share Document