test dataset
Recently Published Documents





Zainab Mushtaq

Abstract: Malware is routinely used for illegal reasons, and new malware variants are discovered every day. Computer vision in computer security is one of the most significant disciplines of research today, and it has witnessed tremendous growth in the preceding decade due to its efficacy. We employed research in machine-learning and deep-learning technology such as Logistic Regression, ANN, CNN, transfer learning on CNN, and LSTM to arrive at our conclusions. We have published analysis-based results from a range of categorization models in the literature. InceptionV3 was trained using a transfer learning technique, which yielded reasonable results when compared with other methods such as LSTM. On the test dataset, the transferring learning technique was about 98.76 percent accurate, while on the train dataset, it was around 99.6 percent accurate. Keywords: Malware, illegal activity, Deep learning, Network Security,

Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 132
Eyad Alsaghir ◽  
Xiyu Shi ◽  
Varuna De Silva ◽  
Ahmet Kondoz

Deep learning, in general, was built on input data transformation and presentation, model training with parameter tuning, and recognition of new observations using the trained model. However, this came with a high computation cost due to the extensive input database and the length of time required in training. Despite the model learning its parameters from the transformed input data, no direct research has been conducted to investigate the mathematical relationship between the transformed information (i.e., features, excitation) and the model’s learnt parameters (i.e., weights). This research aims to explore a mathematical relationship between the input excitations and the weights of a trained convolutional neural network. The objective is to investigate three aspects of this assumed feature-weight relationship: (1) the mathematical relationship between the training input images’ features and the model’s learnt parameters, (2) the mathematical relationship between the images’ features of a separate test dataset and a trained model’s learnt parameters, and (3) the mathematical relationship between the difference of training and testing images’ features and the model’s learnt parameters with a separate test dataset. The paper empirically demonstrated the existence of this mathematical relationship between the test image features and the model’s learnt weights by the ANOVA analysis.

Materials ◽  
2022 ◽  
Vol 15 (2) ◽  
pp. 643
Paul Meißner ◽  
Jens Winter ◽  
Thomas Vietor

A neural network (NN)-based method is presented in this paper which allows the identification of parameters for material cards used in Finite Element simulations. Contrary to the conventionally used computationally intensive material parameter identification (MPI) by numerical optimization with internal or commercial software, a machine learning (ML)-based method is time saving when used repeatedly. Within this article, a self-developed ML-based Python framework is presented, which offers advantages, especially in the development of structural components in early development phases. In this procedure, different machine learning methods are used and adapted to the specific MPI problem considered herein. Using the developed NN-based and the common optimization-based method with LS-OPT, the material parameters of the LS-DYNA material card MAT_187_SAMP-1 and the failure model GISSMO were exemplarily calibrated for a virtually generated test dataset. Parameters for the description of elasticity, plasticity, tension–compression asymmetry, variable plastic Poisson’s ratio (VPPR), strain rate dependency and failure were taken into account. The focus of this paper is on performing a comparative study of the two different MPI methods with varying settings (algorithms, hyperparameters, etc.). Furthermore, the applicability of the NN-based procedure for the specific usage of both material cards was investigated. The studies reveal the general applicability for the calibration of a complex material card by the example of the used MAT_187_SAMP-1.

2022 ◽  
Vol 12 (1) ◽  
Akitoshi Shimazaki ◽  
Daiju Ueda ◽  
Antoine Choppin ◽  
Akira Yamamoto ◽  
Takashi Honjo ◽  

AbstractWe developed and validated a deep learning (DL)-based model using the segmentation method and assessed its ability to detect lung cancer on chest radiographs. Chest radiographs for use as a training dataset and a test dataset were collected separately from January 2006 to June 2018 at our hospital. The training dataset was used to train and validate the DL-based model with five-fold cross-validation. The model sensitivity and mean false positive indications per image (mFPI) were assessed with the independent test dataset. The training dataset included 629 radiographs with 652 nodules/masses and the test dataset included 151 radiographs with 159 nodules/masses. The DL-based model had a sensitivity of 0.73 with 0.13 mFPI in the test dataset. Sensitivity was lower in lung cancers that overlapped with blind spots such as pulmonary apices, pulmonary hila, chest wall, heart, and sub-diaphragmatic space (0.50–0.64) compared with those in non-overlapped locations (0.87). The dice coefficient for the 159 malignant lesions was on average 0.52. The DL-based model was able to detect lung cancers on chest radiographs, with low mFPI.

2022 ◽  
Vol 8 ◽  
Danyan Li ◽  
Xiaowei Han ◽  
Jie Gao ◽  
Qing Zhang ◽  
Haibo Yang ◽  

Background: Multiparametric magnetic resonance imaging (mpMRI) plays an important role in the diagnosis of prostate cancer (PCa) in the current clinical setting. However, the performance of mpMRI usually varies based on the experience of the radiologists at different levels; thus, the demand for MRI interpretation warrants further analysis. In this study, we developed a deep learning (DL) model to improve PCa diagnostic ability using mpMRI and whole-mount histopathology data.Methods: A total of 739 patients, including 466 with PCa and 273 without PCa, were enrolled from January 2017 to December 2019. The mpMRI (T2 weighted imaging, diffusion weighted imaging, and apparent diffusion coefficient sequences) data were randomly divided into training (n = 659) and validation datasets (n = 80). According to the whole-mount histopathology, a DL model, including independent segmentation and classification networks, was developed to extract the gland and PCa area for PCa diagnosis. The area under the curve (AUC) were used to evaluate the performance of the prostate classification networks. The proposed DL model was subsequently used in clinical practice (independent test dataset; n = 200), and the PCa detective/diagnostic performance between the DL model and different level radiologists was evaluated based on the sensitivity, specificity, precision, and accuracy.Results: The AUC of the prostate classification network was 0.871 in the validation dataset, and it reached 0.797 using the DL model in the test dataset. Furthermore, the sensitivity, specificity, precision, and accuracy of the DL model for diagnosing PCa in the test dataset were 0.710, 0.690, 0.696, and 0.700, respectively. For the junior radiologist without and with DL model assistance, these values were 0.590, 0.700, 0.663, and 0.645 versus 0.790, 0.720, 0.738, and 0.755, respectively. For the senior radiologist, the values were 0.690, 0.770, 0.750, and 0.730 vs. 0.810, 0.840, 0.835, and 0.825, respectively. The diagnosis made with DL model assistance for radiologists were significantly higher than those without assistance (P < 0.05).Conclusion: The diagnostic performance of DL model is higher than that of junior radiologists and can improve PCa diagnostic accuracy in both junior and senior radiologists.

2021 ◽  
pp. 00452-2021
Akihiro Shiroshita ◽  
Yuya Kimura ◽  
Hiroshi Shiba ◽  
Chigusa Shirakawa ◽  
Kenya Sato ◽  

IntroductionThere is no established clinical prediction model for in-hospital death among patients with pneumonic chronic obstructive pulmonary disease (COPD) exacerbation. We aimed to externally validate BAP-65 and CURB-65 and to develop a new model based on the eXtreme Gradient Boosting (XGBoost) algorithm.MethodsThis multicentre cohort study included patients aged ≥40 years with pneumonic COPD exacerbation. The input data were age, sex, activities of daily living, mental status, systolic and diastolic blood pressure, respiratory rate, heart rate, peripheral blood eosinophil count, and blood urea nitrogen. The primary outcome was in-hospital death. BAP-65 and CURB-65 underwent external validation using the area under the receiver operating characteristic curve (AUROC) in the whole dataset. We used XGBoost to develop a new prediction model. We compared the AUROCs of XGBoost with that of BAP-65 and CURB-65 in the test dataset using bootstrap sampling.ResultsWe included 1190 patients with pneumonic COPD exacerbation. The in-hospital mortality was 7% (88/1190). In the external validation of BAP-65 and CURB-65, the AUROCs (95% confidence interval [CI]) of BAP-65 and CURB-65 were 0.69 (0.66–0.72, and 0.69 (0.66–0.72), respectively. XGBoost showed an AUROC of 0.71 (0.62–0.81) in the test dataset. There was no significant difference in the AUROCs of XGBoost versus BAP-65 (absolute difference, 0.054; 95% CI, −0.057–0.16) or versus CURB-65 (absolute difference, 0.0021; 95% CI, −0.091–0.088).ConclusionBAP-65, CURB-65, and XGBoost showed low predictive performance for in-hospital death in pneumonic COPD exacerbation. Further large-scale studies including more variables are warranted.

2021 ◽  
Vol 1 (1) ◽  
pp. 407-413
Nur Heri Cahyana ◽  
Yuli Fauziah ◽  
Agus Sasmito Aribowo

This study aims to determine the best methods of tree-based ensemble machine learning to classify the datasets used, a total of 34 datasets. This study also wants to know the relationship between the number of records and columns of the test dataset with the number of estimators (trees) for each ensemble model, namely Random Forest, Extra Tree Classifier, AdaBoost, and Gradient Bosting. The four methods will be compared to the maximum accuracy and the number of estimators when tested to classify the test dataset. Based on the results of the experiments above, tree-based ensemble machine learning methods have been obtained and the best number of estimators for the classification of each dataset used in the study. The Extra Tree method is the best classifier method for binary-class and multi-class. Random Forest is good for multi-classes, and AdaBoost is a pretty good method for binary-classes. The number of rows, columns and data classes is positively correlated with the number of estimators. This means that to process a dataset with a large row, column or class size requires more estimators than processing a dataset with a small row, column or class size. However, the relationship between the number of classes and accuracy is negatively correlated, meaning that the accuracy will decrease if there are more classes for classification.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Fangpeng Ming ◽  
Liang Tan ◽  
Xiaofan Cheng

Big data has been developed for nearly a decade, and the information data on the network is exploding. Facing the complex and massive data, it is difficult for people to get the demanded information quickly, and the recommendation algorithm with its characteristics becomes one of the important methods to solve the massive data overload problem at this stage. In particular, the rise of the e-commerce industry has promoted the development of recommendation algorithms. Traditional, single recommendation algorithms often have problems such as cold start, data sparsity, and long-tail items. The hybrid recommendation algorithms at this stage can effectively avoid some of the drawbacks caused by a single algorithm. To address the current problems, this paper makes up for the shortcomings of a single collaborative model by proposing a hybrid recommendation algorithm based on deep learning IA-CN. The algorithm first uses an integrated strategy to fuse user-based and item-based collaborative filtering algorithms to generalize and classify the output results. Then deeper and more abstract nonlinear interactions between users and items are captured by improved deep learning techniques. Finally, we designed experiments to validate the algorithm. The experiments are compared with the benchmark algorithm on (Amazon item rating dataset), and the results show that the IA-CN algorithm proposed in this paper has better performance in rating prediction on the test dataset.

2021 ◽  
Ksenia Guseva ◽  
Sean Darcy ◽  
Eva Simon ◽  
Lauren V. Alteio ◽  
Alicia Montesinos-Navarro ◽  

Network analysis has been used for many years in ecological research to analyze organismal associations, for example in food webs, plant-plant or plant-animal interactions. Although network analysis is widely applied in microbial ecology, only recently has it entered the realms of soil microbial ecology, shown by a rapid rise in studies applying co-occurrence analysis to soil microbial communities. While this application offers great potential for deeper insights into the ecological structure of soil microbial ecosystems, it also brings new challenges related to the specific characteristics of soil datasets and the type of ecological questions that can be addressed. In this Perspectives Paper we assess the challenges of applying network analysis to soil microbial ecology due to the small-scale heterogeneity of the soil environment and the nature of soil microbial datasets. We review the different approaches of network construction that are commonly applied to soil microbial datasets and discuss their features and limitations. Using a test dataset of microbial communities from two depths of a forest soil, we demonstrate how different experimental designs and network constructing algorithms affect the structure of the resulting networks, and how this in turn may influence ecological conclusions. We will also reveal how assumptions of the construction method, methods of preparing the dataset, an definitions of thresholds affect the network structure. Finally, we discuss the particular questions in soil microbial ecology that can be approached by analyzing and interpreting specific network properties. Targeting these network properties in a meaningful way will allow applying this technique not in merely descriptive, but in hypothesis-driven research.

Sign in / Sign up

Export Citation Format

Share Document