testing dataset
Recently Published Documents





P. Vijayalakshmi ◽  
K. Muthumanickam ◽  
G. Karthik ◽  
S. Sakthivel

Adenomyosis is an abnormality in the uterine wall of women that adversely affects their normal life style. If not treated properly, it may lead to severe health issues. The symptoms of adenomyosis are identified from MRI images. It is a gynaecological disease that may lead to infertility. The presence of red dots in the uterus is the major symptom of adenomyosis. The difference in the extent of these red dots extracted from MRI images shows how significant the deviation from normality is. Thus, we proposed an entroxon-based bio-inspired intelligent water drop back-propagation neural network (BIWDNN) model to discover the probability of infertility being caused by adenomyosis and endometriosis. First, vital features from the images are extracted and segmented, and then they are classified using the fuzzy C-means clustering algorithm. The extracted features are then attributed and compared with a normal person’s extracted attributes. The proposed BIWDNN model is evaluated using training and testing datasets and the predictions are estimated using the testing dataset. The proposed model produces an improved diagnostic precision rate on infertility.

2021 ◽  
Vol 38 (6) ◽  
pp. 1699-1711
Devanshu Tiwari ◽  
Manish Dixit ◽  
Kamlesh Gupta

This paper simply presents a fully automated breast cancer detection system as “Deep Multi-view Breast cancer Detection” based on deep transfer learning. The deep transfer learning model i.e., Visual Geometry Group 16 (VGG 16) is used in this approach for the correct classification of Breast thermal images into either normal or abnormal. This VGG 16 model is trained with the help of Static as well as Dynamic breast thermal images dataset consisting of multi-view, single view breast thermal images. These Multi-view breast thermal images are generated in this approach by concatenating the conventional left, frontal and right view breast thermal images taken from the Database for Mastology Research with Infrared image for the first time in order to generate a more informative and complete thermal temperature map of breast for enhancing the accuracy of the overall system. For the sake of genuine comparison, three other popular deep transfer learning models like Residual Network 50 (ResNet50V2), InceptionV3 network and Visual Geometry Group 19 (VGG 19) are also trained with the same augmented dataset consisting of multi-view as well as single view breast thermal images. The VGG 16 based Deep Multi-view Breast cancer Detect system delivers the best training, validation as well as testing accuracies as compared to their other deep transfer learning models. The VGG 16 achieves an encouraging testing accuracy of 99% on the Dynamic breast thermal images testing dataset utilizing the multi-view breast thermal images as input. Whereas the testing accuracies of 95%, 94% and 89% are achieved by the VGG 19, ResNet50V2, InceptionV3 models respectively over the Dynamic breast thermal images testing dataset utilizing the same multi-view breast thermal images as input.

2021 ◽  
Vol 1 (1) ◽  
pp. 146-176
Israa Nadher ◽  
Mohammad Ayache ◽  
Hussein Kanaan

Abstract—Information decision support systems are becomingmore in use as we are living in the era of digital data andrise of artificial intelligence. Heart disease as one of the mostknown and dangerous is getting very important attention, thisattention is translated into digital and prediction system thatdetects the presence of disease according to the available dataand information. Such systems faced a lot of problems since thefirst rise, but now with the deveolopment of machine learnigfield we are using them in developing new models to detect thepresence of this disease, in addition to algorithms data is veryimportant which also form the heart of the predicton systems,as we know prediction algorithms take decisions and thesedecisions must be based on facts, and these facts are extractedfrom data, as a result data is the starting point of every system.In this paper we propose a Heart Disease Prediction Systemusing Machine Learning Algorithms, in terms of data we usedCleveland dataset, this dataset is normalized then divided intothree scnearios in terms of traning and testing respectively,80%-20%, 50%-50%, 30%-70%. In each case of dataset ifit is normalized or not we will have these three scenarios.We used three machine learning algorithms for every scenarioof the mentioned before which are SVM, SMO and MLP, inthese algorithms we’ve used two different kernels to test theresults upon that. These two types of simulation are added tothe collection of scenarios mentioned above to become as thefollowing we have at the main level two types normalized andunnormalized dataset, then for each one we have three typesaccording to the amount of training and testing dataset, thenfor each of these scenarios we have two scenarios according tothe type of kernel to become 30 scenarios in total, our proposedsystem have shown a dominance in terms of accuracy over theother previous works.

Tomography ◽  
2021 ◽  
Vol 7 (4) ◽  
pp. 950-960
Aymen Meddeb ◽  
Tabea Kossen ◽  
Keno K. Bressem ◽  
Bernd Hamm ◽  
Sebastian N. Nagel

The aim of this study was to develop a deep learning-based algorithm for fully automated spleen segmentation using CT images and to evaluate the performance in conditions directly or indirectly affecting the spleen (e.g., splenomegaly, ascites). For this, a 3D U-Net was trained on an in-house dataset (n = 61) including diseases with and without splenic involvement (in-house U-Net), and an open-source dataset from the Medical Segmentation Decathlon (open dataset, n = 61) without splenic abnormalities (open U-Net). Both datasets were split into a training (n = 32.52%), a validation (n = 9.15%) and a testing dataset (n = 20.33%). The segmentation performances of the two models were measured using four established metrics, including the Dice Similarity Coefficient (DSC). On the open test dataset, the in-house and open U-Net achieved a mean DSC of 0.906 and 0.897 respectively (p = 0.526). On the in-house test dataset, the in-house U-Net achieved a mean DSC of 0.941, whereas the open U-Net obtained a mean DSC of 0.648 (p < 0.001), showing very poor segmentation results in patients with abnormalities in or surrounding the spleen. Thus, for reliable, fully automated spleen segmentation in clinical routine, the training dataset of a deep learning-based algorithm should include conditions that directly or indirectly affect the spleen.

2021 ◽  
Vol 7 (12) ◽  
Mao Peng ◽  
Ronald P. de Vries

Pectinolytic enzymes are a variety of enzymes involved in breaking down pectin, a complex and abundant plant cell-wall polysaccharide. In nature, pectinolytic enzymes play an essential role in allowing bacteria and fungi to depolymerize and utilize pectin. In addition, pectinases have been widely applied in various industries, such as the food, wine, textile, paper and pulp industries. Due to their important biological function and increasing industrial potential, discovery of novel pectinolytic enzymes has received global interest. However, traditional enzyme characterization relies heavily on biochemical experiments, which are time consuming, laborious and expensive. To accelerate identification of novel pectinolytic enzymes, an automatic approach is needed. We developed a machine learning (ML) approach for predicting pectinases in the industrial workhorse fungus, Aspergillus niger. The prediction integrated a diverse range of features, including evolutionary profile, gene expression, transcriptional regulation and biochemical characteristics. Results on both the training and the independent testing dataset showed that our method achieved over 90 % accuracy, and recalled over 60 % of pectinolytic genes. Application of the ML model on the A. niger genome led to the identification of 83 pectinases, covering both previously described pectinases and novel pectinases that do not belong to any known pectinolytic enzyme family. Our study demonstrated the tremendous potential of ML in discovery of new industrial enzymes through integrating heterogeneous (post-) genomimcs data.

Temitayo O. Oyegoke ◽  
Kehinde K. Akomolede ◽  
Adesola G. Aderounmu ◽  
Emmanuel R. Adagunodo

This study was developed an e-mail classification model to preempt fraudulent activities. The e-mail has such a predominant nature that makes it suitable for adoption by cyber-fraudsters. This research used a combination of two databases: CLAIR fraudulent and Spambase datasets for creating the training and testing dataset. The CLAIR dataset consists of raw e-mails from users’ inbox which were pre-processed into structured form using Natural Language Processing (NLP) techniques. This dataset was then consolidated with the Spambase dataset as a single dataset. The study deployed the Multi-Layer Perceptron (MLP) architecture which used a back-propagation algorithm for training the fraud detection model. The model was simulated using 70% and 80% for training while 30% and 20% of datasets were used for testing respectively. The results of the performance of the models were compared using a number of evaluation metrics. The study concluded that using the MLP, an effective model for fraud detection among e-mail dataset was proposed.

2021 ◽  
pp. 1-14
Vanitha Lingaraj ◽  
Kalaiselvi Kaliannan ◽  
Venmathi Asirvatham Rohini ◽  
Rajesh Kumar Thevasigamani ◽  
Karthikeyan Chinnasamy ◽  

Flow state assessment is essential to understand the involvement of an individual in a particular task assigned. If there is no involvement in the task assigned then the individual in due course of time gets affected either by psychological or physiological illnesses. The National Crime Records Bureau (NCRB) statistics show that non-involvement in the task drive the individual to a depression state and subsequently attempt for suicide. Therefore, it is essential to determine the decrease in flow level at an earlier stage and take remedial steps to recover them. There are many invasive methods to determine the flow state, which is not preferred and the commonly used non-invasive method is the questionnaire and interview method, which is the subjective and retroactive method, and hence chance to fake the result is more. Hence, the main objective of our work is to design an efficient flow level measurement system that measures flow in an objective method and also determines real-time flow classification. The accuracy of classification is achieved by designing an Expert Active k-Nearest Neighbour (EAkNN) which can classify the individual flow state towards the task assigned into nine states using non-invasive physiological Electrocardiogram (ECG) signals. The ECG parameters are obtained during the performance of FSCWT. Thus this work is a combination of psychological theory, physiological signals and machine learning concepts. The classifier is designed with a modified voting rule instead of the default majority voting rule, in which the contribution probability of nearest points to new data is considered. The dataset is divided into two sets, training dataset 75%and testing dataset 25%. The classifier is trained and tested with the dataset and the classification efficiency is 95%.

2021 ◽  
Niaz Muhammad Shahani ◽  
Xigui Zheng

Abstract Sedimentary rocks provide information on previous environments on the surface of the earth. As a result, they are the principal narrators of former climate, life, and important events on the surface of the earth. Complexity and expensiveness of direct destructive laboratory tests are adversely affects the data scarcity problem, making the development of intelligent indirect methods an integral step in attempts to address the problem faced by rock engineering projects. This study established artificial neural network (ANN) approach to predict uniaxial compressive strength (UCS) in MPa of soft sedimentary rocks using different input parameters i.e. dry density (ρd) in g/cm3; Brazilian tensile strength (BTS) in MPa; point load index (Is(50)) in MPa. The developed ANN models M1, M2 and M3 were divided into the overall dataset; 70% training dataset and 30% testing dataset; and 60% training dataset and 40% testing dataset respectively. In addition, multiple linear regression (MLR) was performed to compare with the proposed ANN models to verify the accuracy of the predicted values. The performance indices were also calculated by estimating the established models. The predictive performance of the M3 ANN model with the highest coefficient of correlation (R2), the smallest root mean squared error (RMSE), the highest variance accounts for (VAF) and reliable a10-index was 0.99, 0.00060, 0.99 and 0.99 respectively at the testing dataset revealing ideal results and proposed as the best-fit prediction model for UCS of soft sedimentary rocks at the Thar Coalfield, Pakistan, among other developed models in this study. Moreover, by performing sensitivity analysis, it was determined that the BTS and Is(50) were the most influential parameters in predicting UCS.

2021 ◽  
Vol 14 (12) ◽  
pp. 7411-7424
Moritz Lange ◽  
Henri Suominen ◽  
Mona Kurppa ◽  
Leena Järvi ◽  
Emilia Oikarinen ◽  

Abstract. Running large-eddy simulations (LESs) can be burdensome and computationally too expensive from the application point of view, for example, to support urban planning. In this study, regression models are used to replicate modelled air pollutant concentrations from LES in urban boulevards. We study the performance of regression models and discuss how to detect situations where the models are applied outside their training domain and their outputs cannot be trusted. Regression models from 10 different model families are trained and a cross-validation methodology is used to evaluate their performance and to find the best set of features needed to reproduce the LES outputs. We also test the regression models on an independent testing dataset. Our results suggest that in general, log-linear regression gives the best and most robust performance on new independent data. It clearly outperforms the dummy model which would predict constant concentrations for all locations (multiplicative minimum RMSE (mRMSE) of 0.76 vs. 1.78 of the dummy model). Furthermore, we demonstrate that it is possible to detect concept drift, i.e. situations where the model is applied outside its training domain and a new LES run may be necessary to obtain reliable results. Regression models can be used to replace LES simulations in estimating air pollutant concentrations, unless higher accuracy is needed. In order to have reliable results, it is however important to do the model and feature selection carefully to avoid overfitting and to use methods to detect the concept drift.

2021 ◽  
Vol 2128 (1) ◽  
pp. 012024
M Solehin Shamsudin ◽  
Fitri Yakub ◽  
M Ibrahim Shapiai ◽  
Azlan Mohmad ◽  
N Amirah Abd Hamid

Abstract The Dissolve Gas Analysis (DGA) to determine the ageing and degradation of the transformer is standard and routine periodic maintenance. In general, there are two DGA analysis methods which are conventional (lab-based) and online monitoring. DGA monitoring will be able to access to detect incipient fault and transformer failure. Several techniques are available to analyse, interpret and diagnose the DGA result, such as IEEE standard, IEC 60599 standard, Key Gas Method, and Duval methods. There are several Machine Learning (ML) techniques has been explored such as Support Vector Machine (SVM), Artificial Neural Network (ANN), K-Neural Neighbours (KNN), Random Neural Network (RNN), and Fuzzy Logic for determining the transformer condition, including fault diagnostic and fault detection. However, there are unexplored studies to combine the commercial device to determine the Health Index (HI) of Transformer. In this study, an ML method with the available input feature from the commercial device to the network is trained to determine the HI. In general, the benchmark dataset from the existing work is employed to validate the proposed investigation. There are 730 datasets comprising five different classes; 1) Very Good, 2) Good, 3) Fair, 4) Poor, 5) Very Poor in determining the HI of a transformer. Conventional rule to partition the train and testing dataset with a 70:30 ratio is employed in this study. The maximum accuracy results and method for 1) M1 is 66.67% for ANN, 2) M2 is 68.49% for ANN, 3) M3 is 76.71% for KNN, 4) M5 is 76.26% for ANN, 5) M6 is 79.00% for ANN and 6) M7 is 86.30% for ANN. In conclusion, the multi-gas device will have a good accuracy performance and provide a good HI indicator to classify the condition of the transformer, which can be used for preventive maintenance.

Sign in / Sign up

Export Citation Format

Share Document