scholarly journals Predicting Alert Source Device using Machine Learning Algorithms

In a large distributed virtualized environment, predicting the alerting source from its text seems to be daunting task. This paper explores the option of using machine learning algorithm to solve this problem. Unfortunately, our training dataset is highly imbalanced. Where 96% of alerting data is reported by 24% of alerting sources. This is the expected dataset in any live distributed virtualized environment, where new version of device will have relatively less alert compared to older devices. Any classification effort with such imbalanced dataset present different set of challenges compared to binary classification. This type of skewed data distribution makes conventional machine learning less effective, especially while predicting the minority device type alerts. Our challenge is to build a robust model which can cope with this imbalanced dataset and achieves relative high level of prediction accuracy. This research work stared with traditional regression and classification algorithms using bag of words model. Then word2vec and doc2vec models are used to represent the words in vector formats, which preserve the sematic meaning of the sentence. With this alerting text with similar message will have same vector form representation. This vectorized alerting text is used with Logistic Regression for model building. This yields better accuracy, but the model is relatively complex and demand more computational resources. Finally, simple neural network is used for this multi-class text classification problem domain by using keras and tensorflow libraries. A simple two layered neural network yielded 99 % accuracy, even though our training dataset was not balanced. This paper goes through the qualitative evaluation of the different machine learning algorithms and their respective result. Finally, two layered deep learning algorithms is selected as final solution, since it takes relatively less resource and time with better accuracy values.

2020 ◽  
Vol 12 (11) ◽  
pp. 1838 ◽  
Author(s):  
Zhao Zhang ◽  
Paulo Flores ◽  
C. Igathinathane ◽  
Dayakar L. Naik ◽  
Ravi Kiran ◽  
...  

The current mainstream approach of using manual measurements and visual inspections for crop lodging detection is inefficient, time-consuming, and subjective. An innovative method for wheat lodging detection that can overcome or alleviate these shortcomings would be welcomed. This study proposed a systematic approach for wheat lodging detection in research plots (372 experimental plots), which consisted of using unmanned aerial systems (UAS) for aerial imagery acquisition, manual field evaluation, and machine learning algorithms to detect the occurrence or not of lodging. UAS imagery was collected on three different dates (23 and 30 July 2019, and 8 August 2019) after lodging occurred. Traditional machine learning and deep learning were evaluated and compared in this study in terms of classification accuracy and standard deviation. For traditional machine learning, five types of features (i.e. gray level co-occurrence matrix, local binary pattern, Gabor, intensity, and Hu-moment) were extracted and fed into three traditional machine learning algorithms (i.e., random forest (RF), neural network, and support vector machine) for detecting lodged plots. For the datasets on each imagery collection date, the accuracies of the three algorithms were not significantly different from each other. For any of the three algorithms, accuracies on the first and last date datasets had the lowest and highest values, respectively. Incorporating standard deviation as a measurement of performance robustness, RF was determined as the most satisfactory. Regarding deep learning, three different convolutional neural networks (simple convolutional neural network, VGG-16, and GoogLeNet) were tested. For any of the single date datasets, GoogLeNet consistently had superior performance over the other two methods. Further comparisons between RF and GoogLeNet demonstrated that the detection accuracies of the two methods were not significantly different from each other (p > 0.05); hence, the choice of any of the two would not affect the final detection accuracies. However, considering the fact that the average accuracy of GoogLeNet (93%) was larger than RF (91%), it was recommended to use GoogLeNet for wheat lodging detection. This research demonstrated that UAS RGB imagery, coupled with the GoogLeNet machine learning algorithm, can be a novel, reliable, objective, simple, low-cost, and effective (accuracy > 90%) tool for wheat lodging detection.


2021 ◽  
Vol 11 (10) ◽  
pp. 2573-2583
Author(s):  
P. Deepika ◽  
P. Pabitha

This research aims to evaluate the possibilities of fetus ultrasound image classification using machine learning algorithms as normal or abnormal. Most of the earlier research works have produced a high percentage of false-negative classification results—recent research work aimed to reduce the rate of false-negative diagnoses. Also, the number of sonologists for analyzing prenatal ultrasound worldwide is very less and solved by developing an efficient algorithm, which reduces the percentage of false negatives in the diagnosis output. Several earlier research works focused on analyzing fetal abdominal image or fetal head images, making the medical industry use two different diagnostic modules separately. This work aims to design and implement a convolution frame-work named as two Convolution Neural Network (tCNN) model for diagnosing any fetal images. The proposed tCNN model diagnoses the fetal abdominal and fetal brain images and classify them as normal or abnormal. CNN1 of tCNN performs segmentation and classification based on the acceptance of abdomen circumference and stomach bubble, umbilical vein, and amniotic fluid measurements. CNN2 shows based on head circumference and head and abdominal circumference, femur, crown-rump, and humerus lengths measured.With clinical validation, an extensive experiment carried out and the results compared with the experts in terms of segmentation accuracy and the obstetric measurements. This paper provides a foundation for future multi-classification research works on diagnosing fetal intracranial abnormalities and differential diagnosis using machine learning algorithms.


Author(s):  
Anoop Kumar Tiwari ◽  
Abhigyan Nath ◽  
Karthikeyan Subbiah ◽  
Kaushal Kumar Shukla

Imbalanced dataset affects the learning of classifiers. This imbalance problem is almost ubiquitous in biological datasets. Resampling is one of the common methods to deal with the imbalanced dataset problem. In this study, we explore the learning performance by varying the balancing ratios of training datasets, consisting of the observed peptides and absent peptides in the Mass Spectrometry experiment on the different machine learning algorithms. It has been observed that the ideal balancing ratio has yielded better performance than the imbalanced dataset, but it was not the best as compared to some intermediate ratio. By experimenting using Synthetic Minority Oversampling Technique (SMOTE) at different balancing ratios, we obtained the best results by achieving sensitivity of 92.1%, specificity value of 94.7%, overall accuracy of 93.4%, MCC of 0.869, and AUC of 0.982 with boosted random forest algorithm. This study also identifies the most discriminating features by applying the feature ranking algorithm. From the results of current experiments, it can be inferred that the performance of machine learning algorithms for the classification tasks can be enhanced by selecting optimally balanced training dataset, which can be obtained by suitably modifying the class distribution.


2021 ◽  
Vol 2066 (1) ◽  
pp. 012041
Author(s):  
Yiqiang Lai

Abstract Neural networks have strong characteristics for processing data and information. At the same time, the current computer technology is also very advanced, and many kinds of very powerful information technologies have been developed under the promotion and promotion of modern science and technology. Therefore, relevant personnel will carry out advanced technology and neural network structure methods. Fusion, and then an artificial neural network was established on this basis. In a broad sense, machine learning refers to how to enable a machine to acquire relevant knowledge through autonomous learning, and the purpose is to enable the machine to have relevant skills similar to what people need to acquire knowledge. The research in this article aims to explore the machine learning algorithms based on neural network technology, and through literature research methods, case analysis methods, etc., to have an in-depth understanding of machine learning algorithms in neural network technology, and then through the analysis of machine learning algorithms in neural network technology The learning advantage and its influencing factors are designed based on the machine learning algorithm in the neural network and experimented. Experimental results show that LSTM performs well in replication tasks, and the performance of LSTM even far exceeds that of NTM, but the performance of LSTM in addition and multiplication tasks is much lower than that of NTM. Although the accuracy of NTM on the test set is higher than that of LSTM and RNN, the performance of the model is still relatively poor.


Author(s):  
Ayomide Emmanuel Adesiyan

Manufacturing today considers data-drive business operations at different levels leading to the growth of various paradigms in manufacturing, of which emerged smart manufacturing. However data can be used to predict equipment failure rates, streamline and optimize inventory management and prioritize processes. The use of parameter tuning and optimization, grid-search, cross-validation, to predict the best performing machine learning algorithm. This research work evaluates the time potential failure-rates, against the lines which peaks and drops depending on its components RUL(Remaining Useful Life). The accuracy of the machine learning algorithms that are employed in this studies, are hence subjected to some metrics for evaluation, these are : MCC and AUC-ROC. This study has analyzed and evaluated some annoymized dataset from a manufacturing company, using some metrics and machine learning algorithms for performance prediction of their production lines using unsupervised learning. This study would served as a good reference for anyone wanting to use the best performance model, for further research work.


2020 ◽  
pp. 1-11
Author(s):  
Jie Liu ◽  
Lin Lin ◽  
Xiufang Liang

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.


2015 ◽  
Vol 32 (6) ◽  
pp. 821-827 ◽  
Author(s):  
Enrique Audain ◽  
Yassel Ramos ◽  
Henning Hermjakob ◽  
Darren R. Flower ◽  
Yasset Perez-Riverol

Abstract Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electrophoretic mobility—of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: [email protected] Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.


Author(s):  
E. Yu. Shchetinin

The recognition of human emotions is one of the most relevant and dynamically developing areas of modern speech technologies, and the recognition of emotions in speech (RER) is the most demanded part of them. In this paper, we propose a computer model of emotion recognition based on an ensemble of bidirectional recurrent neural network with LSTM memory cell and deep convolutional neural network ResNet18. In this paper, computer studies of the RAVDESS database containing emotional speech of a person are carried out. RAVDESS-a data set containing 7356 files. Entries contain the following emotions: 0 – neutral, 1 – calm, 2 – happiness, 3 – sadness, 4 – anger, 5 – fear, 6 – disgust, 7 – surprise. In total, the database contains 16 classes (8 emotions divided into male and female) for a total of 1440 samples (speech only). To train machine learning algorithms and deep neural networks to recognize emotions, existing audio recordings must be pre-processed in such a way as to extract the main characteristic features of certain emotions. This was done using Mel-frequency cepstral coefficients, chroma coefficients, as well as the characteristics of the frequency spectrum of audio recordings. In this paper, computer studies of various models of neural networks for emotion recognition are carried out on the example of the data described above. In addition, machine learning algorithms were used for comparative analysis. Thus, the following models were trained during the experiments: logistic regression (LR), classifier based on the support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting over trees – XGBoost, convolutional neural network CNN, recurrent neural network RNN (ResNet18), as well as an ensemble of convolutional and recurrent networks Stacked CNN-RNN. The results show that neural networks showed much higher accuracy in recognizing and classifying emotions than the machine learning algorithms used. Of the three neural network models presented, the CNN + BLSTM ensemble showed higher accuracy.


2021 ◽  
Author(s):  
Yingxian Liu ◽  
Cunliang Chen ◽  
Hanqing Zhao ◽  
Yu Wang ◽  
Xiaodong Han

Abstract Fluid properties are key factors for predicting single well productivity, well test interpretation and oilfield recovery prediction, which directly affect the success of ODP program design. The most accurate and direct method of acquisition is underground sampling. However, not every well has samples due to technical reasons such as excessive well deviation or high cost during the exploration stage. Therefore, analogies or empirical formulas have to be adopted to carry out research in many cases. But a large number of oilfield developments have shown that the errors caused by these methods are very large. Therefore, how to quickly and accurately obtain fluid physical properties is of great significance. In recent years, with the development and improvement of artificial intelligence or machine learning algorithms, their applications in the oilfield have become more and more extensive. This paper proposed a method for predicting crude oil physical properties based on machine learning algorithms. This method uses PVT data from nearly 100 wells in Bohai Oilfield. 75% of the data is used for training and learning to obtain the prediction model, and the remaining 25% is used for testing. Practice shows that the prediction results of the machine learning algorithm are very close to the actual data, with a very small error. Finally, this method was used to apply the preliminary plan design of the BZ29 oilfield which is a new oilfield. Especially for the unsampled sand bodies, the fluid physical properties prediction was carried out. It also compares the influence of the analogy method on the scheme, which provides potential and risk analysis for scheme design. This method will be applied in more oil fields in the Bohai Sea in the future and has important promotion value.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Muhammad Waqar ◽  
Hassan Dawood ◽  
Hussain Dawood ◽  
Nadeem Majeed ◽  
Ameen Banjar ◽  
...  

Cardiac disease treatments are often being subjected to the acquisition and analysis of vast quantity of digital cardiac data. These data can be utilized for various beneficial purposes. These data’s utilization becomes more important when we are dealing with critical diseases like a heart attack where patient life is often at stake. Machine learning and deep learning are two famous techniques that are helping in making the raw data useful. Some of the biggest problems that arise from the usage of the aforementioned techniques are massive resource utilization, extensive data preprocessing, need for features engineering, and ensuring reliability in classification results. The proposed research work presents a cost-effective solution to predict heart attack with high accuracy and reliability. It uses a UCI dataset to predict the heart attack via various machine learning algorithms without the involvement of any feature engineering. Moreover, the given dataset has an unequal distribution of positive and negative classes which can reduce performance. The proposed work uses a synthetic minority oversampling technique (SMOTE) to handle given imbalance data. The proposed system discarded the need of feature engineering for the classification of the given dataset. This led to an efficient solution as feature engineering often proves to be a costly process. The results show that among all machine learning algorithms, SMOTE-based artificial neural network when tuned properly outperformed all other models and many existing systems. The high reliability of the proposed system ensures that it can be effectively used in the prediction of the heart attack.


Sign in / Sign up

Export Citation Format

Share Document