Mammography Image-Based Diagnosis of Breast Cancer Using Machine Learning: A Pilot Study

A tumor is an abnormal tissue classified as either benign or malignant. A breast tumor is one of the most common tumors in women. Radiologists use mammograms to identify a breast tumor and classify it, which is a time-consuming process and prone to error due to the complexity of the tumor. In this study, we applied machine learning-based techniques to assist the radiologist in reading mammogram images and classifying the tumor in a very reasonable time interval. We extracted several features from the region of interest in the mammogram, which the radiologist manually annotated. These features are incorporated into a classification engine to train and build the proposed structure classification models. We used a dataset that was not previously seen in the model to evaluate the accuracy of the proposed system following the standard model evaluation schemes. Accordingly, this study found that various factors could affect the performance, which we avoided after experimenting all the possible ways. This study finally recommends using the optimized Support Vector Machine or Naïve Bayes, which produced 100% accuracy after integrating the feature selection and hyper-parameter optimization schemes.

Download Full-text

Financial Context News Sentiment Analysis for the Lithuanian Language

Applied Sciences ◽

10.3390/app11104443 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4443

Author(s):

Rokas Štrimaitis ◽

Pavel Stefanovič ◽

Simona Ramanauskaitė ◽

Asta Slotkienė

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Experimental Investigations ◽

Support Vector ◽

Applied Machine Learning ◽

Bayes Algorithm ◽

Website Content

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).

Download Full-text

Automatic Tomato Plant Leaf Disease Classification using Multi-Kernel Support Vector Machine

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9689.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 560-565

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Learning Algorithm ◽

Early Stage ◽

Region Of Interest ◽

Input Image ◽

Disease Classification ◽

Support Vector ◽

Leaf Disease ◽

Kernel Support Vector Machine

In agriculture the major problem is leaf disease identifying these disease in early stage increases the yield. To reduce the loss identifying the various disease is very important. In this work , an efficient technique for identifying unhealthy tomato leaves using a machine learning algorithm is proposed. Support Vector Machines (SVM) is the methodology of machine learning , and have been successfully applied to a number of applications to identify region of interest, classify the region. The proposed algorithm has three main staggers, namely preprocessing, feature extraction and classification. In preprocessing, the images are converted to RGB and the average filter is used to eliminate the noise in the input image. After the pre-processing stage, features such as texture, color and shape are extracted from each image. Then, the extracted features are presented to the classifier to classify an input tomato leaf as a healthy or unhealthy image. For classification, in this paper, a multi-kernel support vector machine (MKSVM) is used. The performance of the proposed method is analysed on the basis of different metrics, such as accuracy, sensitivity and specificity. The images used in the test are collected from the plant village. The proposed method implemented in MATLAB.

Download Full-text

The Application of Machine Learning to a General Risk–Need Assessment Instrument in the Prediction of Criminal Recidivism

Criminal Justice and Behavior ◽

10.1177/0093854820969753 ◽

2020 ◽

pp. 009385482096975

Author(s):

Mehdi Ghasemi ◽

Daniel Anvari ◽

Mahshid Atapour ◽

J. Stephen wormith ◽

Keira C. Stockdale ◽

...

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Characteristic Curve ◽

Assessment Instrument ◽

Support Vector ◽

Data Set ◽

Applied Machine Learning ◽

Vector Machines ◽

Individual Scores

The Level of Service/Case Management Inventory (LS/CMI) is one of the most frequently used tools to assess criminogenic risk–need in justice-involved individuals. Meta-analytic research demonstrates strong predictive accuracy for various recidivism outcomes. In this exploratory study, we applied machine learning (ML) algorithms (decision trees, random forests, and support vector machines) to a data set with nearly 100,000 LS/CMI administrations to provincial corrections clientele in Ontario, Canada, and approximately 3 years follow-up. The overall accuracies and areas under the receiver operating characteristic curve (AUCs) were comparable, although ML outperformed LS/CMI in terms of predictive accuracy for the middle scores where it is hardest to predict the recidivism outcome. Moreover, ML improved the AUCs for individual scores to near 0.60, from 0.50 for the LS/CMI, indicating that ML also improves the ability to rank individuals according to their probability of recidivating. Potential considerations, applications, and future directions are discussed.

Download Full-text

Machine-Learning Based Hybrid-Feature Analysis for Liver Cancer Classification Using Fused (MR and CT) Images

Applied Sciences ◽

10.3390/app10093134 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3134 ◽

Cited By ~ 5

Author(s):

Samreen Naeem ◽

Aqib Ali ◽

Salman Qadri ◽

Wali Khan Mashwani ◽

Nasser Tairan ◽

...

Keyword(s):

Machine Learning ◽

Liver Cancer ◽

Ct Scan ◽

Region Of Interest ◽

Hepatocellular Adenoma ◽

Cancer Classification ◽

Support Vector ◽

Hybrid Features ◽

Feature Selection Technique ◽

Probability Of Error

The purpose of this research is to demonstrate the ability of machine-learning (ML) methods for liver cancer classification using a fused dataset of two-dimensional (2D) computed tomography (CT) scans and magnetic resonance imaging (MRI). Datasets of benign (hepatocellular adenoma, hemangioma, cyst) and malignant (hepatocellular carcinoma, hepatoblastoma, metastasis) liver cancer were acquired at Bahawal Victoria Hospital (BVH), Bahawalpur, Pakistan. The final dataset was generated by fusion of 1200 (100 × 6 × 2) MR and CT-scan images, 200 (100 MRI and 100 CT-scan) images size 512 × 512 for each class of cancer. The acquired dataset was preprocessed by employing the Gabor filters to reduce the noise and taking an automated region of interest (ROIs) using an Otsu thresholding-based segmentation approach. The preprocessed dataset was used to acquire 254 hybrid-feature data for each ROI, which is the combination of the histogram, wavelet, co-occurrence, and run-length features, while 10 optimized hybrid features were selected by employing (probability of error plus average correlation) feature selection technique. For classification, we deployed this optimized hybrid-feature dataset to four ML classifiers: multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), and J48, using a ten fold cross-validation method. MLP showed an overall accuracy of (95.78% on MRI and 97.44% on CT). Unfortunately, the obtained results were not promising, and there were some limitations due to the different modalities of the dataset. Thereafter, a fusion of MRI and CT-scan datasets generated the fused optimized hybrid-feature dataset. The MLP has shown a promising accuracy of 99% among all the deployed classifiers.

Download Full-text

Spatiotemporal Approaches for Quality Control and Error Correction of Atmospheric Data through Machine Learning

Computational Intelligence and Neuroscience ◽

10.1155/2020/7980434 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Hye-Jin Kim ◽

Sung Min Park ◽

Byung Jin Choi ◽

Seung-Hyun Moon ◽

Yong-Hyuk Kim

Keyword(s):

Machine Learning ◽

Time Series ◽

Quality Control ◽

Mean Squared Error ◽

Machine Learning Algorithms ◽

Support Vector ◽

Weather Element ◽

Applied Machine Learning ◽

Squared Error ◽

Atmospheric Data

We propose three quality control (QC) techniques using machine learning that depend on the type of input data used for training. These include QC based on time series of a single weather element, QC based on time series in conjunction with other weather elements, and QC using spatiotemporal characteristics. We performed machine learning-based QC on each weather element of atmospheric data, such as temperature, acquired from seven types of IoT sensors and applied machine learning algorithms, such as support vector regression, on data with errors to make meaningful estimates from them. By using the root mean squared error (RMSE), we evaluated the performance of the proposed techniques. As a result, the QC done in conjunction with other weather elements had 0.14% lower RMSE on average than QC conducted with only a single weather element. In the case of QC with spatiotemporal characteristic considerations, the QC done via training with AWS data showed performance with 17% lower RMSE than QC done with only raw data.

Download Full-text

Sentiment Analysis on Twitter Hashtag Datasets

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39201 ◽

2021 ◽

Vol 9 (12) ◽

pp. 278-281

Author(s):

Ganesh K. Shinde

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Business Intelligence ◽

Opinion Mining ◽

Specific Area ◽

Machine Learning Algorithms ◽

Support Vector ◽

Applied Machine Learning ◽

Machine Learning Approach ◽

N Gram

Abstract: Sentiment Analysis has improvement in online shopping platforms, scientific surveys from political polls, business intelligence, etc. In this we trying to analyse the twitter posts about Hashtag like #MakeinIndia using Machine Learning approach. By doing opinion mining in a specific area, it is possible to identify the effect of area information in sentiment analysis. We put forth a feature vector for classifying the tweets as positive, negative and neutral. After that applied machine learning algorithms namely: MaxEnt and SVM. We utilised Unigram, Bigram and Trigram Features to generate a set of features to train a linear MaxEnt and SVM classifiers. In the end we have measured the performance of classifier in terms of overall accuracy. Keywords: Sentiment analysis, support vector machine, maximum entropy, N-gram, Machine Learning

Download Full-text

Machine Learning Methods to Classify Mushrooms for Edibility-A Review

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst060909 ◽

2020 ◽

Vol 06 (09) ◽

pp. 54-58

Author(s):

Rakesh Kumar Y and Dr. V. Chandrasekhar

Keyword(s):

Neural Network ◽

Machine Learning ◽

Literature Review ◽

Edible Mushroom ◽

Machine Learning Techniques ◽

Support Vector ◽

Applied Machine Learning ◽

Learning Techniques ◽

Artificial Neural Network Ann

There are thousands of species of Mushrooms in the world; they are edible and non-edible being poisonous. It is difficult for non-expertise person to Identify poisonous and edible mushroom of all the species manually. So a computer aided system with software or algorithm is required to classify poisonous and nonpoisonous mushrooms. In this paper a literature review is presented on classification of poisonous and nonpoisonous mushrooms. Most of the research works to classify the type of mushroom have applied, machine learning techniques like Naïve Bayes, K-Neural Network, Support vector Machine(SVM), Artificial Neural Network(ANN), Decision Tree techniques. In this literature review, a summary and comparisons of all different techniques of mushroom classification in terms of its performance parameters, merits and demerits faced during the classification of mushrooms using machine learning techniques.

Download Full-text

On the Analysis of Machine Learning Classifiers to Detect Traffic Congestion in Vehicular Networks

10.5753/eniac.2019.9290 ◽

2019 ◽

Author(s):

Lucas Carvalho ◽

Maycon Silva ◽

Edimilson Santos ◽

Daniel Guidoni

Keyword(s):

Machine Learning ◽

Traffic Congestion ◽

Vehicular Networks ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Applied Machine Learning ◽

Routing Methods

Problems related to traffic congestion and management have become common in many cities. Thus, vehicle re-routing methods have been proposed to minimize the congestion. Some of these methods have applied machine learning techniques, more specifically classifiers, to verify road conditions and detect congestion. However, better results may be obtained by applying a classifier more suitable to domain. In this sense, this paper presents an evaluation of different classifiers applied to the identification of the level of road congestion. Our main goal is to analyze the characteristics of each classifier in this task. The classifiers involved in the experiments here are: Multiple Layer Neural Network (MLP), K-Nearest Neighbors (KNN), Decision Trees (J48), Support Vector Machines (SVM), Naive Bayes and Tree Augment Naive Bayes.

Download Full-text

Ensemble of SVM Classifiers for Spam Filtering

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch086 ◽

2011 ◽

pp. 561-566

Author(s):

Ángela Blanco ◽

Manuel Martín-Merino

Keyword(s):

Machine Learning ◽

False Positive ◽

Machine Learning Techniques ◽

Support Vector ◽

Applied Machine Learning ◽

Internet Users ◽

Learning Techniques ◽

Svm Algorithm ◽

Misclassification Errors ◽

Voting Strategy

Unsolicited commercial email also known as Spam is becoming a serious problem for Internet users and providers (Fawcett, 2003). Several researchers have applied machine learning techniques in order to improve the detection of spam messages. Naive Bayes models are the most popular (Androutsopoulos, 2000) but other authors have applied Support Vector Machines (SVM) (Drucker, 1999), boosting and decision trees (Carreras, 2001) with remarkable results. SVM has revealed particularly attractive in this application because it is robust against noise and is able to handle a large number of features (Vapnik, 1998). Errors in anti-spam email filtering are strongly asymmetric. Thus, false positive errors or valid messages that are blocked, are prohibitively expensive. Several authors have proposed new versions of the original SVM algorithm that help to reduce the false positive errors (Kolz, 2001, Valentini, 2004 & Kittler, 1998). In particular, it has been suggested that combining non-optimal classifiers can help to reduce particularly the variance of the predictor (Valentini, 2004 & Kittler, 1998) and consequently the misclassification errors. In order to achieve this goal, different versions of the classifier are usually built by sampling the patterns or the features (Breiman, 1996). However, in our application it is expected that the aggregation of strong classifiers will help to reduce more the false positive errors (Provost, 2001 & Hershop, 2005). In this paper, we address the problem of reducing the false positive errors by combining classifiers based on multiple dissimilarities. To this aim, a diversity of classifiers is built considering dissimilarities that reflect different features of the data. The dissimilarities are first embedded into an Euclidean space where a SVM is adjusted for each measure. Next, the classifiers are aggregated using a voting strategy (Kittler, 1998). The method proposed has been applied to the Spam UCI machine learning database (Hastie, 2001) with remarkable results.

Download Full-text

Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review

Biology ◽

10.3390/biology9120453 ◽

2020 ◽

Vol 9 (12) ◽

pp. 453

Author(s):

Petar Tonkovic ◽

Slobodan Kalajdziski ◽

Eftim Zdravevski ◽

Petre Lameski ◽

Roberto Corizzo ◽

...

Keyword(s):

Machine Learning ◽

Language Processing ◽

Scoping Review ◽

Digital Libraries ◽

Research Field ◽

Time Interval ◽

Research Papers ◽

Data Set ◽

Practical Applications ◽

Applied Machine Learning

Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008–2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries’ search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement.

Download Full-text