Automatic Classification of Web Images as UML Static Diagrams Using Machine Learning Techniques

Our purpose in this research is to develop a method to automatically and efficiently classify web images as Unified Modeling Language (UML) static diagrams, and to produce a computer tool that implements this function. The tool receives a bitmap file (in different formats) as an input and communicates whether the image corresponds to a diagram. For pragmatic reasons, we restricted ourselves to the simplest kinds of diagrams that are more useful for automated software reuse: computer-edited 2D representations of static diagrams. The tool does not require that the images are explicitly or implicitly tagged as UML diagrams. The tool extracts graphical characteristics from each image (such as grayscale histogram, color histogram and elementary geometric forms) and uses a combination of rules to classify it. The rules are obtained with machine learning techniques (rule induction) from a sample of 19,000 web images manually classified by experts. In this work, we do not consider the textual contents of the images. Our tool reaches nearly 95% of agreement with manually classified instances, improving the effectiveness of related research works. Moreover, using a training dataset 15 times bigger, the time required to process each image and extract its graphical features (0.680 s) is seven times lower.

Download Full-text

Machine Learning Generalisation across Different 3D Architectural Heritage

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9060379 ◽

2020 ◽

Vol 9 (6) ◽

pp. 379 ◽

Cited By ~ 4

Author(s):

Eleonora Grilli ◽

Fabio Remondino

Keyword(s):

Machine Learning ◽

Point Cloud ◽

Machine Learning Techniques ◽

Training Dataset ◽

High Complexity ◽

Architectural Heritage ◽

Learning Techniques ◽

Machine Learning Model ◽

Point Cloud Classification

The use of machine learning techniques for point cloud classification has been investigated extensively in the last decade in the geospatial community, while in the cultural heritage field it has only recently started to be explored. The high complexity and heterogeneity of 3D heritage data, the diversity of the possible scenarios, and the different classification purposes that each case study might present, makes it difficult to realise a large training dataset for learning purposes. An important practical issue that has not been explored yet, is the application of a single machine learning model across large and different architectural datasets. This paper tackles this issue presenting a methodology able to successfully generalise to unseen scenarios a random forest model trained on a specific dataset. This is achieved looking for the best features suitable to identify the classes of interest (e.g., wall, windows, roof and columns).

Download Full-text

Physician-Friendly Machine Learning: A Case Study with Cardiovascular Disease Risk Prediction

Journal of Clinical Medicine ◽

10.3390/jcm8071050 ◽

2019 ◽

Vol 8 (7) ◽

pp. 1050 ◽

Cited By ~ 3

Author(s):

Meghana Padmanabhan ◽

Pengyu Yuan ◽

Govind Chada ◽

Hien Van Nguyen

Keyword(s):

Machine Learning ◽

Graduate Student ◽

Disease Risk ◽

Cardiovascular Disease Risk ◽

Machine Learning Techniques ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Learning Techniques ◽

Standard Code ◽

Time Required

Machine learning is often perceived as a sophisticated technology accessible only by highly trained experts. This prevents many physicians and biologists from using this tool in their research. The goal of this paper is to eliminate this out-dated perception. We argue that the recent development of auto machine learning techniques enables biomedical researchers to quickly build competitive machine learning classifiers without requiring in-depth knowledge about the underlying algorithms. We study the case of predicting the risk of cardiovascular diseases. To support our claim, we compare auto machine learning techniques against a graduate student using several important metrics, including the total amounts of time required for building machine learning models and the final classification accuracies on unseen test datasets. In particular, the graduate student manually builds multiple machine learning classifiers and tunes their parameters for one month using scikit-learn library, which is a popular machine learning library to obtain ones that perform best on two given, publicly available datasets. We run an auto machine learning library called auto-sklearn on the same datasets. Our experiments find that automatic machine learning takes 1 h to produce classifiers that perform better than the ones built by the graduate student in one month. More importantly, building this classifier only requires a few lines of standard code. Our findings are expected to change the way physicians see machine learning and encourage wide adoption of Artificial Intelligence (AI) techniques in clinical domains.

Download Full-text

Navigating Virtual Environments Using Leg Poses and Smartphone Sensors

Sensors ◽

10.3390/s19020299 ◽

2019 ◽

Vol 19 (2) ◽

pp. 299 ◽

Cited By ~ 2

Author(s):

Georgios Tsaramirsis ◽

Seyed Buhari ◽

Mohammed Basheri ◽

Milos Stojmenovic

Keyword(s):

Machine Learning ◽

Virtual Environments ◽

Sensory Information ◽

Operating Conditions ◽

Identification Accuracy ◽

Sensor Data ◽

Machine Learning Techniques ◽

Training Dataset ◽

Learning Techniques ◽

The Right

Realization of navigation in virtual environments remains a challenge as it involves complex operating conditions. Decomposition of such complexity is attainable by fusion of sensors and machine learning techniques. Identifying the right combination of sensory information and the appropriate machine learning technique is a vital ingredient for translating physical actions to virtual movements. The contributions of our work include: (i) Synchronization of actions and movements using suitable multiple sensor units, and (ii) selection of the significant features and an appropriate algorithm to process them. This work proposes an innovative approach that allows users to move in virtual environments by simply moving their legs towards the desired direction. The necessary hardware includes only a smartphone that is strapped to the subjects’ lower leg. Data from the gyroscope, accelerometer and campus sensors of the mobile device are transmitted to a PC where the movement is accurately identified using a combination of machine learning techniques. Once the desired movement is identified, the movement of the virtual avatar in the virtual environment is realized. After pre-processing the sensor data using the box plot outliers approach, it is observed that Artificial Neural Networks provided the highest movement identification accuracy of 84.2% on the training dataset and 84.1% on testing dataset.

Download Full-text

Comparing Classical and Modern Machine Learning Techniques for Monitoring Pedestrian Workers in Top-View Construction Site Video Sequences

Applied Sciences ◽

10.3390/app10238466 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8466

Author(s):

Marcel Neuhausen ◽

Dennis Pawlowski ◽

Markus König

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Kalman Filter ◽

Safety Management ◽

Machine Learning Techniques ◽

Training Dataset ◽

Learning Approaches ◽

Construction Sites ◽

Learning Techniques ◽

Different Characteristics

Keeping an overview of all ongoing processes on construction sites is almost unfeasible, especially for the construction workers executing their tasks. It is difficult for workers to concentrate on their work while paying attention to other processes. If their workflows in hazardous areas do not run properly, this can lead to dangerous accidents. Tracking pedestrian workers could improve the productivity and safety management on construction sites. For this, vision-based tracking approaches are suitable, but the training and evaluation of such a system requires a large amount of data originating from construction sites. These are rarely available, which complicates deep learning approaches. Thus, we use a small generic dataset and juxtapose a deep learning detector with an approach based on classical machine learning techniques. We identify workers using a YOLOv3 detector and compare its performance with an approach based on a soft cascaded classifier. Afterwards, tracking is done by a Kalman filter. In our experiments, the classical approach outperforms YOLOv3 on the detection task given a small training dataset. However, the Kalman filter is sufficiently robust to compensate for the drawbacks of YOLOv3. We found that both approaches generally yield a satisfying tracking performances but feature different characteristics.

Download Full-text

Determining the extent and drivers of attrition losses from wind using long-term datasets and machine learning techniques

Forestry An International Journal of Forest Research ◽

10.1093/forestry/cpy047 ◽

2019 ◽

Vol 92 (4) ◽

pp. 425-435 ◽

Cited By ~ 2

Author(s):

John Moore ◽

Yue Lin

Keyword(s):

Machine Learning ◽

Basal Area ◽

Wind Damage ◽

Machine Learning Techniques ◽

Training Dataset ◽

Validation Dataset ◽

Gradient Boosting ◽

Factors Associated ◽

Learning Techniques

Abstract In addition to causing large-scale catastrophic damage to forests, wind can also cause damage to individual trees or small groups of trees. Over time, the cumulative effect of this wind-induced attrition can result in a significant reduction in yield in managed forests. Better understanding of the extent of these losses and the factors associated with them can aid better forest management. Information on wind damage attrition is often captured in long-term growth monitoring plots but analysing these large datasets to identify factors associated with the damage can be problematic. Machine learning techniques offer the potential to overcome some of the challenges with analysing these datasets. In this study, we applied two commonly-available machine learning algorithms (Random Forests and Gradient Boosting Trees) to a large, long-term dataset of tree growth for radiata pine (Pinus radiata D. Don) in New Zealand containing more than 157 000 observations. Both algorithms identified stand density and height-to-diameter ratio as being the two most important variables associated with the proportion of basal area lost to wind. The algorithms differed in their ease of parameterization and processing time as well as their overall ability to predict wind damage loss. The Random Forest model was able to predict ~43 per cent of the variation in the proportion of basal area lost to wind damage in the training dataset (a random sample of 80 per cent of the original data) and 45 per cent of the validation dataset (the remaining 20 per cent of the data). Conversely, the Gradient Boosting Tree model was able to predict more than 99 per cent of the variation in wind damage loss in the training dataset, but only ~49 per cent of the variation in the validation dataset, which highlights the potential for overfitting models to specific datasets. When applying these techniques to long-term datasets, it is also important to be aware of potential issues with the underlying data such as missing observations resulting from plots being abandoned without measurement when damage levels have been very high.

Download Full-text

Myocardial Infarction Prediction Using Hybrid Machine Learning Techniques

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.1716 ◽

2021 ◽

Vol 12 (3) ◽

pp. 4251-4260

Author(s):

Vaddi Niranjan Reddy Et.al

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Supervised Learning ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Training Dataset ◽

Learning Classifier ◽

Learning Techniques ◽

Accuracy Performance ◽

Graphical Presentation

The myocardial infarction prediction is an important task in health care domain in the current days. So, Prediction of cardiovascular diseases is a critical challenge in the area of clinical data analysis. It is difficult to predict myocardial infarction prediction by physicians with huge health records. To overcome this complexity we need to implement the automatic heard disease prediction system to notify the patient and get to recovery from the disease. Here to gaining the automatic system we are using machine learning techniques to easily performing myocardial infarction prediction. The machine learning techniques can be split into multiple types like unsupervised and supervised learning classifier. The supervised learning techniques working with structured data which is recommended to implement this classifiers. So, in this system we are using supervised learning techniques namely KNN, RF, NN, DT, NB, and SVM classifiers. To predict myocardial infarction, this system is using training dataset which is accessing from UCI ML repository. As well as this system is comparing accuracy performance between various machine learning algorithms and accuracy results with graphical presentation. This makes the accessing of the risk of the disease in the early stages and can try to save the patient without having any loss.

Download Full-text

Model-Based Resource Utilization and Performance Risk Prediction using Machine Learning Techniques

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.1.3.35 ◽

2017 ◽

Vol 1 (3) ◽

pp. 101

Author(s):

Haitham A.M Salih ◽

Hany H Ammar

Keyword(s):

Machine Learning ◽

Resource Utilization ◽

Performance Prediction ◽

Machine Learning Techniques ◽

Training Dataset ◽

Software System ◽

Performance Risk ◽

Learning Techniques ◽

Uml Diagrams ◽

And Performance

The growing complexity of modern software systems makes the performance prediction a challenging activity. Many drawbacks incurred by using the traditional performance prediction techniques such as time consuming and inability to surround all software system when large scaled. To contribute to solving these problems, we adopt a model-based approach for resource utilization and performance risk prediction. Firstly, we model the software system into annotated UML diagrams. Secondly, performance model is derived from UML diagrams in order to be evaluated. Thirdly, we generate performance and resource utilization training dataset by changing workload. Finally, when new instances are applied we can predict resource utilization and performance risk by using machine learning techniques. The approach will be used to enhance work of human experts and improve efficiency of software system performance prediction. In this paper, we illustrate the approach on a case study. A performance training dataset has been generated, and three machine learning techniques are applied to predict resource utilization and performance risk level. Our approach shows prediction accuracy within 68.9 % to 93.1 %.

Download Full-text

Machine Learning Techniques for Automated Software Fault Detection via Dynamic Execution Data

Proceedings of the 6th International Conference on Engineering & MIS 2020 ◽

10.1145/3410352.3410747 ◽

2020 ◽

Author(s):

Rafig Almaghairbe ◽

Marc Roper ◽

Tahani Almabruk

Keyword(s):

Machine Learning ◽

Fault Detection ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Software Fault ◽

Dynamic Execution ◽

Automated Software

Download Full-text

Using machine learning techniques to reduce data annotation time

PsycEXTRA Dataset ◽

10.1037/e577762012-020 ◽

2006 ◽

Author(s):

Christopher Schreiner ◽

Kari Torkkola ◽

Mike Gardner ◽

Keshu Zhang

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Data Annotation ◽

Learning Techniques

Download Full-text

Using Machine Learning Algorithms on Prediction of Stock Price

Journal of Modeling and Optimization ◽

10.32732/jmo.2020.12.2.84 ◽

2020 ◽

Vol 12 (2) ◽

pp. 84-99

Author(s):

Li-Pang Chen

Keyword(s):

Machine Learning ◽

Stock Price ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Short Term ◽

Learning Techniques ◽

Historical Database ◽

Long Short Term Memory

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.

Download Full-text