Machine learning techniques for SAR data augmentation

Author(s):  
Anurag Yedla ◽  
Fatemeh Davoudi Kakhki ◽  
Ali Jannesari

Mining is known to be one of the most hazardous occupations in the world. Many serious accidents have occurred worldwide over the years in mining. Although there have been efforts to create a safer work environment for miners, the number of accidents occurring at the mining sites is still significant. Machine learning techniques and predictive analytics are becoming one of the leading resources to create safer work environments in the manufacturing and construction industries. These techniques are leveraged to generate actionable insights to improve decision-making. A large amount of mining safety-related data are available, and machine learning algorithms can be used to analyze the data. The use of machine learning techniques can significantly benefit the mining industry. Decision tree, random forest, and artificial neural networks were implemented to analyze the outcomes of mining accidents. These machine learning models were also used to predict days away from work. An accidents dataset provided by the Mine Safety and Health Administration was used to train the models. The models were trained separately on tabular data and narratives. The use of a synthetic data augmentation technique using word embedding was also investigated to tackle the data imbalance problem. Performance of all the models was compared with the performance of the traditional logistic regression model. The results show that models trained on narratives performed better than the models trained on structured/tabular data in predicting the outcome of the accident. The higher predictive power of the models trained on narratives led to the conclusion that the narratives have additional information relevant to the outcome of injury compared to the tabular entries. The models trained on tabular data had a lower mean squared error compared to the models trained on narratives while predicting the days away from work. The results highlight the importance of predictors, like shift start time, accident time, and mining experience in predicting the days away from work. It was found that the F1 score of all the underrepresented classes except one improved after the use of the data augmentation technique. This approach gave greater insight into the factors influencing the outcome of the accident and days away from work.


2020 ◽  
Author(s):  
Rija Tonny Christian Ramarolahy ◽  
Esther Opoku Gyasi ◽  
Alessandro Crimi

Abstract Background: Recent studies use machine-learning techniques to detect parasites in microscopy images automatically. However, these tools are trained and tested in specific datasets. Indeed, even if over-fitting is avoided during the improvements of computer vision applications, large differences are expected. Differences might be related to settings of camera (exposure, white balance settings, etc) and different blood film slides preparation. Moreover, generative adversial networks offer new opportunities in microscopy: data homogenization, and increase of images in case of imbalanced or small sample size. Methods: Taking into consideration all those aspects, in this paper, we describe a more complete view including both detection and generating synthetic images: i) an automated detection used to detect malaria parasites on stained blood smear images using machine learning techniques testing several datasets. ii) investigate transfer learning and further testing in different unseen datasets having different staining, microscope, resolution, etc. iii) a generative approach to create synthetic images which can deceive experts. Results: The tested architecture achieved 0.98 and 0.95 area under the ROC curve in classifying images with respectively thin and thick smear. Moreover, the generated images proved to be very similar to the original and difficult to be distinguished by an expert microscopist, which identified correcly the real data for one dataset but had 50\% misclassification for another dataset of images. Conclusion: The proposed deep-learning architecture performed well on a classification task for malaria parasites classification. The automated detection for malaria can help the technician to reduce their work and do not need any presence of experts. Moreover, generative networks can also be applied to blood smear images to generate useful images for microscopists. Opening new ways to data augmentation, translation and homogenization.


2020 ◽  
Author(s):  
Rija Tonny Christian Ramarolahy ◽  
Esther Opoku Gyasi ◽  
Alessandro Crimi

AbstractBackgroundRecent studies use machine-learning techniques to detect parasites in microscopy images automatically. However, these tools are trained and tested in specific datasets. Indeed, even if over-fitting is avoided during the improvements of computer vision applications, large differences are expected. Differences might be related to settings of camera (exposure, white balance settings, etc) and different blood film slides preparation. Moreover, generative adversial networks offer new opportunities in microscopy: data homogenization, and increase of images in case of imbalanced or small sample size.MethodsTaking into consideration all those aspects, in this paper, we describe a more complete view including both detection and generating synthetic images: i) an automated detection used to detect malaria parasites on stained blood smear images using machine learning techniques testing several datasets. ii) investigate transfer learning and further testing in different unseen datasets having different staining, microscope, resolution, etc. iii) a generative approach to create synthetic images which can deceive experts.ResultsThe tested architecture achieved 0.98 and 0.95 area under the ROC curve in classifying images with respectively thin and thick smear. Moreover, the generated images proved to be very similar to the original and difficult to be distinguished by an expert microscopist, which identified correcly the real data for one dataset but had 50% misclassification for another dataset of images.ConclusionThe proposed deep-learning architecture performed well on a classification task for malaria parasites classification. The automated detection for malaria can help the technician to reduce their work and do not need any presence of experts. Moreover, generative networks can also be applied to blood smear images to generate useful images for microscopists. Opening new ways to data augmentation, translation and homogenization.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5866
Author(s):  
Gonzalo De-Las-Heras ◽  
Javier Sánchez-Soriano ◽  
Enrique Puertas

Among the reasons for traffic accidents, distractions are the most common. Although there are many traffic signs on the road that contribute to safety, variable message signs (VMSs) require special attention, which is transformed into distraction. ADAS (advanced driver assistance system) devices are advanced systems that perceive the environment and provide assistance to the driver for his comfort or safety. This project aims to develop a prototype of a VMS (variable message sign) reading system using machine learning techniques, which are still not used, especially in this aspect. The assistant consists of two parts: a first one that recognizes the signal on the street and another one that extracts its text and transforms it into speech. For the first one, a set of images were labeled in PASCAL VOC format by manual annotations, scraping and data augmentation. With this dataset, the VMS recognition model was trained, a RetinaNet based off of ResNet50 pretrained on the dataset COCO. Firstly, in the reading process, the images were preprocessed and binarized to achieve the best possible quality. Finally, the extraction was done by the Tesseract OCR model in its 4.0 version, and the speech was done by the cloud service of IBM Watson Text to Speech.


2021 ◽  
Author(s):  
Luciano V. B. Espiridião ◽  
Laura L. Dias ◽  
Anderson A. Ferreira

Author name ambiguity is one of the most challenging issues that can compromise the information quality in a scholarly digital library. For years, researchers have been searched for solutions to solve such a problem. Despite the many methods already proposed, the question remains open. In this study, we address the issue of producing a more accurate disambiguation function by means of applying data augmentation in the set of data training. We also propose a SyGAR-based data augmentation approach and evaluate our proposal on three collections commonly used in works about author name disambiguation task. The experimental results showed scenarios where improvements are possible in the author name disambiguation task. The proposal of data augmentation outperforms other data augmentation approach, as well as improves some machine learning techniques that were not specifically designed for the author name disambiguation task.


2006 ◽  
Author(s):  
Christopher Schreiner ◽  
Kari Torkkola ◽  
Mike Gardner ◽  
Keshu Zhang

2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Sign in / Sign up

Export Citation Format

Share Document