scholarly journals Applying Data Augmentation for Disambiguating Author Names

2021 ◽  
Author(s):  
Luciano V. B. Espiridião ◽  
Laura L. Dias ◽  
Anderson A. Ferreira

Author name ambiguity is one of the most challenging issues that can compromise the information quality in a scholarly digital library. For years, researchers have been searched for solutions to solve such a problem. Despite the many methods already proposed, the question remains open. In this study, we address the issue of producing a more accurate disambiguation function by means of applying data augmentation in the set of data training. We also propose a SyGAR-based data augmentation approach and evaluate our proposal on three collections commonly used in works about author name disambiguation task. The experimental results showed scenarios where improvements are possible in the author name disambiguation task. The proposal of data augmentation outperforms other data augmentation approach, as well as improves some machine learning techniques that were not specifically designed for the author name disambiguation task.

Author(s):  
Anurag Yedla ◽  
Fatemeh Davoudi Kakhki ◽  
Ali Jannesari

Mining is known to be one of the most hazardous occupations in the world. Many serious accidents have occurred worldwide over the years in mining. Although there have been efforts to create a safer work environment for miners, the number of accidents occurring at the mining sites is still significant. Machine learning techniques and predictive analytics are becoming one of the leading resources to create safer work environments in the manufacturing and construction industries. These techniques are leveraged to generate actionable insights to improve decision-making. A large amount of mining safety-related data are available, and machine learning algorithms can be used to analyze the data. The use of machine learning techniques can significantly benefit the mining industry. Decision tree, random forest, and artificial neural networks were implemented to analyze the outcomes of mining accidents. These machine learning models were also used to predict days away from work. An accidents dataset provided by the Mine Safety and Health Administration was used to train the models. The models were trained separately on tabular data and narratives. The use of a synthetic data augmentation technique using word embedding was also investigated to tackle the data imbalance problem. Performance of all the models was compared with the performance of the traditional logistic regression model. The results show that models trained on narratives performed better than the models trained on structured/tabular data in predicting the outcome of the accident. The higher predictive power of the models trained on narratives led to the conclusion that the narratives have additional information relevant to the outcome of injury compared to the tabular entries. The models trained on tabular data had a lower mean squared error compared to the models trained on narratives while predicting the days away from work. The results highlight the importance of predictors, like shift start time, accident time, and mining experience in predicting the days away from work. It was found that the F1 score of all the underrepresented classes except one improved after the use of the data augmentation technique. This approach gave greater insight into the factors influencing the outcome of the accident and days away from work.


2020 ◽  
Author(s):  
Rija Tonny Christian Ramarolahy ◽  
Esther Opoku Gyasi ◽  
Alessandro Crimi

Abstract Background: Recent studies use machine-learning techniques to detect parasites in microscopy images automatically. However, these tools are trained and tested in specific datasets. Indeed, even if over-fitting is avoided during the improvements of computer vision applications, large differences are expected. Differences might be related to settings of camera (exposure, white balance settings, etc) and different blood film slides preparation. Moreover, generative adversial networks offer new opportunities in microscopy: data homogenization, and increase of images in case of imbalanced or small sample size. Methods: Taking into consideration all those aspects, in this paper, we describe a more complete view including both detection and generating synthetic images: i) an automated detection used to detect malaria parasites on stained blood smear images using machine learning techniques testing several datasets. ii) investigate transfer learning and further testing in different unseen datasets having different staining, microscope, resolution, etc. iii) a generative approach to create synthetic images which can deceive experts. Results: The tested architecture achieved 0.98 and 0.95 area under the ROC curve in classifying images with respectively thin and thick smear. Moreover, the generated images proved to be very similar to the original and difficult to be distinguished by an expert microscopist, which identified correcly the real data for one dataset but had 50\% misclassification for another dataset of images. Conclusion: The proposed deep-learning architecture performed well on a classification task for malaria parasites classification. The automated detection for malaria can help the technician to reduce their work and do not need any presence of experts. Moreover, generative networks can also be applied to blood smear images to generate useful images for microscopists. Opening new ways to data augmentation, translation and homogenization.


Author(s):  
Ramgopal Kashyap

Fast advancements in equipment, programming, and correspondence advances have permitted the rise of internet-associated tangible gadgets that give perception and information estimation from the physical world. It is assessed that the aggregate number of internet-associated gadgets being utilized will be in the vicinity of 25 and 50 billion. As the numbers develop and advances turn out to be more develop, the volume of information distributed will increment. Web-associated gadgets innovation, alluded to as internet of things (IoT), keeps on broadening the present internet by giving network and cooperation between the physical and digital universes. Notwithstanding expanded volume, the IoT produces big data described by speed as far as time and area reliance, with an assortment of numerous modalities and changing information quality. Keen handling and investigation of this big data is the way to creating shrewd IoT applications. This chapter evaluates the distinctive machine learning techniques that deal with the difficulties in IoT information.


Author(s):  
Ramgopal Kashyap

Fast advancements in equipment, programming, and correspondence advances have permitted the rise of internet-associated tangible gadgets that give perception and information estimation from the physical world. It is assessed that the aggregate number of internet-associated gadgets being utilized will be in the vicinity of 25 and 50 billion. As the numbers develop and advances turn out to be more develop, the volume of information distributed will increment. Web-associated gadgets innovation, alluded to as internet of things (IoT), keeps on broadening the present internet by giving network and cooperation between the physical and digital universes. Notwithstanding expanded volume, the IoT produces big data described by speed as far as time and area reliance, with an assortment of numerous modalities and changing information quality. Keen handling and investigation of this big data is the way to creating shrewd IoT applications. This chapter evaluates the distinctive machine learning techniques that deal with the difficulties in IoT information.


2021 ◽  
Vol 9 (1) ◽  
pp. 519-525
Author(s):  
B. Hemalatha, Dr. M. Renukadevi

Alzheimer's Disease (AD) is referred to as one of the highest non-unusual neurodegenerative disorders that inflict eternal harm to the memory-associated brain cells and wonder skills. There is a 99.6 percent failure rate in clinical trials of Alzheimer's disease pills, perhaps due to the fact that AD sufferers cannot be without early-stage complications. This observation analyzed machine learning knowledge of strategies to use empirical statistics to forecast the progression of AD in the years of fate. Diagnosis of AD is often difficult, particularly at an early stage in the disease system, due to the degree of mild cognitive impairment (MCI). However, it is at this point where treatment is much more likely to be successful, so there will be great benefits in enhancing the diagnosis process. Research in this area aims to identify the most complex mechanisms directly related to changes in AD. Various imaging methods are used to diagnose AD, and image modes play a key role in the diagnosis of AD. This paper uses a Positron Emission Tomography (PET) image to detect AD early. The PET image is often used to know how organs and tissues function in the human body. This research study analyses prediction approaches using various kinds of machine learning algorithms to solve AD diagnostic problems. Artificial Neural Networks are one of the many algorithms. Modern research has shown that deep learning is a proficient technique for solving numerous problems of image recognition, but most of these published approaches owe their performance to training on a very large number of data samples.


Author(s):  
Muhammad Yasir Bilal ◽  
Rana Muhammad Amir Latif ◽  
N. Z. Jhanjhi ◽  
Mamoona Humayun

Measuring and analyzing the student's visual attention are significant challenges in the e-learning environment. Machine learning techniques and multimedia tools can be used to examine the visual attention of a student. Emotions play a vital impact in understanding or judging the attention of the student in the class. If the student is interested in the lecture, the teacher can judge it by reading his emotions, and the learning has increased, and students can pay more attention to the classroom, authors say. The study explores the effect on the brand reputation of universities of information and communication technology (ICT), e-service quality, and e-information quality by focusing on the e-learning and fulfillment of students.


2018 ◽  
Vol 7 (3) ◽  
pp. 1136
Author(s):  
V Devasekhar ◽  
P Natarajan

Data Mining is an extraction of important knowledge from the various databases using different kinds of approaches. In the multi agent, distributed mining the knowledge aggregation is one of challenging task. This paper tries to optimize the problem of aggregation and boils down into the solution, which is derived based on the machine learning statistical features of each agents. However, in this paper a novel optimization algorithm called Multi-Agent Based Data Mining Aggregation (MABDA) is used for present day’s scenarios. The MBADA algorithm has agents which collect extracted knowledge and summarizes the various levels of agent’s cluster data into an aggregation with maximum accuracies. To prove the effectiveness of the proposed algorithm, the experimental results are compared with relatively existing methods. 


2020 ◽  
Vol 10 (18) ◽  
pp. 6452 ◽  
Author(s):  
Yong-Hyuk Kim ◽  
Seung-Hyun Moon ◽  
Yourim Yoon

The lidar ceilometer estimates cloud height by analyzing backscatter data. This study examines weather detectability using a lidar ceilometer by making an unprecedented attempt at detecting weather phenomena through the application of machine learning techniques to the backscatter data obtained from a lidar ceilometer. This study investigates the weather phenomena of precipitation and fog, which are expected to greatly affect backscatter data. In this experiment, the backscatter data obtained from the lidar ceilometer, CL51, installed in Boseong, South Korea, were used. For validation, the data from the automatic weather station for precipitation and visibility sensor PWD20 for fog, installed at the same location, were used. The experimental results showed potential for precipitation detection, which yielded an F1 score of 0.34. However, fog detection was found to be very difficult and yielded an F1 score of 0.10.


Sign in / Sign up

Export Citation Format

Share Document