scholarly journals A Review of Infrastructures to Process Big Multimedia Data

Author(s):  
Jaime Salvador ◽  
Zoila Ruiz ◽  
Jose Garcia-Rodriguez

In the last years, the volume of information is growing faster than ever before, moving from small to huge, structured to unstructured datasets like text, image, audio and video. The purpose of processing the data is aimed to extract relevant information on trends, challenges and opportunities; all these studies with large volumes of data. The increase in the power of parallel computing enabled the use of Machine Learning (ML) techniques to take advantage of the processing capabilities offered by new architectures on large volumes of data. For this reason, it is necessary to find mechanisms that allow classify and organize them to facilitate to the users the extraction of the required information. The processing of these data requires the use of classification techniques that will be reviewed. This work analyzes different studies carried out on the use of ML for processing large volumes of data (Big Multimedia Data) and proposes a classification, using as criteria, the hardware infrastructures used in works of machine learning parallel approaches applied to large volumes of data.

2020 ◽  
pp. 1-12
Author(s):  
Jaime Salvador ◽  
Zoila Ruiz ◽  
Jose Garcia-Rodriguez

In the last years, the volume of information is growing faster than ever before, moving from small to huge, structured to unstructured datasets like text, image, audio and video. The purpose of processing the data is aimed to extract relevant information on trends, challenges and opportunities; all these studies with large volumes of data. The increase in the power of parallel computing enabled the use of Machine Learning (ML) techniques to take advantage of the processing capabilities offered by new architectures on large volumes of data. For this reason, it is necessary to find mechanisms that allow classify and organize them to facilitate to the users the extraction of the required information. The processing of these data requires the use of classification techniques that will be reviewed. This work analyzes different studies carried out on the use of ML for processing large volumes of data (Big Multimedia Data) and proposes a classification, using as criteria, the hardware infrastructures used in works of machine learning parallel approaches applied to large volumes of data.


Author(s):  
Amisha Sinha ◽  
Mohnish Raval ◽  
S Sindhu

Social media plays a vital role in connecting people around world and developing relationships. Social Media has a huge potential audience and the circulation of any information does impact a huge population. With the surge of Covid-19, we can see a lot offake news and tweets circulating about remedies, medicine, and general information related to pandemics. In this paper, we set out machine learning-based detection of deceptive information around Covid-19. With this paper, we have described our project which could detect whether a tweet is fake or real automatically. The labeled dataset is used in the process which is extracted from the arXiv repository. Dataset has tweets, upon which various methods are applied for cleaning, training, and testing. Preprocessing, Classification, tokenization, and stemming/removal of stop words are performed to extract the most relevant information from the dataset and to achieve better accuracy in comparison with the existing system. For classification, we have used two classification techniques- Tf-Idf and Bags of words. To achieve better accuracy, we have used two other methodology-SVM and Random Forest and have achieved an F1-score of 0.94 using SVM.


Author(s):  
Sumi Helal ◽  
Flavia C. Delicato ◽  
Cintia B. Margi ◽  
Satyajayant Misra ◽  
Markus Endler

Diagnostics ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 40
Author(s):  
Meike Nauta ◽  
Ricky Walsh ◽  
Adam Dubowski ◽  
Christin Seifert

Machine learning models have been successfully applied for analysis of skin images. However, due to the black box nature of such deep learning models, it is difficult to understand their underlying reasoning. This prevents a human from validating whether the model is right for the right reasons. Spurious correlations and other biases in data can cause a model to base its predictions on such artefacts rather than on the true relevant information. These learned shortcuts can in turn cause incorrect performance estimates and can result in unexpected outcomes when the model is applied in clinical practice. This study presents a method to detect and quantify this shortcut learning in trained classifiers for skin cancer diagnosis, since it is known that dermoscopy images can contain artefacts. Specifically, we train a standard VGG16-based skin cancer classifier on the public ISIC dataset, for which colour calibration charts (elliptical, coloured patches) occur only in benign images and not in malignant ones. Our methodology artificially inserts those patches and uses inpainting to automatically remove patches from images to assess the changes in predictions. We find that our standard classifier partly bases its predictions of benign images on the presence of such a coloured patch. More importantly, by artificially inserting coloured patches into malignant images, we show that shortcut learning results in a significant increase in misdiagnoses, making the classifier unreliable when used in clinical practice. With our results, we, therefore, want to increase awareness of the risks of using black box machine learning models trained on potentially biased datasets. Finally, we present a model-agnostic method to neutralise shortcut learning by removing the bias in the training dataset by exchanging coloured patches with benign skin tissue using image inpainting and re-training the classifier on this de-biased dataset.


2021 ◽  
Vol 10 (18) ◽  
pp. 4245
Author(s):  
Jörn Lötsch ◽  
Constantin A. Hintschich ◽  
Petros Petridis ◽  
Jürgen Pade ◽  
Thomas Hummel

Chronic rhinosinusitis (CRS) is often treated by functional endoscopic paranasal sinus surgery, which improves endoscopic parameters and quality of life, while olfactory function was suggested as a further criterion of treatment success. In a prospective cohort study, 37 parameters from four categories were recorded from 60 men and 98 women before and four months after endoscopic sinus surgery, including endoscopic measures of nasal anatomy/pathology, assessments of olfactory function, quality of life, and socio-demographic or concomitant conditions. Parameters containing relevant information about changes associated with surgery were examined using unsupervised and supervised methods, including machine-learning techniques for feature selection. The analyzed cohort included 52 men and 38 women. Changes in the endoscopic Lildholdt score allowed separation of baseline from postoperative data with a cross-validated accuracy of 85%. Further relevant information included primary nasal symptoms from SNOT-20 assessments, and self-assessments of olfactory function. Overall improvement in these relevant parameters was observed in 95% of patients. A ranked list of criteria was developed as a proposal to assess the outcome of functional endoscopic sinus surgery in CRS patients with nasal polyposis. Three different facets were captured, including the Lildholdt score as an endoscopic measure and, in addition, disease-specific quality of life and subjectively perceived olfactory function.


Author(s):  
Alex Freitas ◽  
André C.P.L.F. de Carvalho

In machine learning and data mining, most of the works in classification problems deal with flat classification, where each instance is classified in one of a set of possible classes and there is no hierarchical relationship between the classes. There are, however, more complex classification problems where the classes to be predicted are hierarchically related. This chapter presents a tutorial on the hierarchical classification techniques found in the literature. We also discuss how hierarchical classification techniques have been applied to the area of bioinformatics (particularly the prediction of protein function), where hierarchical classification problems are often found.


Sign in / Sign up

Export Citation Format

Share Document