Quality of sentiment analysis tools

In this paper, we present a comprehensive study that evaluates six state-of-the-art sentiment analysis tools on five public datasets, based on the quality of predictive results in the presence of semantically equivalent documents, i.e., how consistent existing tools are in predicting the polarity of documents based on paraphrased text. We observe that sentiment analysis tools exhibit intra-tool inconsistency , which is the prediction of different polarity for semantically equivalent documents by the same tool, and inter-tool inconsistency , which is the prediction of different polarity for semantically equivalent documents across different tools. We introduce a heuristic to assess the data quality of an augmented dataset and a new set of metrics to evaluate tool inconsistencies. Our results indicate that tool inconsistencies is still an open problem, and they point towards promising research directions and accuracy improvements that can be obtained if such inconsistencies are resolved.

Download Full-text

Research directions in data wrangling: Visualizations and transformations for usable and credible data

Information Visualization ◽

10.1177/1473871611415994 ◽

2011 ◽

Vol 10 (4) ◽

pp. 271-288 ◽

Cited By ~ 121

Author(s):

Sean Kandel ◽

Jeffrey Heer ◽

Catherine Plaisant ◽

Jessie Kennedy ◽

Frank van Ham ◽

...

Keyword(s):

Data Quality ◽

State Of The Art ◽

Interactive Systems ◽

Research Directions ◽

Data Transformations ◽

Challenges And Opportunities ◽

Interactive Visualizations ◽

Research Questions ◽

Quality Issues ◽

Integrate Data

In spite of advances in technologies for working with data, analysts still spend an inordinate amount of time diagnosing data quality issues and manipulating data into a usable form. This process of ‘data wrangling’ often constitutes the most tedious and time-consuming aspect of analysis. Though data cleaning and integration arelongstanding issues in the database community, relatively little research has explored how interactive visualization can advance the state of the art. In this article, we review the challenges and opportunities associated with addressing data quality issues. We argue that analysts might more effectively wrangle data through new interactive systems that integrate data verification, transformation, and visualization. We identify a number of outstanding research questions, including how appropriate visual encodings can facilitate apprehension of missing data, discrepant values, and uncertainty; how interactive visualizations might facilitate data transform specification; and how recorded provenance and social interaction might enable wider reuse, verification, and modification of data transformations.

Download Full-text

Calibration of CO, NO2, and O3 Using Airify: A Low-Cost Sensor Cluster for Air Quality Monitoring

Sensors ◽

10.3390/s21237977 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7977

Author(s):

Marian-Emanuel Ionascu ◽

Nuria Castell ◽

Oana Boncalo ◽

Philipp Schneider ◽

Marius Darie ◽

...

Keyword(s):

Air Quality ◽

Data Quality ◽

State Of The Art ◽

Low Cost ◽

Unit Cost ◽

Measured Data ◽

Quality Monitoring ◽

Air Quality Monitoring ◽

Similar Accuracy

During the last decade, extensive research has been carried out on the subject of low-cost sensor platforms for air quality monitoring. A key aspect when deploying such systems is the quality of the measured data. Calibration is especially important to improve the data quality of low-cost air monitoring devices. The measured data quality must comply with regulations issued by national or international authorities in order to be used for regulatory purposes. This work discusses the challenges and methods suitable for calibrating a low-cost sensor platform developed by our group, Airify, that has a unit cost five times less expensive than the state-of-the-art solutions (approximately €1000). The evaluated platform can integrate a wide variety of sensors capable of measuring up to 12 parameters, including the regulatory pollutants defined in the European Directive. In this work, we developed new calibration models (multivariate linear regression and random forest) and evaluated their effectiveness in meeting the data quality objective (DQO) for the following parameters: carbon monoxide (CO), ozone (O3), and nitrogen dioxide (NO2). The experimental results show that the proposed calibration managed an improvement of 12% for the CO and O3 gases and a similar accuracy for the NO2 gas compared to similar state-of-the-art studies. The evaluated parameters had different calibration accuracies due to the non-identical levels of gas concentration at which the sensors were exposed during the model’s training phase. After the calibration algorithms were applied to the evaluated platform, its performance met the DQO criteria despite the overall low price level of the platform.

Download Full-text

Automatic Signature Verification on Handheld Devices

Multimodality in Mobile Computing and Mobile Devices ◽

10.4018/978-1-60566-978-6.ch014 ◽

2010 ◽

pp. 321-338

Author(s):

Marcos Martinez-Diaz ◽

Julian Fierrez ◽

Javier Ortega-Garcia

Keyword(s):

State Of The Art ◽

Multimodal Interfaces ◽

Signature Verification ◽

Handheld Devices ◽

Research Directions ◽

Verification System ◽

Corporate Environments ◽

Available Resources

Automatic signature verification on handheld devices can be seen as a means to improve usability in consumer applications and a way to reduce costs in corporate environments. It can be easily integrated in touchscreen devices, for example, as a part of combined handwriting and keypad-based multimodal interfaces. In the last few decades, several approaches to the problem of signature verification have been proposed. However, most research has been carried out considering signatures captured with digitizing tables, in which the quality of the captured data is much higher than in handheld devices. Signature verification on handheld devices represents a new scenario both for researchers and vendors. In this chapter, we introduce automatic signature verification as a component of multimodal interfaces; we analyze the applications and challenges of signature verification and overview available resources and research directions. A case study is also given, in which a state-of-the-art signature verification system adapted to handheld devices is presented.

Download Full-text

Cold-Start Aware Deep Memory Network for Multi-Entity Aspect-Based Sentiment Analysis

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/722 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kaisong Song ◽

Wei Gao ◽

Lujun Zhao ◽

Jun Lin ◽

Changlong Sun ◽

...

Keyword(s):

Sentiment Analysis ◽

State Of The Art ◽

Cold Start ◽

Experimental Results ◽

Target Information ◽

Data Sparsity ◽

Memory Network ◽

Encoding Method ◽

Public Datasets ◽

Cold Start Problem

Various types of target information have been considered in aspect-based sentiment analysis, such as entities and aspects. Existing research has realized the importance of targets and developed methods with the goal of precisely modeling their contexts via generating target-specific representations. However, all these methods ignore that these representations cannot be learned well due to the lack of sufficient human-annotated target-related reviews, which leads to the data sparsity challenge, a.k.a. cold-start problem here. In this paper, we focus on a more general multiple entity aspect-based sentiment analysis (ME-ABSA) task which aims at identifying the sentiment polarity of different aspects of multiple entities in their context. Faced with severe cold-start scenario, we develop a novel and extensible deep memory network framework with cold-start aware computational layers which use frequency-guided attention mechanism to accentuate on the most related targets, and then compose their representations into a complementary vector for enhancing the representations of cold-start entities and aspects. To verify the effectiveness of the framework, we instantiate it with a concrete context encoding method and then apply the model to the ME-ABSA task. Experimental results conducted on two public datasets demonstrate that the proposed approach outperforms state-of-the-art baselines on ME-ABSA task.

Download Full-text

Survey of Post-OCR Processing Approaches

ACM Computing Surveys ◽

10.1145/3453476 ◽

2021 ◽

Vol 54 (6) ◽

pp. 1-37

Author(s):

Thi Tuyet Hai Nguyen ◽

Adam Jatowt ◽

Mickael Coustaty ◽

Doucet Antoine

Keyword(s):

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

State Of The Art ◽

Current Trend ◽

Language Resources ◽

Research Directions ◽

Historical Materials ◽

Machine Readable

Optical character recognition (OCR) is one of the most popular techniques used for converting printed documents into machine-readable ones. While OCR engines can do well with modern text, their performance is unfortunately significantly reduced on historical materials. Additionally, many texts have already been processed by various out-of-date digitisation techniques. As a consequence, digitised texts are noisy and need to be post-corrected. This article clarifies the importance of enhancing quality of OCR results by studying their effects on information retrieval and natural language processing applications. We then define the post-OCR processing problem, illustrate its typical pipeline, and review the state-of-the-art post-OCR processing approaches. Evaluation metrics, accessible datasets, language resources, and useful toolkits are also reported. Furthermore, the work identifies the current trend and outlines some research directions of this field.

Download Full-text

The Influence Of The Led Luminaires Electrical Parameters On Their Correlated Colour Temperature During Operation Mode

Light & Engineering ◽

10.33383/2020-025 ◽

2020 ◽

pp. 89-96

Author(s):

Sergei S. Kapitonov ◽

Alexei S. Vinokurov ◽

Sergei V. Prytkov ◽

Sergei Yu. Grigorovich ◽

Anastasia V. Kapitonova ◽

...

Keyword(s):

Operation Mode ◽

Light Sources ◽

Electrical Parameters ◽

Colour Temperature ◽

Radiation Spectra ◽

Service Period ◽

Close Proximity ◽

Definition Of ◽

Comprehensive Study

The article describes the results of comprehensive study aiming at increase of quality of LED luminaires and definition of the nature of changes in their correlated colour temperature (CCT) in the course of operation. Dependences of CCT of LED luminaires with remote and close location of phosphor for 10 thousand hours of operation in different electric modes were obtained; the results of comparison between the initial and final radiation spectra of the luminaires are presented; using mathematical statistics methods, variation of luminaire CCT over the service period claimed by the manufacturer is forecast; the least favourable electric operation modes with the highest CCT variation observed are defined. The obtained results have confirmed availability of the problem of variation of CCT of LED luminaires during their operation. Possible way of its resolution is application of more qualitative and therefore expensive LEDs with close proximity of phosphor or LEDs with remote phosphor. The article may be interesting both for manufacturers and consumers of LED light sources and lighting devices using them.

Download Full-text

Improving data quality of a trauma register

10.26226/morressier.58f5b02fd462b80296c9e0d7 ◽

2017 ◽

Author(s):

Estefania Rabaneda Romero

Keyword(s):

Data Quality

Download Full-text

A Comprehensive Study on Sentiment Analysis Using Deep Forest

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i8.115123 ◽

2018 ◽

Vol 6 (8) ◽

pp. 115-123 ◽

Cited By ~ 1

Author(s):

Krishna Priya S ◽

Shaksham Kapoor ◽

Kavita S Oza ◽

R.K. Kamat

Keyword(s):

Sentiment Analysis ◽

Deep Forest ◽

Comprehensive Study

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

Designing Information Product (IP) Maps On the Process of Data Processing and Academic Information

International Journal of New Media Technology ◽

10.31937/ijnmt.v4i1.534 ◽

2017 ◽

Vol 4 (1) ◽

pp. 25-31 ◽

Cited By ~ 1

Author(s):

Diana Effendi

Keyword(s):

Data Quality ◽

Data Management ◽

Information Management ◽

Information Quality ◽

Quality Data ◽

Management Approach ◽

Quality Of Data ◽

Information Product ◽

Academic Activities

Information Product Approach (IP Approach) is an information management approach. It can be used to manage product information and data quality analysis. IP-Map can be used by organizations to facilitate the management of knowledge in collecting, storing, maintaining, and using the data in an organized. The process of data management of academic activities in X University has not yet used the IP approach. X University has not given attention to the management of information quality of its. During this time X University just concern to system applications used to support the automation of data management in the process of academic activities. IP-Map that made in this paper can be used as a basis for analyzing the quality of data and information. By the IP-MAP, X University is expected to know which parts of the process that need improvement in the quality of data and information management. Index term: IP Approach, IP-Map, information quality, data quality. REFERENCES[1] H. Zhu, S. Madnick, Y. Lee, and R. Wang, “Data and Information Quality Research: Its Evolution and Future,” Working Paper, MIT, USA, 2012.[2] Lee, Yang W; at al, Journey To Data Quality, MIT Press: Cambridge, 2006.[3] L. Al-Hakim, Information Quality Management: Theory and Applications. Idea Group Inc (IGI), 2007.[4] “Access : A semiotic information quality framework: development and comparative analysis : Journal ofInformation Technology.” [Online]. Available: http://www.palgravejournals.com/jit/journal/v20/n2/full/2000038a.html. [Accessed: 18-Sep-2015].[5] Effendi, Diana, Pengukuran Dan Perbaikan Kualitas Data Dan Informasi Di Perguruan Tinggi MenggunakanCALDEA Dan EVAMECAL (Studi Kasus X University), Proceeding Seminar Nasional RESASTEK, 2012, pp.TIG.1-TI-G.6.

Download Full-text