Big Data and Cognitive Computing

An Efficient Multi-Scale Anchor Box Approach to Detect Partial Faces from a Video Sequence

Big Data and Cognitive Computing ◽

10.3390/bdcc6010009 ◽

2022 ◽

Vol 6 (1) ◽

pp. 9

Author(s):

Dweepna Garg ◽

Priyanka Jain ◽

Ketan Kotecha ◽

Parth Goel ◽

Vijayakumar Varadarajan

Keyword(s):

Deep Learning ◽

Face Detection ◽

Machine Learning Techniques ◽

Face Shape ◽

Detection Algorithms ◽

Accuracy And Precision ◽

Learning Techniques ◽

Proposed Model ◽

The Face ◽

Scale Anchor

In recent years, face detection has achieved considerable attention in the field of computer vision using traditional machine learning techniques and deep learning techniques. Deep learning is used to build the most recent and powerful face detection algorithms. However, partial face detection still remains to achieve remarkable performance. Partial faces are occluded due to hair, hat, glasses, hands, mobile phones, and side-angle-captured images. Fewer facial features can be identified from such images. In this paper, we present a deep convolutional neural network face detection method using the anchor boxes section strategy. We limited the number of anchor boxes and scales and chose only relevant to the face shape. The proposed model was trained and tested on a popular and challenging face detection benchmark dataset, i.e., Face Detection Dataset and Benchmark (FDDB), and can also detect partially covered faces with better accuracy and precision. Extensive experiments were performed, with evaluation metrics including accuracy, precision, recall, F1 score, inference time, and FPS. The results show that the proposed model is able to detect the face in the image, including occluded features, more precisely than other state-of-the-art approaches, achieving 94.8% accuracy and 98.7% precision on the FDDB dataset at 21 frames per second (FPS).

Download Full-text

Infusing Autopoietic and Cognitive Behaviors into Digital Automata to Improve Their Sentience, Resilience, and Intelligence

Big Data and Cognitive Computing ◽

10.3390/bdcc6010007 ◽

2022 ◽

Vol 6 (1) ◽

pp. 7

Author(s):

Rao Mikkilineni

Keyword(s):

General Theory ◽

Computer Science ◽

Common Knowledge ◽

Physical World ◽

Cognitive Behaviors ◽

Symbolic Computing ◽

New Science ◽

Theory Of Information ◽

Living Organisms ◽

Classical Computer

All living beings use autopoiesis and cognition to manage their “life” processes from birth through death. Autopoiesis enables them to use the specification in their genomes to instantiate themselves using matter and energy transformations. They reproduce, replicate, and manage their stability. Cognition allows them to process information into knowledge and use it to manage its interactions between various constituent parts within the system and its interaction with the environment. Currently, various attempts are underway to make modern computers mimic the resilience and intelligence of living beings using symbolic and sub-symbolic computing. We discuss here the limitations of classical computer science for implementing autopoietic and cognitive behaviors in digital machines. We propose a new architecture applying the general theory of information (GTI) and pave the path to make digital automata mimic living organisms by exhibiting autopoiesis and cognitive behaviors. The new science, based on GTI, asserts that information is a fundamental constituent of the physical world and that living beings convert information into knowledge using physical structures that use matter and energy. Our proposal uses the tools derived from GTI to provide a common knowledge representation from existing symbolic and sub-symbolic computing structures to implement autopoiesis and cognitive behaviors.

Download Full-text

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Big Data and Cognitive Computing ◽

10.3390/bdcc6010008 ◽

2022 ◽

Vol 6 (1) ◽

pp. 8

Author(s):

Roberta Rodrigues de Lima ◽

Anita M. R. Fernandes ◽

James Roberto Bombasar ◽

Bruno Alves da Silva ◽

Paul Crocker ◽

...

Keyword(s):

International Trade ◽

Great Promise ◽

Classification Problems ◽

Empirical Comparison ◽

Legal Implications ◽

Supervised Learning Algorithms ◽

Import And Export ◽

Correct Category ◽

Real Challenge

Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.

Download Full-text

On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining

Big Data and Cognitive Computing ◽

10.3390/bdcc6010006 ◽

2022 ◽

Vol 6 (1) ◽

pp. 6

Author(s):

Gomathy Ramaswami ◽

Teo Susnjak ◽

Anuradha Mathrani

Keyword(s):

At Risk ◽

Predictive Model ◽

Operating Characteristic ◽

Prediction Models ◽

Characteristic Curve ◽

At Risk Students ◽

Educational Data Mining ◽

Generic Model ◽

Generic Models ◽

Excellent Candidate

Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets.

Download Full-text

A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data

Big Data and Cognitive Computing ◽

10.3390/bdcc6010005 ◽

2022 ◽

Vol 6 (1) ◽

pp. 5

Author(s):

Giuseppe Di Modica ◽

Orazio Tomarchio

Keyword(s):

Big Data ◽

High Performance ◽

Small Scale ◽

Test Results ◽

Computing Power ◽

The Past ◽

Broadband Network ◽

Geographically Distributed ◽

Mapreduce Paradigm ◽

Hadoop Framework

In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.

Download Full-text

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals

Big Data and Cognitive Computing ◽

10.3390/bdcc6010004 ◽

2022 ◽

Vol 6 (1) ◽

pp. 4

Author(s):

Dmitry Soshnikov ◽

Tatiana Petrova ◽

Vickie Soshnikova ◽

Andrey Grunin

Keyword(s):

Artificial Intelligence ◽

Named Entity Recognition ◽

Treatment Strategies ◽

Main Idea ◽

Signs And Symptoms ◽

Entity Recognition ◽

Text Corpus ◽

Scientific Papers ◽

Fast Processing ◽

Structured Information

Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers navigate the medical papers collections in a meaningful way and extract some knowledge from scientific COVID-19 papers. The main idea of our approach is to get as much semi-structured information from text corpus as possible, using named entity recognition (NER) with a model called PubMedBERT and Text Analytics for Health service, then store the data into NoSQL database for further fast processing and insights generation. Additionally, the contexts in which the entities were used (neutral or negative) are determined. Application of NLP and text-based emotion detection (TBED) methods to COVID-19 text corpus allows us to gain insights on important issues of diagnosis and treatment (such as changes in medical treatment over time, joint treatment strategies using several medications, and the connection between signs and symptoms of coronavirus, etc.).

Download Full-text

Analyzing Political Polarization on Social Media by Deleting Bot Spamming

Big Data and Cognitive Computing ◽

10.3390/bdcc6010003 ◽

2022 ◽

Vol 6 (1) ◽

pp. 3

Author(s):

Riccardo Cantini ◽

Fabrizio Marozzo ◽

Domenico Talia ◽

Paolo Trunfio

Keyword(s):

Social Media ◽

Presidential Election ◽

Opinion Mining ◽

Political Support ◽

Political Polarization ◽

Political Issues ◽

Political Factions ◽

Us Presidential Election ◽

Twitter Users ◽

Rich Data

Social media platforms are part of everyday life, allowing the interconnection of people around the world in large discussion groups relating to every topic, including important social or political issues. Therefore, social media have become a valuable source of information-rich data, commonly referred to as Social Big Data, effectively exploitable to study the behavior of people, their opinions, moods, interests and activities. However, these powerful communication platforms can be also used to manipulate conversation, polluting online content and altering the popularity of users, through spamming activities and misinformation spreading. Recent studies have shown the use on social media of automatic entities, defined as social bots, that appear as legitimate users by imitating human behavior aimed at influencing discussions of any kind, including political issues. In this paper we present a new methodology, namely TIMBRE (Time-aware opInion Mining via Bot REmoval), aimed at discovering the polarity of social media users during election campaigns characterized by the rivalry of political factions. This methodology is temporally aware and relies on a keyword-based classification of posts and users. Moreover, it recognizes and filters out data produced by social media bots, which aim to alter public opinion about political candidates, thus avoiding heavily biased information. The proposed methodology has been applied to a case study that analyzes the polarization of a large number of Twitter users during the 2016 US presidential election. The achieved results show the benefits brought by both removing bots and taking into account temporal aspects in the forecasting process, revealing the high accuracy and effectiveness of the proposed approach. Finally, we investigated how the presence of social bots may affect political discussion by studying the 2016 US presidential election. Specifically, we analyzed the main differences between human and artificial political support, estimating also the influence of social bots on legitimate users.

Download Full-text

Early Diagnosis of Alzheimer’s Disease Using Cerebral Catheter Angiogram Neuroimaging: A Novel Model Based on Deep Learning Approaches

Big Data and Cognitive Computing ◽

10.3390/bdcc6010002 ◽

2021 ◽

Vol 6 (1) ◽

pp. 2

Author(s):

Maha Gharaibeh ◽

Mothanna Almahmoud ◽

Mustafa Ali ◽

Amer Al-Badarneh ◽

Mwaffaq El-Heis ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Blood Flow ◽

Cerebral Blood Flow ◽

Early Diagnosis ◽

Real Space ◽

Learning Approaches ◽

Neural Structure ◽

Novel Approach ◽

Multiple Frames

Neuroimaging refers to the techniques that provide efficient information about the neural structure of the human brain, which is utilized for diagnosis, treatment, and scientific research. The problem of classifying neuroimages is one of the most important steps that are needed by medical staff to diagnose their patients early by investigating the indicators of different neuroimaging types. Early diagnosis of Alzheimer’s disease is of great importance in preventing the deterioration of the patient’s situation. In this research, a novel approach was devised based on a digital subtracted angiogram scan that provides sufficient features of a new biomarker cerebral blood flow. The used dataset was acquired from the database of K.A.U.H hospital and contains digital subtracted angiograms of participants who were diagnosed with Alzheimer’s disease, besides samples of normal controls. Since each scan included multiple frames for the left and right ICA’s, pre-processing steps were applied to make the dataset prepared for the next stages of feature extraction and classification. The multiple frames of scans transformed from real space into DCT space and averaged to remove noises. Then, the averaged image was transformed back to the real space, and both sides filtered with Meijering and concatenated in a single image. The proposed model extracts the features using different pre-trained models: InceptionV3 and DenseNet201. Then, the PCA method was utilized to select the features with 0.99 explained variance ratio, where the combination of selected features from both pre-trained models is fed into machine learning classifiers. Overall, the obtained experimental results are at least as good as other state-of-the-art approaches in the literature and more efficient according to the recent medical standards with a 99.14% level of accuracy, considering the difference in dataset samples and the used cerebral blood flow biomarker.

Download Full-text

AGR4BS: A Generic Multi-Agent Organizational Model for Blockchain Systems

Big Data and Cognitive Computing ◽

10.3390/bdcc6010001 ◽

2021 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Hector Roussille ◽

Önder Gürcan ◽

Fabien Michel

Keyword(s):

Social Systems ◽

Generic Model ◽

Multi Agent Systems ◽

Dynamic Interactions ◽

The Public ◽

Agent Systems ◽

Unified Field ◽

Multi Agent ◽

The Rich ◽

Agent Group

Blockchain is a very attractive technology since it maintains a public, append-only, immutable and ordered log of transactions which guarantees an auditable ledger accessible by anyone. Blockchain systems are inherently interdisciplinary since they combine various fields such as cryptography, multi-agent systems, distributed systems, social systems, economy, and finance. Furthermore, they have a very active and dynamic ecosystem where new blockchain platforms and algorithms are developed continuously due to the interest of the public and the industries to the technology. Consequently, we anticipate a challenging and interdisciplinary research agenda in blockchain systems, built upon a methodology that strives to capture the rich process resulting from the interplay between the behavior of agents and the dynamic interactions among them. To be effective, however, modeling studies providing insights into blockchain systems, and appropriate description of agents paired with a generic understanding of their components are needed. Such studies will create a more unified field of blockchain systems that advances our understanding and leads to further insight. According to this perspective, in this study, we propose using a generic multi-agent organizational modeling for studying blockchain systems, namely AGR4BS. Concretely, we use the Agent/Group/Role (AGR) organizational modeling approach to identify and represent the generic entities which are common to blockchain systems. We show through four real case studies how this generic model can be used to model different blockchain systems. We also show briefly how it can be used for modeling three well-known attacks on blockchain systems.

Download Full-text

Clustering Algorithm to Measure Student Assessment Accuracy: A Double Study

Big Data and Cognitive Computing ◽

10.3390/bdcc5040081 ◽

2021 ◽

Vol 5 (4) ◽

pp. 81

Author(s):

Sónia Rolland Sobral ◽

Catarina Félix de Oliveira

Keyword(s):

Clustering Algorithm ◽

Student Assessment ◽

Academic Learning ◽

First Year ◽

Computer Engineering ◽

Self Assessment ◽

Final Grade ◽

Self Evaluation ◽

Active Teaching ◽

The Difference

Self-assessment is one of the strategies used in active teaching to engage students in the entire learning process, in the form of self-regulated academic learning. This study aims to assess the possibility of including self-evaluation in the student’s final grade, not just as a self-assessment that allows students to predict the grade obtained but also as something to weigh on the final grade. Two different curricular units are used, both from the first year of graduation, one from the international relations course (N = 29) and the other from the computer science and computer engineering courses (N = 50). Students were asked to self-assess at each of the two evaluation moments of each unit, after submitting their work/test and after knowing the correct answers. This study uses statistical analysis as well as a clustering algorithm (K-means) on the data to try to gain deeper knowledge and visual insights into the data and the patterns among them. It was verified that there are no differences between the obtained grade and the thought grade by gender and age variables, but a direct correlation was found between the thought grade averages and the grade level. The difference is less accentuated at the second moment of evaluation—which suggests that an improvement in the self-assessment skill occurs from the first to the second evaluation moment.

Download Full-text

Big Data and Cognitive Computing
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mdpi Ag

An Efficient Multi-Scale Anchor Box Approach to Detect Partial Faces from a Video Sequence

Infusing Autopoietic and Cognitive Behaviors into Digital Automata to Improve Their Sentience, Resilience, and Intelligence

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining

A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals

Analyzing Political Polarization on Social Media by Deleting Bot Spamming

Early Diagnosis of Alzheimer’s Disease Using Cerebral Catheter Angiogram Neuroimaging: A Novel Model Based on Deep Learning Approaches

AGR4BS: A Generic Multi-Agent Organizational Model for Blockchain Systems

Clustering Algorithm to Measure Student Assessment Accuracy: A Double Study

Export Citation Format

Big Data and Cognitive ComputingLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mdpi Ag

An Efficient Multi-Scale Anchor Box Approach to Detect Partial Faces from a Video Sequence

Infusing Autopoietic and Cognitive Behaviors into Digital Automata to Improve Their Sentience, Resilience, and Intelligence

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining

A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals

Analyzing Political Polarization on Social Media by Deleting Bot Spamming

Early Diagnosis of Alzheimer’s Disease Using Cerebral Catheter Angiogram Neuroimaging: A Novel Model Based on Deep Learning Approaches

AGR4BS: A Generic Multi-Agent Organizational Model for Blockchain Systems

Clustering Algorithm to Measure Student Assessment Accuracy: A Double Study

Big Data and Cognitive Computing
Latest Publications