Embedding Non-planar Graphs: Storage and Representation

Mapping Intimacies ◽

10.18690/978-961-286-516-0.13 ◽

2021 ◽

Author(s):

Ðorže Klisura

Keyword(s):

Planar Graphs ◽

State Of The Art ◽

Data Sets ◽

Graph Data ◽

Combinatorial Representations ◽

Vertex Transitive ◽

Canonical Labelling ◽

General Graphs

In this paper, we propose a convention for repre-senting non-planar graphs and their least-crossing embeddings in a canonical way. We achieve this by using state-of-the-art tools such as canonical labelling of graphs, Nauty’s Graph6 string and combinatorial representations for planar graphs. To the best of our knowledge, this has not been done before. Besides, we implement the men-tioned procedure in a SageMath language and compute embeddings for certain classes of cubic, vertex-transitive and general graphs. Our main contribution is an extension of one of the graph data sets hosted on MathDataHub, and towards extending the SageMath codebase.

Download Full-text

TrustSVD: A Novel Trust-Based Matrix Factorization Model with User Trust and Item Ratings

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i11.422 ◽

2017 ◽

Vol 7 (11) ◽

pp. 7 ◽

Cited By ~ 1

Author(s):

K Sobha Rani

Keyword(s):

Matrix Factorization ◽

Social Trust ◽

State Of The Art ◽

Data Sets ◽

Real World Data ◽

Recommendation Algorithm ◽

Active User ◽

Factorization Model ◽

The Social ◽

Matrix Factorization Technique

Collaborative filtering suffers from the problems of data sparsity and cold start, which dramatically degrade recommendation performance. To help resolve these issues, we propose TrustSVD, a trust-based matrix factorization technique. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that our approach TrustSVD achieves better accuracy than other ten counterparts, and can better handle the concerned issues.

Download Full-text

A novel optimal multi-pattern matching method with wildcards for DNA sequence

Technology and Health Care ◽

10.3233/thc-218012 ◽

2021 ◽

Vol 29 ◽

pp. 115-124

Author(s):

Xinlu Wang ◽

Ahmed A.F. Saif ◽

Dayou Liu ◽

Yungang Zhu ◽

Jon Atli Benediktsson

Keyword(s):

Dna Sequence ◽

Pattern Matching ◽

Health Informatics ◽

State Of The Art ◽

Machine Language ◽

Data Sets ◽

Fundamental Issue ◽

Matching Method ◽

Dna Sequence Alignment ◽

The Given

BACKGROUND: DNA sequence alignment is one of the most fundamental and important operation to identify which gene family may contain this sequence, pattern matching for DNA sequence has been a fundamental issue in biomedical engineering, biotechnology and health informatics. OBJECTIVE: To solve this problem, this study proposes an optimal multi pattern matching with wildcards for DNA sequence. METHODS: This proposed method packs the patterns and a sliding window of texts, and the window slides along the given packed text, matching against stored packed patterns. RESULTS: Three data sets are used to test the performance of the proposed algorithm, and the algorithm was seen to be more efficient than the competitors because its operation is close to machine language. CONCLUSIONS: Theoretical analysis and experimental results both demonstrate that the proposed method outperforms the state-of-the-art methods and is especially effective for the DNA sequence.

Download Full-text

gbt-HIPS: Explaining the Classifications of Gradient Boosted Tree Ensembles

Applied Sciences ◽

10.3390/app11062511 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2511

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R. Muhammad Atif Azad

Keyword(s):

State Of The Art ◽

Heuristic Method ◽

Good Explanation ◽

Classification Rule ◽

Data Sets ◽

Classification Models ◽

Boundary Values ◽

Class Label ◽

Input Space ◽

Boosted Tree

This research presents Gradient Boosted Tree High Importance Path Snippets (gbt-HIPS), a novel, heuristic method for explaining gradient boosted tree (GBT) classification models by extracting a single classification rule (CR) from the ensemble of decision trees that make up the GBT model. This CR contains the most statistically important boundary values of the input space as antecedent terms. The CR represents a hyper-rectangle of the input space inside which the GBT model is, very reliably, classifying all instances with the same class label as the explanandum instance. In a benchmark test using nine data sets and five competing state-of-the-art methods, gbt-HIPS offered the best trade-off between coverage (0.16–0.75) and precision (0.85–0.98). Unlike competing methods, gbt-HIPS is also demonstrably guarded against under- and over-fitting. A further distinguishing feature of our method is that, unlike much prior work, our explanations also provide counterfactual detail in accordance with widely accepted recommendations for what makes a good explanation.

Download Full-text

Learning emotional word embeddings for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201993 ◽

2021 ◽

pp. 1-13

Author(s):

Qingtian Zeng ◽

Xishi Zhao ◽

Xiaohui Hu ◽

Hua Duan ◽

Zhongying Zhao ◽

...

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Emotional Word ◽

Classification Model ◽

Data Sets ◽

Word Embeddings ◽

Real World Data ◽

Text Documents

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.

Download Full-text

Critical Aspects of Person Counting and Density Estimation

Journal of Imaging ◽

10.3390/jimaging7020021 ◽

2021 ◽

Vol 7 (2) ◽

pp. 21

Author(s):

Roland Perko ◽

Manfred Klopschitz ◽

Alexander Almer ◽

Peter M. Roth

Keyword(s):

Density Estimation ◽

Network Architecture ◽

Reference Data ◽

State Of The Art ◽

Limit State ◽

Ground Truth ◽

Data Sets ◽

Ground Truth Generation ◽

Baseline Approach ◽

Critical Aspects

Many scientific studies deal with person counting and density estimation from single images. Recently, convolutional neural networks (CNNs) have been applied for these tasks. Even though often better results are reported, it is often not clear where the improvements are resulting from, and if the proposed approaches would generalize. Thus, the main goal of this paper was to identify the critical aspects of these tasks and to show how these limit state-of-the-art approaches. Based on these findings, we show how to mitigate these limitations. To this end, we implemented a CNN-based baseline approach, which we extended to deal with identified problems. These include the discovery of bias in the reference data sets, ambiguity in ground truth generation, and mismatching of evaluation metrics w.r.t. the training loss function. The experimental results show that our modifications allow for significantly outperforming the baseline in terms of the accuracy of person counts and density estimation. In this way, we get a deeper understanding of CNN-based person density estimation beyond the network architecture. Furthermore, our insights would allow to advance the field of person density estimation in general by highlighting current limitations in the evaluation protocols.

Download Full-text

Towards Application of One-Class Classification Methods to Medical Data

The Scientific World JOURNAL ◽

10.1155/2014/730712 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 10

Author(s):

Itziar Irigoien ◽

Basilio Sierra ◽

Concepción Arenas

Keyword(s):

State Of The Art ◽

Gaussian Mixture ◽

Support Vector ◽

Support Vector Data Description ◽

Data Sets ◽

Biomedical Data ◽

Vector Data ◽

Target Class ◽

Tumor Recognition ◽

One Class Classification

In the problem of one-class classification (OCC) one of the classes, the target class, has to be distinguished from all other possible objects, considered as nontargets. In many biomedical problems this situation arises, for example, in diagnosis, image based tumor recognition or analysis of electrocardiogram data. In this paper an approach to OCC based on a typicality test is experimentally compared with reference state-of-the-art OCC techniques—Gaussian, mixture of Gaussians, naive Parzen, Parzen, and support vector data description—using biomedical data sets. We evaluate the ability of the procedures using twelve experimental data sets with not necessarily continuous data. As there are few benchmark data sets for one-class classification, all data sets considered in the evaluation have multiple classes. Each class in turn is considered as the target class and the units in the other classes are considered as new units to be classified. The results of the comparison show the good performance of the typicality approach, which is available for high dimensional data; it is worth mentioning that it can be used for any kind of data (continuous, discrete, or nominal), whereas state-of-the-art approaches application is not straightforward when nominal variables are present.

Download Full-text

Ambient Intelligence Based on IoT for Assisting People with Alzheimer’s Disease Through Context Histories

Electronics ◽

10.3390/electronics10111260 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1260

Author(s):

Savanna Denega Machado ◽

João Elison da Rosa Tavares ◽

Márcio Garcia Martins ◽

Jorge Luis Victória Barbosa ◽

Gabriel Villarrubia González ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

State Of The Art ◽

Physiological Data ◽

Data Sets ◽

Scientific Contribution ◽

Daily Lives ◽

Iot Applications ◽

Methodological Aspects ◽

Dangerous Behaviors

New Internet of Things (IoT) applications are enabling the development of projects that help with monitoring people with different diseases in their daily lives. Alzheimer’s is a disease that affects neurological functions and needs support to maintain maximum independence and security of patients during this stage of life, as the cure and reversal of symptoms have not yet been discovered. The IoT-based monitoring system provides the caregivers’ support in monitoring people with Alzheimer’s disease (AD). This paper presents an ontology-based computational model that receives physiological data from external IoT applications, allowing identification of potentially dangerous behaviors for patients with AD. The main scientific contribution of this work is the specification of a model focusing on Alzheimer’s disease using the analysis of context histories and context prediction, which, considering the state of the art, is the only one that uses analysis of context histories to perform predictions. In this research, we also propose a simulator to generate activities of the daily life of patients, allowing the creation of data sets. These data sets were used to evaluate the contributions of the model and were generated according to the standardization of the ontology. The simulator generated 1026 scenarios applied to guide the predictions, which achieved average accurary of 97.44%. The experiments also allowed the learning of 20 relevant lessons on technological, medical, and methodological aspects that are recorded in this article.

Download Full-text

The Three-Cornered Hat Method for Estimating Error Variances of Three or More Atmospheric Data Sets – Part II: Evaluating Radio Occultation and Radiosonde Observations, Global Model Forecasts, and Reanalyses

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-20-0209.1 ◽

2021 ◽

Author(s):

Therese Rieckh ◽

Jeremiah P. Sjoberg ◽

Richard A. Anthes

Keyword(s):

Radio Occultation ◽

State Of The Art ◽

Specific Humidity ◽

Data Sets ◽

Error Growth ◽

Atmospheric Conditions ◽

Error Statistics ◽

Weather And Climate ◽

Atmospheric Data ◽

The Impact

AbstractWe apply the three-cornered hat (3CH) method to estimate refractivity, bending angle, and specific humidity error variances for a number of data sets widely used in research and/or operations: radiosondes, radio occultation (COSMIC, COSMIC-2), NCEP global forecasts, and nine reanalyses. We use a large number and combinations of data sets to obtain insights into the impact of the error correlations among different data sets that affect 3CH estimates. Error correlations may be caused by actual correlations of errors, representativeness differences, or imperfect co-location of the data sets. We show that the 3CH method discriminates among the data sets and how error statistics of observations compare to state-of-the-art reanalyses and forecasts, as well as reanalyses that do not assimilate satellite data. We explore results for October and November 2006 and 2019 over different latitudinal regions and show error growth of the NCEP forecasts with time. Because of the importance of tropospheric water vapor to weather and climate, we compare error estimates of refractivity for dry and moist atmospheric conditions.

Download Full-text

Website traffic measurement and rankings: competitive intelligence tools examination

International Journal of Web Information Systems ◽

10.1108/ijwis-01-2018-0001 ◽

2018 ◽

Vol 14 (4) ◽

pp. 423-437 ◽

Cited By ~ 1

Author(s):

David Prantl ◽

Martin Prantl

Keyword(s):

Design Methodology ◽

State Of The Art ◽

Competitive Intelligence ◽

Data Sets ◽

Traffic Data ◽

Content Type ◽

Traffic Measurement ◽

Data Estimation ◽

Research Studies

PurposeThe purpose of this paper is to examine and verify the competitive intelligence tools Alexa and SimilarWeb, which are broadly used for website traffic data estimation. Tested tools belong to the state of the art in this area.Design/methodology/approachThe authors use quantitative approach. Research was conducted on a sample of Czech websites for which there are accurate traffic data values, against which the other data sets (less accurate) provided by Alexa and SimilarWeb will be compared.FindingsThe results show that neither tool can accurately determine the ranking of websites on the internet. However, it is possible to approximately determine the significance of a particular website. These results are useful for another research studies which use data from Alexa or SimilarWeb. Moreover, the results show that it is still not possible to accurately estimate website traffic of any website in the world.Research limitations/implicationsThe limitation of the research lies in the fact that it was conducted solely in the Czech market.Originality/valueSignificant amount of research studies use data sets provided by Alexa and SimilarWeb. However, none of these research studies focus on the quality of the website traffic data acquired by Alexa or SimilarWeb, nor do any of them refer to other studies that would deal with this issue. Furthermore, authors describe approaches to measuring website traffic and based on the analysis, the possible usability of these methods is discussed.

Download Full-text

Adaptive Structure Concept Factorization for Multiview Clustering

Neural Computation ◽

10.1162/neco_a_01055 ◽

2018 ◽

Vol 30 (4) ◽

pp. 1080-1103 ◽

Cited By ~ 10

Author(s):

Kun Zhan ◽

Jinhui Shi ◽

Jing Wang ◽

Haibo Wang ◽

Yuange Xie

Keyword(s):

Nonnegative Matrix Factorization ◽

State Of The Art ◽

Nonnegative Matrix ◽

Adaptive Method ◽

Data Sets ◽

Clustering Methods ◽

Normalized Mutual Information ◽

Adaptive Structure ◽

Concept Factorization ◽

Multiview Clustering

Most existing multiview clustering methods require that graph matrices in different views are computed beforehand and that each graph is obtained independently. However, this requirement ignores the correlation between multiple views. In this letter, we tackle the problem of multiview clustering by jointly optimizing the graph matrix to make full use of the data correlation between views. With the interview correlation, a concept factorization–based multiview clustering method is developed for data integration, and the adaptive method correlates the affinity weights of all views. This method differs from nonnegative matrix factorization–based clustering methods in that it can be applicable to data sets containing negative values. Experiments are conducted to demonstrate the effectiveness of the proposed method in comparison with state-of-the-art approaches in terms of accuracy, normalized mutual information, and purity.

Download Full-text