Data Representations
Recently Published Documents


TOTAL DOCUMENTS

256
(FIVE YEARS 113)

H-INDEX

19
(FIVE YEARS 7)

2022 ◽  
Vol 16 (4) ◽  
pp. 1-19
Author(s):  
Hanrui Wu ◽  
Michael K. Ng

Hypergraphs have shown great power in representing high-order relations among entities, and lots of hypergraph-based deep learning methods have been proposed to learn informative data representations for the node classification problem. However, most of these deep learning approaches do not take full consideration of either the hyperedge information or the original relationships among nodes and hyperedges. In this article, we present a simple yet effective semi-supervised node classification method named Hypergraph Convolution on Nodes-Hyperedges network, which performs filtering on both nodes and hyperedges as well as recovers the original hypergraph with the least information loss. Instead of only reducing the cross-entropy loss over the labeled samples as most previous approaches do, we additionally consider the hypergraph reconstruction loss as prior information to improve prediction accuracy. As a result, by taking both the cross-entropy loss on the labeled samples and the hypergraph reconstruction loss into consideration, we are able to achieve discriminative latent data representations for training a classifier. We perform extensive experiments on the semi-supervised node classification problem and compare the proposed method with state-of-the-art algorithms. The promising results demonstrate the effectiveness of the proposed method.


2022 ◽  
Author(s):  
Christopher Graney-Ward ◽  
Biju Issac ◽  
LIDA KETSBAIA ◽  
Seibu Mary Jacob

Due to the recent popularity and growth of social media platforms such as Facebook and Twitter, cyberbullying is becoming more and more prevalent. The current research on cyberbullying and the NLP techniques being used to classify this kind of online behaviour was initially studied. This paper discusses the experimentation with combined Twitter datasets by Maryland and Cornell universities using different classification approaches like classical machine learning, RNN, CNN, and pretrained transformer-based classifiers. A state of the art (SOTA) solution was achieved by optimising BERTweet on a Onecycle policy with a Decoupled weight decay optimiser (AdamW), improving the previous F1-score by up to 8.4%, resulting in 64.8% macro F1. Particle Swarm Optimisation was later used to optimise the ensemble model. The ensemble developed from the optimised BERTweet model and a collection of models with varying data representations, outperformed the standalone BERTweet model by 0.53% resulting in 65.33% macro F1 for TweetEval dataset and by 0.55% for combined datasets, resulting in 68.1% macro F1.


2022 ◽  
Author(s):  
Christopher Graney-Ward ◽  
Biju Issac ◽  
LIDA KETSBAIA ◽  
Seibu Mary Jacob

Due to the recent popularity and growth of social media platforms such as Facebook and Twitter, cyberbullying is becoming more and more prevalent. The current research on cyberbullying and the NLP techniques being used to classify this kind of online behaviour was initially studied. This paper discusses the experimentation with combined Twitter datasets by Maryland and Cornell universities using different classification approaches like classical machine learning, RNN, CNN, and pretrained transformer-based classifiers. A state of the art (SOTA) solution was achieved by optimising BERTweet on a Onecycle policy with a Decoupled weight decay optimiser (AdamW), improving the previous F1-score by up to 8.4%, resulting in 64.8% macro F1. Particle Swarm Optimisation was later used to optimise the ensemble model. The ensemble developed from the optimised BERTweet model and a collection of models with varying data representations, outperformed the standalone BERTweet model by 0.53% resulting in 65.33% macro F1 for TweetEval dataset and by 0.55% for combined datasets, resulting in 68.1% macro F1.


2022 ◽  
Vol 9 (1) ◽  
pp. 205395172110707
Author(s):  
Richard Milne ◽  
Alessia Costa ◽  
Natassia Brenman

In this paper, we examine the practice and promises of digital phenotyping. We build on work on the ‘data self’ to focus on a medical domain in which the value and nature of knowledge and relations with data have been played out with particular persistence, that of Alzheimer's disease research. Drawing on research with researchers and developers, we consider the intersection of hopes and concerns related to both digital tools and Alzheimer's disease using the metaphor of the ‘data shadow’. We suggest that as a tool for engaging with the nature of the data self, the shadow is usefully able to capture both the dynamic and distorted nature of data representations, and the unease and concern associated with encounters between individuals or groups and data about them. We then consider what the data shadow ‘is’ in relation to ageing data subjects, and the nature of the representation of the individual's cognitive state and dementia risk that is produced by digital tools. Second, we consider what the data shadow ‘does’, through researchers and practitioners’ discussions of digital phenotyping practices in the dementia field as alternately empowering, enabling and threatening.


2021 ◽  
Vol 11 (1) ◽  
pp. 19
Author(s):  
Arianna D’Ulizia ◽  
Patrizia Grifoni ◽  
Fernando Ferri

The increasing use of social media and the recent advances in geo-positioning technologies have produced a great amount of geosocial data, consisting of spatial, textual, and social information, to be managed and queried. In this paper, we focus on the issue of query processing by providing a systematic literature review of geosocial data representations, query processing methods, and evaluation approaches published over the last two decades (2000–2020). The result of our analysis shows the categories of geosocial queries proposed by the surveyed studies, the query primitives and the kind of access method used to retrieve the result of the queries, the common evaluation metrics and datasets used to evaluate the performance of the query processing methods, and the main open challenges that should be faced in the near future. Due to the ongoing interest in this research topic, the results of this survey are valuable to many researchers and practitioners by gaining an in-depth understanding of the geosocial querying process and its applications and possible future perspectives.


2021 ◽  
Vol 20 (2) ◽  
pp. 16
Author(s):  
SOLEDAD ESTRELLA ◽  
ANDREA VERGARA ◽  
ORLANDO GONZÁLEZ

In order to study the manifestation of data sense and identify ways of thinking about variability in authentically realistic problems in a group of Chilean fifth-grade students, a lesson plan was designed and implemented, within the framework of statistical literacy and using the “lesson study” modality, in which students were urged to make inferences based on the analysis of data corresponding to the tsunami that struck the Chilean coast in 2010. This article focuses on the qualitative study of the data representations produced by two groups of students during the implementation of the lesson plan. The analysis of the behavior of the tsunami carried out by the students led them to work simultaneously with nominal qualitative, ordinal qualitative, discrete quantitative, and continuous quantitative variables; to create new variables; to construct representations of data (multiple bar graphs and frequency tables); and to make inferences based on the data. We conclude that the use of an authentic context and the construction of their own representations promoted data sense in students and facilitated the development of their statistical thinking, through which they were able to recognize, describe, and explain the variability of the phenomenon. Abstract: Spanish Con el propósito de estudiar la manifestación del sentido del dato e identificar las formas de razonar la variabilidad en problemas auténticamente realistas en un grupo de estudiantes chilenos de quinto grado de primaria, se diseñó e implementó un plan de clases, en el marco de alfabetización estadística y bajo la modalidad “lesson study”, en el que se instó a los estudiantes a hacer inferencias a partir del análisis de los datos correspondientes al tsunami que azotó la costa de Chile en 2010. Este artículo se centra en el estudio cualitativo de las representaciones de datos producidas por dos grupos de estudiantes durante la implementación del plan de clases. El análisis del comportamiento de los datos del tsunami realizado por los estudiantes los llevó a trabajar simultáneamente con variables cualitativas nominales, cualitativas ordinales, cuantitativas discretas y cuantitativas continuas; crear nuevas variables; elaborar representaciones de datos (gráfico de barras múltiples y tabla de frecuencias); y hacer inferencias basadas en los datos. Se concluye que el uso de un contexto auténtico y la construcción de representaciones propias, promovieron en los estudiantes el sentido del dato y facilitaron el desarrollo de su pensamiento estadístico, pudiendo reconocer, describir y explicar la variabilidad del fenómeno.


Author(s):  
Christopher Jenkins ◽  
Aaron Stump

Abstract Guided by Tarksi’s fixpoint theorem in order theory, we show how to derive monotone recursive types with constant-time roll and unroll operations within Cedille, an impredicative, constructive, and logically consistent pure typed lambda calculus. This derivation takes place within the preorder on Cedille types induced by type inclusions, a notion which is expressible within the theory itself. As applications, we use monotone recursive types to generically derive two recursive representations of data in lambda calculus, the Parigot and Scott encoding. For both encodings, we prove induction and examine the computational and extensional properties of their destructor, iterator, and primitive recursor in Cedille. For our Scott encoding in particular, we translate into Cedille a construction due to Lepigre and Raffalli (2019) that equips Scott naturals with primitive recursion, then extend this construction to derive a generic induction principle. This allows us to give efficient and provably unique (up to function extensionality) solutions for the iteration and primitive recursion schemes for Scott-encoded data.


2021 ◽  
Vol 3 ◽  
Author(s):  
Shima Hosseinzadeh ◽  
Mehrdad Biglari ◽  
Dietmar Fey

Non-volatile memory (NVM) technologies offer a number of advantages over conventional memory technologies such as SRAM and DRAM. These include a smaller area requirement, a lower energy requirement for reading and partly for writing, too, and, of course, the non-volatility and especially the qualitative advantage of multi-bit capability. It is expected that memristors based on resistive random access memories (ReRAMs), phase-change memories, or spin-transfer torque random access memories will replace conventional memory technologies in certain areas or complement them in hybrid solutions. To support the design of systems that use NVMs, there is still research to be done on the modeling side of NVMs. In this paper, we focus on multi-bit ternary memories in particular. Ternary NVMs allow the implementation of extremely memory-efficient ternary weights in neural networks, which have sufficiently high accuracy in interference, or they are part of carry-free fast ternary adders. Furthermore, we lay a focus on the technology side of memristive ReRAMs. In this paper, a novel memory model in the circuit level is presented to support the design of systems that profit from ternary data representations. This model considers two read methods of ternary ReRAMs, namely, serial read and parallel read. They are extensively studied and compared in this work, as well as the write-verification method that is often used in NVMs to reduce the device stress and to increase the endurance. In addition, a comprehensive tool for the ternary model was developed, which is capable of performing energy, performance, and area estimation for a given setup. In this work, three case studies were conducted, namely, area cost per trit, excessive parameter selection for the write-verification method, and the assessment of pulse width variation and their energy latency trade-off for the write-verification method in ReRAM.


Author(s):  
Teodor Vernica ◽  
Robert Lipman ◽  
Thomas Kramer ◽  
Soonjo Kwon ◽  
William Bernstein

Abstract Augmented reality (AR) has already helped manufacturers realize value across a variety of domains, including assistance in maintenance, process monitoring, and product assembly. However, coordinating traditional engineering data representations into AR systems without loss of context and information remains a challenge. A major barrier is the lack of interoperability between manufacturing-specific data models and AR-capable data representations. In response, we present a pipeline for porting standards-based design and inspection data into an AR scene. As a result, product manufacturing information with three-dimensional (3D) model data and corresponding inspection results are successfully overlaid onto a physical part. We demonstrate our pipeline by interacting with annotated parts while continuously tracking their pose and orientation. We then validate the pipeline by testing against six fully toleranced design models, accompanied by idealized inspection results. Our work (1) pro-vides insight on how to address fundamental issues related to interoperability between domain-specific models and AR systems and (2) establishes an open software pipeline from which others can implement and further develop.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alice R. Owens ◽  
Caitríona E. McInerney ◽  
Kevin M. Prise ◽  
Darragh G. McArt ◽  
Anna Jurek-Loughrey

Abstract Background Liver cancer (Hepatocellular carcinoma; HCC) prevalence is increasing and with poor clinical outcome expected it means greater understanding of HCC aetiology is urgently required. This study explored a deep learning solution to detect biologically important features that distinguish prognostic subgroups. A novel architecture of an Artificial Neural Network (ANN) trained with a customised objective function (LRSC) was developed. The ANN should discover new data representations, to detect patient subgroups that are biologically homogenous (clustering loss) and similar in survival (survival loss) while removing noise from the data (reconstruction loss). The model was applied to TCGA-HCC multi-omics data and benchmarked against baseline models that only use a reconstruction objective function (BCE, MSE) for learning. With the baseline models, the new features are then filtered based on survival information and used for clustering patients. Different variants of the customised objective function, incorporating only reconstruction and clustering losses (LRC); and reconstruction and survival losses (LRS) were also evaluated. Robust features consistently detected were compared between models and validated in TCGA and LIRI-JP HCC cohorts. Results The combined loss (LRSC) discovered highly significant prognostic subgroups (P-value = 1.55E−77) with more accurate sample assignment (Silhouette scores: 0.59–0.7) compared to baseline models (0.18–0.3). All LRSC bottleneck features (N = 100) were significant for survival, compared to only 11–21 for baseline models. Prognostic subgroups were not explained by disease grade or risk factors. Instead LRSC identified robust features including 377 mRNAs, many of which were novel (61.27%) compared to those identified by the other losses. Some 75 mRNAs were prognostic in TCGA, while 29 were prognostic in LIRI-JP also. LRSC also identified 15 robust miRNAs including two novel (hsa-let-7g; hsa-mir-550a-1) and 328 methylation features with 71% being prognostic. Gene-enrichment and Functional Annotation Analysis identified seven pathways differentiating prognostic clusters. Conclusions Combining cluster and survival metrics with the reconstruction objective function facilitated superior prognostic subgroup identification. The hybrid model identified more homogeneous clusters that consequently were more biologically meaningful. The novel and prognostic robust features extracted provide additional information to improve our understanding of a complex disease to help reveal its aetiology. Moreover, the gene features identified may have clinical applications as therapeutic targets.


Sign in / Sign up

Export Citation Format

Share Document