scholarly journals Unsupervised Chunking Based on Graph Propagation from Bilingual Corpus

2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Ling Zhu ◽  
Derek F. Wong ◽  
Lidia S. Chao

This paper presents a novel approach for unsupervised shallow parsing model trained on the unannotated Chinese text of parallel Chinese-English corpus. In this approach, no information of the Chinese side is applied. The exploitation of graph-based label propagation for bilingual knowledge transfer, along with an application of using the projected labels as features in unsupervised model, contributes to a better performance. The experimental comparisons with the state-of-the-art algorithms show that the proposed approach is able to achieve impressive higher accuracy in terms ofF-score.

2021 ◽  
Vol 15 (5) ◽  
pp. 1-32
Author(s):  
Quang-huy Duong ◽  
Heri Ramampiaro ◽  
Kjetil Nørvåg ◽  
Thu-lan Dam

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.


Author(s):  
Gaetano Rossiello ◽  
Alfio Gliozzo ◽  
Michael Glass

We propose a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. We collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. This dataset is adopted to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. The model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a relation extraction task.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3529 ◽  
Author(s):  
Rabih Younes ◽  
Mark Jones ◽  
Thomas Martin

Most activity classifiers focus on recognizing application-specific activities that are mostly performed in a scripted manner, where there is very little room for variation within the activity. These classifiers are mainly good at recognizing short scripted activities that are performed in a specific way. In reality, especially when considering daily activities, humans perform complex activities in a variety of ways. In this work, we aim to make activity recognition more practical by proposing a novel approach to recognize complex heterogeneous activities that could be performed in a wide variety of ways. We collect data from 15 subjects performing eight complex activities and test our approach while analyzing it from different aspects. The results show the validity of our approach. They also show how it performs better than the state-of-the-art approaches that tried to recognize the same activities in a more controlled environment.


2019 ◽  
Vol 9 (20) ◽  
pp. 4316 ◽  
Author(s):  
Loise ◽  
Caputo ◽  
Porto ◽  
Calandra ◽  
Angelico ◽  
...  

This review aims to explore the state of the knowledge and the state-of-the-art regarding bitumen rejuvenation. In particular, attention was paid to clear things up about the rejuvenator mechanism of action. Frequently, the terms rejuvenator and flux oil, or oil (i.e., softening agent) are used as if they were synonymous. According to our knowledge, these two terms refer to substances producing different modifications to the aged bitumen: they can decrease the viscosity (softening agents), or, in addition to this, restore the original microstructure (real rejuvenators). In order to deal with the argument in its entirety, the bitumen is investigated in terms of chemical structure and microstructural features. Proper investigating tools are, therefore, needed to distinguish the different mechanisms of action of the various types of bitumen, so attention is focused on recent research and the use of different investigation techniques to distinguish between various additives. Methods based on organic synthesis can also be used to prepare ad-hoc rejuvenating molecules with higher performances. The interplay of chemical interaction, structural changes and overall effect of the additive is then presented in terms of the modern concepts of complex systems, which furnishes valid arguments to suggest X-ray scattering and Nuclear Magnetic Resonance relaxometry experiments as vanguard and forefront tools to study bitumen. Far from being a standard review, this work represents a critical analysis of the state-of-the-art taking into account for the molecular basis at the origin of the observed behavior. Furnishing a novel viewpoint for the study of bitumen based on the concepts of the complex systems in physics, it constitutes a novel approach for the study of these systems.


2021 ◽  
Vol 13 (4) ◽  
pp. 663
Author(s):  
Runze Fan ◽  
Ting-Bing Xu ◽  
Zhenzhong Wei

This article addresses the challenge of 6D aircraft pose estimation from a single RGB image during the flight. Many recent works have shown that keypoints-based approaches, which first detect keypoints and then estimate the 6D pose, achieve remarkable performance. However, it is hard to locate the keypoints precisely in complex weather scenes. In this article, we propose a novel approach, called Pose Estimation with Keypoints and Structures (PEKS), which leverages multiple intermediate representations to estimate the 6D pose. Unlike previous works, our approach simultaneously locates keypoints and structures to recover the pose parameter of aircraft through a Perspective-n-Point Structure (PnPS) algorithm. These representations integrate the local geometric information of the object and the topological relationship between components of the target, which effectively improve the accuracy and robustness of 6D pose estimation. In addition, we contribute a dataset for aircraft pose estimation which consists of 3681 real images and 216,000 rendered images. Extensive experiments on our own aircraft pose dataset and multiple open-access pose datasets (e.g., ObjectNet3D, LineMOD) demonstrate that our proposed method can accurately estimate 6D aircraft pose in various complex weather scenes while achieving the comparative performance with the state-of-the-art pose estimation methods.


Author(s):  
Ximing Li ◽  
Yang Wang

Partial Multi-label Learning (PML) aims to induce the multi-label predictor from datasets with noisy supervision, where each training instance is associated with several candidate labels but only partially valid. To address the noisy issue, the existing PML methods basically recover the ground-truth labels by leveraging the ground-truth confidence of the candidate label, i.e., the likelihood of a candidate label being a ground-truth one. However, they neglect the information from non-candidate labels, which potentially contributes to the ground-truth label recovery. In this paper, we propose to recover the ground-truth labels, i.e., estimating the ground-truth confidences, from the label enrichment, composed of the relevance degrees of candidate labels and irrelevance degrees of non-candidate labels. Upon this observation, we further develop a novel two-stage PML method, namely Partial Multi-Label Learning with Label Enrichment-Recovery (PML3ER), where in the first stage, it estimates the label enrichment with unconstrained label propagation, then jointly learns the ground-truth confidence and multi-label predictor given the label enrichment. Experimental results validate that PML3ER outperforms the state-of-the-art PML methods.


2018 ◽  
Vol 7 (3.20) ◽  
pp. 6
Author(s):  
Juhaida Abu Bakar ◽  
Khairuddin Khairuddin ◽  
Mohammad Faidzul Nasrudin ◽  
Mohd Zamri Murah

Jawi and Roman scripts are represented Malay language. In the past, Jawi writings are widely used by the Malay community and foreigners; and it can be seen in the old documents. Old documents face the risk of background damage. In order to preserve this valuable information, there are significant needs to automated Jawi materials. Based on previous literature, POS-tags are known as the first phase in the automated text analysis; and the development of language technologies can barely initiate without this phase. We highlight the existing POS-tags approaches; and suggest the development of Malay Jawi POS-tags using extended ME-based approach on NUWT Corpus. Results have shown that the proposed model yielded a higher accuracy in comparison to the state-of-the-art model.  


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Abdu Gumaei ◽  
Rachid Sammouda ◽  
Abdul Malik S. Al-Salman ◽  
Ahmed Alsanad

Multispectral palmprint recognition system (MPRS) is an essential technology for effective human identification and verification tasks. To improve the accuracy and performance of MPRS, a novel approach based on autoencoder (AE) and regularized extreme learning machine (RELM) is proposed in this paper. The proposed approach is intended to make the recognition faster by reducing the number of palmprint features without degrading the accuracy of classifier. To achieve this objective, first, the region of interest (ROI) from palmprint images is extracted by David Zhang’s method. Second, an efficient normalized Gist (NGist) descriptor is used for palmprint feature extraction. Then, the dimensionality of extracted features is reduced using optimized AE. Finally, the reduced features are fed to the RELM for classification. A comprehensive set of experiments are conducted on the benchmark MS-PolyU dataset. The results were significantly high compared to the state-of-the-art approaches, and the robustness and efficiency of the proposed approach are revealed.


Author(s):  
Julian Hatwell ◽  
Mohamed Medhat Gaber ◽  
R. Muhammad Atif Azad

Abstract Background Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients’ disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning models and high dimensional data sources such as electronic health records, magnetic resonance imaging scans, cardiotocograms, etc. These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years because it addresses the interpretability and trust concerns of critical decision makers, including those in clinical and medical practice. Methods In this work, we focus on AdaBoost, a black box model that has been widely adopted in the CAD literature. We address the challenge – to explain AdaBoost classification – with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, Adaptive-Weighted High Importance Path Snippets (Ada-WHIPS), makes use of AdaBoost’s adaptive classifier weights. Using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes of the internal decision trees of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model’s decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure stability that is better suited to the XAI setting. Results Experiments on 9 CAD-related data sets showed that Ada-WHIPS explanations consistently generalise better (mean coverage 15%-68%) than the state of the art while remaining competitive for specificity (mean precision 80%-99%). A very small trade-off in specificity is shown to guard against over-fitting which is a known problem in the state of the art methods. Conclusions The experimental results demonstrate the benefits of using our novel algorithm for explaining CAD AdaBoost classifiers widely found in the literature. Our tightly coupled, AdaBoost-specific approach outperforms model-agnostic explanation methods and should be considered by practitioners looking for an XAI solution for this class of models.


2020 ◽  
Vol 34 (07) ◽  
pp. 13114-13121 ◽  
Author(s):  
Zhihui Zhu ◽  
Xinyang Jiang ◽  
Feng Zheng ◽  
Xiaowei Guo ◽  
Feiyue Huang ◽  
...  

Although great progress in supervised person re-identification (Re-ID) has been made recently, due to the viewpoint variation of a person, Re-ID remains a massive visual challenge. Most existing viewpoint-based person Re-ID methods project images from each viewpoint into separated and unrelated sub-feature spaces. They only model the identity-level distribution inside an individual viewpoint but ignore the underlying relationship between different viewpoints. To address this problem, we propose a novel approach, called Viewpoint-Aware Loss with Angular Regularization (VA-reID). Instead of one subspace for each viewpoint, our method projects the feature from different viewpoints into a unified hypersphere and effectively models the feature distribution on both the identity-level and the viewpoint-level. In addition, rather than modeling different viewpoints as hard labels used for conventional viewpoint classification, we introduce viewpoint-aware adaptive label smoothing regularization (VALSR) that assigns the adaptive soft label to feature representation. VALSR can effectively solve the ambiguity of the viewpoint cluster label assignment. Extensive experiments on the Market1501 and DukeMTMC-reID datasets demonstrated that our method outperforms the state-of-the-art supervised Re-ID methods.


Sign in / Sign up

Export Citation Format

Share Document