scholarly journals A large and evolving cognate database

Author(s):  
Khuyagbaatar Batsuren ◽  
Gábor Bella ◽  
Fausto Giunchiglia

AbstractWe present CogNet, a large-scale, automatically-built database of sense-tagged cognates—words of common origin and meaning across languages. CogNet is continuously evolving: its current version contains over 8 million cognate pairs over 338 languages and 35 writing systems, with new releases already in preparation. The paper presents the algorithm and input resources used for its computation, an evaluation of the result, as well as a quantitative analysis of cognate data leading to novel insights on language diversity. Furthermore, as an example on the use of large-scale cross-lingual knowledge bases for improving the quality of multilingual applications, we present a case study on the use of CogNet for bilingual lexicon induction in the framework of cross-lingual transfer learning.

2018 ◽  
Vol 64 (247) ◽  
pp. 811-821 ◽  
Author(s):  
STEFAN LIPPL ◽  
SAURABH VIJAY ◽  
MATTHIAS BRAUN

ABSTRACTDespite their importance for mass-balance estimates and the progress in techniques based on optical and thermal satellite imagery, the mapping of debris-covered glacier boundaries remains a challenging task. Manual corrections hamper regular updates. In this study, we present an automatic approach to delineate glacier outlines using interferometrically derived synthetic aperture radar (InSAR) coherence, slope and morphological operations. InSAR coherence detects the temporally decorrelated surface (e.g. glacial extent) irrespective of its surface type and separates it from the highly coherent surrounding areas. We tested the impact of different processing settings, for example resolution, coherence window size and topographic phase removal, on the quality of the generated outlines. We found minor influence of the topographic phase, but a combination of strong multi-looking during interferogram generation and additional averaging during coherence estimation strongly deteriorated the coherence at the glacier edges. We analysed the performance of X-, C- and L- band radar data. The C-band Sentinel-1 data outlined the glacier boundary with the least misclassifications and a type II error of 0.47% compared with Global Land Ice Measurements from Space inventory data. Our study shows the potential of the Sentinel-1 mission together with our automatic processing chain to provide regular updates for land-terminating glaciers on a large scale.


Author(s):  
Wagner Al Alam ◽  
Francisco Carvalho Junior

The efforts to make cloud computing suitable for the requirements of HPC applications have motivated us to design HPC Shelf, a cloud computing platform of services for building and deploying parallel computing systems for large-scale parallel processing. We introduce Alite, the system of contextual contracts of HPC Shelf, aimed at selecting component implementations according to requirements of applications, features of targeting parallel computing platforms (e.g. clusters), QoS (Quality-of-Service) properties and cost restrictions. It is evaluated through a small-scale case study employing a componentbased framework for matrix-multiplication based on the BLAS library.


2020 ◽  
Vol 11 (3) ◽  
pp. 49-65
Author(s):  
Emily Ng K.L.

The resources and time constraints of assessing large classes are always weighed up against the validity, reliability, and learning outcomes of the assessment tasks. With the digital revolution in the 21st Century, educators can benefit from computer technology to carry out a large-scale assessment in higher education more efficiently. In this article, an in-depth case study of a nursing school that has integrated online assessment initiatives into their nursing program. To assess a large class of first-year nursing students, a series of non-proctored multiple-choice online quizzes are administered using a learning management system. Validity and reliability are commonly used to measure the quality of an assessment. The aim of the present article to analyze these non-proctored multiple-choice online assessments in the context of content validity and reliability. We use this case study to examine online assessment in nursing education, exploring the benefits and challenges. We conclude that instructors have to determine how to use the full potential of online assessment as well as ensure validity and reliability.


Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1722
Author(s):  
Ivan Kovačević ◽  
Stjepan Groš ◽  
Karlo Slovenec

Intrusion Detection Systems (IDSs) automatically analyze event logs and network traffic in order to detect malicious activity and policy violations. Because IDSs have a large number of false positives and false negatives and the technical nature of their alerts requires a lot of manual analysis, the researchers proposed approaches that automate the analysis of alerts to detect large-scale attacks and predict the attacker’s next steps. Unfortunately, many such approaches use unique datasets and success metrics, making comparison difficult. This survey provides an overview of the state of the art in detecting and projecting cyberattack scenarios, with a focus on evaluation and the corresponding metrics. Representative papers are collected while using Google Scholar and Scopus searches. Mutually comparable success metrics are calculated and several comparison tables are provided. Our results show that commonly used metrics are saturated on popular datasets and cannot assess the practical usability of the approaches. In addition, approaches with knowledge bases require constant maintenance, while data mining and ML approaches depend on the quality of available datasets, which, at the time of writing, are not representative enough to provide general knowledge regarding attack scenarios, so more emphasis needs to be placed on researching the behavior of attackers.


Author(s):  
Emily Ng K.L.

The resources and time constraints of assessing large classes are always weighed up against the validity, reliability, and learning outcomes of the assessment tasks. With the digital revolution in the 21st Century, educators can benefit from computer technology to carry out a large-scale assessment in higher education more efficiently. In this article, an in-depth case study of a nursing school that has integrated online assessment initiatives into their nursing program. To assess a large class of first-year nursing students, a series of non-proctored multiple-choice online quizzes are administered using a learning management system. Validity and reliability are commonly used to measure the quality of an assessment. The aim of the present article to analyze these non-proctored multiple-choice online assessments in the context of content validity and reliability. We use this case study to examine online assessment in nursing education, exploring the benefits and challenges. We conclude that instructors have to determine how to use the full potential of online assessment as well as ensure validity and reliability.


2020 ◽  
Vol 34 (01) ◽  
pp. 115-122 ◽  
Author(s):  
Baijun Ji ◽  
Zhirui Zhang ◽  
Xiangyu Duan ◽  
Min Zhang ◽  
Boxing Chen ◽  
...  

Transfer learning between different language pairs has shown its effectiveness for Neural Machine Translation (NMT) in low-resource scenario. However, existing transfer methods involving a common target language are far from success in the extreme scenario of zero-shot translation, due to the language space mismatch problem between transferor (the parent model) and transferee (the child model) on the source side. To address this challenge, we propose an effective transfer learning approach based on cross-lingual pre-training. Our key idea is to make all source languages share the same feature space and thus enable a smooth transition for zero-shot translation. To this end, we introduce one monolingual pre-training method and two bilingual pre-training methods to obtain a universal encoder for different languages. Once the universal encoder is constructed, the parent model built on such encoder is trained with large-scale annotated data and then directly applied in zero-shot translation scenario. Experiments on two public datasets show that our approach significantly outperforms strong pivot-based baseline and various multilingual NMT approaches.


2001 ◽  
Vol 2 (4) ◽  
pp. 196-206 ◽  
Author(s):  
Christian Blaschke ◽  
Alfonso Valencia

The Dictionary of Interacting Proteins(DIP) (Xenarioset al., 2000) is a large repository of protein interactions: its March 2000 release included 2379 protein pairs whose interactions have been detected by experimental methods. Even if many of these correspond to poorly characterized proteins, the result of massive yeast two-hybrid screenings, as many as 851 correspond to interactions detected using direct biochemical methods.We used information retrieval technology to search automatically for sentences in Medline abstracts that support these 851 DIP interactions. Surprisingly, we found correspondence between DIP protein pairs and Medline sentences describing their interactions in only 30% of the cases. This low coverage has interesting consequences regarding the quality of annotations (references) introduced in the database and the limitations of the application of information extraction (IE) technology to Molecular Biology. It is clear that the limitation of analyzing abstracts rather than full papers and the lack of standard protein names are difficulties of considerably more importance than the limitations of the IE methodology employed. A positive finding is the capacity of the IE system to identify new relations between proteins, even in a set of proteins previously characterized by human experts. These identifications are made with a considerable degree of precision.This is, to our knowledge, the first large scale assessment of IE capacity to detect previously known interactions: we thus propose the use of the DIP data set as a biological reference to benchmark IE systems.


10.29007/ppgx ◽  
2020 ◽  
Author(s):  
Yan Wu ◽  
Jinchuan Chen ◽  
Plarent Haxhidauti ◽  
Vinu Ellampallil Venugopal ◽  
Martin Theobald

Domain-oriented knowledge bases (KBs) such as DBpedia and YAGO are largely constructed by applying a set of predefined extraction rules to the semi-structured contents of Wikipedia articles. Although both of these large-scale KBs achieve very high average precision values (above 95% for YAGO3), subtle mistakes in a few of the underlying ex- traction rules may still impose a substantial amount of systematic extraction mistakes for specific relations. For example, by applying the same regular expressions to extract per- son names of both Asian and Western nationality, YAGO erroneously swaps most of the family and given names of Asian person entities. For traditional rule-learning approaches based on Inductive Logic Programming (ILP), it is very difficult to detect these systematic extraction mistakes, since they usually occur only in a relatively small subdomain of the relations’ arguments. In this paper, we thus propose a guided form of ILP, coined “GILP”, that iteratively asks for small amounts of user feedback over a given KB to learn a set of data-cleaning rules that (1) best match the feedback and (2) also generalize to a larger portion of facts in the KB. We propose both algorithms and respective metrics to automatically assess the quality of the learned rules with respect to the user feedback.


Sign in / Sign up

Export Citation Format

Share Document