scholarly journals Key Research Issues and Related Technologies in Crowdsourcing Data Collection

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Yunhui Li ◽  
Liang Chang ◽  
Long Li ◽  
Xuguang Bao ◽  
Tianlong Gu

Crowdsourcing provides a distributed method to solve the tasks that are difficult to complete using computers and require the wisdom of human beings. Due to its fast and inexpensive nature, crowdsourcing is widely used to collect metadata and data annotation in many fields, such as information retrieval, machine learning, recommendation system, and natural language processing. Crowdsourcing helps enable the collection of rich and large-scale data, which promotes the development of researches driven by data. In recent years, a large amount of effort has been spent on crowdsourcing in data collection, to address the challenges, including quality control, cost control, efficiency, and privacy protection. In this paper, we introduce the concept and workflow of crowdsourcing data collection. Furthermore, we review the key research topics and related technologies in its workflow, including task design, task-worker matching, response aggregation, incentive mechanism, and privacy protection. Then, the limitations of the existing work are discussed, and the future development directions are identified.

2020 ◽  
Vol 34 (05) ◽  
pp. 9346-9353
Author(s):  
Bingcong Xue ◽  
Sen Hu ◽  
Lei Zou ◽  
Jiashu Cheng

Paraphrase, i.e., differing textual realizations of the same meaning, has proven useful for many natural language processing (NLP) applications. Collecting paraphrase for predicates in knowledge bases (KBs) is the key to comprehend the RDF triples in KBs. Existing works have published some paraphrase datasets automatically extracted from large corpora, but have too many redundant pairs or don't cover enough predicates, which cannot be improved by computer only and need the help of human beings. This paper shows a full process of collecting large-scale and high-quality paraphrase dictionaries for predicates in knowledge bases, which takes advantage of existing datasets and combines the technologies of machine mining and crowdsourcing. Our dataset comprises 2284 distinct predicates in DBpedia and 31130 paraphrase pairs in total, the quality of which is a great leap over previous works. Then it is demonstrated that such good paraphrase dictionaries can do great help to natural language processing tasks such as question answering and language generation. We also publish our own dictionary for further research.


Author(s):  
Georgi Derluguian

The author develops ideas about the origin of social inequality during the evolution of human societies and reflects on the possibilities of its overcoming. What makes human beings different from other primates is a high level of egalitarianism and altruism, which contributed to more successful adaptability of human collectives at early stages of the development of society. The transition to agriculture, coupled with substantially increasing population density, was marked by the emergence and institutionalisation of social inequality based on the inequality of tangible assets and symbolic wealth. Then, new institutions of warfare came into existence, and they were aimed at conquering and enslaving the neighbours engaged in productive labour. While exercising control over nature, people also established and strengthened their power over other people. Chiefdom as a new type of polity came into being. Elementary forms of power (political, economic and ideological) served as a basis for the formation of early states. The societies in those states were characterised by social inequality and cruelties, including slavery, mass violence and numerous victims. Nowadays, the old elementary forms of power that are inherent in personalistic chiefdom are still functioning along with modern institutions of public and private bureaucracy. This constitutes the key contradiction of our time, which is the juxtaposition of individual despotic power and public infrastructural one. However, society is evolving towards an ever more efficient combination of social initiatives with the sustainability and viability of large-scale organisations.


Author(s):  
Robert Boyd

Human beings have evolved to become the most dominant species on Earth. This astonishing transformation is usually explained in terms of cognitive ability—people are just smarter than all the rest. But this book argues that culture—our ability to learn from each other—has been the essential ingredient of our remarkable success. The book shows how a unique combination of cultural adaptation and large-scale cooperation has transformed our species and assured our survival—making us the different kind of animal we are today. The book is based on the Tanner Lectures delivered at Princeton University, featuring challenging responses across the chapters.


2015 ◽  
Vol 3 (2) ◽  
pp. 262-265
Author(s):  
Dr.Navdeep Kaur

Since its evolution environment has remained both a matter of awe and concern to man. The frontier attitude of the industrialized society towards nature has not only endangered the survival of all other life forms but also threatened the very existence of human life. The realization of such potential danger has necessitated the dissemination of knowledge and skill vis-a-vis environment protection at all stages of learning. Therefore, learners of all stages of learning need to be sensitized with a missionary zeal. This may ensure transformation of students into committed citizens for averting global environment crisis. The advancement of science and technology made the life more and more relaxed and man also became more and more ambitious. With such development, human dependence on environment increased. He consumed more resources and the effect of his activities on the environment became more and more detectable. Environment covers all the things present around the living beings and above the land, on the surface of the earth and under the earth. Environment indicates, in total, all of peripheral forces, pressures and circumstances, which affect the life, nature, behaviour, growth, development and maturation of living beings. Irrational exploitation (not utilization) of natural resources for our greed (not need) has endangered our survival, and incurred incalculable harm. Environmental Education is a science, a well-thought, permanent, lasting and integrated process of equipping learning experiences for getting awareness, knowledge, understanding, skills, values, technical expertise and involvement of learners with desirable attitudinal changes about their relationship with their natural and biophysical environment. Environmental Education is an organized effort to educate the masses about environment, its functions, need, importance, and especially how human beings can manage their behaviour in order to live in a sustainable manner.  The term 'environmental awareness' refers to creating general awareness of environmental issues, their causes by bringing about changes in perception, attitude, values and necessary skills to solve environment related problems. Moreover, it is the first step leading to the formation of responsible environmental behaviour (Stern, 2000). With the ever increasing development by modern man, large scale degradation of natural resources have been occurred, the public has to be educated about the fact that if we are degrading our environment we are actually harming ourselves. To encourage meaningful public participation and environment, it is necessary to create awareness about environment pollution and related adverse effects. This is the crucial time that environmental awareness and environmental sensitivity should be cultivated among the masses particularly among youths. For the awareness of society it is essential to work at a gross root level. So the whole society can work to save the environment.


2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.


Technologies ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 2
Author(s):  
Ashish Jaiswal ◽  
Ashwin Ramesh Babu ◽  
Mohammad Zaki Zadeh ◽  
Debapriya Banerjee ◽  
Fillia Makedon

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.


Author(s):  
Ekaterina Kochmar ◽  
Dung Do Vu ◽  
Robert Belfer ◽  
Varun Gupta ◽  
Iulian Vlad Serban ◽  
...  

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.


Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 79 ◽  
Author(s):  
Xiaoyu Han ◽  
Yue Zhang ◽  
Wenkai Zhang ◽  
Tinglei Huang

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.


Sign in / Sign up

Export Citation Format

Share Document