Automatic construction of academic profile: A case of information science domain

2021 ◽  
pp. 016555152199804
Author(s):  
Qian Geng ◽  
Ziang Chuai ◽  
Jian Jin

To provide junior researchers with domain-specific concepts efficiently, an automatic approach for academic profiling is needed. First, to obtain personal records of a given scholar, typical supervised approaches often utilise structured data like infobox in Wikipedia as training dataset, but it may lead to a severe mis-labelling problem when they are utilised to train a model directly. To address this problem, a new relation embedding method is proposed for fine-grained entity typing, in which the initial vector of entities and a new penalty scheme are considered, based on the semantic distance of entities and relations. Also, to highlight critical concepts relevant to renowned scholars, scholars’ selective bibliographies which contain massive academic terms are analysed by a newly proposed extraction method based on logistic regression, AdaBoost algorithm and learning-to-rank techniques. It bridges the gap that conventional supervised methods only return binary classification results and fail to help researchers understand the relative importance of selected concepts. Categories of experiments on academic profiling and corresponding benchmark datasets demonstrate that proposed approaches outperform existing methods notably. The proposed techniques provide an automatic way for junior researchers to obtain organised knowledge in a specific domain, including scholars’ background information and domain-specific concepts.

2009 ◽  
Vol 25 (3) ◽  
pp. 194-203 ◽  
Author(s):  
Shulamith Kreitler ◽  
Hernan Casakin

In view of unclear previous findings about the validity of self-assessed creativity, the hypothesis guiding the present study was that validity would be proven if self-assessed creativity was examined with respect to a specific domain, specific product, specific aspects of creativity, and in terms of specific criteria. The participants were 52 architecture students. The experimental task was to design a small museum in a described context. After completing the task, the students self-assessed their creativity in designing with seven open-ended questions, the Self-Assessment of Creative Design questionnaire, and a list of seven items tapping affective metacognitive aspects of the designing process. Thus, 21 creativity indicators were formed. Four expert architects, working independently, assessed the designs on nine creativity indicators: fluency, flexibility, elaboration, functionality, innovation, fulfilling specified design requirements, considering context, mastery of skills concerning the esthetics of the design representation, and overall creativity. The agreement among the architects’ evaluations was very high. The correlations between the nine corresponding indicators in students’ assessment of their design and those of the experts were positive and significant with respect to three indicators: fluency, flexibility, and overall creativity. On the contrary, the correlations of the rest noncorresponding indicators with those of the experts were not significant. The findings support the validity of self-assessed creativity with specific restrictions.


2019 ◽  
Vol 1 (3) ◽  
Author(s):  
A. Aziz Altowayan ◽  
Lixin Tao

We consider the following problem: given neural language models (embeddings) each of which is trained on an unknown data set, how can we determine which model would provide a better result when used for feature representation in a downstream task such as text classification or entity recognition? In this paper, we assess the word similarity measure through analyzing its impact on word embeddings learned from various datasets and how they perform in a simple classification task. Word representations were learned and assessed under the same conditions. For training word vectors, we used the implementation of Continuous Bag of Words described in [1]. To assess the quality of the vectors, we applied the analogy questions test for word similarity described in the same paper. Further, to measure the retrieval rate of an embedding model, we introduced a new metric (Average Retrieval Error) which measures the percentage of missing words in the model. We observe that scoring a high accuracy of syntactic and semantic similarities between word pairs is not an indicator of better classification results. This observation can be justified by the fact that a domain-specific corpus contributes to the performance better than a general-purpose corpus. For reproducibility, we release our experiments scripts and results.


2021 ◽  
pp. 026142942110463
Author(s):  
Dean Keith Simonton

The terms giftedness and genius entered the research literature in the 19th century. Although not synonymous, both terms were defined according to potential or actual achievement in a specific domain. However, in the early 20th century, both terms became defined according to performance on domain-generic IQ tests. Given the empirical relations between achievement and intelligence, this transfer of meaning is unjustified. Both giftedness and genius must be defined with respect to potential or actual domain-specific achievements.


2010 ◽  
pp. 185-203
Author(s):  
Terence R. Smith ◽  
Marcia Lei Zeng

We describe a digital learning environment (DLE) organized around sets of concepts that represent a specific domain of knowledge. A prototype DLE developed by the Alexandria Project currently supports teaching at the University of California at Santa Barbara. Its distinguishing strength is an underlying abstract model of key aspects of any concept and its relationship to other concepts. Similar models of concepts are evolving simultaneously in a variety of disciplines. Our strongly-structured model (SSM) of concepts is based on the viewpoint that scientific concepts and their interrelationships provide the most powerful level of granularity with which to support effective access and use of knowledge in DLEs. The SSM integrates a taxonomy (or thesaurus), metadata (or attribute-value pairs), domain-specific mark-up languages, and specific models for learning scientific concepts. It is focused on attributes of concepts that include objective representations, operational semantics, use, and interrelationships to other concepts. The DLE integrates various semantic tools facilitating the creation, merging, and use of heterogeneous learning materials from distributed sources, as well as their access in terms of our SSM of concepts by both instructors and students. Evidence indicates that undergraduate instructional activities are enhanced with the use of such integrated semantic tools.


Author(s):  
Terry Gao

In this paper, the cow recognition and traction in video sequences is studied. In the recognition phase, this paper does some discussion and analysis which aim at different classification algorithms and feature extraction algorithms, and cow's detection is transformed into a binary classification problem. The detection method extracts cow's features using a method of multiple feature fusion. These features include edge characters which reflects the cow body contour, grey value, and spatial position relationship. In addition, the algorithm detects the cow body through the classifier which is trained by Gentle Adaboost algorithm. Experiments show that the method has good detection performance when the target has deformation or the contrast between target and background is low. Compared with the general target detection algorithm, this method reduces the miss rate and the detection precision is improved. Detection rate can reach 97.3%. In traction phase, the popular compressive tracking (CT) algorithm is proposed. The learning rate is changed through adaptively calculating the pap distance of image block. Moreover, the update for target model is stopped to avoid introducing error and noise when the classification response values are negative. The experiment results show that the improved tracking algorithm can effectively solve the target model update by mistaken when there are large covers or the attitude is changed frequently. For the detection and tracking of cow body, a detection and tracking framework for the image of cow is built and the detector is combined with the tracking framework. The algorithm test for some video sequences under the complex environment indicates the detection algorithm based on improved compressed perception shows good tracking effect in the changing and complicated background.


Entropy ◽  
2019 ◽  
Vol 21 (10) ◽  
pp. 925 ◽  
Author(s):  
Stephen Guth ◽  
Themistoklis P. Sapsis

The ability to characterize and predict extreme events is a vital topic in fields ranging from finance to ocean engineering. Typically, the most-extreme events are also the most-rare, and it is this property that makes data collection and direct simulation challenging. We consider the problem of deriving optimal predictors of extremes directly from data characterizing a complex system, by formulating the problem in the context of binary classification. Specifically, we assume that a training dataset consists of: (i) indicator time series specifying on whether or not an extreme event occurs; and (ii) observables time series, which are employed to formulate efficient predictors. We employ and assess standard binary classification criteria for the selection of optimal predictors, such as total and balanced error and area under the curve, in the context of extreme event prediction. For physical systems for which there is sufficient separation between the extreme and regular events, i.e., extremes are distinguishably larger compared with regular events, we prove the existence of optimal extreme event thresholds that lead to efficient predictors. Moreover, motivated by the special character of extreme events, i.e., the very low rate of occurrence, we formulate a new objective function for the selection of predictors. This objective is constructed from the same principles as receiver operating characteristic curves, and exhibits a geometric connection to the regime separation property. We demonstrate the application of the new selection criterion to the advance prediction of intermittent extreme events in two challenging complex systems: the Majda–McLaughlin–Tabak model, a 1D nonlinear, dispersive wave model, and the 2D Kolmogorov flow model, which exhibits extreme dissipation events.


1994 ◽  
Vol 16 (3) ◽  
pp. 306-325 ◽  
Author(s):  
Herbert W. Marsh

Theoretical models of relations between specific components of physical self-concept, global physical self-concept, and global esteem are evaluated. Self-concept models posit that the effect of a specific domain (e.g., strength, endurance, or appearance) on global components should vary with the importance an individual places on the specific domain, but empirical support for this prediction is weak. Fox (1990) incorporated a related assumption into his hierarchical model of physical self-concept, but did not test this assumption. In empirical tests based on responses to the newly developed Physical Self-Description Questionnaire, relations between specific and global components of physical self-concept did not vary with the perceived importance of the specific component, and unweighted averages of specific components were as highly related to global components as importance weighted averages. These results provide no support for the importance of importance in modifying relations between domain-specific and general components of self-concept.


2020 ◽  
Vol 14 (02) ◽  
pp. 199-222
Author(s):  
Md Enamul Haque ◽  
Eddie C. Ling ◽  
Aminul Islam ◽  
Mehmet Engin Tozal

Microblog activity logs are useful to determine user’s interest and sentiment towards specific and broader category of events such as natural disaster and national election. In this paper, we present a corpus model to show how personal attitudes can be predicted from social media or microblog activities for a specific domain of events such as natural disasters. More specifically, given a user’s tweet and an event, the model is used to predict whether the user will be willing to help or show a positive attitude towards that event or similar events in the future. We present a new dataset related to a specific natural disaster event, i.e. Hurricane Harvey, that distinguishes user’s tweets into positive and non-positive attitudes. We build Term Embeddings for Tweet (TEmT) to generate features to model personal attitudes for arbitrary user’s tweets. In addition, we present sentiment analysis on the same disaster event dataset using enhanced feature learning on TEmT generated features by applying Convolutional Neural Network (CNN). Finally, we evaluate the effectiveness of our method by employing multiple classification techniques and comparative methods on the newly created dataset.


Electronics ◽  
2020 ◽  
Vol 9 (7) ◽  
pp. 1083 ◽  
Author(s):  
Zakria ◽  
Jianhua Deng ◽  
Jingye Cai ◽  
Muhammad Umar Aftab ◽  
Muhammad Saddam Khokhar ◽  
...  

Vehicle re-identification (Re-Id) is the key module in an intelligent transportation system (ITS). Due to its versatile applicability in metropolitan cities, this task has received increasing attention these days. It aims to identify whether the specific vehicle has already appeared over the surveillance network or not. Mostly, the vehicle Re-Id method are evaluated on a single dataset, in which training and testing of the model is performed on the same dataset. However in practice, this negatively effects model generalization ability due to biased datasets along with the significant difference between training and testing data; hence, the model becomes weak in a practical environment. To demonstrate this issue, we have empirically shown that the current vehicle Re-Id datasets are usually strongly biased. In this regard, we also conduct an extensive study on the cross and the same dataset to examine the impact on the performance of the vehicle Re-Id system, considering existing methods. To address the problem, in this paper, we have proposed an approach with augmentation of the training dataset to reduce the influence of pose, angle, camera color response, and background information in vehicle images; whereas, spatio-temporal patterns of unlabelled target datasets are learned by transferring siamese neural network classifiers trained on a source-labelled dataset. We finally calculate the composite similarity score of spatio-temporal patterns with siamese neural-network-based classifier visual features. Extensive experiments on multiple datasets are examined and results suggest that the proposed approach has the ability to generalize adequately.


2019 ◽  
Vol 54 (1) ◽  
pp. 34-63 ◽  
Author(s):  
Xiaoming Zhang ◽  
Mingming Meng ◽  
Xiaoling Sun ◽  
Yu Bai

Purpose With the advent of the era of Big Data, the scale of knowledge graph (KG) in various domains is growing rapidly, which holds huge amount of knowledge surely benefiting the question answering (QA) research. However, the KG, which is always constituted of entities and relations, is structurally inconsistent with the natural language query. Thus, the QA system based on KG is still faced with difficulties. The purpose of this paper is to propose a method to answer the domain-specific questions based on KG, providing conveniences for the information query over domain KG. Design/methodology/approach The authors propose a method FactQA to answer the factual questions about specific domain. A series of logical rules are designed to transform the factual questions into the triples, in order to solve the structural inconsistency between the user’s question and the domain knowledge. Then, the query expansion strategies and filtering strategies are proposed from two levels (i.e. words and triples in the question). For matching the question with domain knowledge, not only the similarity values between the words in the question and the resources in the domain knowledge but also the tag information of these words is considered. And the tag information is obtained by parsing the question using Stanford CoreNLP. In this paper, the KG in metallic materials domain is used to illustrate the FactQA method. Findings The designed logical rules have time stability for transforming the factual questions into the triples. Additionally, after filtering the synonym expansion results of the words in the question, the expansion quality of the triple representation of the question is improved. The tag information of the words in the question is considered in the process of data matching, which could help to filter out the wrong matches. Originality/value Although the FactQA is proposed for domain-specific QA, it can also be applied to any other domain besides metallic materials domain. For a question that cannot be answered, FactQA would generate a new related question to answer, providing as much as possible the user with the information they probably need. The FactQA could facilitate the user’s information query based on the emerging KG.


Sign in / Sign up

Export Citation Format

Share Document