Bucketed common vector scaling for authorship attribution in heterogeneous web collections: A scaling approach for authorship attribution

2019 ◽  
Vol 46 (5) ◽  
pp. 683-695 ◽  
Author(s):  
Hayri Volkan Agun ◽  
Ozgur Yilmazel

Domain, genre and topic influences on author style adversely affect the performance of authorship attribution (AA) in multi-genre and multi-domain data sets. Although recent approaches to AA tasks focus on suggesting new feature sets and sampling techniques to improve the robustness of a classification system, they do not incorporate domain-specific properties to reduce the negative impact of irrelevant features on AA. This study presents a novel scaling approach, namely, bucketed common vector scaling, to efficiently reduce negative domain influence without reducing the dimensionality of existing features; therefore, this approach is easily transferable and applicable in a classification system. Classification performances on English-language competition data sets consisting of emails and articles and Turkish-language web documents consisting of blogs, articles and tweets indicate that our approach is very competitive to top-performing approaches in English competition data sets and is significantly improving the top classification performance in mixed-domain experiments on blogs, articles and tweets.

2020 ◽  
pp. 147592172097970
Author(s):  
Liangliang Cheng ◽  
Vahid Yaghoubi ◽  
Wim Van Paepegem ◽  
Mathias Kersemans

The Mahalanobis–Taguchi system is considered as a promising and powerful tool for handling binary classification cases. Though, the Mahalanobis–Taguchi system has several restrictions in screening useful features and determining the decision boundary in an optimal manner. In this article, an integrated Mahalanobis classification system is proposed which builds on the concept of Mahalanobis distance and its space. The integrated Mahalanobis classification system integrates the decision boundary searching process, based on particle swarm optimizer, directly into the feature selection phase for constructing the Mahalanobis distance space. This integration (a) avoids the need for user-dependent input parameters and (b) improves the classification performance. For the feature selection phase, both the use of binary particle swarm optimizer and binary gravitational search algorithm is investigated. To deal with possible overfitting problems in case of sparse data sets, k-fold cross-validation is considered. The integrated Mahalanobis classification system procedure is benchmarked with the classical Mahalanobis–Taguchi system as well as the recently proposed two-stage Mahalanobis classification system in terms of classification performance. Results are presented on both an experimental case study of complex-shaped metallic turbine blades with various damage types and a synthetic case study of cylindrical dogbone samples with creep and microstructural damage. The results indicate that the proposed integrated Mahalanobis classification system shows good and robust classification performance.


2021 ◽  
Vol 14 (1) ◽  
pp. 205979912098776
Author(s):  
Joseph Da Silva

Interviews are an established research method across multiple disciplines. Such interviews are typically transcribed orthographically in order to facilitate analysis. Many novice qualitative researchers’ experiences of manual transcription are that it is tedious and time-consuming, although it is generally accepted within much of the literature that quality of analysis is improved through researchers performing this task themselves. This is despite the potential for the exhausting nature of bulk transcription to conversely have a negative impact upon quality. Other researchers have explored the use of automated methods to ease the task of transcription, more recently using cloud-computing services, but such services present challenges to ensuring confidentiality and privacy of data. In the field of cyber-security, these are particularly concerning; however, any researcher dealing with confidential participant speech should also be uneasy with third-party access to such data. As a result, researchers, particularly early-career researchers and students, may find themselves with no option other than manual transcription. This article presents a secure and effective alternative, building on prior work published in this journal, to present a method that significantly reduced, by more than half, interview transcription time for the researcher yet maintained security of audio data. It presents a comparison between this method and a fully manual method, drawing on data from 10 interviews conducted as part of my doctoral research. The method presented requires an investment in specific equipment which currently only supports the English language.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 434
Author(s):  
Anca Nicoleta Marginean ◽  
Delia Doris Muntean ◽  
George Adrian Muntean ◽  
Adelina Priscu ◽  
Adrian Groza ◽  
...  

It has recently been shown that the interpretation by partial differential equations (PDEs) of a class of convolutional neural networks (CNNs) supports definition of architectures such as parabolic and hyperbolic networks. These networks have provable properties regarding the stability against the perturbations of the input features. Aiming for robustness, we tackle the problem of detecting changes in chest X-ray images that may be suggestive of COVID-19 with parabolic and hyperbolic CNNs and with domain-specific transfer learning. To this end, we compile public data on patients diagnosed with COVID-19, pneumonia, and tuberculosis, along with normal chest X-ray images. The negative impact of the small number of COVID-19 images is reduced by applying transfer learning in several ways. For the parabolic and hyperbolic networks, we pretrain the networks on normal and pneumonia images and further use the obtained weights as the initializers for the networks to discriminate between COVID-19, pneumonia, tuberculosis, and normal aspects. For DenseNets, we apply transfer learning twice. First, the ImageNet pretrained weights are used to train on the CheXpert dataset, which includes 14 common radiological observations (e.g., lung opacity, cardiomegaly, fracture, support devices). Then, the weights are used to initialize the network which detects COVID-19 and the three other classes. The resulting networks are compared in terms of how well they adapt to the small number of COVID-19 images. According to our quantitative and qualitative analysis, the resulting networks are more reliable compared to those obtained by direct training on the targeted dataset.


2021 ◽  
Vol 22 (4) ◽  
pp. 1988
Author(s):  
Francesco Lotti ◽  
Sara Marchiani ◽  
Giovanni Corona ◽  
Mario Maggi

Metabolic syndrome (MetS) and infertility are two afflictions with a high prevalence in the general population. MetS is a global health problem increasing worldwide, while infertility affects up to 12% of men. Despite the high prevalence of these conditions, the possible impact of MetS on male fertility has been investigated by a few authors only in the last decade. In addition, underlying mechanism(s) connecting the two conditions have been investigated in few preclinical studies. The aim of this review is to summarize and critically discuss available clinical and preclinical studies on the role of MetS (and its treatment) in male fertility. An extensive Medline search was performed identifying studies in the English language. While several studies support an association between MetS and hypogonadism, contrasting results have been reported on the relationship between MetS and semen parameters/male infertility, and the available studies considered heterogeneous MetS definitions and populations. So far, only two meta-analyses in clinical and preclinical studies, respectively, evaluated this topic, reporting a negative association between MetS and sperm parameters, testosterone and FSH levels, advocating, however, larger prospective investigations. In conclusion, a possible negative impact of MetS on male reproductive potential was reported; however, larger studies are needed.


2001 ◽  
Vol 221 (5-6) ◽  
Author(s):  
Elizabeth Kremp ◽  
Elmar Stöß

SummaryThis paper investigates the borrowing behavior of 2,900 French and 1,300 German firms over the 1987-95 period. Both samples based on data sets of the Banque de France and the Deutsche Bundesbank not only include large but also small and medium-sized enterprises. Applying GMM techniques, we estimate identical debt equations for the two total samples and by size class. Despite the large differences between the two countries in term of debt trends over time and size class the main result is the similarity of a few determinants between France and Germany. E.g. we find that firm growth has a positive impact on borrowing according to the theory of signalling whereas the negative correlation of profit and debt supports pecking order approach and the cost of finance has a negative impact on leverage, too. Additionally, the study can provide some insights for the monetary transmission mechanism in both EMU member countries.


2021 ◽  
pp. 136700692110345
Author(s):  
Van H Tran ◽  
Cen Wang ◽  
Sharynne McLeod ◽  
Sarah Verdon

Aim: To explore Vietnamese–Australian children’s proficiency and use of Vietnamese and English and identify associated factors that are related to demographics, language practices, language ideologies, and language management. Methodology: Vietnamese–Australian parents ( n = 151) completed a questionnaire (in English or Vietnamese) regarding their child’s language proficiency and use, demographic details and a range of factors as conceptualized by Spolsky’s language policy theory: language practices; language ideologies; and language management. Data and analysis: Bivariate analyses (Pearson’s correlation and analysis of variance) and multiple regression models were conducted to explore associations between language proficiency and use and associated factors and identify the most significant factors. Findings/conclusions: Factors associated with children’s Vietnamese language proficiency (oral/written) included: demographic factors; language practices; language ideologies; and language management. In contrast, children’s English language proficiency (oral/written) was linked to demographic factors and language practices. Children’s Vietnamese language use was not significantly correlated with demographics but rather with language practices, language ideologies, and language management. Children’s home language use and proficiency did not have a negative impact upon their English proficiency. Originality: This study is the first to consider factors associated with Vietnamese–Australian children’s language proficiency and use. Significance/implications: Demographic factors, language practices, language ideologies, and language management were associated with children’s language proficiency and use. The results can be used by parents, educators, policy-makers, speech–language pathologists and other professionals to support Vietnamese–Australian and multilingual children around the world to develop and maintain their home and majority languages.


Author(s):  
Maja Radović ◽  
Nenad Petrović ◽  
Milorad Tošić

The requirements of state-of-the-art curricula and teaching processes in medical education have brought both new and improved the existing assessment methods. Recently, several promising methods have emerged, among them the Comprehensive Integrative Puzzle (CIP), which shows great potential. However, the construction of such questions requires high efforts of a team of experts and is time-consuming. Furthermore, despite the fact that English language is accepted as an international language, for educational purposes there is also a need for representing data and knowledge in native language. In this paper, we present an approach for automatic generation of CIP assessment questions based on using ontologies for knowledge representation. In this way, it is possible to provide multilingual support in the teaching and learning process because the same ontological concept can be applied to corresponding language expressions in different languages. The proposed approach shows promising results indicated by dramatic speeding up of construction of CIP questions compared to manual methods. The presented results represent a strong indication that adoption of ontologies for knowledge representation may enable scalability in multilingual domain-specific education regardless of the language used. High level of automation in the assessment process proven on the CIP method in medical education as one of the most challenging domains, promises high potential for new innovative teaching methodologies in other educational domains as well.


2020 ◽  
Author(s):  
Harith Al-Sahaf ◽  
Mengjie Zhang ◽  
M Johnston

In machine learning, it is common to require a large number of instances to train a model for classification. In many cases, it is hard or expensive to acquire a large number of instances. In this paper, we propose a novel genetic programming (GP) based method to the problem of automatic image classification via adopting a one-shot learning approach. The proposed method relies on the combination of GP and Local Binary Patterns (LBP) techniques to detect a predefined number of informative regions that aim at maximising the between-class scatter and minimising the within-class scatter. Moreover, the proposed method uses only two instances of each class to evolve a classifier. To test the effectiveness of the proposed method, four different texture data sets are used and the performance is compared against two other GP-based methods namely Conventional GP and Two-tier GP. The experiments revealed that the proposed method outperforms these two methods on all the data sets. Moreover, a better performance has been achieved by Naïve Bayes, Support Vector Machine, and Decision Trees (J48) methods when extracted features by the proposed method have been used compared to the use of domain-specific and Two-tier GP extracted features. © Springer International Publishing 2013.


2019 ◽  
Author(s):  
Jamal Kaid Mohammed Ali ◽  
S. Imtiaz Hasnain ◽  
M. Salim Beg

Overindulgence in social networking, in general, and texting, in particular, is much in practice. It is cutting across various population boundaries and has almost assumed an endemic proportion. Its consequential impact on the standard language has acquired greater importance. This paper aims to determine the perceptions and attitudes of English Second Language (ESL) learners at Aligarh Muslim University towards the consequences of texting on Standard English. The data were collected through a five-point scale questionnaire from ninety students who were enrolled at Aligarh Muslim University during the academic year 2010-2011. The respondents completed a 16-item questionnaire. The students from which the data were collected were grouped according to their levels. The results indicate the negative impact of this new usage of the language in breaking the rules of English language and influencing their literacy. Moreover, the questionnaire results from respondents' point of view show that regardless of their heavy use of texting, most respondents have a negative attitude towards texting and they viewed it as a threat to Standard English.


2015 ◽  
Vol 1 (2) ◽  
pp. 6-12
Author(s):  
Evgeniya Bolshakova

Although a variety of the English language written olympiads (language competitions) exist, fairly little is known about how they are different from traditional forms of language assessment.  In Russia, olympiads in the English language are now gaining currency because they provide an opportunity to reveal creative thinking and intellectual abilities of pupils.  The present study examined major differences between language olympiads and traditional forms of language assessment.  A comparison of five main olympiads in the English language in terms of their levels, assessed skills and task types is made and their distinctive features are outlined.  The results of a testing of a new written olympiad of the Higher School of Economics “Vysshaya proba” (Highest Degree) in the English language are analyzed.  A set of test items was developed for 120 secondary school pupils in Moscow to find out whether they can easily cope with non-traditional form of assessment, which is language olympiad.  The results indicate that language competition as a form of alternative assessment may be introduced at schools to encourage better learning.


Sign in / Sign up

Export Citation Format

Share Document