Authorship Attribution for a Resource Poor Language—Urdu

Zulqarnain Nazir ◽  
Khurram Shahzad ◽  
Muhammad Kamran Malik ◽  
Waheed Anwar ◽  
Imran Sarwar Bajwa ◽  

Authorship attribution refers to examining the writing style of authors to determine the likelihood of the original author of a document from a given set of potential authors. Due to the wide range of authorship attribution applications, a plethora of studies have been conducted for various Western, as well as Asian, languages. However, authorship attribution research in the Urdu language has just begun, although Urdu is widely acknowledged as a prominent South Asian language. Furthermore, the existing studies on authorship attribution in Urdu have addressed a considerably easier problem of having less than 20 candidate authors, which is far from the real-world settings. Therefore, the findings from these studies may not be applicable to the real-world settings. To that end, we have made three key contributions: First, we have developed a large authorship attribution corpus for Urdu, which is a low-resource language. The corpus is composed of over 2.6 million tokens and 21,938 news articles by 94 authors, which makes it a closer substitute to the real-world settings. Second, we have analyzed hundreds of stylometry features used in the literature to identify 194 features that are applicable to the Urdu language and developed a taxonomy of these features. Finally, we have performed 66 experiments using two heterogeneous datasets to evaluate the effectiveness of four traditional and three deep learning techniques. The experimental results show the following: (a) Our developed corpus is many folds larger than the existing corpora, and it is more challenging than its counterparts for the authorship attribution task, and (b) Convolutional Neutral Networks is the most effective technique, as it achieved a nearly perfect F1 score of 0.989 for an existing corpus and 0.910 for our newly developed corpus.

BMJ Open ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. e038530
Francesca L Cavallaro ◽  
Ruth Gilbert ◽  
Linda Wijlaars ◽  
Eilis Kennedy ◽  
Ailsa Swarbrick ◽  

IntroductionAlmost 20 000 babies are born to teenage mothers each year in England, with poorer outcomes for mothers and babies than among older mothers. A nurse home visitation programme in the USA was found to improve a wide range of outcomes for young mothers and their children. However, a randomised controlled trial in England found no effect on short-term primary outcomes, although cognitive development up to age 2 showed improvement. Our study will use linked routinely collected health, education and social care data to evaluate the real-world effects of the Family Nurse Partnership (FNP) on child outcomes up to age 7, with a focus on identifying whether the FNP works better for particular groups of families, thereby informing programme targeting and resource allocation.Methods and analysisWe will construct a retrospective cohort of all women aged 13–24 years giving birth in English NHS hospitals between 2010 and 2017, linking information on mothers and children from FNP programme data, Hospital Episodes Statistics and the National Pupil Database. To assess the effectiveness of FNP, we will compare outcomes for eligible mothers ever and never enrolled in FNP, and their children, using two analysis strategies to adjust for measured confounding: propensity score matching and analyses adjusting for maternal characteristics up to enrolment/28 weeks gestation. Outcomes of interest include early childhood development, childhood unplanned hospital admissions for injury or maltreatment-related diagnoses and children in care. Subgroup analyses will determine whether the effect of FNP varied according to maternal characteristics (eg, age and education).Ethics and disseminationThe Nottingham Research Ethics Committee approved this study. Mothers participating in FNP were supportive of our planned research. Results will inform policy-makers for targeting home visiting programmes. Methodological findings on the accuracy and reliability of cross-sectoral data linkage will be of interest to researchers.

2014 ◽  
Vol 2014 ◽  
pp. 1-6 ◽  
Xia-Xia Zhao ◽  
Jian-Zhong Wang

Information plays an important role in modern society. In this paper, we presented a mathematical model of information spreading with isolation. It was found that such a model has rich dynamics including Hopf bifurcation. The results showed that, for a wide range of parameters, there is a bistable phenomenon in the process of information spreading and thus the information cannot be well controlled. Moreover, the model has a limit cycle which implies that the information exhibits periodic outbreak which is consistent with the observations in the real world.

2020 ◽  
Vol 12 (02) ◽  
pp. 189-218
Eleonora Milazzo

The concept of solidarity has been receiving growing attention from scholars in a wide range of disciplines. While this trend coincides with widespread unsuccessful attempts to achieve solidarity in the real world, the failure of solidarity as such remains a relatively unexplored topic. In the case of the so-called European Union (EU) refugee crisis, the fact that EU member states failed to fulfil their commitment to solidarity is now regarded as established wisdom. But as we try to come to terms with failing solidarity in the EU we are faced with a number of important questions: are all instances of failing solidarity equally morally reprehensible? Are some motivations for resorting to unsolidaristic measures more valid than others? What claims have an effective countervailing force against the commitment to act in solidarity?

2016 ◽  
Vol 2016 (3) ◽  
pp. 155-171 ◽  
Rebekah Overdorf ◽  
Rachel Greenstadt

AbstractStylometry is a form of authorship attribution that relies on the linguistic information to attribute documents of unknown authorship based on the writing styles of a suspect set of authors. This paper focuses on the cross-domain subproblem where the known and suspect documents differ in the setting in which they were created. Three distinct domains, Twitter feeds, blog entries, and Reddit comments, are explored in this work. We determine that state-of-the-art methods in stylometry do not perform as well in cross-domain situations (34.3% accuracy) as they do in in-domain situations (83.5% accuracy) and propose methods that improve performance in the cross-domain setting with both feature and classification level techniques which can increase accuracy to up to 70%. In addition to testing these approaches on a large real world dataset, we also examine real world adversarial cases where an author is actively attempting to hide their identity. Being able to identify authors across domains facilitates linking identities across the Internet making this a key security and privacy concern; users can take other measures to ensure their anonymity, but due to their unique writing style, they may not be as anonymous as they believe.

Jiakai Wang

Although deep neural networks (DNNs) have already made fairly high achievements and a very wide range of impact, their vulnerability attracts lots of interest of researchers towards related studies about artificial intelligence (AI) safety and robustness this year. A series of works reveals that the current DNNs are always misled by elaborately designed adversarial examples. And unfortunately, this peculiarity also affects real-world AI applications and places them at potential risk. we are more interested in physical attacks due to their implementability in the real world. The study of physical attacks can effectively promote the application of AI techniques, which is of great significance to the security development of AI.

David R Mills

Projects are being increasingly used to provide a richer experience in physics teaching laboratories, and in the higher years, these may well approximate to the real world of industry and research. In first year, however, a wide range of approaches are utilised, from projects to open-ended experiments, yet questions remain about how students can best acquire a range of desired scientific abilities. Recent physics education research has suggested tools and approaches to help develop and measure the abilities such as needed to design and implement an experiment. Examples from several countries illustrate the need for matching the task with students' capabilities, and how various goals may be achieved for student learning in the laboratory.

2019 ◽  
Vol 17 (1) ◽  
pp. 5-14
Robert Pratten
The Real ◽  

In participatory transmedia experiences a wide range of player agency is desirable but can be problematic if the game and storyworld boundaries are unknown or ignored. Players breaking the world boundaries can mean an experience must be aborted or stops being fun. Yet breaking the rules is fun and in learning & development experiences like wargaming it might even be part of the goal. How then can authors of participatory experiences that play out in the real world allow players to break the rules but not break the world? How can we design an experience for the greatest player agency and the broadest scope of emergent stories yet prevent the world from travelling so far from the author’s intended state that it becomes unrecognisable, unplayable or unsuitable? This paper introduces the concept of an elastic storyworld as an alternative to a persistent storyworld: a world that stretches to accommodate unexpected player actions and yet restores itself over time. Drawing on definitions of elasticity from physics, the paper suggests ways in which authors might classify and detect player-enacted distortions and how participatory experiences might be designed to be more resilient to the stresses and strains of player agency. 

Tanzila Khan ◽  
H. Christopher Frey

With more stringent U.S. fuel economy (FE) standards, the effect of auxiliary devices such as air-conditioning (AC) have received increased attention. AC is the largest auxiliary engine load for light duty gasoline vehicles (LDGVs). However, there are few data regarding the effect of AC operation on FE for LDGVs based on real-world measurements, especially for recent model year vehicles. The Motor Vehicle Emission Simulator (MOVES) is a regulatory model for estimating on-road vehicle energy-use and emissions. MOVES adjusts vehicle energy-use rates for AC effects. However, MOVES-predicted FE with AC has not been evaluated based on empirical measurements. The research objectives are to quantify the LDGVs FE penalty from AC and assess the accuracy of MOVES2014a-predicted FE with AC. The AC effect on real-world fleet-average FE was quantified based on 78 AC-off vehicles versus 55 AC-on vehicles, measured with onboard instruments on defined study routes. MOVES2014a-based FE penalty from AC was evaluated based on real-world estimates and chassis dynamometer-based FE test results used for FE ratings. The real-world FE penalty ranges between 1.3% and 7.5% among a wide range of driving cycles. Fuel consumption at idle is 13% higher with AC on. MOVES underestimates the real-world FE with AC by 6%, on average. MOVES overestimates the AC effect on cycle-average FE ranging between 13.5% and 18.5% for real-world and MOVES default cycles, and between 11.1% and 14.5% for standard cycles.

Bioimpacts ◽  
2021 ◽  
Yosef Masoudi-Sobhanzadeh ◽  
Hosein Esmaeili ◽  
Ali Masoudi-Nejad

Introduction: COVID-19 has spread out all around the world and seriously interrupted human activities. Being a newfound disease, not only many aspects of the disease are unknown, but also there is not an effective medication to cure the disease. Besides, designing a drug is a time-consuming process and needs large investment. Hence, drug repurposing techniques, employed to discover the hidden benefits of the existing drugs, maybe a useful option for treating COVID-19. Methods: The present study exploits the drug repositioning concepts and introduces some candidate drugs which may be effective in controlling COVID-19. The suggested method consists of three main steps. First, the required data such as the amino acid sequences of targets and drug-target interactions are extracted from the public databases. Second, the similarity score between the targets (protein/enzymes) and genome of SARS-COV-2 is computed using the proposed fuzzy logic-based method. Since the classical approaches yield outcomes which may not be useful for the real-world applications, the fuzzy technique can address the issue. Third, after ranking targets based on the obtained scores, the usefulness of drugs affecting them is examined for managing COVID-19. Results: The results indicate that antiviral medicines, designed for curing hepatitis C, may also cure COVID-19. According to the findings, ribavirin, simeprevir, danoprevir, and XTL-6865 may be helpful in controlling the disease. Conclusion: It can be concluded that the similarity-based drug repurposing techniques may be the most suitable option for managing emerging diseases such as COVID-19 and can be applied to a wide range of data. Also, fuzzy logic-based scoring methods can produce outcomes which are more consistent with the real-world biological applications than others.

2020 ◽  
Florian Ulrich Jehn ◽  
Lutz Breuer ◽  
Philipp Kraft ◽  
Konrad Bestian ◽  
Tobias Houska

<p>Hydrology and especially hydrological models often treat catchments as if they were leaky buckets. But, do we find such catchments in the real world or is this just a convenient simplification? Moreover, if we find them, what attributes allow these catchments to show such a simple behavior? To study this, we look at time series of 27 years for 90 catchments in Hesse, Germany, which includes droughts and years of abundant precipitation. In addition, the state Hesse provides a wide range of catchment attributes like geology, soils and land use, while still having a relatively similar climate. Using discharge, evapotranspiration and precipitation, we calculate the cumulative storage change for all years separately and use it as a proxy for the storage. We group the 90 catchments by the complexity of their storage-discharge relationship, which we define as how good the relationship can be modelled by an exponential function. We find that climate and physical attributes of the catchments seem to have similar influence on the overall complexity of the storage-discharge relationship. However, we could also identify catchments that depict consistent behavior, mostly independent of climate. Those catchments either behave always complex or always simple in all the years considered. They differ in their permeability, conductivity, geology, soil and to a lesser extent their shape. We show that bucket like catchments exist in the real world and that they can be found by looking for oval catchments with good permeability in regions of igneous geology and clay silt soil texture.</p>

Sign in / Sign up

Export Citation Format

Share Document