scholarly journals UrduAI: Writeprints for Urdu Authorship Identification

Author(s):  
Raheem Sarwar ◽  
Saeed-Ul Hassan

The authorship identification task aims at identifying the original author of an anonymous text sample from a set of candidate authors. It has several application domains such as digital text forensics and information retrieval. These application domains are not limited to a specific language. However, most of the authorship identification studies are focused on English and limited attention has been paid to Urdu. However, existing Urdu authorship identification solutions drop accuracy as the number of training samples per candidate author reduces and when the number of candidate authors increases. Consequently, these solutions are inapplicable to real-world cases. Moreover, due to the unavailability of reliable POS taggers or sentence segmenters, all existing authorship identification studies on Urdu text are limited to the word n-grams features only. To overcome these limitations, we formulate a stylometric feature space, which is not limited to the word n-grams feature only. Based on this feature space, we use an authorship identification solution that transforms each text sample into a point set, retrieves candidate text samples, and relies on the nearest neighbors classifier to predict the original author of the anonymous text sample. To evaluate our solution, we create a significantly larger corpus than existing studies and conduct several experimental studies that show that our solution can overcome the limitations of existing studies and report an accuracy level of 94.03%, which is higher than all previous authorship identification works.

2019 ◽  
Vol 9 (22) ◽  
pp. 4749
Author(s):  
Lingyun Jiang ◽  
Kai Qiao ◽  
Linyuan Wang ◽  
Chi Zhang ◽  
Jian Chen ◽  
...  

Decoding human brain activities, especially reconstructing human visual stimuli via functional magnetic resonance imaging (fMRI), has gained increasing attention in recent years. However, the high dimensionality and small quantity of fMRI data impose restrictions on satisfactory reconstruction, especially for the reconstruction method with deep learning requiring huge amounts of labelled samples. When compared with the deep learning method, humans can recognize a new image because our human visual system is naturally capable of extracting features from any object and comparing them. Inspired by this visual mechanism, we introduced the mechanism of comparison into deep learning method to realize better visual reconstruction by making full use of each sample and the relationship of the sample pair by learning to compare. In this way, we proposed a Siamese reconstruction network (SRN) method. By using the SRN, we improved upon the satisfying results on two fMRI recording datasets, providing 72.5% accuracy on the digit dataset and 44.6% accuracy on the character dataset. Essentially, this manner can increase the training data about from n samples to 2n sample pairs, which takes full advantage of the limited quantity of training samples. The SRN learns to converge sample pairs of the same class or disperse sample pairs of different class in feature space.


2021 ◽  
Vol 25 (12) ◽  
pp. 1665-1665
Author(s):  
Emi Furukawa ◽  
Brent Alsop ◽  
Shizuka Shimabukuro ◽  
Paula Sowerby ◽  
Stephanie Jensen ◽  
...  

Background: Research on altered motivational processes in ADHD has focused on reward. The sensitivity of children with ADHD to punishment has received limited attention. We evaluated the effects of punishment on the behavioral allocation of children with and without ADHD from the United States, New Zealand, and Japan, applying the generalized matching law. Methods: Participants in two studies (Furukawa et al., 2017, 2019) were 210 English-speaking (145 ADHD) and 93 Japanese-speaking (34 ADHD) children. They completed an operant task in which they chose between playing two simultaneously available games. Rewards became available every 10 seconds on average, arranged equally across the two games. Responses on one game were punished four times as often as responses on the other. The asymmetrical punishment schedules should bias responding to the less punished alternative. Results: Compared with controls, children with ADHD from both samples allocated significantly more responses to the less frequently punished game, suggesting greater behavioral sensitivity to punishment. For these children, the bias toward the less punished alternative increased with time on task. Avoiding the more punished game resulted in missed reward opportunities and reduced earnings. English-speaking controls showed some preference for the less punished game. The behavior of Japanese controls was not significantly influenced by the frequency of punishment, despite slowed response times after punished trials and immediate shifts away from the punished game, indicating awareness of punishment. Conclusion: Punishment exerted greater control over the behavior of children with ADHD, regardless of their cultural background. This may be a common characteristic of the disorder. Avoidance of punishment led to poorer task performance. Caution is required in the use of punishment, especially with children with ADHD. The group difference in punishment sensitivity was more pronounced in the Japanese sample; this may create a negative halo effect for children with ADHD in this culture.


2020 ◽  
Vol 34 (07) ◽  
pp. 12975-12983
Author(s):  
Sicheng Zhao ◽  
Guangzhi Wang ◽  
Shanghang Zhang ◽  
Yang Gu ◽  
Yaxian Li ◽  
...  

Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA). Conventional DA methods usually assume that the labeled data is sampled from a single source distribution. However, in practice, labeled data may be collected from multiple sources, while naive application of the single-source DA algorithms may lead to suboptimal solutions. In this paper, we propose a novel multi-source distilling domain adaptation (MDDA) network, which not only considers the different distances among multiple sources and the target, but also investigates the different similarities of the source samples to the target ones. Specifically, the proposed MDDA includes four stages: (1) pre-train the source classifiers separately using the training data from each source; (2) adversarially map the target into the feature space of each source respectively by minimizing the empirical Wasserstein distance between source and target; (3) select the source training samples that are closer to the target to fine-tune the source classifiers; and (4) classify each encoded target feature by corresponding source classifier, and aggregate different predictions using respective domain weight, which corresponds to the discrepancy between each source and target. Extensive experiments are conducted on public DA benchmarks, and the results demonstrate that the proposed MDDA significantly outperforms the state-of-the-art approaches. Our source code is released at: https://github.com/daoyuan98/MDDA.


Sensors ◽  
2020 ◽  
Vol 20 (13) ◽  
pp. 3687
Author(s):  
Joanna Sekulska-Nalewajko ◽  
Jarosław Gocławski ◽  
Ewa Korzeniewska

Pilling is caused by friction pulling and fuzzing the fibers of a material. Pilling is normally evaluated by visually counting the pills on a flat fabric surface. Here, we propose an objective method of pilling assessment, based on the textural characteristics of the fabric shown in optical coherence tomography (OCT) images. The pilling layer is first identified above the fabric surface. The percentage of protruding fiber pixels and Haralick’s textural features are then used as pilling descriptors. Principal component analysis (PCA) is employed to select strongly correlated features and then reduce the feature space dimensionality. The first principal component is used to quantify the intensity of fabric pilling. The results of experimental studies confirm that this method can determine the intensity of pilling. Unlike traditional methods of pilling assessment, it can also detect pilling in its early stages. The approach could help to prevent overestimation of the degree of pilling, thereby avoiding unnecessary procedures, such as mechanical removal of entangled fibers. However, the research covered a narrow group of fabrics and wider conclusions about the usefulness and limitations of this method can be drawn after examining fabrics of different thickness and chemical composition of fibers.


2014 ◽  
Vol 644-650 ◽  
pp. 2160-2163 ◽  
Author(s):  
Shi Min Liu ◽  
Yan Ni Deng ◽  
Yuan Xing Lv

Locally linear embedding algorithm (LLE) , It makes up the shortcomings that the manifold learning algorithm can be only applied to training samples but not be extended to test samples . However, due to the presence of its Low-dimensional feature space redundant information,and its sample category information does not integrate into a low-dimensional embedding. For this shortcoming, here we introduce the two improved algorithms:the local linear maximum dispersion matrix algorithm (FSLLE) and the adaptive algorithm (ALLE), and the combinations of the above two algorithms.With this experience,combined Garbol and locally linear embedding algorithm (LLE) to compare each conclusion. The results proved to be effective elimination of redundant information among basis vectors and improve the recognition rate.


Author(s):  
V. B. Melekhin ◽  
V. M. Khachumov

Objective. The objective of the study is to determine various stable characteristics of images (semi-invariants and invariants) as descriptors necessary for the formation of a feature space of standards intended for recognizing images of different nature belonging to different classes of objects. Methods. The authors propose metrics for evaluating the proximity of the recognized image to a given standard in the space of covariance matrices, based on the obtained descriptors as a methodological basis for constructing image recognition methods. Results. The content of the main stages of selecting descriptors for a given class of objects is developed, taking into account the different illumination of the recognized images. The effectiveness of the results obtained is confirmed by experimental studies related to the solution of the problem of recognition of special images - facies. Conclusions. The definition of stable image descriptors as invariants or semi-invariants to zoom and brightness transformations allows solving the problems of facies classification in conditions of the unstable shooting of recognized images. The images can be rotated and shifted in any way. In general, the proposed approach allows developing an effective image recognition system in the presence of various types of interference on the recognized images. 


Author(s):  
Gede Aditra Pradnyana ◽  
I Komang Agus Suryantara ◽  
I Gede Mahendra Darmawiguna

An impression can be interpreted as a psychological feeling toward a product and it plays an important role in decision making. Therefore, the understanding of the data in the domain of impressions will be very useful. This research had the objective of knowing the performance of K-Nearest Neighbors method to classify endek image impression using K-Fold Cross Validation method. The images were taken from 3 locations, namely CV. Artha Dharma, Agung Bali Collection, and Pengrajin Sri Rejeki. To get the image impression was done by consulting with an endek expert named Dr. D.A Tirta Ray, M.Si. The process of data mining was done by using K-Nearest Neighbors Method which was a classification method to a set of data based on learning data that had been classified previously and to classify new objects based on attributes and training samples. K-Fold Cross Validation testing obtained accuracy of 91% with K value in K-Nearest Neighbors of 3, 4, 7, 8.


2019 ◽  
Vol 11 (6) ◽  
pp. 626 ◽  
Author(s):  
Wenping Ma ◽  
Yunta Xiong ◽  
Yue Wu ◽  
Hui Yang ◽  
Xiangrong Zhang ◽  
...  

Homogeneous image change detection research has been well developed, and many methods have been proposed. However, change detection between heterogeneous images is challenging since heterogeneous images are in different domains. Therefore, direct heterogeneous image comparison in the way that we do it is difficult. In this paper, a method for heterogeneous synthetic aperture radar (SAR) image and optical image change detection is proposed, which is based on a pixel-level mapping method and a capsule network with a deep structure. The mapping method proposed transforms an image from one feature space to another feature space. Then, the images can be compared directly in a similarly transformed space. In the mapping process, some image blocks in unchanged areas are selected, and these blocks are only a small part of the image. Then, the weighted parameters are acquired by calculating the Euclidean distances between the pixel to be transformed and the pixels in these blocks. The Euclidean distance calculated according to the weighted coordinates is taken as the pixel gray value in another feature space. The other image is transformed in a similar manner. In the transformed feature space, these images are compared, and the fusion of the two different images is achieved. The two experimental images are input to a capsule network, which has a deep structure. The image fusion result is taken as the training labels. The training samples are selected according to the ratio of the center pixel label and its neighboring pixels’ labels. The capsule network can improve the detection result and suppress noise. Experiments on remote sensing datasets show the final detection results, and the proposed method obtains a satisfactory performance.


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Waheed Anwar ◽  
Imran Sarwar Bajwa ◽  
Shabana Ramzan

In this paper, a novel approach is presented for authorship identification in English and Urdu text using the LDA model with n-grams texts of authors and cosine similarity. The proposed approach uses similarity metrics to identify various learned representations of stylometric features and uses them to identify the writing style of a particular author. The proposed LDA-based approach emphasizes instance-based and profile-based classifications of an author’s text. Here, LDA suitably handles high-dimensional and sparse data by allowing more expressive representation of text. The presented approach is an unsupervised computational methodology that can handle the heterogeneity of the dataset, diversity in writing, and the inherent ambiguity of the Urdu language. A large corpus has been used for performance testing of the presented approach. The results of experiments show superiority of the proposed approach over the state-of-the-art representations and other algorithms used for authorship identification. The contributions of the presented work are the use of cosine similarity with n-gram-based LDA topics to measure similarity in vectors of text documents. Achievement of overall 84.52% accuracy on PAN12 datasets and 93.17% accuracy on Urdu news articles without using any labels for authorship identification task is done.


Sign in / Sign up

Export Citation Format

Share Document