content similarity
Recently Published Documents


TOTAL DOCUMENTS

139
(FIVE YEARS 38)

H-INDEX

13
(FIVE YEARS 2)

PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262556
Author(s):  
Andrew Kapinos ◽  
Pauline Aghamalian ◽  
Erika Capehart ◽  
Anya Alag ◽  
Heather Angel ◽  
...  

Bacteriophages exhibit a vast spectrum of relatedness and there is increasing evidence of close genomic relationships independent of host genus. The variability in phage similarity at the nucleotide, amino acid, and gene content levels confounds attempts at quantifying phage relatedness, especially as more novel phages are isolated. This study describes three highly similar novel Arthrobacter globiformis phages–Powerpuff, Lego, and YesChef–which were assigned to Cluster AZ using a nucleotide-based clustering parameter. Phages in Cluster AZ, Microbacterium Cluster EH, and the former Microbacterium singleton Zeta1847 exhibited low nucleotide similarity. However, their gene content similarity was in excess of the recently adopted Microbacterium clustering parameter, which ultimately resulted in the reassignment of Zeta1847 to Cluster EH. This finding further highlights the importance of using multiple metrics to capture phage relatedness. Additionally, Clusters AZ and EH phages encode a shared integrase indicative of a lysogenic life cycle. In the first experimental verification of a Cluster AZ phage’s life cycle, we show that phage Powerpuff is a true temperate phage. It forms stable lysogens that exhibit immunity to superinfection by related phages, despite lacking identifiable repressors typically required for lysogenic maintenance and superinfection immunity. The ability of phage Powerpuff to undergo and maintain lysogeny suggests that other closely related phages may be temperate as well. Our findings provide additional evidence of significant shared phage genomic content spanning multiple actinobacterial host genera and demonstrate the continued need for verification and characterization of life cycles in newly isolated phages.


Author(s):  
Niloufar Shoeibi ◽  
Nastaran Shoeibi ◽  
Pablo Chamoso ◽  
Zakie AlizadehSani ◽  
Juan M. Corchado

Social media platforms have been entirely an undeniable part of the lifestyle for the past decade. Analyzing the information being shared is a crucial step to understanding human behavior. Social media analysis aims to guarantee a better experience for the user and risen user satisfaction. However, first, it is necessary to know how and from which aspects to compare users. In this paper, an intelligent system has been proposed to measure the similarity of Twitter profiles. For this, firstly, the timeline of each profile has been extracted using the official TwitterAPI. Then, all information is given to the proposed system. Next, in parallel, three aspects of a profile are derived. Behavioral ratios are time-series-related information showing the consistency and habits of the user. Dynamic time warping has been utilized for the comparison of the behavioral ratios of two profiles. Next, the audience network is extracted for each user, and for estimating the similarity of two sets, Jaccard similarity is used. Finally, for the Content similarity measurement, the tweets are preprocessed respecting the feature extraction method; TF-IDF and DistilBERT for feature extraction are employed and then compared using the cosine similarity method. Results have shown that TF-IDF has slightly better performance; therefore, the more straightforward solution is selected for the model. Similarity level of different profiles. As in the case study, a Random Forest classification model was trained on almost 20000 users revealed a 97.24% accuracy. This comparison enables us to find duplicate profiles with nearly the same behavior and content.


2021 ◽  
Vol 23 (11) ◽  
pp. 612-618
Author(s):  
K. Pon Karthika ◽  
◽  
Dr. S. Kavi Priya ◽  

The proposed work deals with finding related reviews posted on various online Forums. Conventional methods for matching related documents compute the content similarity over the entire review instead of partitioning into segments revealing different intentions. In this work, intention-based similarity clustering is introduced to find the relatedness of two documents. This method forms the document clusters based on the similarity of the segments with similar intentions. The segmentation points are identified using a number of text features which can express when the segmentation should be done. Finally, the document clusters are formed by grouping the segments with similar intentions in same cluster and then the similarities among the segments with the same intention are computed. The proposed model is trained on TripAdvisor and Yelp Open Review datasets to evaluate the performance of the model, and the evaluation results show that the model produces more precise results in mining documents related to the user’s interest.


Author(s):  
Christoph Mauritz ◽  
Martin Nienhaus ◽  
Christopher Oehler

AbstractWe analyze the extent to which individual audit partners influence the audited narrative disclosures in their clients’ financial reports. Using a sample of 3,281,423 private and public client firm-pairs, we find that the similarity among audited narrative disclosures is higher when two client firms share the same audit partner. Specifically, we find that the wording similarity of management reports (notes) increases by 30 (48) percent, the content similarity by 29 (49) percent, and the structure similarity by 48 (121) percent. Moreover, we find that audit partners in particular are relevant for their clients’ narrative disclosures because the increase in narrative disclosure similarity when sharing the same audit partner is nine (four) times greater than when sharing the same audit firm (audit office). We show that this influence of audit partners goes beyond adding boilerplate statements and, using novel field evidence, we shed light on the underlying mechanisms. Our findings are economically relevant because a stronger involvement of audit partners with their clients’ narratives is associated with a higher quality of narrative disclosures, which helps users better predict the future profitability of client firms.


2021 ◽  
Vol 29 (3) ◽  
Author(s):  
Chun Then Lim ◽  
Chih How Bong ◽  
Wee Sian Wong ◽  
Nung Kion Lee

Automated Essay Scoring (AES) is a service or software that can predictively grade essay based on a pre-trained computational model. It has gained a lot of research interest in educational institutions as it expedites the process and reduces the effort of human raters in grading the essays as close to humans’ decisions. Despite the strong appeal, its implementation varies widely according to researchers’ preferences. This critical review examines various AES development milestones specifically on different methodologies and attributes used in deriving essay scores. To generalize existing AES systems according to their constructs, we attempted to fit all of them into three frameworks which are content similarity, machine learning and hybrid. In addition, we presented and compared various common evaluation metrics in measuring the efficiency of AES and proposed Quadratic Weighted Kappa (QWK) as standard evaluation metric since it corrects the agreement purely by chance when estimate the degree of agreement between two raters. In conclusion, the paper proposes hybrid framework standard as the potential upcoming AES framework as it capable to aggregate both style and content to predict essay grades Thus, the main objective of this study is to discuss various critical issues pertaining to the current development of AES which yielded our recommendations on the future AES development.


Author(s):  
Petar Juric ◽  
Marija Brkic Bakaric ◽  
Maja Matetic

In order to make e-learning systems more readily available for use, the majority of new systems are being developed in a form suitable for mobile learning, i.e. m-learning. The paper puts focus on the parts of the implementation of an e-learning system which is not restricted to desktop platforms, but works equally well on smartphones and tablets in the form of m-learning. The implemented system uses educational computer games for learning Mathematics in primary schools and has an integrated social network, which is used for communication and publishing of the content related to the game. Besides analysing the platforms used for accessing the system (desktop/mobile), since students are given a choice, the paper also questions how to interpret messages when they contain concepts in student jargon or generally unknown to teachers, and shows that these messages can be interpreted by applying neural networks.


Author(s):  
Niloufar Shoeibi ◽  
Nastaran Shoeibi ◽  
Pablo Chamoso ◽  
Zakie Alizadehsani ◽  
Juan M. Corchado

Social media platforms are entirely an undeniable part of the lifestyle from the past decade. Analyzing the information being shared is a crucial step to understand humans behavior. Social media analysis is aiming to guarantee a better experience for the user and risen user satisfaction. But first, it is necessary to know how and from which aspects to compare users with each other. In this paper, an intelligent system has been proposed to measure the similarity of Twitter profiles. For this, firstly, the timeline of each profile has been extracted using the official Twitter API. Then, all information is given to the proposed system. Next, in parallel, three aspects of a profile are derived. Behavioral ratios are time-series-related information showing the consistency and habits of the user. Dynamic time warping has been utilized for comparison of the behavioral ratios of two profiles. Next, Graph Network Analysis is used for monitoring the interactions of the user and its audience; for estimating the similarity of graphs, Jaccard similarity is used. Finally, for the Content similarity measurement, natural language processing techniques for preprocessing and TF-IDF for feature extraction are employed and then compared using the cosine similarity method. Results have presented the similarity level of different profiles. As the case study, people with the same interest show higher similarity. This way of comparison is helpful in many other areas. Also, it enables to find duplicate profiles; those are profiles with almost the same behavior and content.


2021 ◽  
Vol 115 ◽  
pp. 102000 ◽  
Author(s):  
Xianglin Wei ◽  
Jianwei Liu ◽  
Yangang Wang ◽  
Chaogang Tang ◽  
Yongyang Hu

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Marcos Parras-Moltó ◽  
Daniel Aguirre de Cárcer

AbstractIn this report we use available curated phylogenies, taxonomy, and genome annotations to assess the phylogenetic and gene content similarity associated with each different taxon and taxonomic rank. Subsequently, we employ the same data to assess the frontiers of functional coherence along the bacterial phylogeny. Our results show that within-group phylogenetic and gene content similarity of taxa in the same rank are not homogenous, and that these values show extensive overlap between ranks. Functional coherence along the 16S rRNA gene-based phylogeny was limited to 44 particular nodes presenting large variations in phylogenetic depth. For instance, the deep subtree affiliated to class Actinobacteria presented functional coherence, while the shallower family Enterobacteriaceae-affiliated subtree did not. On the other hand, functional coherence along the genome-based phylogeny delimited deep subtrees affiliated to phyla Actinobacteriota, Deinococcota, Chloroflexota, Firmicutes, and a subtree containing the rest of the bacterial phyla. The results presented here can be used to guide the exploration of results in many microbial ecology and evolution research scenarios. Moreover, we provide dedicated scripts and files that can be used to continue the exploration of functional coherence along the bacterial phylogeny employing different parameters or input data (https://git.io/Jec5U).


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248418
Author(s):  
Stephanie Demo ◽  
Andrew Kapinos ◽  
Aaron Bernardino ◽  
Kristina Guardino ◽  
Blake Hobbs ◽  
...  

Bacteriophages (phages) exhibit high genetic diversity, and the mosaic nature of the shared genetic pool makes quantifying phage relatedness a shifting target. Early parameters for clustering of related Mycobacteria and Arthrobacter phage genomes relied on nucleotide identity thresholds but, more recently, clustering of Gordonia and Microbacterium phages has been performed according to shared gene content. Singleton phages lack the nucleotide identity and/or shared gene content required for clustering newly sequenced genomes with known phages. Whole genome metrics of novel Arthrobacter phage BlueFeather, originally designated a putative singleton, showed low nucleotide identity but high amino acid and gene content similarity with Arthrobacter phages originally assigned to Clusters FE and FI. Gene content similarity revealed that BlueFeather shared genes with these phages in excess of the parameter for clustering Gordonia and Microbacterium phages. Single gene analyses revealed evidence of horizontal gene transfer between BlueFeather and phages in unique clusters that infect a variety of bacterial hosts. Our findings highlight the advantage of using shared gene content to study seemingly genetically isolated phages and have resulted in the reclustering of BlueFeather, a putative singleton, as well as former Cluster FI phages, into a newly expanded Cluster FE.


Sign in / Sign up

Export Citation Format

Share Document