Query Expansion for Tansliterated Text Retrieval

Author(s):  
Dinesh Kumar Prabhakar ◽  
Sukomal Pal ◽  
Chiranjeev Kumar

With Web 2.0, there has been exponential growth in the number of Web users and the volume of Web content. Most of these users are not only consumers of the information but also generators of it. People express themselves here in colloquial languages, but using Roman script (transliteration). These texts are mostly informal and casual, and therefore seldom follow grammar rules. Also, there does not exist any prescribed set of spelling rules in transliterated text. This freedom leads to large-scale spelling variations, which is a major challenge in mixed script information processing. This article studies different existing phonetic algorithms to handle the issue of spelling variation, points out the limitations of them, and proposes a novel phonetic encoding approach with two different flavors in the light of Hindi transliteration. Experiments performed over Hindi song lyrics retrieval in mixed script domain with three different retrieval models show that proposed approaches outperform the existing techniques in a majority of the cases (sometimes statistically significantly) for a number of metrics like nDCG@1, nDCG@5, nDCG@10, MAP, MRR, and Recall.

2008 ◽  
Vol 68 (4) ◽  
pp. 875-883 ◽  
Author(s):  
LH. Sipaúba-Tavares ◽  
AML. Pereira

Large-scale lab culture of Ankistrodesmus gracilis and Diaphanososma birgei were evaluated by studying the biology and biochemical composition of the species and production costs. Ankistrodesmus gracilis presented exponential growth until the 6th day, with approximately 144 x 10(4) cells.mL-1, followed by a sharp decrease to 90 x 10(4) cells.mL-1 (8th day). Algae cells tended to increase again from the 11th day and reached a maximum of 135 x 10(4) cells.mL-1 on the 17th day. D. birgei culture showed exponential growth until the 9th day with 140 x 10² individuals.L-1, and increased again as from the 12th day. Algae A. gracilis and zooplankton D. birgei contain 47 to 70% dry weight protein and over 5% dry weight carbohydrates. The most expensive items in the context of variable costs were labor and electricity. Data suggested that temperature, nutrients, light availability and culture management were determining factors on productivity. Results indicate that NPK (20-5-20) may be used directly as a good alternative for mass cultivation when low costs are taken into account, promoting adequate growth and nutritional value for cultured A. gracilis and D. birgei.


2014 ◽  
Vol 37 (1) ◽  
pp. 20-21 ◽  
Author(s):  
Roy F. Baumeister ◽  
Kathleen D. Vohs ◽  
E. J. Masicampo

AbstractPsychologists debate whether consciousness or unconsciousness is most central to human behavior. Our goal, instead, is to figure out how they work together. Conscious processes are partly produced by unconscious processes, and much information processing occurs outside of awareness. Yet, consciousness has advantages that the unconscious does not. We discuss how consciousness causes behavior, drawing conclusions from large-scale literature reviews.


Author(s):  
Rafael A. Gonzalez ◽  
Alexander Verbraeck ◽  
Ajantha Dahanayake

Coordinating the response of multiple public agencies to a large-scale crisis is a challenge that has been studied predominantly according to the information-processing view. In this paper, the authors extend this view with the notion of emergence giving special attention to information and communication technology (ICT). The extended framework is applied in a case study of crisis response exercises in the public sector. The findings suggest that current practices concentrate on standards and hierarchy, but mutual adjustment and emergent coordination also occur and are susceptible to analysis and equally relevant to understand coordination practices. In addition, ICT can provide information processing capabilities needed for coordination but may also create information processing needs by increasing the volume of data and the interconnectedness of responders. Applying the extended framework improves the understanding of coordination and forms the basis for its future use in designing ICT to support coordination in crisis response and e-government.


Author(s):  
Khadija Ateya Almohsen ◽  
Huda Kadhim Al-Jobori

The increasing usage of e-commerce website has led to the emergence of Recommender System (RS) with the aim of personalizing the web content for each user. One of the successful techniques of RSs is Collaborative Filtering (CF) which makes recommendations for users based on what other like-mind users had preferred. However, as the world enter Big Data era, CF has faced some challenges such as: scalability, sparsity and cold start. Thus, new approaches that overcome the existing problems have been studied such as Singular Value Decomposition (SVD). This chapter surveys the literature of RSs, reviews the current state of RSs with the main concerns surrounding them due to Big Data, investigates thoroughly SVD and provides an implementation to it using Apache Hadoop and Spark. This is intended to validate the applicability of, existing contributions to the field of, SVD-based RSs as well as validated the effectiveness of Hadoop and spark in developing large-scale systems. The results proved the scalability of SVD-based RS and its applicability to Big Data.


2010 ◽  
Vol 6 (4) ◽  
pp. 25-44 ◽  
Author(s):  
Rafael A. Gonzalez ◽  
Alexander Verbraeck ◽  
Ajantha Dahanayake

Coordinating the response of multiple public agencies to a large-scale crisis is a challenge that has been studied predominantly according to the information-processing view. In this paper, the authors extend this view with the notion of emergence giving special attention to information and communication technology (ICT). The extended framework is applied in a case study of crisis response exercises in the public sector. The findings suggest that current practices concentrate on standards and hierarchy, but mutual adjustment and emergent coordination also occur and are susceptible to analysis and equally relevant to understand coordination practices. In addition, ICT can provide information processing capabilities needed for coordination but may also create information processing needs by increasing the volume of data and the interconnectedness of responders. Applying the extended framework improves the understanding of coordination and forms the basis for its future use in designing ICT to support coordination in crisis response and e-government.


1997 ◽  
Vol 20 (4) ◽  
pp. 694-695
Author(s):  
Paul L. Nunez

Well-posed questions about information processing may require physiologically based, quantitative models of large scale neocortical dynamic function. “Synchronization” of this dynamics can be viewed in different contexts of the binding problem.


Author(s):  
Min Pan ◽  
Yue Zhang ◽  
Qiang Zhu ◽  
Bo Sun ◽  
Tingting He ◽  
...  

Abstract Background In order to better help doctors make decision in the clinical setting, research is necessary to connect electronic health record (EHR) with the biomedical literature. Pseudo Relevance Feedback (PRF) is a kind of classical query modification technique that has shown to be effective in many retrieval models and thus suitable for handling terse language and clinical jargons in EHR. Previous work has introduced a set of constraints (axioms) of traditional PRF model. However, in the feedback document, the importance degree of candidate term and the co-occurrence relationship between a candidate term and a query term. Most methods do not consider both of these factors. Intuitively, terms that have higher co-occurrence degree with a query term are more likely to be related to the query topic. Methods In this paper, we incorporate original HAL model into the Rocchio’s model, and propose a new concept of term proximity feedback weight. A HAL-based Rocchio’s model in the query expansion, called HRoc, is proposed. Meanwhile, we design three normalization methods to better incorporate proximity information to query expansion. Finally, we introduce an adaptive parameter to replace the length of sliding window of HAL model, and it can select window size according to document length. Results Based on 2016 TREC Clinical Support medicine dataset, experimental results demonstrate that the proposed HRoc and HRoc_AP models superior to other advanced models, such as PRoc2 and TF-PRF methods on various evaluation metrics. Among them, compared with the Proc2 and TF-PRF models, the MAP of our model is increased by 8.5% and 12.24% respectively, while the F1 score of our model is increased by 7.86% and 9.88% respectively. Conclusions The proposed HRoc model can effectively enhance the precision and the recall rate of Information Retrieval and gets a more precise result than other models. Furthermore, after introducing self-adaptive parameter, the advanced HRoc_AP model uses less hyper-parameters than other models while enjoys an equivalent performance, which greatly improves the efficiency and applicability of the model and thus helps clinicians to retrieve clinical support document effectively.


1985 ◽  
Vol 8 (2) ◽  
pp. 193-219 ◽  
Author(s):  
Arthur R. Jensen

AbstractAlthough the black and white populations in the United States differ, on average, by about one standard deviation (equivalent to 15 IQ points) on current IQ tests, they differ by various amounts on different tests. The present study examines the nature of the highly variable black–white difference across diverse tests and indicates the major systematic source of this between-population variation, namely, Spearman's g. Charles Spearman originally suggested in 1927 that the varying magnitude of the mean difference between black and white populations on a variety of mental tests is directly related to the size of the test's loading on g, the general factor common to all complex tests of mental ability. Eleven large-scale studies, each comprising anywhere from 6 to 13 diverse tests, show a significant and substantial correlation between tests' g loadings and the mean black–white difference (expressed in standard score units) on the various tests. Hence, in accord with Spearman's hypothesis, the average black–white difference on diverse mental tests may be interpreted as chiefly a difference in g, rather than as a difference in the more specific sources of test score variance associated with any particular informational content, scholastic knowledge, specific acquired skill, or type of test. The results of recent chronometric studies of relatively simple cognitive tasks suggest that the g factor is related, at least in part, to the speed and efficiency of certain basic information-processing capacities. The consistent relationship of these processing variables to g and to Spearman's hypothesis suggests the hypothesis that the differences between black and white populations in the rate of information processing may account for a part of the average black–white difference on standard IQ tests and their educational and occupational correlates.


2009 ◽  
Vol 07 (04) ◽  
pp. 811-820 ◽  
Author(s):  
FENG MEI ◽  
YA-FEI YU ◽  
ZHI-MING ZHANG

Large scale quantum information processing requires stable and long-lived quantum memories. Here, using atom-photon entanglement, we propose an experimentally feasible scheme to realize decoherence-free quantum memory with atomic ensembles, and show one of its applications, remote transfer of unknown quantum state, based on laser manipulation of atomic ensembles, photonic state operation through optical elements, and single-photon detection with moderate efficiency. The scheme, with inherent fault-tolerance to the practical noise and imperfections, allows one to retrieve the information in the memory for further quantum information processing within the reach of current technology.


Sign in / Sign up

Export Citation Format

Share Document