scholarly journals CodeMatcher: Searching Code Based on Sequential Semantics of Important Query Words

2022 ◽  
Vol 31 (1) ◽  
pp. 1-37
Author(s):  
Chao Liu ◽  
Xin Xia ◽  
David Lo ◽  
Zhiwe Liu ◽  
Ahmed E. Hassan ◽  
...  

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR)-based models for code search, but they fail to connect the semantic gap between query and code. An early successful deep learning (DL)-based model DeepCS solved this issue by learning the relationship between pairs of code methods and corresponding natural language descriptions. Two major advantages of DeepCS are the capability of understanding irrelevant/noisy keywords and capturing sequential relationships between words in query and code. In this article, we proposed an IR-based model CodeMatcher that inherits the advantages of DeepCS (i.e., the capability of understanding the sequential semantics in important query words), while it can leverage the indexing technique in the IR-based model to accelerate the search response time substantially. CodeMatcher first collects metadata for query words to identify irrelevant/noisy ones, then iteratively performs fuzzy search with important query words on the codebase that is indexed by the Elasticsearch tool and finally reranks a set of returned candidate code according to how the tokens in the candidate code snippet sequentially matched the important words in a query. We verified its effectiveness on a large-scale codebase with ~41K repositories. Experimental results showed that CodeMatcher achieves an MRR (a widely used accuracy measure for code search) of 0.60, outperforming DeepCS, CodeHow, and UNIF by 82%, 62%, and 46%, respectively. Our proposed model is over 1.2K times faster than DeepCS. Moreover, CodeMatcher outperforms two existing online search engines (GitHub and Google search) by 46% and 33%, respectively, in terms of MRR. We also observed that: fusing the advantages of IR-based and DL-based models is promising; improving the quality of method naming helps code search, since method name plays an important role in connecting query and code.

Author(s):  
A. V. Ponomarev

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems). 


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Xiang Yin ◽  
Zhiyi Meng ◽  
Xin Yi ◽  
Yong Wang ◽  
Xia Hua

AbstractChina has made great efforts to alleviate poverty in rural ethnic minority areas and targeted achieving the poverty-alleviation task by the end of 2020. Aba, Ganzi, and Liangshan, three of the poorest ethnic prefectures in Sichuan Province, Southwest China, have all implemented “Internet+” tactics since 2013, which have had the positive effect of increasing family revenues by improving communication infrastructure and encouraging the large-scale use of e-commerce. This paper aims to comprehensively investigate whether “Internet+” tactics play a key role in poverty alleviation in Sichuan’s rural ethnic minority areas and to propose further measures to enhance the efficiency of e-commerce practice. To this end, we conduct an analysis using the framework of classic growth theory and use panel data from 2000 to 2018 to examine the relationship between Communication Infrastructure Investment (CII) and a set of poverty-alleviation indicators, including local GDP growth rate (LGGR), local government revenue (LGR), and per-capita income of residents (PCIR). The results indicate that strengthening CII improves the PCIR and local economic growth, playing a key role in poverty alleviation. However, the stimulation of CII on LGGR and LGR wanes as time passes. More financial and technical actions will be needed to improve the efficiency and quality of current strategies for sustainable development in those areas.


Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 79 ◽  
Author(s):  
Xiaoyu Han ◽  
Yue Zhang ◽  
Wenkai Zhang ◽  
Tinglei Huang

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.


2020 ◽  
Vol 7 ◽  
pp. 233339362093002
Author(s):  
Susanne Winther ◽  
Mia Fredens ◽  
Marie Brund Hansen ◽  
Kirstine Skov Benthien ◽  
Camilla Palmhøj Nielsen ◽  
...  

Proactive Health Support (PaHS) is a large-scale intervention in Denmark carried out by registered nurses (RNs) who provide self-management support to people at risk of hospital admission to enhance their health, coping, and quality of life. PaHS is initiated with a face-to-face session followed by telephone conversations. We aimed to explore the start-up sessions, including if and how the relationship between participants and RNs developed at the onset of PaHS. We used an ethnographic design including observations and informal interviews. Data were analyzed using a phenomenological–hermeneutical approach. The study showed that contexts such as hospitals and RNs legitimized the intervention. Face-to-face communication contributed to credibility, just as the same RN throughout the intervention ensured continuity. We conclude that start-up sessions before telephone-based self-management support enable a trust-based relationship between participants and RNs. Continuous contact with the same RNs throughout the session promoted participation in the intervention.


2019 ◽  
Vol 50 (1) ◽  
pp. 161-169 ◽  
Author(s):  
Haruka Kasamatsu ◽  
Akiko Tsuchida ◽  
Kenta Matsumura ◽  
Moeko Shimao ◽  
Kei Hamazaki ◽  
...  

AbstractBackgroundPostpartum depression is a major mental health issue. It not only adversely affects the mother's quality of life, but also mother-infant bonding. However, the relationship between postpartum depression (at multiple points after childbirth) and mother-infant bonding failure one year after birth is not well understood. This study investigates the relationship between postpartum depression at 1-month and 6-month after birth and mother-infant bonding failure at 1 year after birth with a large cohort.MethodsData from 83 109 mothers from the Japan Environment and Children's Study were analyzed. Mother-infant bonding 1-year after delivery was assessed using the Mother-to-Infant Bonding Scale Japanese version (MIBS-J). Postpartum depression was measured using the Edinburgh Postnatal Depression Scale (EPDS) at 1-month and 6-month after delivery. Twenty covariates during pregnancy and one month after delivery were controlled for deriving the odds ratios (ORs) describing postpartum depression to mother-infant bonding.ResultsEPDS Total Score crude ORs and adjusted ORs against the MIBS-J Total Score at 1-month and 6-month after delivery were calculated. Crude ORs were 1.111 (95% CI 1.110–1.112) and 1.122 (95% CI 1.121–1.124) respectively. In the fully adjusted model, ORs were 1.088 (95% CI 1.086–1.089) and 1.085 (95% CI 1.083–1.087), respectively.ConclusionsThis study demonstrated prospectively, in a large-scale cohort, that depression at multiple postpartum points, including associations with each EPDS and MIBS-J factors, may be a robust predictor of mother-infant bonding failure 1-year after birth.


Author(s):  
Deepak Babu Sam ◽  
Neeraj N Sajjan ◽  
Himanshu Maurya ◽  
R. Venkatesh Babu

We present an unsupervised learning method for dense crowd count estimation. Marred by large variability in appearance of people and extreme overlap in crowds, enumerating people proves to be a difficult task even for humans. This implies creating large-scale annotated crowd data is expensive and directly takes a toll on the performance of existing CNN based counting models on account of small datasets. Motivated by these challenges, we develop Grid Winner-Take-All (GWTA) autoencoder to learn several layers of useful filters from unlabeled crowd images. Our GWTA approach divides a convolution layer spatially into a grid of cells. Within each cell, only the maximally activated neuron is allowed to update the filter. Almost 99.9% of the parameters of the proposed model are trained without any labeled data while the rest 0.1% are tuned with supervision. The model achieves superior results compared to other unsupervised methods and stays reasonably close to the accuracy of supervised baseline. Furthermore, we present comparisons and analyses regarding the quality of learned features across various models.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Jin-xiao Li ◽  
Qian Yan ◽  
Na Liu ◽  
Wen-jiang Zheng ◽  
Man Hu ◽  
...  

Objective. At present, the relationship between autophagosomes and the prognosis of various cancers has become a subject of active investigation. A series of studies have demonstrated the correlation between autophagy microtubule-associated protein light chain 3 (LC-3), Beclin-1, and colorectal cancer (CRC). Since autophagy has dual regulatory roles in tumors, the results of this correlation are also uncertain. Hence, we summarized the relationship between Beclin-1, LC-3, and CRC using systematic reviews and meta-analysis to clarify their prognostic significance in it. Methods. PubMed, EMBASE, Cochrane Library, and Web of Science databases were searched online up to April 1, 2019. The quality of the involving studies was assessed against the Newcastle-Ottawa Scale (NOS). Pooled hazard ratio (HR) and 95% confidence interval (CI) in a fixed or random effects model were used to assess the strength of correlation between Beclin-1, LC-3, and CRC. Results. A total of 9 articles were collected, involving 2,297 patients. Most literatures scored more than 6 points, suggesting that the quality of our including research was acceptable. Our finding suggested that the expression of Beclin-1 was not associated with overall survival (HR = 0.68, 95% CI (0.31–1.52), P=0.351). Nonetheless, LC-3 expression exerted significant impact on OS (HR = 0.51, 95% CI (0.35–0.74), P<0.05). Subgroup analysis exhibited that Beclin-1 expression was associated with OS at TNM stage III (HR = 0.04, 95% CI = 0.02–0.08, P<0.05), surgical treatment (HR = 1.53, 95% CI (1.15–2.02), P=0.003), and comprehensive treatment (HR = 0.27 95% CI (0.08–0.92), P=0.036), respectively. Similarly, the results showed the increased LC-3 expression in CRC was related to OS in multivariate analyses (HR = 0.44, 95% CI (0.34–0.57), P<0.05), stages (HR = 0.51, 95% CI (0.35–0.74), P<0.05), and comprehensive treatment (HR = 0.44, 95% CI (0.34–0.57), P<0.05). Conclusions. Autophagy-related proteins of LC-3 might be an important marker of CRC progression. However, since the number of the original studies was limited, more well-designed, large-scale, high-quality studies are warranted to provide more convincing and reliable information.


2014 ◽  
Vol 18 (03) ◽  
pp. 1440001
Author(s):  
ALKA ASHWINI NAND ◽  
PRAKASH J. SINGH ◽  
ANANYA BHATTACHARYA

Organisations lack clear guidance on how they can become more innovative at the operational level. The operations strategy literature shows that organisations compete on four generic capabilities: cost efficiency, quality of products or services, speed of delivery, and flexibility of operations. Should organisations choose between these capabilities, i.e., engage in trading-off these capabilities and focussing on one capability ("trade-off" model), or combine them, thereby competing on multiple capabilities simultaneously ("cumulative capabilities" model), remains an unresolved issue. Our paper addresses this by empirically testing the relationship between the four operations capabilities and innovation performance through a large-scale global study of manufacturing plants. Our results show support for the cumulative capabilities model and not the trade-off model. Furthermore, both delivery and flexibility capabilities are comparatively stronger predictors of innovativeness than cost efficiency and quality capabilities. This study provides interesting insights for practitioners and managers in generating clearer guidelines as to what organisations need to do with their key operational capabilities, in order to become more innovative.


2021 ◽  
Vol 12 ◽  
Author(s):  
Vera Békés ◽  
Katie Aafjes-van Doorn ◽  
Xiaochen Luo ◽  
Tracy A. Prout ◽  
Leon Hoffman

Therapists’ forced transition to provide psychotherapy remotely during the COVID-19 pandemic offers a unique opportunity to examine therapists’ views and challenges with online therapy. This study aimed to investigate the main challenges experienced by therapists during the transition from in-person to online therapy at the beginning of the pandemic and 3 months later, and the association between these challenges and therapists’ perception of the quality of the relationship with their online patients, and therapists’ attitudes and views about online therapy and its efficacy at these two timepoints. As part of a large-scale international longitudinal survey, we collected data from 1,257 therapists at two timepoints: at the start of COVID-19, when many therapists switched from providing in-person therapy to online therapy, as well as 3 months later, when they had had the opportunity to adjust to the online therapy format. At both timepoints, therapists reported on perceived challenges, quality of working alliance and real relationship, attitudes toward online therapy, and their views on online therapy’s efficacy compared to in-person therapy. Factor analysis of individual survey items at both timepoints identified four different types of challenges among this therapist sample: Emotional connection (feeling connected with patients, reading emotions, express or feel empathy), Distraction during sessions (therapist or patient), Patients’ privacy (private space, confidentiality), and Therapists’ boundaries (professional space, boundary setting). Older and more experienced therapists perceived fewer challenges in their online sessions. At baseline, all four types of challenges were associated with lower perceived quality of the therapeutic relationship (working alliance and real relationship), and more negative attitudes toward online therapy and its efficacy. After 3 months, perceived challenges with three domains – Emotional connection, Patients’ privacy, and Therapists’ boundaries significantly decreased – whereas challenges in the fourth domain – Distraction – increased. In our study, therapists’ concerns about being able to connect with patients online appeared to be the most impactful, in that it predicted negative attitudes toward online therapy and its perceived efficacy 3 months later, above and beyond the effect of therapists’ age and clinical experience. Clinical and training implications are discussed.


2021 ◽  
Vol 72 (06) ◽  
pp. 601-605
Author(s):  
SNEZHINA ANDONOVA ◽  
ALEKSEY STEFANOV ◽  
IVAN AMUDZHEV

The process of thermo-mechanical fusing (TMF) is one of the major technological processes in the sewing industry. The quality of the sewing article as a whole depends largely on the effective implementation of this process. The good appearance of the finished product and the preservation of the shapes given during the operation of the product depend on the proper choice of the parameters for the TMF. It is therefore important to carry out research to optimize this process. On the other hand, new and different textile materials (TM) with more complex structure and multicomponent composition have appeared in recent years. This determines the different properties of each TM. Therefore, it is extremely important to conduct numerous preliminary studies and analyses to determine the specific effective values for defining the TMF process for a particular type of TM. This is especially important namely for large-scale companies. In the context of the above, it is of particular interest to study the TMF process for an innovative TM (with complex structure and multicomponent composition) registered with a patent for an invention in recent years. The purpose of the present work is to investigate and analyse the nature of the change in temperature between basic and adhesive TM in TMF of innovative /complex in composition and structure/TM. As a result of the performed research and analysis, a method for establishing continuous feedback with the processed textile materials at TMF has been proposed. The nature of the temperature change of the treated innovative TM has been defined. The relationship between the time for conducting the TMF process and the temperature of the pressing plate for the respective innovative TM has been established.


Sign in / Sign up

Export Citation Format

Share Document