candidate set
Recently Published Documents


TOTAL DOCUMENTS

111
(FIVE YEARS 46)

H-INDEX

8
(FIVE YEARS 3)

Author(s):  
Yong Yang ◽  
Young Chun ko

With the rapid development of online e-commerce, traditional collaborative filtering algorithms have the disadvantages of data set reduction and sparse matrix filling cannot meet the requirements of users. This paper takes handicrafts as an example to propose the design and application of handicraft recommendation system based on an improved hybrid algorithm. Based on the theory of e-commerce system, through the traditional collaborative filtering algorithm of users, the personalized e-commerce system of hybrid algorithm is designed and analyzed. The personalized e-commerce system based on hybrid algorithm is further proposed. The component model of the business recommendation system and the specific steps of the improved hybrid algorithm based on user information are given. Finally, an experimental analysis of the improved hybrid algorithm is carried out. The results show that the algorithm can effectively improve the effectiveness and exemption of recommending handicrafts. What’s more, it can reduce the user item ratings of candidate set and improve accuracy of the forecast recommendation.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Yufeng Jia ◽  
Sang-Bing Tsai

With the development of the Internet, the amount of information present on the network has grown rapidly, leading to increased difficulty in obtaining effective information. Especially for individuals, enterprises, and institutions with a large amount of information, it is an almost impossible task to integrate and analyze Internet information with great difficulty just by human resources. Internet hot events mining and analysis technology can effectively solve the above problems by alleviating information overload, integrating redundant information, and refining core information. In this paper, we address the above problems and research hot event topic sentence generation techniques in the field of hot event mining and design a hybrid event candidate set construction algorithm based on topic core word mapping and event triad selection. The algorithm uses the PAT-Tree technique to extract high-frequency core words in topic hotspots and maps the high-frequency words into sentences to generate a part of event core sentences. The other part of event core sentences is extracted from the topic hotspots by making event triples as candidate elements, and sentences containing event elements are extracted from the topic hotspots. The sets of event core sentences generated by the two methods are mixed and filtered and sorted to obtain the candidate set, which can be used to build a word graph-based main service channel (MSC) model. In this paper, we also propose an improved word graph-based MSC model and use it for the extraction of event topic sentences. Based on the above research, a hot event analysis system is implemented. The system analyzes the existing topic data and uses the event topic sentence generation algorithm studied in this paper to generate the titles of hot spots, that is, hot events. At the same time, the topics are displayed from different dimensions, and data visualization is completed. The visualization includes the trend change of event hotness, trend change of event sentiment polarity, and distribution of event article sources.


2021 ◽  
Vol 11 (24) ◽  
pp. 11974
Author(s):  
Shijie Zhang ◽  
Gang Wu

Logs, recording the system runtime information, are frequently used to ensure software system reliability. As the first and foremost step of typical log analysis, many data-driven methods have been proposed for automated log parsing. Most existing log parsers work offline, requiring a time-consuming training progress and retraining as the system upgrades. Meanwhile, the state of the art online log parsers are tree-based, which still have defects in robustness and efficiency. To overcome such limitations, we abandon the tree structure and propose a hash-like method. In this paper, we propose LogPunk, an efficient online log parsing method. The core of LogPunk is a novel log signature method based on log punctuations and length features. According to the signature, we can quickly find a small set of candidate templates. Further, the most suitable template is returned by traversing the candidate set with our log similarity function. We evaluated LogPunk on 16 public datasets from the LogHub comparing with five other log parsers. LogPunk achieves the best parsing accuracy of 91.9%. Evaluation results also demonstrate its superiority in terms of robustness and efficiency.


Author(s):  
Lutz Kämmerer ◽  
Felix Krahmer ◽  
Toni Volkmer

AbstractIn this paper, a sublinear time algorithm is presented for the reconstruction of functions that can be represented by just few out of a potentially large candidate set of Fourier basis functions in high spatial dimensions, a so-called high-dimensional sparse fast Fourier transform. In contrast to many other such algorithms, our method works for arbitrary candidate sets and does not make additional structural assumptions on the candidate set. Our transform significantly improves upon the other approaches available for such a general framework in terms of the scaling of the sample complexity. Our algorithm is based on sampling the function along multiple rank-1 lattices with random generators. Combined with a dimension-incremental approach, our method yields a sparse Fourier transform whose computational complexity only grows mildly in the dimension and can hence be efficiently computed even in high dimensions. Our theoretical analysis establishes that any Fourier s-sparse function can be accurately reconstructed with high probability. This guarantee is complemented by several numerical tests demonstrating the high efficiency and versatile applicability for the exactly sparse case and also for the compressible case.


2021 ◽  
Vol 11 (22) ◽  
pp. 10968
Author(s):  
Jiancheng Yin ◽  
Yuqing Li ◽  
Rixin Wang ◽  
Minqiang Xu

With the complexity of the task requirement, multiple operating conditions have gradually become the common scenario for equipment. However, the degradation trend of monitoring data cannot be accurately extracted in life prediction under multiple operating conditions, which is because some monitoring data is affected by the operating conditions. Aiming at this problem, this paper proposes an improved similarity trajectory method that can directly use the monitoring data under multiple operating conditions for life prediction. The morphological pattern and symbolic aggregate approximation-based similarity measurement method (MP-SAX) is first used to measure the similarity between the monitoring data under multiple operating conditions. Then, the similar life candidate set, and corresponding weight are obtained according to the MP-SAX. Finally, the life prediction results of equipment under multiple operating conditions can be calculated by aggregating the similar life candidate set. The proposed method is validated by the public datasets from NASA Ames Prognostics Data Repository. The results show that the proposed method can directly and effectively use the original monitoring data for life prediction without extracting the degradation trend of the monitoring data.


2021 ◽  
Vol 2078 (1) ◽  
pp. 012031
Author(s):  
Ani Song ◽  
Xiaoxia Jia ◽  
Wei Jiang

Abstract With the development of military intelligence, higher requirements are put forward for automatic term recognition in military field. In view of the characteristics of flexible and diverse naming of military requirement documents without annotated corpus, the method of this paper uses the existing military domain core database, and matches the data set and core database by Aho-Corasic algorithm and word segmentation technology, so that the terms to be recognized in the data set can be divided into three types. The possible rules of word formation of military terms are summarized and phrases that conform to the rules of word formation are found in the documents as the term candidate set. The core library and TF-IDF method are used to calculate the value of the candidate terms, and the candidate terms whose value is greater than the threshold are selected iteratively as the real terms. The experimental results show that the F1 value of this method reaches 0.719, which is better than the traditional C-value method. Therefore, the method proposed in this paper can achieve better automatic term recognition effect for military requirement documents without annotation.


Author(s):  
Anirban Mondal ◽  
Ayaan Kakkar ◽  
Nilesh Padhariya ◽  
Mukesh Mohania

AbstractNext-generation enterprise management systems are beginning to be developed based on the Systems of Engagement (SOE) model. We visualize an SOE as a set of entities. Each entity is modeled by a single parent document with dynamic embedded links (i.e., child documents) that contain multi-modal information about the entity from various networks. Since entities in an SOE are generally queried using keywords, our goal is to efficiently retrieve the top-k entities related to a given keyword-based query by considering the relevance scores of both their parent and child documents. Furthermore, we extend the afore-mentioned problem to incorporate the case where the entities are geo-tagged. The main contributions of this work are three-fold. First, it proposes an efficient bitmap-based approach for quickly identifying the candidate set of entities, whose parent documents contain all queried keywords. A variant of this approach is also proposed to reduce memory consumption by exploiting skews in keyword popularity. Second, it proposes the two-tier HI-tree index, which uses both hashing and inverted indexes, for efficient document relevance score lookups. Third, it proposes an R-tree-based approach to extend the afore-mentioned approaches for the case where the entities are geo-tagged. Fourth, it performs comprehensive experiments with both real and synthetic datasets to demonstrate that our proposed schemes are indeed effective in providing good top-k result recall performance within acceptable query response times.


Webology ◽  
2021 ◽  
Vol 18 (Special Issue 04) ◽  
pp. 752-764
Author(s):  
R. Deeptha

Routing is portrayed as one of the most important prevailing challenges in research with reference to multi-hop networks in a wireless environment. Opportunistic routing (OR) protocol is an emerging area related to research, due to the improvement in communication reliability, compared to the traditional routing models. The major perception related to OR is to determine a group of neighboring node candidates, named as a candidate set using the advantages of broadcast capability of the wireless medium thereby to collaboratively transmit data packets towards the destination using the coordination of the forwarded candidate set. The design and performance of OR protocols over multi-hop wireless networks mainly depend on the processes of forwarding selection of candidates and assignment of priorities. Therefore, the researchers have designed and developed several different algorithms for those OR processes. In this paper, following a short outline on traditional routing and OR protocols, metrics involved in the design of existing OR protocols, classification of OR based protocols, and hurdles in the design of OR protocols over multi-hop wireless networks are examined. More precisely, the OR protocols are divided into two categories, based on the forwarding candidate set selection and forwarding candidate coordination methods. Furthermore, the most significant challenges of OR protocol design, such as prioritization of forwarding candidates, utilizing the cross-layer approach for candidate coordination, and achieving the quality of service also investigated.


2021 ◽  
Author(s):  
Shunxiang Zhang ◽  
Han qing Xu ◽  
Guang li Zhu ◽  
Xiang Chen ◽  
Kuang Ching Li

Abstract New sentiment words in product reviews are valuable resources that are directly close to users. The data processing of new sentiment word extraction can provide information service better for users, and provide theoretical support for the related research of edge computing. Traditional methods for extracting new sentiment words generally ignored the context and syntactic information, which leads to the low accuracy and recall rate in the process of extracting new sentiment words. To tackle the mentioned issue, we proposed a data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Firstly, the probability that the new word is a sentiment word is calculated through the location rules derived from the sequence labeling result, and the candidate set of new sentiment words is obtained according to the probability. Then, the candidate set of new sentiment words is supplemented with the method of matching appositive words based on edit distance. Finally, the final set of new sentiment words is collected through fine-grained filtering, including the calculation of Point Mutual Information (PMI) and difference coefficient of positive and negative corpus (DC-PNC). The experimental results illustrate the effectiveness of new sentiment words extracted by the proposed method which can obviously improve the accuracy and recall rate of sentiment analysis.


Author(s):  
Hankook Lee ◽  
Sungsoo Ahn ◽  
Seung-Woo Seo ◽  
You Young Song ◽  
Eunho Yang ◽  
...  

Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules). In this paper, we propose a new approach that mitigates the issues by reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules. To this end, we design an efficient reactant selection framework, named RetCL (retrosynthesis via contrastive learning), for enumerating all of the candidate molecules based on selection scores computed by graph neural networks. For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining. Extensive experiments demonstrate the benefits of the proposed selection-based approach. For example, when all 671k reactants in the USPTO database are given as candidates, our RetCL achieves top-1 exact match accuracy of 71.3% for the USPTO-50k benchmark, while a recent transformer-based approach achieves 59.6%. We also demonstrate that RetCL generalizes well to unseen templates in various settings in contrast to template-based approaches.


Sign in / Sign up

Export Citation Format

Share Document