content extraction
Recently Published Documents


TOTAL DOCUMENTS

269
(FIVE YEARS 44)

H-INDEX

15
(FIVE YEARS 1)

2021 ◽  
Vol 13 (6) ◽  
pp. 1-13
Author(s):  
Guangxuan Chen ◽  
Guangxiao Chen ◽  
Lei Zhang ◽  
Qiang Liu

In order to solve the problems of repeated acquisition, data redundancy and low efficiency in the process of website forensics, this paper proposes an incremental acquisition method orientecd to dynamic websites. This method realized the incremental collection on dynamically updated websites through acquiring and parsing web pages, URL deduplication, web page denoising, web page content extraction and hashing. Experiments show that the algorithm has relative high acquisition precision and recall rate, and can be combined with other data to perform effective digital forensics on dynamically updated real-time websites.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0258907
Author(s):  
Can Zhao ◽  
Jiabing Liu ◽  
Fuyong Zheng ◽  
Dejun Wang ◽  
Bo Meng

Efficiency and privacy are the key aspects in content extraction signatures. In this study, we proposed a Secure and Efficient and Certificateless Content Extraction Signature with Privacy Protection (SECCESPP) in which scalar multiplication of elliptic curves is used to replace inefficient bilinear pairing of certificateless public key cryptosystem, and the signcryption idea is borrowed to implement privacy protection for signed messages. The correctness of the SECCESPP scheme is demonstrated by the consistency of the message and the accuracy of the equation. The security and privacy of the SECCESPP scheme are demonstrated based on the elliptic curve discrete logarithm problem in the random oracle model and are formally analyzed with the formal analysis tool ProVerif, respectively. Theory and experimental analysis show that the SECCESPP scheme is more efficient than other schemes.


2021 ◽  
Author(s):  
Nouf Alrasheed ◽  
Shivika Prasanna ◽  
Ryan Rowland ◽  
Praveen Rao ◽  
Viviana Grieco ◽  
...  

2021 ◽  
Vol 35 (4) ◽  
pp. 325-330
Author(s):  
Gowrisankar Kalakoti ◽  
Prabakaran G

In today's PC illustration, numerous object locations of videos are quite critical duties to accomplish. Swiftly and reliably recognising and distinguishing the multiple aspects of a video is a crucial attribute for collaborating with one's condition (object). The core issue is that in theory, to ensure that no significant aspect is missing; all aspects of a content in a video must be scanned for elements on various different scales. It requires some investment and effort anyway, to really arrange the substance of a given content region and both time and computational limits that an operator can spend on classification are constrained. Two presumption procedures for accelerating the standard identifier are performed by the proposed method and demonstrate their capability by performing both identification efficiency and velocity. The main enhancement of our group-based classifier focuses on accelerating the grouping of sub features by planning the problem as a selection procedure for consecutive features. The subsequent improvement gives better multiscale features to distinguish objects of all sizes without rescaling the information image from a video. Extracting contents from video is an assortment of successive images with a steady time interim. So video can give more data about contents in it when situations are changing regarding time. Along these lines, physically taking care of contents with features are very unimaginable. In the proposed work, it is suggested that a Group-based Video Content Extraction Classifier (GbCCE) extracts content from a video by extracting relevant features using a group-based classifier. The proposed method is distinct from conventional approaches and the findings indicate that better output is demonstrated by the proposed method.


2021 ◽  
Vol 15 (6) ◽  
pp. 1-105
Author(s):  
Julián Alarte ◽  
Josep Silva

The main content of a webpage is often surrounded by other boilerplate elements related to the template, such as menus, advertisements, copyright notices, and comments. For crawlers and indexers, isolating the main content from the template and other noisy information is an essential task, because processing and storing noisy information produce a waste of resources such as bandwidth, storage space, and computing time. Besides, the detection and extraction of the main content is useful in different areas, such as data mining, web summarization, and content adaptation to low resolutions. This work introduces a new technique for main content extraction. In contrast to most techniques, this technique not only extracts text, but also other types of content, such as images, and animations. It is a Document Object Model-based page-level technique, thus it only needs to load one single webpage to extract the main content. As a consequence, it is efficient enough as to be used online (in real-time). We have empirically evaluated the technique using a suite of real heterogeneous benchmarks producing very good results compared with other well-known content extraction techniques.


Author(s):  
Zhongguo Yang ◽  
Mingzhu Zhang ◽  
Zhongmei Zhang ◽  
Han Li ◽  
Chen Liu ◽  
...  

Information service is always a hot topic especially when the Web is accessible anywhere. In university, lecture information is very important for students and teachers who want to take part in academic meetings. Therefore, lecture news extraction is an important and imperative task. Many open information extraction methods have been proposed, but due to the high heterogeneity of websites, this task is still a challenge. In this paper, we propose a method based on fusing multiple features to locate lecture news on the university website. These features include the linked relationship between parent webpage and child webpages, the visual similarity, and the semantics of webpages. Additionally, this paper provides an information service based on a main content extraction algorithm for extracting the lecture information. Stable and invariant features enable the proposed method to adapt to various kinds of campus websites. The experiments conducted on 50 websites show the effectiveness and efficiency of the provided service.


2021 ◽  
Author(s):  
Heejung Yang ◽  
Beomjun Park ◽  
Jinyoung Park ◽  
Jiho Lee ◽  
Hyeon Seok Jang ◽  
...  

AbstractBiomedical databases grow by more than a thousand new publications every day. The large volume of biomedical literature that is being published at an unprecedented rate hinders the discovery of relevant knowledge from keywords of interest to gather new insights and form hypotheses. A text-mining tool, PubTator, helps to automatically annotate bioentities, such as species, chemicals, genes, and diseases, from PubMed abstracts and full-text articles. However, the manual re-organization and analysis of bioentities is a non-trivial and highly time-consuming task. ChexMix was designed to extract the unique identifiers of bioentities from query results. Herein, ChexMix was used to construct a taxonomic tree with allied species among Korean native plants and to extract the medical subject headings unique identifier of the bioentities, which co-occurred with the keywords in the same literature. ChexMix discovered the allied species related to a keyword of interest and experimentally proved its usefulness for multi-species analysis.


2021 ◽  
Vol 22 (5) ◽  
pp. 2261
Author(s):  
Zhongxin Liang ◽  
Hongrui Liang ◽  
Yizhan Guo ◽  
Dong Yang

Cyanidin 3-O-galactoside (Cy3Gal) is one of the most widespread anthocyanins that positively impacts the health of animals and humans. Since it is available from a wide range of natural sources, such as fruits (apples and berries in particular), substantial studies were performed to investigate its biosynthesis, chemical stability, natural occurrences and content, extraction methods, physiological functions, as well as potential applications. In this review, we focus on presenting the previous studies on the abovementioned aspects of Cy3Gal. As a conclusion, Cy3Gal shares a common biosynthesis pathway and analogous stability with other anthocyanins. Galactosyltransferase utilizing uridine diphosphate galactose (UDP-galactose) and cyanidin as substrates is unique for Cy3Gal biosynthesis. Extraction employing different methods reveals chokeberry as the most practical natural source for mass-production of this compound. The antioxidant properties and other health effects, including anti-inflammatory, anticancer, antidiabetic, anti-toxicity, cardiovascular, and nervous protective capacities, are highlighted in purified Cy3Gal and in its combination with other polyphenols. These unique properties of Cy3Gal are discussed and compared with other anthocyanins with related structure for an in-depth evaluation of its potential value as food additives or health supplement. Emphasis is laid on the description of its physiological functions confirmed via various approaches.


Sign in / Sign up

Export Citation Format

Share Document