Page-Level Main Content Extraction From Heterogeneous Webpages

2021 ◽  
Vol 15 (6) ◽  
pp. 1-105
Author(s):  
Julián Alarte ◽  
Josep Silva

The main content of a webpage is often surrounded by other boilerplate elements related to the template, such as menus, advertisements, copyright notices, and comments. For crawlers and indexers, isolating the main content from the template and other noisy information is an essential task, because processing and storing noisy information produce a waste of resources such as bandwidth, storage space, and computing time. Besides, the detection and extraction of the main content is useful in different areas, such as data mining, web summarization, and content adaptation to low resolutions. This work introduces a new technique for main content extraction. In contrast to most techniques, this technique not only extracts text, but also other types of content, such as images, and animations. It is a Document Object Model-based page-level technique, thus it only needs to load one single webpage to extract the main content. As a consequence, it is efficient enough as to be used online (in real-time). We have empirically evaluated the technique using a suite of real heterogeneous benchmarks producing very good results compared with other well-known content extraction techniques.

Author(s):  
Jean Claude Turiho ◽  
◽  
Wilson Cheruiyot ◽  
Anne Kibe ◽  
Irénée Mungwarakarama ◽  
...  

2021 ◽  
pp. 073490412199344
Author(s):  
Wolfram Jahn ◽  
Frane Sazunic ◽  
Carlos Sing-Long

Synthesising data from fire scenarios using fire simulations requires iterative running of these simulations. For real-time synthesising, faster-than-real-time simulations are thus necessary. In this article, different model types are assessed according to their complexity to determine the trade-off between the accuracy of the output and the required computing time. A threshold grid size for real-time computational fluid dynamic simulations is identified, and the implications of simplifying existing field fire models by turning off sub-models are assessed. In addition, a temperature correction for two zone models based on the conservation of energy of the hot layer is introduced, to account for spatial variations of temperature in the near field of the fire. The main conclusions are that real-time fire simulations with spatial resolution are possible and that it is not necessary to solve all fine-scale physics to reproduce temperature measurements accurately. There remains, however, a gap in performance between computational fluid dynamic models and zone models that must be explored to achieve faster-than-real-time fire simulations.


Author(s):  
Wenqiang Chen ◽  
Lin Chen ◽  
Meiyi Ma ◽  
Farshid Salemi Parizi ◽  
Shwetak Patel ◽  
...  

Wearable devices, such as smartwatches and head-mounted devices (HMD), demand new input devices for a natural, subtle, and easy-to-use way to input commands and text. In this paper, we propose and investigate ViFin, a new technique for input commands and text entry, which harness finger movement induced vibration to track continuous micro finger-level writing with a commodity smartwatch. Inspired by the recurrent neural aligner and transfer learning, ViFin recognizes continuous finger writing, works across different users, and achieves an accuracy of 90% and 91% for recognizing numbers and letters, respectively. We quantify our approach's accuracy through real-time system experiments in different arm positions, writing speeds, and smartwatch position displacements. Finally, a real-time writing system and two user studies on real-world tasks are implemented and assessed.


Data ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Ahmed Elmogy ◽  
Hamada Rizk ◽  
Amany M. Sarhan

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.


2021 ◽  
Vol 13 (10) ◽  
pp. 1884
Author(s):  
Jingjing Hu ◽  
Yansong Bao ◽  
Jian Liu ◽  
Hui Liu ◽  
George P. Petropoulos ◽  
...  

The acquisition of real-time temperature and relative humidity (RH) profiles in the Arctic is of great significance for the study of the Arctic’s climate and Arctic scientific research. However, the operational algorithm of Fengyun-3D only takes into account areas within 60°N, the innovation of this work is that a new technique based on Neural Network (NN) algorithm was proposed, which can retrieve these parameters in real time from the Fengyun-3D Hyperspectral Infrared Radiation Atmospheric Sounding (HIRAS) observations in the Arctic region. Considering the difficulty of obtaining a large amount of actual observation (such as radiosonde) in the Arctic region, collocated ERA5 data from European Centre for Medium-Range Weather Forecasts (ECMWF) and HIRAS observations were used to train the neural networks (NNs). Brightness temperature and training targets were classified using two variables: season (warm season and cold season) and surface type (ocean and land). NNs-based retrievals were compared with ERA5 data and radiosonde observations (RAOBs) independent of the NN training sets. Results showed that (1) the NNs retrievals accuracy is generally higher on warm season and ocean; (2) the root-mean-square error (RMSE) of retrieved profiles is generally slightly higher in the RAOB comparisons than in the ERA5 comparisons, but the variation trend of errors with height is consistent; (3) the retrieved profiles by the NN method are closer to ERA5, comparing with the AIRS products. All the results demonstrated the potential value in time and space of NN algorithm in retrieving temperature and relative humidity profiles of the Arctic region from HIRAS observations under clear-sky conditions. As such, the proposed NN algorithm provides a valuable pathway for retrieving reliably temperature and RH profiles from HIRAS observations in the Arctic region, providing information of practical value in a wide spectrum of practical applications and research investigations alike.All in all, our work has important implications in broadening Fengyun-3D’s operational implementation range from within 60°N to the Arctic region.


2020 ◽  
Vol 12 (11) ◽  
pp. 1747 ◽  
Author(s):  
Yin Zhang ◽  
Qiping Zhang ◽  
Yongchao Zhang ◽  
Jifang Pei ◽  
Yulin Huang ◽  
...  

Deconvolution methods can be used to improve the azimuth resolution in airborne radar imaging. Due to the sparsity of targets in airborne radar imaging, an L 1 regularization problem usually needs to be solved. Recently, the Split Bregman algorithm (SBA) has been widely used to solve L 1 regularization problems. However, due to the high computational complexity of matrix inversion, the efficiency of the traditional SBA is low, which seriously restricts its real-time performance in airborne radar imaging. To overcome this disadvantage, a fast split Bregman algorithm (FSBA) is proposed in this paper to achieve real-time imaging with an airborne radar. Firstly, under the regularization framework, the problem of azimuth resolution improvement can be converted into an L 1 regularization problem. Then, the L 1 regularization problem can be solved with the proposed FSBA. By utilizing the low displacement rank features of Toeplitz matrix, the proposed FSBA is able to realize fast matrix inversion by using a Gohberg–Semencul (GS) representation. Through simulated and real data processing experiments, we prove that the proposed FSBA significantly improves the resolution, compared with the Wiener filtering (WF), truncated singular value decomposition (TSVD), Tikhonov regularization (REGU), Richardson–Lucy (RL), iterative adaptive approach (IAA) algorithms. The computational advantage of FSBA increases with the increase of echo dimension. Its computational efficiency is 51 times and 77 times of the traditional SBA, respectively, for echoes with dimensions of 218 × 400 and 400 × 400 , optimizing both the image quality and computing time. In addition, for a specific hardware platform, the proposed FSBA can process echo of greater dimensions than traditional SBA. Furthermore, the proposed FSBA causes little performance degradation, when compared with the traditional SBA.


2014 ◽  
Vol 556-562 ◽  
pp. 2940-2943
Author(s):  
Wei Dai ◽  
Gang Xie ◽  
Bai Qin Zhao

The gas alarms based on sensors are widely used, but there are still some limitations. By design a gas real time monitoring system, the gas alarms within a certain range transport collected data to a remote computer via wired or wireless method. Remote computer can monitoring the gas environment by receiving and monitoring alarms’ collected data. In order to ensure system’s reliability, transmission mechanism based on handshake is used. Data compression technology also included to reduce the storage space required by data store.


2013 ◽  
Vol 14 (1) ◽  
pp. 156 ◽  
Author(s):  
David Mayerich ◽  
Michael Walsh ◽  
Matthew Schulmerich ◽  
Rohit Bhargava

Sign in / Sign up

Export Citation Format

Share Document