Hot News Recommendation System across Heterogonous Websites Using Hadoop

2014 ◽  
Vol 989-994 ◽  
pp. 4704-4707
Author(s):  
Sheng Wu Xu ◽  
Zheng You Xia

The current most news recommendations are suitable for news which comes from a single news website, not for news from different news websites. Little research work has been reported on utilizing hundreds of news websites to provide top hot news services for group customers (e.g. Government staffs). In this paper, we present hot news recommendation system based on Hadoop, which is from hundreds of different news websites. We discuss our news recommendation system architecture based on Hadoop.We conclude that Hadoop is an excellent tool for web big data analytics and scales well with increasing data set size and the number of nodes in the cluster. Experimental results demonstrate the reliability and effectiveness of our method.

2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Zhengyou Xia ◽  
Shengwu Xu ◽  
Ningzhong Liu ◽  
Zhengkang Zhao

The most current news recommendations are suitable for news which comes from a single news website, not for news from different heterogeneous news websites. Previous researches about news recommender systems based on different strategies have been proposed to provide news personalization services for online news readers. However, little research work has been reported on utilizing hundreds of heterogeneous news websites to provide top hot news services for group customers (e.g., government staffs). In this paper, we propose a hot news recommendation model based on Bayesian model, which is from hundreds of different news websites. In the model, we determine whether the news is hot news by calculating the joint probability of the news. We evaluate and compare our proposed recommendation model with the results of human experts on the real data sets. Experimental results demonstrate the reliability and effectiveness of our method. We also implement this model in hot news recommendation system of Hangzhou city government in year 2013, which achieves very good results.


Author(s):  
Ponugupati Narendra Mohan Et.al

Man In recent day’s occurrence of a global crisis in Environmental (Emission of pollutants) and in Health (Pandemic COVID-19) created a recession in all sectors. The innovations in technology lead to heavy competition in global market forcing to develop new variants especially in the automobile sector. This creates more turbulence in demand at the production of new models, maintenance of existing models that are obsolete while implementation of Bharat Standard automobile regulatory authority BS-VI of India. In this research work developed a novel model of value analysis is integrated by multi-objective function with multi-criteria decision-making analysis by incorporating the big data analytics with green supply chain management to bridge the gap in demand to an Indian manufacturing sector using a firm-level data set using matrix chain multiplication dynamic programming algorithm and the computational results illustrates that the algorithm proposed is effective.


Author(s):  
Irfan Ali Kandhro ◽  
Sahar Zafar Jumani ◽  
Kamlash Kumar ◽  
Abdul Hafeez ◽  
Fayyaz Ali

This paper presents the automated tool for the classification of text with respect to predefined categories. It has always been considered as a vital method to manage and process a huge number of documents in digital forms which are widespread and continuously increasing. Most of the research work in text classification has been done in Urdu, English and other languages. But limited research work has been carried out on roman data. Technically, the process of the text classification follows two steps: the first step consists of choosing the main features from all the available features of the text documents with the usage of feature extraction techniques. The second step applies classification algorithms on those chosen features. The data set is collected through scraping tools from the most popular news websites Awaji Awaze and Daily Jhoongar. Furthermore, the data set splits in training and testing 70%, 30%, respectively. In this paper, the deep learning models, such as RNN, LSTM, and CNN, are used for classification of roman Urdu headline news. The testing accuracy of RNN (81%), LSTM (82%), and CNN (79%), and the experimental results demonstrate that the performance of the LSTM method is state-of-art method compared to CNN and RNN.


2019 ◽  
Vol 8 (4) ◽  
pp. 7356-7360

Data Analytics is a scientific as well as an engineering tool used to investigate the raw data to revamp the information to achieve knowledge. This is normally connected with obtaining knowledge from reliable information source and rapidity in information processing, and future prediction of the data analysis. Big Data analytics is strongly evolving with different features of volume, velocity and Vectors. Most of the organizations are now concentrating on analyzing information or raw data that are fascinated in deploying analytics to survive forthcoming issues and challenges. The prediction model or intelligent model is proposed in this research to apply machine learning algorithms in the data set. Then it is interpreted and to analyze the better forecast value of the study. The major objective of this research work is to find the optimum prediction from the medical data set using the machine learning techniques.


2021 ◽  
Author(s):  
Maryam Bagheri ◽  
Shahram Jamali ◽  
Reza Fotohi

Abstract Nowadays with the development of technology and access to the Internet everywhere for everyone, the interest to get the news from newspapers and other traditional media is decreasing. Therefore, the popularity of news websites is ascending as the newspapers are changing into electronic versions. News websites can be accessed from anywhere, i.e., any country, city, region, etc. So, the need to present the news depends on where the reader is from can be a research area, as with facing with variety of news topics on websites readers prefer to choose those which more often show the news, they are interested in on their home pages. Based on this idea we represent the technique to find favorite topics of Twitter users of certain geographical districts to provide news websites a way of increasing popularity. In this work we processed tweets. It seems that tweets are some small data, but we found out that processing this small data needs a lot of time, due to the repetition of the algorithm a lot and many searches to be done. Therefore, we categorized our work as big data. To help this problem we developed our work in the Spark framework. Our technique includes 2 phases; Feature Extraction Phase and Topic Discovery Phase. Our analysis shows that with this technique we can get the accuracy between 68% and 76%, in 3 developments 3-fold, 5-fold, and 10-fold.


2016 ◽  
Vol 3 (2) ◽  
pp. 82-100 ◽  
Author(s):  
Arushi Jain ◽  
Vishal Bhatnagar

Movies have been a great source of entertainment for the people ever since their inception in the late 18th century. The term movie is very broad and its definition contains language and genres such as drama, comedy, science fiction and action. The data about movies over the years is very vast and to analyze it, there is a need to break away from the traditional analytics techniques and adopt big data analytics. In this paper the authors have taken the data set on movies and analyzed it against various queries to uncover real nuggets from the dataset for effective recommendation system and ratings for the upcoming movies.


2020 ◽  
Vol 13 (3) ◽  
pp. 381-393
Author(s):  
Farhana Fayaz ◽  
Gobind Lal Pahuja

Background:The Static VAR Compensator (SVC) has the capability of improving reliability, operation and control of the transmission system thereby improving the dynamic performance of power system. SVC is a widely used shunt FACTS device, which is an important tool for the reactive power compensation in high voltage AC transmission systems. The transmission lines compensated with the SVC may experience faults and hence need a protection system against the damage caused by these faults as well as provide the uninterrupted supply of power.Methods:The research work reported in the paper is a successful attempt to reduce the time to detect faults on a SVC-compensated transmission line to less than quarter of a cycle. The relay algorithm involves two ANNs, one for detection and the other for classification of faults, including the identification of the faulted phase/phases. RMS (Root Mean Square) values of line voltages and ratios of sequence components of line currents are used as inputs to the ANNs. Extensive training and testing of the two ANNs have been carried out using the data generated by simulating an SVC-compensated transmission line in PSCAD at a signal sampling frequency of 1 kHz. Back-propagation method has been used for the training and testing. Also the criticality analysis of the existing relay and the modified relay has been done using three fault tree importance measures i.e., Fussell-Vesely (FV) Importance, Risk Achievement Worth (RAW) and Risk Reduction Worth (RRW).Results:It is found that the relay detects any type of fault occurring anywhere on the line with 100% accuracy within a short time of 4 ms. It also classifies the type of the fault and indicates the faulted phase or phases, as the case may be, with 100% accuracy within 15 ms, that is well before a circuit breaker can clear the fault. As demonstrated, fault detection and classification by the use of ANNs is reliable and accurate when a large data set is available for training. The results from the criticality analysis show that the criticality ranking varies in both the designs (existing relay and the existing modified relay) and the ranking of the improved measurement system in the modified relay changes from 2 to 4.Conclusion:A relaying algorithm is proposed for the protection of transmission line compensated with Static Var Compensator (SVC) and criticality ranking of different failure modes of a digital relay is carried out. The proposed scheme has significant advantages over more traditional relaying algorithms. It is suitable for high resistance faults and is not affected by the inception angle nor by the location of fault.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2532
Author(s):  
Encarna Quesada ◽  
Juan J. Cuadrado-Gallego ◽  
Miguel Ángel Patricio ◽  
Luis Usero

Anomaly Detection research is focused on the development and application of methods that allow for the identification of data that are different enough—compared with the rest of the data set that is being analyzed—and considered anomalies (or, as they are more commonly called, outliers). These values mainly originate from two sources: they may be errors introduced during the collection or handling of the data, or they can be correct, but very different from the rest of the values. It is essential to correctly identify each type as, in the first case, they must be removed from the data set but, in the second case, they must be carefully analyzed and taken into account. The correct selection and use of the model to be applied to a specific problem is fundamental for the success of the anomaly detection study and, in many cases, the use of only one model cannot provide sufficient results, which can be only reached by using a mixture model resulting from the integration of existing and/or ad hoc-developed models. This is the kind of model that is developed and applied to solve the problem presented in this paper. This study deals with the definition and application of an anomaly detection model that combines statistical models and a new method defined by the authors, the Local Transilience Outlier Identification Method, in order to improve the identification of outliers in the sensor-obtained values of variables that affect the operations of wind tunnels. The correct detection of outliers for the variables involved in wind tunnel operations is very important for the industrial ventilation systems industry, especially for vertical wind tunnels, which are used as training facilities for indoor skydiving, as the incorrect performance of such devices may put human lives at risk. In consequence, the use of the presented model for outlier detection may have a high impact in this industrial sector. In this research work, a proof-of-concept is carried out using data from a real installation, in order to test the proposed anomaly analysis method and its application to control the correct performance of wind tunnels.


Author(s):  
Jungeui Hong ◽  
Elizabeth A. Cudney ◽  
Genichi Taguchi ◽  
Rajesh Jugulum ◽  
Kioumars Paryani ◽  
...  

The Mahalanobis-Taguchi System is a diagnosis and predictive method for analyzing patterns in multivariate cases. The goal of this study is to compare the ability of the Mahalanobis-Taguchi System and a neural network to discriminate using small data sets. We examine the discriminant ability as a function of data set size using an application area where reliable data is publicly available. The study uses the Wisconsin Breast Cancer study with nine attributes and one class.


Sign in / Sign up

Export Citation Format

Share Document