scholarly journals Word pattern prediction using Big Data frameworks

2020 ◽  
Vol 12 (1) ◽  
pp. 51-69
Author(s):  
Bence Szabari ◽  
Attila Kiss

AbstractUsing software applications or services, which provide word or even word pattern recommendation service has become part of our lives. Those services appear in many form in our daily basis, just think of our smartphones keyboard, or Google search suggestions and this list can be continued. With the help of these tools, we can not only find the suitable word that fits into our sentence, but we can also express ourselves in a much more nuanced, diverse way. To achieve this kind of recommendation service, we use an algorithm which is capable to recommend word by word pattern queries. Word pattern queries, can be expressed as a combination of words, part-of-speech (POS) tags and wild card words. Since there are a lot of possible patterns and sentences, we use Big Data frameworks to handle this large amount of data. In this paper, we compared two popular framework Hadoop and Spark with the proposed algorithm and recommend some enhancement to gain faster word pattern generation.

Author(s):  
А.И. Сотников

То, с какой скоростью человечество накапливает информацию ежедневно, и непредсказуемость завтрашнего дня показывают, что для прогнозирования временных рядов больших данных уже не хватает традиционных технологий и необходимы новые методы обработки. В связи с этим встает вопрос, какие методы возможно использовать в настоящее время для получения достоверного прогнозирования временных рядов больших данных? The speed at which humanity is accumulating information on a daily basis and the unpredictability of tomorrow show that traditional technologies are no longer enough for forecasting big data time series and new processing methods are needed. In this regard, the question arises, what methods can be used at present to obtain reliable forecasting of time series of big data?


Lupus ◽  
2017 ◽  
Vol 26 (8) ◽  
pp. 886-889 ◽  
Author(s):  
M Radin ◽  
S Sciascia

Objective People affected by chronic rheumatic conditions, such as systemic lupus erythematosus (SLE), frequently rely on the Internet and search engines to look for terms related to their disease and its possible causes, symptoms and treatments. ‘Infodemiology’ and ‘infoveillance’ are two recent terms created to describe a new developing approach for public health, based on Big Data monitoring and data mining. In this study, we aim to investigate trends of Internet research linked to SLE and symptoms associated with the disease, applying a Big Data monitoring approach. Methods We analysed the large amount of data generated by Google Trends, considering ‘lupus’, ‘relapse’ and ‘fatigue’ in a 10-year web-based research. Google Trends automatically normalized data for the overall number of searches, and presented them as relative search volumes, in order to compare variations of different search terms across regions and periods. The Menn–Kendall test was used to evaluate the overall seasonal trend of each search term and possible correlation between search terms. Results We observed a seasonality for Google search volumes for lupus-related terms. In the Northern hemisphere, relative search volumes for ‘lupus’ were correlated with ‘relapse’ (τ = 0.85; p = 0.019) and with fatigue (τ = 0.82; p = 0.003), whereas in the Southern hemisphere we observed a significant correlation between ‘fatigue’ and ‘relapse’ (τ = 0.85; p = 0.018). Similarly, a significant correlation between ‘fatigue’ and ‘relapse’ (τ = 0.70; p < 0.001) was seen also in the Northern hemisphere. Conclusion Despite the intrinsic limitations of this approach, Internet-acquired data might represent a real-time surveillance tool and an alert for healthcare systems in order to plan the most appropriate resources in specific moments with higher disease burden.


Author(s):  
Kamalendu Pal

Global retail business has become diverse and latest Information Technology (IT) advancements have created new possibilities for the management of the deluge of data generated by world-wide business operations of its supply chain. In this business, external data from social media and supplier networks provide a huge influx to augment existing data. This is combined with data from sensors and intelligent machines, commonly known as Internet of Things (IoT) data. This data, originating from the global retail supply chain, is simply known as Big Data - because of its enormous volume, the velocity with which it arrives in the global retail business environment, its veracity to quality related issues, and values it generates for the global supply chain. Many retail products manufacturing companies are trying to find ways to enhance their quality of operational performance while reducing business support costs. They do this primarily by improving defect tracking and better forecasting. These manufacturing and operational improvements along with a favorable customer experience remain crucil to thriving in global competition. In recent years, Big Data and its associated technologies are attracting huge research interest with academics, industry practitioners, and government agencies. Big Data-based software applications are widely used within retail supply chain management - in recommendation, prediction, and decision support systems. The spectacular growth of these software systems has enormous potential for improving the daily performance of retail product and service companies. However, there are increasingly data quality problems resulting in erroneous tesing costs in retail Supply Chain Management (SCM). The heavy investment made in Big Data-based software applications puts increasing pressure on management to justify the quality assurance in these software systems. This chapter discusses about data quality and the dimensions of data quality for Big Data applications. It also examines some of the challenges presented by managing the quality and governance of Big Data, and how those can be balanced with the need of delivery usable Big Data-based software systems. Finally, the chapter highlights the importance of data governance; and it also includes some of the Big Data managerial practice related issues and their justifications for achieving application software quality assurance.


2017 ◽  
Vol 121 (4) ◽  
pp. 726-735 ◽  
Author(s):  
Christine Ma-Kellams ◽  
Brianna Bishop ◽  
Mei Fong Zhang ◽  
Brian Villagrana

To what extent could “Big Data” predict the results of the 2016 U.S. presidential election better than more conventional sources of aggregate measures? To test this idea, the present research used Google search trends versus other forms of state-level data (i.e., both behavioral measures like the incidence of hate crimes, hate groups, and police brutality and implicit measures like Implicit Association Test (IAT) data) to predict each state’s popular vote for the 2016 presidential election. Results demonstrate that, when taken in isolation, zero-order correlations reveal that prevalence of hate groups, prevalence of hate crimes, Google searches for racially charged terms (i.e., related to White supremacy groups, racial slurs, and the Nazi movement), and political conservatism were all significant predictors of popular support for Trump. However, subsequent hierarchical regression analyses show that when these predictors are considered simultaneously, only Google search data for historical White supremacy terms (e.g., “Adolf Hitler”) uniquely predicted election outcomes earlier and beyond political conservatism. Thus, Big Data, in the form of Google search, emerged as a more potent predictor of political behavior than other aggregate measures, including implicit attitudes and behavioral measures of racial bias. Implications for the role of racial bias in the 2016 presidential election in particular and the utility of Google search data more generally are discussed.


Author(s):  
Amir Adel Mabrouk Eldeib, Moulay Ibrahim El- Khalil Ghembaza

The science of diacritical marks is closely related to the Holy Quran, as it was used in the Quran to remove confusion and error from the pronunciation of the reader, so the introduction of any technique in the process of processing Quranic texts will have an effect on facilitating the tasks of researchers in the field of Quranic studies, whether on the reader of the Quran, to help him read accurate and correct recitation, or on the tutor to help him compile a number of examples appropriate for training. The importance of this research lies in employing automated text- processing algorithms to determine the locations of the Nunation vowelization types in the Holy Quran, and the possibility of their computerizing in order to facilitate the accurate recitation of the Holy Quran and, at the same time, to collect training examples in a database or building a corpus for future use in many research and software applications for the Holy Quran and its sciences. This research aims to present a new idea through the proposition of a framework architecture that identifies and discover automatically the locations and types of the Nunation in the Holy Quran based on the part- of- speech tagging algorithm for Arabic language so as to determine the type of words, and then by using a knowledge base to discover the appropriate Nunation words and their locations, and finally discovering the type of Nunation so as to determine the vowelization of the last letter of each Nunation word according to the Quran diacritical marks science. Furthermore, another benefit is to link searching processes with Quranic texts towards extracting the composition Nunation and the sequence Nunations in the Holy Quran emerges from the science of Quran diacritical marks; and display them as data according to a set of options selected by the user through suitable applications interfaces. The basic elements that the results of searching Quranic texts should display are highlighted, in order to extract the positions and types of Nunation vowelizations. As well as, a template for the results of searching all types of Nunation in a specific Quranic Chapter is given, with several possible options to retrieve all data in detail.


2021 ◽  
Author(s):  
A Ponmalar ◽  
V Dhanakoti

Abstract The growing popularity of the internet and network services has resulted in an increase in data in all fields. The data are increasing on the daily basis with high speed. This also creates some daunting issues such as security, storage, and so on. Meanwhile, the detection of intrusion from the big data in the ultra-high-speed environment is a critical task. Several intrusion detection methods are carried out to classify the big data based on intrusion and without intrusion. The optimum accuracy of big data classification, however, has yet to be achieved. Hence we proposed a novel ensemble SVM Model, in which the ensemble SVM is incorporated with the Chaos Game Optimization (CGO) algorithm, which can be exploited to enhance the classification accuracy. Our method also classifies the intrusion based on their types. It also classifies almost nine attacks as, Exploits, DoS, Backdoor, Generic, Worms, Analysis, Fuzzers, Shellcode, Reconnaissance. The experimental analysis is carried on the UNSW-NB15 big data dataset. The performance metrics precision, accuracy, recall, F-score are analyzed and compared with the state-of-art works such as BAMS-OIF, SAD, SMLsmBDA, and BDPM. The outcomes depict that the proposed work outperforms all the other existing works in terms of classification accuracy.


2019 ◽  
Vol 8 (S3) ◽  
pp. 90-93
Author(s):  
K. Rohitha ◽  
V. Bhagyasree ◽  
K. Kusuma ◽  
S. Kokila

Big data analytics plays a major role in today’s industry which insisted to use big data analytics for the analysis of previous data. Patient record keeping is very much important to track the history of the patient. According to the patient previous records, decision is made. Large volumes of data are created on a daily basis and this data is used in decision making process. But, health care industry has not sensed the potential benefits from big data analytics. To address this need, four big data analytics capabilities were identified. In addition to four, five capabilities were proposed which provides practical insights for administrator. On the other way, data security plays a key role in health care industry. In order to overcome this, a new architecture is proposed for the implementation to IOT and process scalable sensor data for health care systems. This paper focuses on data security so that we can make use of potential capabilities and benefits of big data analytics in a better way.


Author(s):  
Tapotosh Ghosh ◽  
Md. Hasan Al Banna ◽  
Md. Jaber Al Nahian ◽  
Kazi Abu Taher ◽  
M Shamim Kaiser ◽  
...  

The novel coronavirus disease (COVID-19) pandemic is provoking a prevalent consequence on mental health because of less interaction among people, economic collapse, negativity, fear of losing jobs, and death of the near and dear ones. To express their mental state, people often are using social media as one of the preferred means. Due to reduced outdoor activities, people are spending more time on social media than usual and expressing their emotion of anxiety, fear, and depression. On a daily basis, about 2.5 quintillion bytes of data are generated on social media, analyzing this big data can become an excellent means to evaluate the effect of COVID-19 on mental health. In this work, we have analyzed data from Twitter microblog (tweets) to find out the effect of COVID-19 on peoples mental health with a special focus on depression. We propose a novel pipeline, based on recurrent neural network (in the form of long-short term memory or LSTM) and convolutional neural network, capable of identifying depressive tweets with an accuracy of 99.42%. Preprocessed using various natural language processing techniques, the aim was to find out depressive emotion from these tweets. Analyzing over 571 thousand tweets posted between October 2019 and May 2020 by 482 users, a significant rise in depressing tweets was observed between February and May of 2020, which indicates as an impact of the long ongoing COVID-19 pandemic situation.


Sign in / Sign up

Export Citation Format

Share Document