Discussion on Data Features and Construction Models of Translation Corpus in the Era of Big Data

With the rapid development of artificial intelligence in the current era of big data, the construction of translation corpus has become a key factor in effectively achieving a highly intelligent translation. In the era of big data, the data sources and data types of translation corpus are becoming more and more diversified, which will inevitably bring about a new revolution in the construction of translation corpus. The construction of the translation corpus in the era of big data can fully rely on third-party open source data, crowd-sourcing translation, machine closed-loop, human-machine collaboration and other multiple modes to comprehensively improve the quality of translation corpus construction to better serve translation practice.

Download Full-text

MATCHING ALTERNATIVE ADDRESSES: A SEMANTIC WEB APPROACH

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-1-w5-63-2015 ◽

2015 ◽

Vol XL-1-W5 ◽

pp. 63-66

Author(s):

S. Ariannamazi ◽

F. Karimipour ◽

F. Hakimpour

Keyword(s):

Semantic Web ◽

Information Integration ◽

Spatial Data ◽

Rapid Development ◽

Crowd Sourcing ◽

Knowledge Modeling ◽

Data Types ◽

Geospatial Information ◽

Multiple Data ◽

Web Standards

Rapid development of crowd-sourcing or volunteered geographic information (VGI) provides opportunities for authoritatives that deal with geospatial information. Heterogeneity of multiple data sources and inconsistency of data types is a key characteristics of VGI datasets. The expansion of cities resulted in the growing number of POIs in the OpenStreetMap, a well-known VGI source, which causes the datasets to outdate in short periods of time. These changes made to spatial and aspatial attributes of features such as names and addresses might cause confusion or ambiguity in the processes that require feature’s literal information like addressing and geocoding. VGI sources neither will conform specific vocabularies nor will remain in a specific schema for a long period of time. As a result, the integration of VGI sources is crucial and inevitable in order to avoid duplication and the waste of resources. Information integration can be used to match features and qualify different annotation alternatives for disambiguation. This study enhances the search capabilities of geospatial tools with applications able to understand user terminology to pursuit an efficient way for finding desired results. Semantic web is a capable tool for developing technologies that deal with lexical and numerical calculations and estimations. There are a vast amount of literal-spatial data representing the capability of linguistic information in knowledge modeling, but these resources need to be harmonized based on Semantic Web standards. The process of making addresses homogenous generates a helpful tool based on spatial data integration and lexical annotation matching and disambiguating.

Download Full-text

Issues in security and privacy of big data

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i12.482 ◽

2018 ◽

Vol 7 (12) ◽

pp. 1

Author(s):

Shaveta Bhatia

Keyword(s):

Cloud Computing ◽

Big Data ◽

Approximate Method ◽

Biomedical Research ◽

Cyber Security ◽

Data Science ◽

Third Party ◽

Security And Privacy ◽

Security Threats ◽

The Third

The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity. It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.

Download Full-text

Chinese Open Source Data Collection, Big Data, And Private Enterprise Work For State Intelligence and Security: The Case of Shenzhen Zhenhua

SSRN Electronic Journal ◽

10.2139/ssrn.3691999 ◽

2020 ◽

Author(s):

Christopher Balding

Keyword(s):

Big Data ◽

Data Collection ◽

Open Source ◽

Private Enterprise ◽

Open Source Data ◽

Source Data

Download Full-text

Construction of a multi-source heterogeneous hybrid platform for big data

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-215138 ◽

2021 ◽

pp. 1-10

Author(s):

Ying Wang ◽

Yiding Liu ◽

Minna Xia

Keyword(s):

Big Data ◽

Data Analysis ◽

Forest Fire ◽

Original Data ◽

Big Data Analysis ◽

Multiple Sources ◽

Data Types ◽

Fire Monitoring ◽

Data Platform

Big data is featured by multiple sources and heterogeneity. Based on the big data platform of Hadoop and spark, a hybrid analysis on forest fire is built in this study. This platform combines the big data analysis and processing technology, and learns from the research results of different technical fields, such as forest fire monitoring. In this system, HDFS of Hadoop is used to store all kinds of data, spark module is used to provide various big data analysis methods, and visualization tools are used to realize the visualization of analysis results, such as Echarts, ArcGIS and unity3d. Finally, an experiment for forest fire point detection is designed so as to corroborate the feasibility and effectiveness, and provide some meaningful guidance for the follow-up research and the establishment of forest fire monitoring and visualized early warning big data platform. However, there are two shortcomings in this experiment: more data types should be selected. At the same time, if the original data can be converted to XML format, the compatibility is better. It is expected that the above problems can be solved in the follow-up research.

Download Full-text

Big Data Analytics for Search Engine Optimization

Big Data and Cognitive Computing ◽

10.3390/bdcc4020005 ◽

2020 ◽

Vol 4 (2) ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Ioannis C. Drivas ◽

Damianos P. Sakas ◽

Georgios A. Giannakopoulos ◽

Daphne Kyriaki-Manessi

Keyword(s):

Big Data ◽

Cultural Heritage ◽

Search Engine ◽

Data Analytics ◽

User Behavior ◽

Big Data Analytics ◽

Data Types ◽

Search Engine Optimization ◽

The Impact ◽

The Web

In the Big Data era, search engine optimization deals with the encapsulation of datasets that are related to website performance in terms of architecture, content curation, and user behavior, with the purpose to convert them into actionable insights and improve visibility and findability on the Web. In this respect, big data analytics expands the opportunities for developing new methodological frameworks that are composed of valid, reliable, and consistent analytics that are practically useful to develop well-informed strategies for organic traffic optimization. In this paper, a novel methodology is implemented in order to increase organic search engine visits based on the impact of multiple SEO factors. In order to achieve this purpose, the authors examined 171 cultural heritage websites and their retrieved data analytics about their performance and user experience inside them. Massive amounts of Web-based collections are included and presented by cultural heritage organizations through their websites. Subsequently, users interact with these collections, producing behavioral analytics in a variety of different data types that come from multiple devices, with high velocity, in large volumes. Nevertheless, prior research efforts indicate that these massive cultural collections are difficult to browse while expressing low visibility and findability in the semantic Web era. Against this backdrop, this paper proposes the computational development of a search engine optimization (SEO) strategy that utilizes the generated big cultural data analytics and improves the visibility of cultural heritage websites. One step further, the statistical results of the study are integrated into a predictive model that is composed of two stages. First, a fuzzy cognitive mapping process is generated as an aggregated macro-level descriptive model. Secondly, a micro-level data-driven agent-based model follows up. The purpose of the model is to predict the most effective combinations of factors that achieve enhanced visibility and organic traffic on cultural heritage organizations’ websites. To this end, the study contributes to the knowledge expansion of researchers and practitioners in the big cultural analytics sector with the purpose to implement potential strategies for greater visibility and findability of cultural collections on the Web.

Download Full-text

Bibliometric analysis and critical review of the research on big data in the construction industry

Engineering Construction & Architectural Management ◽

10.1108/ecam-01-2021-0005 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yusheng Lu ◽

Jiantong Zhang

Keyword(s):

Big Data ◽

Construction Industry ◽

Bibliometric Analysis ◽

Critical Review ◽

Rapid Development ◽

Heterogeneous Data ◽

The Body ◽

Future Research ◽

Temporal And Spatial Distribution ◽

Content Type

PurposeThe digital revolution and the use of big data (BD) in particular has important applications in the construction industry. In construction, massive amounts of heterogeneous data need to be analyzed to improve onsite efficiency. This article presents a systematic review and identifies future research directions, presenting valuable conclusions derived from rigorous bibliometric tools. The results of this study may provide guidelines for construction engineering and global policymaking to change the current low-efficiency of construction sites.Design/methodology/approachThis study identifies research trends from 1,253 peer-reviewed papers, using general statistics, keyword co-occurrence analysis, critical review, and qualitative-bibliometric techniques in two rounds of search.FindingsThe number of studies in this area rapidly increased from 2012 to 2020. A significant number of publications originated in the UK, China, the US, and Australia, and the smallest number from one of these countries is more than twice the largest number in the remaining countries. Keyword co-occurrence is divided into three clusters: BD application scenarios, emerging technology in BD, and BD management. Currently developing approaches in BD analytics include machine learning, data mining, and heuristic-optimization algorithms such as graph convolutional, recurrent neural networks and natural language processes (NLP). Studies have focused on safety management, energy reduction, and cost prediction. Blockchain integrated with BD is a promising means of managing construction contracts.Research limitations/implicationsThe study of BD is in a stage of rapid development, and this bibliometric analysis is only a part of the necessary practical analysis.Practical implicationsNational policies, temporal and spatial distribution, BD flow are interpreted, and the results of this may provide guidelines for policymakers. Overall, this work may develop the body of knowledge, producing a reference point and identifying future development.Originality/valueTo our knowledge, this is the first bibliometric review of BD in the construction industry. This study can also benefit construction practitioners by providing them a focused perspective of BD for emerging practices in the construction industry.

Download Full-text

Big Data Usage Patterns in the Health Care Domain: A Use Case Driven Approach Applied to the Assessment of Vaccination Benefits and Risks

Yearbook of Medical Informatics ◽

10.15265/iy-2014-0016 ◽

2014 ◽

Vol 23 (01) ◽

pp. 27-35 ◽

Cited By ~ 16

Author(s):

S. de Lusignan ◽

S-T. Liaw ◽

C. Kuziemsky ◽

F. Mold ◽

P. Krause ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Real Time ◽

Predictive Analytics ◽

Regulatory Compliance ◽

Use Cases ◽

Data Sources ◽

Use Case ◽

Crowd Sourcing ◽

Big Data Processing

Summary Background: Generally benefits and risks of vaccines can be determined from studies carried out as part of regulatory compliance, followed by surveillance of routine data; however there are some rarer and more long term events that require new methods. Big data generated by increasingly affordable personalised computing, and from pervasive computing devices is rapidly growing and low cost, high volume, cloud computing makes the processing of these data inexpensive. Objective: To describe how big data and related analytical methods might be applied to assess the benefits and risks of vaccines. Method: We reviewed the literature on the use of big data to improve health, applied to generic vaccine use cases, that illustrate benefits and risks of vaccination. We defined a use case as the interaction between a user and an information system to achieve a goal. We used flu vaccination and pre-school childhood immunisation as exemplars. Results: We reviewed three big data use cases relevant to assessing vaccine benefits and risks: (i) Big data processing using crowd-sourcing, distributed big data processing, and predictive analytics, (ii) Data integration from heterogeneous big data sources, e.g. the increasing range of devices in the “internet of things”, and (iii) Real-time monitoring for the direct monitoring of epidemics as well as vaccine effects via social media and other data sources. Conclusions: Big data raises new ethical dilemmas, though its analysis methods can bring complementary real-time capabilities for monitoring epidemics and assessing vaccine benefit-risk balance.

Download Full-text

Study on the Competence Factors of Clothing Store Online Based on Website

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.5644 ◽

2014 ◽

Vol 644-650 ◽

pp. 5644-5647

Author(s):

Kang Shao ◽

Hui Xu ◽

Kun Wang

Keyword(s):

Service Quality ◽

Rapid Development ◽

Third Party ◽

Target Market ◽

Factors Affecting ◽

Online Sales ◽

Logistics Services ◽

Clothing Store ◽

Principal Factors ◽

Store Management

With the rapid development of e-commerce, online store, characterized in less investment and operating flexibility, are favored by more entrepreneurs. This paper analyzes the principal factors, affecting the competitiveness of online clothing shop, based on third-party platform with online sales of conduct, and the intrinsic link between them. And emphasis should be placed on the network store management, target market, products value, service quality, network store promotion, prestige image and logistics services to cultivate and raise the competence of network store so as to get unique competitive edges.

Download Full-text

Research on Big Data Digging of Hot Topics about Recycled Water Use on Micro-Blog Based on Particle Swarm Optimization

Sustainability ◽

10.3390/su10072488 ◽

2018 ◽

Vol 10 (7) ◽

pp. 2488 ◽

Cited By ~ 64

Author(s):

Hanliang Fu ◽

Zhaoxing Li ◽

Zhijian Liu ◽

Zelin Wang

Keyword(s):

Big Data ◽

Particle Swarm Optimization ◽

Key Words ◽

Water Use ◽

Particle Swarm ◽

Recycled Water ◽

Swarm Optimization ◽

The Public ◽

Key Factor ◽

Important Significance

The public’s acceptance level of recycled water use is a key factor that affects the popularization of this technology; therefore, it is critical to know the public’s attitude in order to make guiding policies effectively and scientifically. To examine the major focuses and hot topics among the public about recycled water use, one of the major platforms for social opinion in China, the micro blog, is used as a source to obtain data related to the topic. Through the “follow-be followed” and “forward-dialogue” behaviors, a network of discussion of recycled water use among micro-blog users has been constructed. Improved particle swarm optimization has been used to allow deep digging for key words. Ultimately, key words about the topic of have been clustered into three categories, namely, the popularization status of recycled water use, the main application, and the public’s attitude. The conclusion accurately describes the concerns of Chinese citizens regarding recycled water use, and has important significance for the popularization of this technology.

Download Full-text

Light-Weight, Self-Powered Sensor Based on Triboelectric Nanogenerator for Big Data Analytics in Sports

Electronics ◽

10.3390/electronics10192322 ◽

2021 ◽

Vol 10 (19) ◽

pp. 2322

Author(s):

Xiaofei Ma ◽

Xuan Liu ◽

Xinxing Li ◽

Yunfei Ma

Keyword(s):

Big Data ◽

Real Time ◽

Data Analytics ◽

Mechanical Energy ◽

Rapid Development ◽

Big Data Analytics ◽

Triboelectric Nanogenerator ◽

Light Weight ◽

Table Tennis ◽

Self Powered

With the rapid development of the Internet of Things (IoTs), big data analytics has been widely used in the sport field. In this paper, a light-weight, self-powered sensor based on a triboelectric nanogenerator for big data analytics in sports has been demonstrated. The weight of each sensing unit is ~0.4 g. The friction material consists of polyaniline (PANI) and polytetrafluoroethylene (PTFE). Based on the triboelectric nanogenerator (TENG), the device can convert small amounts of mechanical energy into the electrical signal, which contains information about the hitting position and hitting velocity of table tennis balls. By collecting data from daily table tennis training in real time, the personalized training program can be adjusted. A practical application has been exhibited for collecting table tennis information in real time and, according to these data, coaches can develop personalized training for an amateur to enhance the ability of hand control, which can improve their table tennis skills. This work opens up a new direction in intelligent athletic facilities and big data analytics.

Download Full-text