scholarly journals Discussion on Data Features and Construction Models of Translation Corpus in the Era of Big Data

2021 ◽  
Vol 251 ◽  
pp. 01030
Author(s):  
Qinqi Kang ◽  
Zhao Kang

With the rapid development of artificial intelligence in the current era of big data, the construction of translation corpus has become a key factor in effectively achieving a highly intelligent translation. In the era of big data, the data sources and data types of translation corpus are becoming more and more diversified, which will inevitably bring about a new revolution in the construction of translation corpus. The construction of the translation corpus in the era of big data can fully rely on third-party open source data, crowd-sourcing translation, machine closed-loop, human-machine collaboration and other multiple modes to comprehensively improve the quality of translation corpus construction to better serve translation practice.

Author(s):  
S. Ariannamazi ◽  
F. Karimipour ◽  
F. Hakimpour

Rapid development of crowd-sourcing or volunteered geographic information (VGI) provides opportunities for authoritatives that deal with geospatial information. Heterogeneity of multiple data sources and inconsistency of data types is a key characteristics of VGI datasets. The expansion of cities resulted in the growing number of POIs in the OpenStreetMap, a well-known VGI source, which causes the datasets to outdate in short periods of time. These changes made to spatial and aspatial attributes of features such as names and addresses might cause confusion or ambiguity in the processes that require feature’s literal information like addressing and geocoding. VGI sources neither will conform specific vocabularies nor will remain in a specific schema for a long period of time. As a result, the integration of VGI sources is crucial and inevitable in order to avoid duplication and the waste of resources. Information integration can be used to match features and qualify different annotation alternatives for disambiguation. This study enhances the search capabilities of geospatial tools with applications able to understand user terminology to pursuit an efficient way for finding desired results. Semantic web is a capable tool for developing technologies that deal with lexical and numerical calculations and estimations. There are a vast amount of literal-spatial data representing the capability of linguistic information in knowledge modeling, but these resources need to be harmonized based on Semantic Web standards. The process of making addresses homogenous generates a helpful tool based on spatial data integration and lexical annotation matching and disambiguating.


Author(s):  
Shaveta Bhatia

 The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity.  It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.


Author(s):  
Ying Wang ◽  
Yiding Liu ◽  
Minna Xia

Big data is featured by multiple sources and heterogeneity. Based on the big data platform of Hadoop and spark, a hybrid analysis on forest fire is built in this study. This platform combines the big data analysis and processing technology, and learns from the research results of different technical fields, such as forest fire monitoring. In this system, HDFS of Hadoop is used to store all kinds of data, spark module is used to provide various big data analysis methods, and visualization tools are used to realize the visualization of analysis results, such as Echarts, ArcGIS and unity3d. Finally, an experiment for forest fire point detection is designed so as to corroborate the feasibility and effectiveness, and provide some meaningful guidance for the follow-up research and the establishment of forest fire monitoring and visualized early warning big data platform. However, there are two shortcomings in this experiment: more data types should be selected. At the same time, if the original data can be converted to XML format, the compatibility is better. It is expected that the above problems can be solved in the follow-up research.


2020 ◽  
Vol 4 (2) ◽  
pp. 5 ◽  
Author(s):  
Ioannis C. Drivas ◽  
Damianos P. Sakas ◽  
Georgios A. Giannakopoulos ◽  
Daphne Kyriaki-Manessi

In the Big Data era, search engine optimization deals with the encapsulation of datasets that are related to website performance in terms of architecture, content curation, and user behavior, with the purpose to convert them into actionable insights and improve visibility and findability on the Web. In this respect, big data analytics expands the opportunities for developing new methodological frameworks that are composed of valid, reliable, and consistent analytics that are practically useful to develop well-informed strategies for organic traffic optimization. In this paper, a novel methodology is implemented in order to increase organic search engine visits based on the impact of multiple SEO factors. In order to achieve this purpose, the authors examined 171 cultural heritage websites and their retrieved data analytics about their performance and user experience inside them. Massive amounts of Web-based collections are included and presented by cultural heritage organizations through their websites. Subsequently, users interact with these collections, producing behavioral analytics in a variety of different data types that come from multiple devices, with high velocity, in large volumes. Nevertheless, prior research efforts indicate that these massive cultural collections are difficult to browse while expressing low visibility and findability in the semantic Web era. Against this backdrop, this paper proposes the computational development of a search engine optimization (SEO) strategy that utilizes the generated big cultural data analytics and improves the visibility of cultural heritage websites. One step further, the statistical results of the study are integrated into a predictive model that is composed of two stages. First, a fuzzy cognitive mapping process is generated as an aggregated macro-level descriptive model. Secondly, a micro-level data-driven agent-based model follows up. The purpose of the model is to predict the most effective combinations of factors that achieve enhanced visibility and organic traffic on cultural heritage organizations’ websites. To this end, the study contributes to the knowledge expansion of researchers and practitioners in the big cultural analytics sector with the purpose to implement potential strategies for greater visibility and findability of cultural collections on the Web.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yusheng Lu ◽  
Jiantong Zhang

PurposeThe digital revolution and the use of big data (BD) in particular has important applications in the construction industry. In construction, massive amounts of heterogeneous data need to be analyzed to improve onsite efficiency. This article presents a systematic review and identifies future research directions, presenting valuable conclusions derived from rigorous bibliometric tools. The results of this study may provide guidelines for construction engineering and global policymaking to change the current low-efficiency of construction sites.Design/methodology/approachThis study identifies research trends from 1,253 peer-reviewed papers, using general statistics, keyword co-occurrence analysis, critical review, and qualitative-bibliometric techniques in two rounds of search.FindingsThe number of studies in this area rapidly increased from 2012 to 2020. A significant number of publications originated in the UK, China, the US, and Australia, and the smallest number from one of these countries is more than twice the largest number in the remaining countries. Keyword co-occurrence is divided into three clusters: BD application scenarios, emerging technology in BD, and BD management. Currently developing approaches in BD analytics include machine learning, data mining, and heuristic-optimization algorithms such as graph convolutional, recurrent neural networks and natural language processes (NLP). Studies have focused on safety management, energy reduction, and cost prediction. Blockchain integrated with BD is a promising means of managing construction contracts.Research limitations/implicationsThe study of BD is in a stage of rapid development, and this bibliometric analysis is only a part of the necessary practical analysis.Practical implicationsNational policies, temporal and spatial distribution, BD flow are interpreted, and the results of this may provide guidelines for policymakers. Overall, this work may develop the body of knowledge, producing a reference point and identifying future development.Originality/valueTo our knowledge, this is the first bibliometric review of BD in the construction industry. This study can also benefit construction practitioners by providing them a focused perspective of BD for emerging practices in the construction industry.


2014 ◽  
Vol 23 (01) ◽  
pp. 27-35 ◽  
Author(s):  
S. de Lusignan ◽  
S-T. Liaw ◽  
C. Kuziemsky ◽  
F. Mold ◽  
P. Krause ◽  
...  

Summary Background: Generally benefits and risks of vaccines can be determined from studies carried out as part of regulatory compliance, followed by surveillance of routine data; however there are some rarer and more long term events that require new methods. Big data generated by increasingly affordable personalised computing, and from pervasive computing devices is rapidly growing and low cost, high volume, cloud computing makes the processing of these data inexpensive. Objective: To describe how big data and related analytical methods might be applied to assess the benefits and risks of vaccines. Method: We reviewed the literature on the use of big data to improve health, applied to generic vaccine use cases, that illustrate benefits and risks of vaccination. We defined a use case as the interaction between a user and an information system to achieve a goal. We used flu vaccination and pre-school childhood immunisation as exemplars. Results: We reviewed three big data use cases relevant to assessing vaccine benefits and risks: (i) Big data processing using crowd-sourcing, distributed big data processing, and predictive analytics, (ii) Data integration from heterogeneous big data sources, e.g. the increasing range of devices in the “internet of things”, and (iii) Real-time monitoring for the direct monitoring of epidemics as well as vaccine effects via social media and other data sources. Conclusions: Big data raises new ethical dilemmas, though its analysis methods can bring complementary real-time capabilities for monitoring epidemics and assessing vaccine benefit-risk balance.


2014 ◽  
Vol 644-650 ◽  
pp. 5644-5647
Author(s):  
Kang Shao ◽  
Hui Xu ◽  
Kun Wang

With the rapid development of e-commerce, online store, characterized in less investment and operating flexibility, are favored by more entrepreneurs. This paper analyzes the principal factors, affecting the competitiveness of online clothing shop, based on third-party platform with online sales of conduct, and the intrinsic link between them. And emphasis should be placed on the network store management, target market, products value, service quality, network store promotion, prestige image and logistics services to cultivate and raise the competence of network store so as to get unique competitive edges.


2018 ◽  
Vol 10 (7) ◽  
pp. 2488 ◽  
Author(s):  
Hanliang Fu ◽  
Zhaoxing Li ◽  
Zhijian Liu ◽  
Zelin Wang

The public’s acceptance level of recycled water use is a key factor that affects the popularization of this technology; therefore, it is critical to know the public’s attitude in order to make guiding policies effectively and scientifically. To examine the major focuses and hot topics among the public about recycled water use, one of the major platforms for social opinion in China, the micro blog, is used as a source to obtain data related to the topic. Through the “follow-be followed” and “forward-dialogue” behaviors, a network of discussion of recycled water use among micro-blog users has been constructed. Improved particle swarm optimization has been used to allow deep digging for key words. Ultimately, key words about the topic of have been clustered into three categories, namely, the popularization status of recycled water use, the main application, and the public’s attitude. The conclusion accurately describes the concerns of Chinese citizens regarding recycled water use, and has important significance for the popularization of this technology.


Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2322
Author(s):  
Xiaofei Ma ◽  
Xuan Liu ◽  
Xinxing Li ◽  
Yunfei Ma

With the rapid development of the Internet of Things (IoTs), big data analytics has been widely used in the sport field. In this paper, a light-weight, self-powered sensor based on a triboelectric nanogenerator for big data analytics in sports has been demonstrated. The weight of each sensing unit is ~0.4 g. The friction material consists of polyaniline (PANI) and polytetrafluoroethylene (PTFE). Based on the triboelectric nanogenerator (TENG), the device can convert small amounts of mechanical energy into the electrical signal, which contains information about the hitting position and hitting velocity of table tennis balls. By collecting data from daily table tennis training in real time, the personalized training program can be adjusted. A practical application has been exhibited for collecting table tennis information in real time and, according to these data, coaches can develop personalized training for an amateur to enhance the ability of hand control, which can improve their table tennis skills. This work opens up a new direction in intelligent athletic facilities and big data analytics.


Sign in / Sign up

Export Citation Format

Share Document