Extracting numerical data from unstructured Arabic texts (ENAT)

<span id="docs-internal-guid-5dcc170c-7fff-e8e4-10d4-4a07701ca923"><span>Unstructured data becomes challenges because in recent years have observed the ability to gather a massive amount of data from annotated documents. This paper interested with Arabic unstructured text analysis. Manipulating unstructured text and converting it into a form understandable by computer is a high-level aim. An important step to achieve this aim is to understand numerical phrases. This paper aims to extract numerical data from Arabic unstructured text in general. This work attempts to recognize numerical characters phrases, analyze them and then convert them into integer values. The inference engine is based on the Arabic linguistic and morphological rules. The applied method encompasses rules of numerical nouns with Arabic morphological rules, in order to achieve high accurate extraction method. Arithmetic operations are applied to convert the numerical phrase into integer value. The proper operation is determined depending on linguistic and morphological rules. It will be shown that applying Arabic linguistic rules together with arithmetic operations succeeded in extracting numerical data from Arabic unstructured text with high accuracy reaches to 100%.</span></span>

Download Full-text

Entity Type Recognition for Heterogeneous Semantic Graphs

AI Magazine ◽

10.1609/aimag.v36i1.2569 ◽

2015 ◽

Vol 36 (1) ◽

pp. 75-86 ◽

Cited By ~ 4

Author(s):

Jennifer Sleeman ◽

Tim Finin ◽

Anupam Joshi

Keyword(s):

Machine Learning ◽

Background Knowledge ◽

Knowledge Bases ◽

Heterogeneous Data ◽

Unstructured Data ◽

Supervised Machine Learning ◽

Coreference Resolution ◽

Multiple Sources ◽

Fine Grained ◽

High Level

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.

Download Full-text

MULTI-CHANNEL FRY COUNTER DESIGN USING OPTOCOUPLER SENSOR

SPEKTRA Jurnal Fisika dan Aplikasinya ◽

10.21009/spektra.042.06 ◽

2019 ◽

Vol 4 (2) ◽

pp. 97-104

Author(s):

Lazuardi Umar ◽

Yanuar Hamzah ◽

Rahmondia N. Setiadi

Keyword(s):

Counting Process ◽

Production Capacity ◽

High Accuracy ◽

Limited Capacity ◽

High Concentration ◽

Counting System ◽

Channel Detection ◽

Accuracy Level ◽

High Level ◽

Almost All

This paper describes a design of a fry counter intended to be used by consuming fish farmer. Along this time, almost all the fry counting process is counted by manual, which is done by a human. It is requiring much energy and needs high concentration; thus, can cause a high level of exhaustion for the fry counting worker. Besides that, the human capability and capacity of counting are limited to a low level. A fry counter design in this study utilizes a multi-channel optocoupler sensor to increase the counting capacity. The multi-channel fry counter counting system is developed as a solution to a limited capacity of available fry counter. This design uses an input signal extender system on controller including the interrupt system. From the experiment, high accuracy level is obtained on the counting and channel detection, therefore, this design can be implemented and could help farmers to increase the production capacity of consuming fish.

Download Full-text

Urban Region Function Mining Service Based on Social Media Text Analysis

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021400088 ◽

2021 ◽

Vol 31 (04) ◽

pp. 563-586

Author(s):

Yanchun Sun ◽

Hang Yin ◽

Jiu Wen ◽

Zhiyu Sun

Keyword(s):

Social Media ◽

Text Analysis ◽

Human Activities ◽

Web Application ◽

Classification Model ◽

Urban Region ◽

Urban Regions ◽

Social Media Text ◽

Urban Function ◽

High Level

Urban region functions are the types of potential activities in an urban region, such as residence, commerce, transportation, entertainment, etc. A service which mines urban region functions is of great value for various applications, including urban planning and transportation management, etc. Many studies have been carried out to dig out different regions’ functions, but few studies are based on social media text analysis. Considering that the semantic information embedded in social media texts is very useful to infer an urban region’s main functions, we design a service which extracts human activities using Sina Weibo ( www.weibo.com ; the largest microblog system in Chinese, similar to Twitter) with location information and further describes a region’s main functions with a function vector based on the human activities. First, we predefine a variety of human activities to get the related activities corresponding to each Weibo post using an urban function classification model. Second, urban regions’ function vectors are generated, with which we can easily do some high-level work such as similar place recommendation. At last, with the function vectors generated, we develop a Web application for urban region function querying. We also conduct a case study among the urban regions in Beijing, and the experiment results demonstrate the feasibility of our method.

Download Full-text

Digital Transformation and Archaeology

Developing Effective Communication Skills in Archaeology - Advances in Religious and Cultural Studies ◽

10.4018/978-1-7998-1059-9.ch011 ◽

2020 ◽

pp. 224-244

Author(s):

Caterina Paola Venditti ◽

Paolo Mele

Keyword(s):

Cloud Computing ◽

Knowledge Extraction ◽

Digital Transformation ◽

Unstructured Data ◽

Intrinsic Properties ◽

Emergent Technologies ◽

Software Applications ◽

Archaeological Data ◽

Digital Archaeology ◽

High Level

In the era of digital archaeology, the communication of archaeological data/contexts/work can be enhanced by Cloud computing, AI, and other emergent technologies. The authors explore the most recent and efficient examples, ranging from some intrinsic properties of AI, i.e. capabilities of sense, comprehend and act, and looking at their application in communication both among specialists of the archaeological sector and from them to other recipients. The chapter will also provide a high-level overview of knowledge extraction solutions from tons of structured and unstructured data, to make it available through software applications that perform automated tasks. Archaeologists must be ready to go down in trenches and communicate their studies with a deep consciousness of chances given by these technologies, and with adequate skills to master them.

Download Full-text

Improving Text Analysis Using Sentence Conjunctions and Punctuation

Marketing Science ◽

10.1287/mksc.2019.1214 ◽

2020 ◽

Vol 39 (4) ◽

pp. 727-742 ◽

Cited By ~ 1

Author(s):

Joachim Büschken ◽

Greg M. Allenby

Keyword(s):

Text Analysis ◽

Latent Variable ◽

Topic Model ◽

Topic Models ◽

Future Research ◽

Multiple Data ◽

Variable Approach ◽

Multiple Data Sets ◽

Carry Over ◽

High Level

User-generated content in the form of customer reviews, blogs, and tweets is an emerging and rich source of data for marketers. Topic models have been successfully applied to such data, demonstrating that empirical text analysis benefits greatly from a latent variable approach that summarizes high-level interactions among words. We propose a new topic model that allows for serial dependency of topics in text. That is, topics may carry over from word to word in a document, violating the bag-of-words assumption in traditional topic models. In the proposed model, topic carryover is informed by sentence conjunctions and punctuation. Typically, such observed information is eliminated prior to analyzing text data (i.e., preprocessing) because words such as “and” and “but” do not differentiate topics. We find that these elements of grammar contain information relevant to topic changes. We examine the performance of our models using multiple data sets and establish boundary conditions for when our model leads to improved inference about customer evaluations. Implications and opportunities for future research are discussed.

Download Full-text

Fast and robust key frame extraction method for gesture video based on high-level feature representation

Signal Image and Video Processing ◽

10.1007/s11760-020-01783-4 ◽

2020 ◽

Author(s):

Huimin Yang ◽

Qiuhong Tian ◽

Qiaoli Zhuang ◽

Linye Li ◽

Qinglong Liang

Keyword(s):

Extraction Method ◽

Feature Representation ◽

Key Frame Extraction ◽

Key Frame ◽

High Level ◽

High Level Feature

Download Full-text

Effective processing of unstructured data using python in Hadoop map reduce

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12456 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 417

Author(s):

K Kousalya ◽

Shaik Javed Parvez

Keyword(s):

Open Source ◽

Unstructured Data ◽

Map Reduce ◽

Text Data ◽

Apache Hadoop ◽

Unstructured Text ◽

Wide Range ◽

Two Stages

In present scenario, the growing data are naturally unstructured. In this case to handle the wide range of data, is difficult. The proposed paper is to process the unstructured text data effectively in Hadoop map reduce using Python. Apache Hadoop is an open source platform and it widely uses Map Reduce framework. Map Reduce is popular and effective for processing the unstructured data in parallel manner. There are two stages in map reduce, namely transform and repository. Here the input splits into small blocks and worker node process individual blocks in parallel. This map reduce generally is based on java. While Hadoop Streaming allows writing mapper and reducer in other languages like Python. In this paper, we are going to show an alternative way of processing the growing unstructured content data by using python. We will also compare the performance between java based and non-java based programs.

Download Full-text

Fuzzy perceptron neural networks for classifiers with numerical data and linguistic rules as inputs

IEEE Transactions on Fuzzy Systems ◽

10.1109/91.890331 ◽

2000 ◽

Vol 8 (6) ◽

pp. 730-745 ◽

Cited By ~ 9

Author(s):

Jyh-Yeong Chang ◽

Jia-Lin Chen

Keyword(s):

Neural Networks ◽

Numerical Data ◽

Linguistic Rules

Download Full-text

Shared memory multiprocessor support for functional array processing in SAC

Journal of Functional Programming ◽

10.1017/s0956796805005538 ◽

2005 ◽

Vol 15 (3) ◽

pp. 353-401 ◽

Cited By ~ 29

Author(s):

CLEMENS GRELCK

Keyword(s):

Shared Memory ◽

Array Processing ◽

Numerical Data ◽

Parallel Execution ◽

Real Performance ◽

Execution Model ◽

Series Of Experiments ◽

High Level ◽

Performance Gains ◽

The Impact

Classical application domains of parallel computing are dominated by processing large arrays of numerical data. Whereas most functional languages focus on lists and trees rather than on arrays, SAC is tailor-made in design and in implementation for efficient high-level array processing. Advanced compiler optimizations yield performance levels that are often competitive with low-level imperative implementations. Based on SAC, we develop compilation techniques and runtime system support for the compiler-directed parallel execution of high-level functional array processing code on shared memory architectures. Competitive sequential performance gives us the opportunity to exploit the conceptual advantages of the functional paradigm for achieving real performance gains with respect to existing imperative implementations, not only in comparison with uniprocessor runtimes. While the design of SAC facilitates parallelization, the particular challenge of high sequential performance is that realization of satisfying speedups through parallelization becomes substantially more difficult. We present an initial compilation scheme and multi-threaded execution model, which we step-wise refine to reduce organizational overhead and to improve parallel performance. We close with a detailed analysis of the impact of certain design decisions on runtime performance, based on a series of experiments.

Download Full-text

AN EXTRACTION METHOD OF TIME-SERIES NUMERICAL DATA FROM ENTERPRISE PRESS RELEASES

Proceedings of the First International Conference on Software and Data Technologies ◽

10.5220/0001318202210225 ◽

2006 ◽

Keyword(s):

Time Series ◽

Extraction Method ◽

Numerical Data ◽

Press Releases

Download Full-text