scholarly journals Extracting numerical data from unstructured Arabic texts (ENAT)

Author(s):  
Abeer K. AL-Mashhadany ◽  
Dalal N. Hamood ◽  
Ahmed T. Sadiq Al-Obaidi ◽  
Waleed K. Al-Mashhsdany

<span id="docs-internal-guid-5dcc170c-7fff-e8e4-10d4-4a07701ca923"><span>Unstructured data becomes challenges because in recent years have observed the ability to gather a massive amount of data from annotated documents. This paper interested with Arabic unstructured text analysis. Manipulating unstructured text and converting it into a form understandable by computer is a high-level aim. An important step to achieve this aim is to understand numerical phrases. This paper aims to extract numerical data from Arabic unstructured text in general. This work attempts to recognize numerical characters phrases, analyze them and then convert them into integer values. The inference engine is based on the Arabic linguistic and morphological rules. The applied method encompasses rules of numerical nouns with Arabic morphological rules, in order to achieve high accurate extraction method. Arithmetic operations are applied to convert the numerical phrase into integer value. The proper operation is determined depending on linguistic and morphological rules. It will be shown that applying Arabic linguistic rules together with arithmetic operations succeeded in extracting numerical data from Arabic unstructured text with high accuracy reaches to 100%.</span></span>

AI Magazine ◽  
2015 ◽  
Vol 36 (1) ◽  
pp. 75-86 ◽  
Author(s):  
Jennifer Sleeman ◽  
Tim Finin ◽  
Anupam Joshi

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.


2019 ◽  
Vol 4 (2) ◽  
pp. 97-104
Author(s):  
Lazuardi Umar ◽  
Yanuar Hamzah ◽  
Rahmondia N. Setiadi

This paper describes a design of a fry counter intended to be used by consuming fish farmer. Along this time, almost all the fry counting process is counted by manual, which is done by a human. It is requiring much energy and needs high concentration; thus, can cause a high level of exhaustion for the fry counting worker. Besides that, the human capability and capacity of counting are limited to a low level. A fry counter design in this study utilizes a multi-channel optocoupler sensor to increase the counting capacity. The multi-channel fry counter counting system is developed as a solution to a limited capacity of available fry counter. This design uses an input signal extender system on controller including the interrupt system. From the experiment, high accuracy level is obtained on the counting and channel detection, therefore, this design can be implemented and could help farmers to increase the production capacity of consuming fish.


Author(s):  
Yanchun Sun ◽  
Hang Yin ◽  
Jiu Wen ◽  
Zhiyu Sun

Urban region functions are the types of potential activities in an urban region, such as residence, commerce, transportation, entertainment, etc. A service which mines urban region functions is of great value for various applications, including urban planning and transportation management, etc. Many studies have been carried out to dig out different regions’ functions, but few studies are based on social media text analysis. Considering that the semantic information embedded in social media texts is very useful to infer an urban region’s main functions, we design a service which extracts human activities using Sina Weibo ( www.weibo.com ; the largest microblog system in Chinese, similar to Twitter) with location information and further describes a region’s main functions with a function vector based on the human activities. First, we predefine a variety of human activities to get the related activities corresponding to each Weibo post using an urban function classification model. Second, urban regions’ function vectors are generated, with which we can easily do some high-level work such as similar place recommendation. At last, with the function vectors generated, we develop a Web application for urban region function querying. We also conduct a case study among the urban regions in Beijing, and the experiment results demonstrate the feasibility of our method.


Author(s):  
Caterina Paola Venditti ◽  
Paolo Mele

In the era of digital archaeology, the communication of archaeological data/contexts/work can be enhanced by Cloud computing, AI, and other emergent technologies. The authors explore the most recent and efficient examples, ranging from some intrinsic properties of AI, i.e. capabilities of sense, comprehend and act, and looking at their application in communication both among specialists of the archaeological sector and from them to other recipients. The chapter will also provide a high-level overview of knowledge extraction solutions from tons of structured and unstructured data, to make it available through software applications that perform automated tasks. Archaeologists must be ready to go down in trenches and communicate their studies with a deep consciousness of chances given by these technologies, and with adequate skills to master them.


2020 ◽  
Vol 39 (4) ◽  
pp. 727-742 ◽  
Author(s):  
Joachim Büschken ◽  
Greg M. Allenby

User-generated content in the form of customer reviews, blogs, and tweets is an emerging and rich source of data for marketers. Topic models have been successfully applied to such data, demonstrating that empirical text analysis benefits greatly from a latent variable approach that summarizes high-level interactions among words. We propose a new topic model that allows for serial dependency of topics in text. That is, topics may carry over from word to word in a document, violating the bag-of-words assumption in traditional topic models. In the proposed model, topic carryover is informed by sentence conjunctions and punctuation. Typically, such observed information is eliminated prior to analyzing text data (i.e., preprocessing) because words such as “and” and “but” do not differentiate topics. We find that these elements of grammar contain information relevant to topic changes. We examine the performance of our models using multiple data sets and establish boundary conditions for when our model leads to improved inference about customer evaluations. Implications and opportunities for future research are discussed.


2018 ◽  
Vol 7 (2.21) ◽  
pp. 417
Author(s):  
K Kousalya ◽  
Shaik Javed Parvez

In present scenario, the growing data are naturally unstructured. In this case to handle the wide range of data, is difficult. The proposed paper is to process the unstructured text data effectively in Hadoop map reduce using Python. Apache Hadoop is an open source platform and it widely uses Map Reduce framework. Map Reduce is popular and effective for processing the unstructured data in parallel manner.  There are two stages in map reduce, namely transform and repository. Here the input splits into small blocks and worker node process individual blocks in parallel. This map reduce generally is based on java. While Hadoop Streaming allows writing mapper and reducer in other languages like Python. In this paper, we are going to show an alternative way of processing the growing unstructured content data by using python. We will also compare the performance between java based and non-java based programs. 


2005 ◽  
Vol 15 (3) ◽  
pp. 353-401 ◽  
Author(s):  
CLEMENS GRELCK

Classical application domains of parallel computing are dominated by processing large arrays of numerical data. Whereas most functional languages focus on lists and trees rather than on arrays, SAC is tailor-made in design and in implementation for efficient high-level array processing. Advanced compiler optimizations yield performance levels that are often competitive with low-level imperative implementations. Based on SAC, we develop compilation techniques and runtime system support for the compiler-directed parallel execution of high-level functional array processing code on shared memory architectures. Competitive sequential performance gives us the opportunity to exploit the conceptual advantages of the functional paradigm for achieving real performance gains with respect to existing imperative implementations, not only in comparison with uniprocessor runtimes. While the design of SAC facilitates parallelization, the particular challenge of high sequential performance is that realization of satisfying speedups through parallelization becomes substantially more difficult. We present an initial compilation scheme and multi-threaded execution model, which we step-wise refine to reduce organizational overhead and to improve parallel performance. We close with a detailed analysis of the impact of certain design decisions on runtime performance, based on a series of experiments.


Sign in / Sign up

Export Citation Format

Share Document