scholarly journals PARSING TWITTER MENGGUNAKAN METODE LEFT-CORNER PARSING DENGAN MEMANFAATKAN POS TAGGER

Repositor ◽  
2020 ◽  
Vol 2 (7) ◽  
pp. 897
Author(s):  
Dyah Anitia ◽  
Yuda Munarko ◽  
Yufis Azhar

AbstrakPada penelitian ini dilakukan investigasi parser dengan pendekatan left-corner untuk data tweet bahasa Indonesia. Total koleksi tweet sebanyak 850 tweet yang dibagi menjadi tiga kumpulan data, yakni data train POS Tagger, data train dan data uji. Left-corner menggabungkan dua metode yakni top-down dan bottom-up. Dimana top-down digunakan pada proses pengenalan kelas kata dan bottom-up digunakan pada proses pengenalan struktur kalimat. Adapun jenis tag yang digunakan dalam proses top-down berjumlah 23 tagset dan frasa  yang digunakan untuk menentukan struktur kalimat frasa yakni frasa nomina, frasa verbal, frasa adjektiva, frasa adverbia dan frasa preposisional. Hasilnya adalah untuk pendekatan left corner mencapai nilai precision 88,29%, nilai recall 68,3% dan F1 measure 77,02%. Nilai yang diperoleh dengan pendekatan left-corner lebih besar dibandingkan nilai dengan pendekatan bottom-up. Hasil dari nilai yang diperoleh dengan bottom up mencapai nilai precision 68,79%, nilai recall 47,12% dan F1 measure 55,9%. Hal ini disebabkan penggunaan kelas kata pada proses top-down berpengaruh pada sturuktur kalimat pada proses bottom up.AbstractIn this research, we investigated parser with left-corner parser approach for data tweet in Indonesian language. The data used was consisted of 850 tweets which divided for into three data set, that is data train for POS Tagger, data train for parser and data test. The left-corner combines two methods, top-down and bottom-up methods. Top-down  used for processes a sequence of words, and attaches a part of speech tag to each and bottom-up used for processes a sentence structure. We used 41 tags and the pharse used to define the sentence structure is noun phrase, verbal phrase, adjective pharse, adverd phrase and prepositional pharse. The result was that precision 88,29%,  recall 68,3% and F1 measure 77,02% of left-corner approach. The value obtained by the left-corner approach is greater than the value with the bottom-up approach. The result was that precision 68,29%,  recall 47,12% and F1 measure 55,9% of bottom-up approach. This is because the use of word class in top-down process affect the sentence structure in the bottom up process. that is because the use of word class in top-down process affect the sentence structure in the bottom up process.

2014 ◽  
Vol 2 ◽  
pp. 23-33 ◽  
Author(s):  
Emad Mohamed

In order to examine whether Arabic has Heavy Noun Phrase Shifting (HNPS), I have extracted from the Prague Arabic Dependency Treebank a data set in which a verb governs either an object NP and an Adjunct Phrase (PP or AdvP) or a subject NP and an Adjunct Phrase. I have used binary logistic regression where the criterion variable is whether the subject/object NP shifts, and used as predictor variables heaviness (the number of tokens per NP, adjunct), part of speech tag, verb disposition (ie. whether the verb has a history of taking double objects or sentential objects), NP number, NP definiteness, and the presence of referring pronouns in either the NP or the adjunct. The results show that only object heaviness and adjunct heaviness are useful predictors of object HNPS, while subject heaviness, adjunct heaviness, subject part of speech tag, definiteness, and adjunct head POS tags are active predictors of subject HNPS. I also show that HNPS can in principle be predicted from sentence structure.


2018 ◽  
Vol 2 (3) ◽  
pp. 157
Author(s):  
Ahmad Subhan Yazid ◽  
Agung Fatwanto

Indonesian hold a fundamental role in the communication. There is ambiguous problem in its machine learning implementation. In the Natural Language Processing study, Part of Speech (POS) tagging has a role in the decreasing this problem. This study use the Rule Based method to determine the best word class for ambiguous words in Indonesian. This research follows some stages: knowledge inventory, making algorithms, implementation, Testing, Analysis, and Conclusions. The first data used is Indonesian corpus that was developed by Language department of Computer science Faculty, Indonesia University. Then, data is processed and shown descriptively by following certain rules and specification. The result is a POS tagging algorithm included 71 rules in flowchart and descriptive sentence notation. Refer to testing result, the algorithm successfully provides 92 labeling of 100 tested words (92%). The results of the implementation are influenced by the availability of rules, word class tagsets and corpus data.


2020 ◽  
Vol 9 (2) ◽  
pp. 303
Author(s):  
I Gde Made Hendra Pradiptha ◽  
Ngurah Agus Sanjaya ER

Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10) and tagged corpus around 3669 tokens with 21 tags. Based on the experiments conducted, the HMM method obtained an accuracy of 68.56%.


Author(s):  
Stefania Mereu ◽  
Matt Newman ◽  
Michelle Peterson ◽  
Eric Taylor ◽  
Jessica White-Sustaita ◽  
...  

Within the fast-paced world of Lean and Agile software development, researchers are always on the lookout for methods that allow for rapid data gathering and analysis, while still yielding robust design recommendations. This paper considers the use cases for “top-down” hypothesis testing and “bottom-up” statistical cluster analysis, within survey research on user behaviors and needs. Comparing the application of each method on the same data set shows that statistical cluster analysis can create rich data-driven personas that inform user needs and preferences and provide design teams with insightful recommendations in a short amount of time. This method also increases the potential for gaining unexpected information from quantitative data—an achievement typically viewed as within the purview of qualitative research alone. Using both approaches to the same dataset allowed us to both answer specific questions for the design team, and learn new insights from the bottom up.


2019 ◽  
Vol 25 (5) ◽  
pp. 651-674 ◽  
Author(s):  
Katja Zupan ◽  
Nikola Ljubešić ◽  
Tomaž Erjavec

AbstractPart-of-speech (PoS) tagging of non-standard language with models developed for standard language is known to suffer from a significant decrease in accuracy. Two methods are typically used to improve it: word normalisation, which decreases the out-of-vocabulary rate of the PoS tagger, and domain adaptation where the tagger is made aware of the non-standard language variation, either through supervision via non-standard data being added to the tagger’s training set, or via distributional information calculated from raw texts. This paper investigates the two approaches, normalisation and domain adaptation, on carefully constructed data sets encompassing historical and user-generated Slovene texts, in particular focusing on the amount of labour necessary to produce the manually annotated data sets for each approach and comparing the resulting PoS accuracy. We give quantitative as well as qualitative analyses of the tagger performance in various settings, showing that on our data set closed and open class words exhibit significantly different behaviours, and that even small inconsistencies in the PoS tags in the data have an impact on the accuracy. We also show that to improve tagging accuracy, it is best to concentrate on obtaining manually annotated normalisation training data for short annotation campaigns, while manually producing in-domain training sets for PoS tagging is better when a more substantial annotation campaign can be undertaken. Finally, unsupervised adaptation via Brown clustering is similarly useful regardless of the size of the training data available, but improvements tend to be bigger when adaptation is performed via in-domain tagging data.


2017 ◽  
Author(s):  
Peter Bergamaschi ◽  
Ute Karstens ◽  
Alistair J. Manning ◽  
Marielle Saunois ◽  
Aki Tsuruta ◽  
...  

Abstract. We present inverse modelling (top-down) estimates of European methane (CH4) emissions for 2006–2012 based on a new quality-controlled and harmonized in-situ data set from 18 European atmospheric monitoring stations. We applied an ensemble of seven inverse models and performed four inversion experiments, investigating the impact of different sets of stations and the use of a priori information on emissions. The inverse models infer total CH4 emissions of 26.7 (20.2–29.7) Tg CH4 yr−1 (mean, 10th and 90th percentiles from all inversions) for the EU-28 for 2006–2012 from the four inversion experiments. For comparison, total anthropogenic CH4 emissions reported to UNFCCC (bottom-up, based on statistical data and emissions factors) amount to only 21.3 Tg CH4 yr−1 (2006) to 18.8 Tg CH4 yr−1 (2012). A potential explanation for the higher range of top-down estimates compared to bottom-up inventories could be the contribution from natural sources, such as peatlands, wetlands, and wet soils. Based on seven different wetland inventories from the Wetland and Wetland CH4 Inter-comparison of Models Project (WETCHIMP) total wetland emissions of 4.3 (2.3–8.2) CH4 yr−1 from EU-28 are estimated. The hypothesis of significant natural emissions is supported by the finding that several inverse models yield significant seasonal cycles of derived CH4 emissions with maxima in summer, while anthropogenic CH4 emissions are assumed to have much lower seasonal variability. Furthermore, we investigate potential biases in the inverse models by comparison with regular aircraft profiles at four European sites and with vertical profiles obtained during the Infrastructure for Measurement of the European Carbon Cycle (IMECC) aircraft campaign. We present a novel approach to estimate the biases in the derived emissions, based on the comparison of simulated and measured enhancements of CH4 compared to the background, integrated over the entire boundary layer and over the lower troposphere. This analysis identifies regional biases for several models at the aircraft profile sites in France, Hungary and Poland.


MATICS ◽  
2013 ◽  
Author(s):  
Fachry Khusaini ◽  
Fachrul Kurniawan
Keyword(s):  
Top Down ◽  

<p>Bahasa Inggris adalah bahasa internasional yang digunakan seluruh orang di dunia</p> <p>untuk berinteraksi satu sama lain. Mempelajari bahasa Inggris sangat diperlukan pada saat ini. Banyak metode pembelajaran yang digunakan untuk mempelajari bahasa   Inggris.   Game   merupakan   media   pembelajaran   yang   menyenangkan, terutama bagi anak. Melalui game, anak dapat belajar sambil bermain dalam mengembangkan kemampuannya. Game yang dibangun adalah game pembelajaran grammar bahasa Inggris menggunakan algoritma left corner parsing sebagai pemeriksa dalam kalimat yang dibuat.Algoritma yang digunakan dalam memeriksa sebuah  kalimat  sangat  banyak  macamnya,  salah  satunya  algoritma  left  corner parsing. Algoritma left corner parsing merupakan gabungan dari dua algoritma, yaitu algoritma top down parsing dan bottom up parsing. Tugas algoritma ini memeriksa setiap kata dalam sebuah kalimat, kemudian mencocokkan sebuah pola grammar terhadap hasil pemeriksaan tersebut. Proses pemeriksaan ini yang akan menjadi  pemeriksa  kata  dalam  game.  Dari  uji coba  yang  dilakukan, metode left corner  parsing  dapat  mengenali  pola  grammar  dengan  sangat  baik,  akan  tetapi dalam mengenali makna untuk membentuk sebuah kalimat yang benar masih menjadi kekurangan.</p> <p> </p> <p><strong> </strong></p>


2021 ◽  
Vol 9 (1) ◽  
pp. 137-156
Author(s):  
Tendai Makoni ◽  
Delson Chikobvu ◽  
Caston Sigauke

The objectives of the paper is to: (1) adopt the hierarchical forecasting methods in modelling and forecasting international tourist arrivals in Zimbabwe; and (2) coming up with Zimbabwe international tourist arrivals Prediction Intervals (PIs) in Quantile Regression Averaging (QRA) to hierarchical tourism forecasts. Zimbabwe’s monthly international tourist arrivals data from January 2002 to December 2018 was used. The dataset used was before the COVID-19 period and were disaggregated according to the purpose of the visit (POV). Three hierarchical forecasting approaches, namely top-down, bottom-up and optimal combination approaches were applied to the data. The results showed the superiority of the bottom-up approach over both the top-down and optimal combination approaches. Forecasts indicate a general increase in aggregate series. The combined methods provide a new insight into modelling tourist arrivals. The approach is useful to the government, tourism stakeholders, and investors among others, for decision-making, resource mobilisation and allocation. The Zimbabwe Tourism Authority (ZTA) could adopt the forecasting techniques to produce informative and precise tourism forecasts. The data set used is before the COVID-19 pandemic and the models indicate what could happen outside the pandemic. During the pandemic the country was under lockdown with no tourist arrivals to report on. The models are useful for planning purposes beyond the COVID-19 pandemic.


2018 ◽  
Vol 18 (2) ◽  
pp. 901-920 ◽  
Author(s):  
Peter Bergamaschi ◽  
Ute Karstens ◽  
Alistair J. Manning ◽  
Marielle Saunois ◽  
Aki Tsuruta ◽  
...  

Abstract. We present inverse modelling (top down) estimates of European methane (CH4) emissions for 2006–2012 based on a new quality-controlled and harmonised in situ data set from 18 European atmospheric monitoring stations. We applied an ensemble of seven inverse models and performed four inversion experiments, investigating the impact of different sets of stations and the use of a priori information on emissions. The inverse models infer total CH4 emissions of 26.8 (20.2–29.7) Tg CH4 yr−1 (mean, 10th and 90th percentiles from all inversions) for the EU-28 for 2006–2012 from the four inversion experiments. For comparison, total anthropogenic CH4 emissions reported to UNFCCC (bottom up, based on statistical data and emissions factors) amount to only 21.3 Tg CH4 yr−1 (2006) to 18.8 Tg CH4 yr−1 (2012). A potential explanation for the higher range of top-down estimates compared to bottom-up inventories could be the contribution from natural sources, such as peatlands, wetlands, and wet soils. Based on seven different wetland inventories from the Wetland and Wetland CH4 Inter-comparison of Models Project (WETCHIMP), total wetland emissions of 4.3 (2.3–8.2) Tg CH4 yr−1 from the EU-28 are estimated. The hypothesis of significant natural emissions is supported by the finding that several inverse models yield significant seasonal cycles of derived CH4 emissions with maxima in summer, while anthropogenic CH4 emissions are assumed to have much lower seasonal variability. Taking into account the wetland emissions from the WETCHIMP ensemble, the top-down estimates are broadly consistent with the sum of anthropogenic and natural bottom-up inventories. However, the contribution of natural sources and their regional distribution remain rather uncertain. Furthermore, we investigate potential biases in the inverse models by comparison with regular aircraft profiles at four European sites and with vertical profiles obtained during the Infrastructure for Measurement of the European Carbon Cycle (IMECC) aircraft campaign. We present a novel approach to estimate the biases in the derived emissions, based on the comparison of simulated and measured enhancements of CH4 compared to the background, integrated over the entire boundary layer and over the lower troposphere. The estimated average regional biases range between −40 and 20 % at the aircraft profile sites in France, Hungary and Poland.


Sign in / Sign up

Export Citation Format

Share Document