PARSING TWITTER MENGGUNAKAN METODE LEFT-CORNER PARSING DENGAN MEMANFAATKAN POS TAGGER

AbstrakPada penelitian ini dilakukan investigasi parser dengan pendekatan left-corner untuk data tweet bahasa Indonesia. Total koleksi tweet sebanyak 850 tweet yang dibagi menjadi tiga kumpulan data, yakni data train POS Tagger, data train dan data uji. Left-corner menggabungkan dua metode yakni top-down dan bottom-up. Dimana top-down digunakan pada proses pengenalan kelas kata dan bottom-up digunakan pada proses pengenalan struktur kalimat. Adapun jenis tag yang digunakan dalam proses top-down berjumlah 23 tagset dan frasa yang digunakan untuk menentukan struktur kalimat frasa yakni frasa nomina, frasa verbal, frasa adjektiva, frasa adverbia dan frasa preposisional. Hasilnya adalah untuk pendekatan left corner mencapai nilai precision 88,29%, nilai recall 68,3% dan F1 measure 77,02%. Nilai yang diperoleh dengan pendekatan left-corner lebih besar dibandingkan nilai dengan pendekatan bottom-up. Hasil dari nilai yang diperoleh dengan bottom up mencapai nilai precision 68,79%, nilai recall 47,12% dan F1 measure 55,9%. Hal ini disebabkan penggunaan kelas kata pada proses top-down berpengaruh pada sturuktur kalimat pada proses bottom up.AbstractIn this research, we investigated parser with left-corner parser approach for data tweet in Indonesian language. The data used was consisted of 850 tweets which divided for into three data set, that is data train for POS Tagger, data train for parser and data test. The left-corner combines two methods, top-down and bottom-up methods. Top-down used for processes a sequence of words, and attaches a part of speech tag to each and bottom-up used for processes a sentence structure. We used 41 tags and the pharse used to define the sentence structure is noun phrase, verbal phrase, adjective pharse, adverd phrase and prepositional pharse. The result was that precision 88,29%, recall 68,3% and F1 measure 77,02% of left-corner approach. The value obtained by the left-corner approach is greater than the value with the bottom-up approach. The result was that precision 68,29%, recall 47,12% and F1 measure 55,9% of bottom-up approach. This is because the use of word class in top-down process affect the sentence structure in the bottom up process. that is because the use of word class in top-down process affect the sentence structure in the bottom up process.

Download Full-text

Object and subject Heavy-NP shift in Arabic

Research in Corpus Linguistics ◽

10.32714/ricl.02.03 ◽

2014 ◽

Vol 2 ◽

pp. 23-33 ◽

Cited By ~ 1

Author(s):

Emad Mohamed

Keyword(s):

Logistic Regression ◽

Noun Phrase ◽

Binary Logistic Regression ◽

Predictor Variables ◽

Sentence Structure ◽

Criterion Variable ◽

Data Set ◽

Part Of Speech ◽

History Of ◽

The Subject

In order to examine whether Arabic has Heavy Noun Phrase Shifting (HNPS), I have extracted from the Prague Arabic Dependency Treebank a data set in which a verb governs either an object NP and an Adjunct Phrase (PP or AdvP) or a subject NP and an Adjunct Phrase. I have used binary logistic regression where the criterion variable is whether the subject/object NP shifts, and used as predictor variables heaviness (the number of tokens per NP, adjunct), part of speech tag, verb disposition (ie. whether the verb has a history of taking double objects or sentential objects), NP number, NP definiteness, and the presence of referring pronouns in either the NP or the adjunct. The results show that only object heaviness and adjunct heaviness are useful predictors of object HNPS, while subject heaviness, adjunct heaviness, subject part of speech tag, definiteness, and adjunct head POS tags are active predictors of subject HNPS. I also show that HNPS can in principle be predicted from sentence structure.

Download Full-text

Pola Karangan Argumentatif Mahasiswa Prodi Bahasa Indonesia: Analisis Teks Model 'Top-Down' dan 'Bottom-Up'

Komposisi Jurnal Pendidikan Bahasa Sastra dan Seni ◽

10.24036/komposisi.v4i1.6446 ◽

2007 ◽

Vol 4 (1) ◽

pp. 9

Author(s):

Safnil Safnil

Keyword(s):

Top Down ◽

Bottom Up ◽

Bahasa Indonesia

Download Full-text

PENENTUAN KELAS KATA PADA PART OF SPEECH TAGGING KATA AMBIGU BAHASA INDONESIA

JISKA (Jurnal Informatika Sunan Kalijaga) ◽

10.14421/jiska.2018.23-05 ◽

2018 ◽

Vol 2 (3) ◽

pp. 157

Author(s):

Ahmad Subhan Yazid ◽

Agung Fatwanto

Keyword(s):

Language Processing ◽

Word Class ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Ambiguous Words ◽

Computer Science Faculty ◽

Speech Tagging ◽

Bahasa Indonesia

Indonesian hold a fundamental role in the communication. There is ambiguous problem in its machine learning implementation. In the Natural Language Processing study, Part of Speech (POS) tagging has a role in the decreasing this problem. This study use the Rule Based method to determine the best word class for ambiguous words in Indonesian. This research follows some stages: knowledge inventory, making algorithms, implementation, Testing, Analysis, and Conclusions. The first data used is Indonesian corpus that was developed by Language department of Computer science Faculty, Indonesia University. Then, data is processed and shown descriptively by following certain rules and specification. The result is a POS tagging algorithm included 71 rules in flowchart and descriptive sentence notation. Refer to testing result, the algorithm successfully provides 92 labeling of 100 tested words (92%). The results of the implementation are influenced by the availability of rules, word class tagsets and corpus data.

Download Full-text

Building Balinese Part-of-Speech Tagger Using Hidden Markov Model (HMM)

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v09.i02.p18 ◽

2020 ◽

Vol 9 (2) ◽

pp. 303

Author(s):

I Gde Made Hendra Pradiptha ◽

Ngurah Agus Sanjaya ER

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Probabilistic Approach ◽

Word Class ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Fast Processing ◽

Pos Tagger ◽

Speech Tagging

Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10) and tagged corpus around 3669 tokens with 21 tags. Based on the experiments conducted, the HMM method obtained an accuracy of 68.56%.

Download Full-text

Top-Down vs Bottom-Up Approaches to User Segmentation: The Best of Both Worlds

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1541931213601613 ◽

2017 ◽

Vol 61 (1) ◽

pp. 515-519

Author(s):

Stefania Mereu ◽

Matt Newman ◽

Michelle Peterson ◽

Eric Taylor ◽

Jessica White-Sustaita ◽

...

Keyword(s):

Cluster Analysis ◽

Data Gathering ◽

Design Teams ◽

Top Down ◽

Data Set ◽

Bottom Up ◽

User Behaviors ◽

Statistical Cluster ◽

Rich Data ◽

Agile Software

Within the fast-paced world of Lean and Agile software development, researchers are always on the lookout for methods that allow for rapid data gathering and analysis, while still yielding robust design recommendations. This paper considers the use cases for “top-down” hypothesis testing and “bottom-up” statistical cluster analysis, within survey research on user behaviors and needs. Comparing the application of each method on the same data set shows that statistical cluster analysis can create rich data-driven personas that inform user needs and preferences and provide design teams with insightful recommendations in a short amount of time. This method also increases the potential for gaining unexpected information from quantitative data—an achievement typically viewed as within the purview of qualitative research alone. Using both approaches to the same dataset allowed us to both answer specific questions for the design team, and learn new insights from the bottom up.

Download Full-text

How to tag non-standard language: Normalisation versus domain adaptation for Slovene historical and user-generated texts

Natural Language Engineering ◽

10.1017/s1351324919000366 ◽

2019 ◽

Vol 25 (5) ◽

pp. 651-674 ◽

Cited By ~ 1

Author(s):

Katja Zupan ◽

Nikola Ljubešić ◽

Tomaž Erjavec

Keyword(s):

Domain Adaptation ◽

Training Data ◽

Data Sets ◽

Standard Language ◽

Data Set ◽

Standard Data ◽

Pos Tagging ◽

Part Of Speech ◽

Versus Domain ◽

Pos Tagger

AbstractPart-of-speech (PoS) tagging of non-standard language with models developed for standard language is known to suffer from a significant decrease in accuracy. Two methods are typically used to improve it: word normalisation, which decreases the out-of-vocabulary rate of the PoS tagger, and domain adaptation where the tagger is made aware of the non-standard language variation, either through supervision via non-standard data being added to the tagger’s training set, or via distributional information calculated from raw texts. This paper investigates the two approaches, normalisation and domain adaptation, on carefully constructed data sets encompassing historical and user-generated Slovene texts, in particular focusing on the amount of labour necessary to produce the manually annotated data sets for each approach and comparing the resulting PoS accuracy. We give quantitative as well as qualitative analyses of the tagger performance in various settings, showing that on our data set closed and open class words exhibit significantly different behaviours, and that even small inconsistencies in the PoS tags in the data have an impact on the accuracy. We also show that to improve tagging accuracy, it is best to concentrate on obtaining manually annotated normalisation training data for short annotation campaigns, while manually producing in-domain training sets for PoS tagging is better when a more substantial annotation campaign can be undertaken. Finally, unsupervised adaptation via Brown clustering is similarly useful regardless of the size of the training data available, but improvements tend to be bigger when adaptation is performed via in-domain tagging data.

Download Full-text

Inverse modelling of European CH4 emissions during 2006–2012 using different inverse models and reassessed atmospheric observations

10.5194/acp-2017-273 ◽

2017 ◽

Cited By ~ 2

Author(s):

Peter Bergamaschi ◽

Ute Karstens ◽

Alistair J. Manning ◽

Marielle Saunois ◽

Aki Tsuruta ◽

...

Keyword(s):

A Priori ◽

Lower Troposphere ◽

Inverse Modelling ◽

Top Down ◽

Data Set ◽

Bottom Up ◽

Ch4 Emissions ◽

In Situ Data ◽

Inverse Models ◽

The Impact

Abstract. We present inverse modelling (top-down) estimates of European methane (CH4) emissions for 2006–2012 based on a new quality-controlled and harmonized in-situ data set from 18 European atmospheric monitoring stations. We applied an ensemble of seven inverse models and performed four inversion experiments, investigating the impact of different sets of stations and the use of a priori information on emissions. The inverse models infer total CH4 emissions of 26.7 (20.2–29.7) Tg CH4 yr−1 (mean, 10th and 90th percentiles from all inversions) for the EU-28 for 2006–2012 from the four inversion experiments. For comparison, total anthropogenic CH4 emissions reported to UNFCCC (bottom-up, based on statistical data and emissions factors) amount to only 21.3 Tg CH4 yr−1 (2006) to 18.8 Tg CH4 yr−1 (2012). A potential explanation for the higher range of top-down estimates compared to bottom-up inventories could be the contribution from natural sources, such as peatlands, wetlands, and wet soils. Based on seven different wetland inventories from the Wetland and Wetland CH4 Inter-comparison of Models Project (WETCHIMP) total wetland emissions of 4.3 (2.3–8.2) CH4 yr−1 from EU-28 are estimated. The hypothesis of significant natural emissions is supported by the finding that several inverse models yield significant seasonal cycles of derived CH4 emissions with maxima in summer, while anthropogenic CH4 emissions are assumed to have much lower seasonal variability. Furthermore, we investigate potential biases in the inverse models by comparison with regular aircraft profiles at four European sites and with vertical profiles obtained during the Infrastructure for Measurement of the European Carbon Cycle (IMECC) aircraft campaign. We present a novel approach to estimate the biases in the derived emissions, based on the comparison of simulated and measured enhancements of CH4 compared to the background, integrated over the entire boundary layer and over the lower troposphere. This analysis identifies regional biases for several models at the aircraft profile sites in France, Hungary and Poland.

Download Full-text

IMPLEMENTASI LEFT CORNER PARSING UNTUK PEMBELAJARAN GRAMMAR BAHASA INGGRIS PADA GAME 3D ADVENTURE “GO TO LONDON”

MATICS ◽

10.18860/mat.v0i0.2427 ◽

2013 ◽

Author(s):

Fachry Khusaini ◽

Fachrul Kurniawan

Keyword(s):

Top Down ◽

Bottom Up ◽

Left Corner

Bahasa Inggris adalah bahasa internasional yang digunakan seluruh orang di dunia untuk berinteraksi satu sama lain. Mempelajari bahasa Inggris sangat diperlukan pada saat ini. Banyak metode pembelajaran yang digunakan untuk mempelajari bahasa Inggris. Game merupakan media pembelajaran yang menyenangkan, terutama bagi anak. Melalui game, anak dapat belajar sambil bermain dalam mengembangkan kemampuannya. Game yang dibangun adalah game pembelajaran grammar bahasa Inggris menggunakan algoritma left corner parsing sebagai pemeriksa dalam kalimat yang dibuat.Algoritma yang digunakan dalam memeriksa sebuah kalimat sangat banyak macamnya, salah satunya algoritma left corner parsing. Algoritma left corner parsing merupakan gabungan dari dua algoritma, yaitu algoritma top down parsing dan bottom up parsing. Tugas algoritma ini memeriksa setiap kata dalam sebuah kalimat, kemudian mencocokkan sebuah pola grammar terhadap hasil pemeriksaan tersebut. Proses pemeriksaan ini yang akan menjadi pemeriksa kata dalam game. Dari uji coba yang dilakukan, metode left corner parsing dapat mengenali pola grammar dengan sangat baik, akan tetapi dalam mengenali makna untuk membentuk sebuah kalimat yang benar masih menjadi kekurangan.

Download Full-text

Hierarchical Forecasting of the Zimbabwe International Tourist Arrivals

Statistics Optimization & Information Computing ◽

10.19139/soic-2310-5070-959 ◽

2021 ◽

Vol 9 (1) ◽

pp. 137-156

Author(s):

Tendai Makoni ◽

Delson Chikobvu ◽

Caston Sigauke

Keyword(s):

Optimal Combination ◽

Top Down ◽

General Increase ◽

Data Set ◽

Bottom Up ◽

International Tourist ◽

The Government ◽

Resource Mobilisation ◽

Insight Into ◽

Modelling And Forecasting

The objectives of the paper is to: (1) adopt the hierarchical forecasting methods in modelling and forecasting international tourist arrivals in Zimbabwe; and (2) coming up with Zimbabwe international tourist arrivals Prediction Intervals (PIs) in Quantile Regression Averaging (QRA) to hierarchical tourism forecasts. Zimbabwe’s monthly international tourist arrivals data from January 2002 to December 2018 was used. The dataset used was before the COVID-19 period and were disaggregated according to the purpose of the visit (POV). Three hierarchical forecasting approaches, namely top-down, bottom-up and optimal combination approaches were applied to the data. The results showed the superiority of the bottom-up approach over both the top-down and optimal combination approaches. Forecasts indicate a general increase in aggregate series. The combined methods provide a new insight into modelling tourist arrivals. The approach is useful to the government, tourism stakeholders, and investors among others, for decision-making, resource mobilisation and allocation. The Zimbabwe Tourism Authority (ZTA) could adopt the forecasting techniques to produce informative and precise tourism forecasts. The data set used is before the COVID-19 pandemic and the models indicate what could happen outside the pandemic. During the pandemic the country was under lockdown with no tourist arrivals to report on. The models are useful for planning purposes beyond the COVID-19 pandemic.

Download Full-text

Inverse modelling of European CH4 emissions during 2006–2012 using different inverse models and reassessed atmospheric observations

Atmospheric Chemistry and Physics ◽

10.5194/acp-18-901-2018 ◽

2018 ◽

Vol 18 (2) ◽

pp. 901-920 ◽

Cited By ~ 23

Author(s):

Peter Bergamaschi ◽

Ute Karstens ◽

Alistair J. Manning ◽

Marielle Saunois ◽

Aki Tsuruta ◽

...

Keyword(s):

Regional Distribution ◽

Inverse Modelling ◽

Top Down ◽

Data Set ◽

Bottom Up ◽

Natural Sources ◽

Ch4 Emissions ◽

Inverse Models ◽

The Impact ◽

The Eu

Abstract. We present inverse modelling (top down) estimates of European methane (CH4) emissions for 2006–2012 based on a new quality-controlled and harmonised in situ data set from 18 European atmospheric monitoring stations. We applied an ensemble of seven inverse models and performed four inversion experiments, investigating the impact of different sets of stations and the use of a priori information on emissions. The inverse models infer total CH4 emissions of 26.8 (20.2–29.7) Tg CH4 yr−1 (mean, 10th and 90th percentiles from all inversions) for the EU-28 for 2006–2012 from the four inversion experiments. For comparison, total anthropogenic CH4 emissions reported to UNFCCC (bottom up, based on statistical data and emissions factors) amount to only 21.3 Tg CH4 yr−1 (2006) to 18.8 Tg CH4 yr−1 (2012). A potential explanation for the higher range of top-down estimates compared to bottom-up inventories could be the contribution from natural sources, such as peatlands, wetlands, and wet soils. Based on seven different wetland inventories from the Wetland and Wetland CH4 Inter-comparison of Models Project (WETCHIMP), total wetland emissions of 4.3 (2.3–8.2) Tg CH4 yr−1 from the EU-28 are estimated. The hypothesis of significant natural emissions is supported by the finding that several inverse models yield significant seasonal cycles of derived CH4 emissions with maxima in summer, while anthropogenic CH4 emissions are assumed to have much lower seasonal variability. Taking into account the wetland emissions from the WETCHIMP ensemble, the top-down estimates are broadly consistent with the sum of anthropogenic and natural bottom-up inventories. However, the contribution of natural sources and their regional distribution remain rather uncertain. Furthermore, we investigate potential biases in the inverse models by comparison with regular aircraft profiles at four European sites and with vertical profiles obtained during the Infrastructure for Measurement of the European Carbon Cycle (IMECC) aircraft campaign. We present a novel approach to estimate the biases in the derived emissions, based on the comparison of simulated and measured enhancements of CH4 compared to the background, integrated over the entire boundary layer and over the lower troposphere. The estimated average regional biases range between −40 and 20 % at the aircraft profile sites in France, Hungary and Poland.

Download Full-text

PARSING TWITTER MENGGUNAKAN METODE LEFT-CORNER PARSING DENGAN MEMANFAATKAN POS TAGGER

Object and subject Heavy-NP shift in Arabic

Pola Karangan Argumentatif Mahasiswa Prodi Bahasa Indonesia: Analisis Teks Model 'Top-Down' dan 'Bottom-Up'

PENENTUAN KELAS KATA PADA PART OF SPEECH TAGGING KATA AMBIGU BAHASA INDONESIA

Building Balinese Part-of-Speech Tagger Using Hidden Markov Model (HMM)

Top-Down vs Bottom-Up Approaches to User Segmentation: The Best of Both Worlds

How to tag non-standard language: Normalisation versus domain adaptation for Slovene historical and user-generated texts

Inverse modelling of European CH<sub>4</sub> emissions during 2006–2012 using different inverse models and reassessed atmospheric observations

IMPLEMENTASI LEFT CORNER PARSING UNTUK PEMBELAJARAN GRAMMAR BAHASA INGGRIS PADA GAME 3D ADVENTURE “GO TO LONDON”

Hierarchical Forecasting of the Zimbabwe International Tourist Arrivals

Inverse modelling of European CH<sub>4</sub> emissions during 2006–2012 using different inverse models and reassessed atmospheric observations

Export Citation Format