File Format Conversion Technology Research on Uygur Text Corpus Construction

Author(s):  
Askar yakup ◽  
Xiangwei Qi ◽  
Haijun Zhang ◽  
Yusup Abaydulla
2019 ◽  
Author(s):  
Andrew Webb ◽  
Jared Knoblauch ◽  
Nitesh Sabankar ◽  
Apeksha Sukesh Kallur ◽  
Jody Hey ◽  
...  

AbstractHere we present the Pop-Gen Pipeline Platform (PPP), a software platform with the goal of reducing the computational expertise required for conducting population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized Python environment. Functions were developed to encompass entire workflows, including: input preparation, file format conversion, various population genomic analyses, output generation, and visualization. By facilitating entire workflows, the PPP offers several benefits to prospective end users - it reduces the need of redundant in-house software and scripts that would require development time and may be error-prone, or incorrect. The platform has also been developed with reproducibility and extensibility of analyses in mind. The PPP is an open-source package that is available for download and use at https://ppp.readthedocs.io/en/latest/PPP_pages/install.html


Applied laser ◽  
2014 ◽  
Vol 34 (4) ◽  
pp. 366-370
Author(s):  
罗曼 Luo Man ◽  
潘涌 Pan Yong ◽  
张瑄珺 Zhang Xuanjun ◽  
李浩 Li Hao ◽  
安博言 An Boyan

2008 ◽  
Vol 3 (2) ◽  
pp. 98-99
Author(s):  
Chinnaiah Swaminathan Vinobha ◽  
Maruthamuthu Rajadurai ◽  
Ekambaram Rajasekaran

Applied laser ◽  
2014 ◽  
Vol 34 (4) ◽  
pp. 366-370
Author(s):  
罗曼 Luo Man ◽  
潘涌 Pan Yong ◽  
张瑄珺 Zhang Xuanjun ◽  
李浩 Li Hao ◽  
安博言 An Boyan

Author(s):  
Shunsuke Kozawa ◽  
Hitomi Tohyama ◽  
Kiyotaka Uchimoto ◽  
Shigeki Matsubara

Recently, language resources have become indispensable for linguistic researches. However, existing language resources are seldom fully utilized because their variety of usage is not well known, indicating that their intrinsic value is not recognized very well either. Regarding this issue, lists of usage information might improve language resource searches and lead to their efficient use. In this research, therefore, we collect a list of usage information for each language resource from academic articles to promote the efficient utilization of language resources. This paper describes the construction of a text corpus annotated with usage information (UI corpus). In particular, we automatically extract sentences containing language resource names from academic articles. Then, the extracted sentences are annotated with usage information by two annotators in a cascaded manner. We will show that the UI corpus contributes to efficient language resource searches, by combining the UI corpus with a metadata database of language resources and comparing the number of language resources retrieved with and without the UI corpus.


Sign in / Sign up

Export Citation Format

Share Document