scholarly journals CORPUS-DRIVEN BAMBARA SPELLING DICTIONARY

Author(s):  
V. F. Vydrin ◽  
◽  
J. J. Méric ◽  

A model for the development of a corpus-driven spelling dictionary for the Bambara language is described. First, a list of about 4,000 lexemes characterized by spelling variability is extracted from an electronic BambaraFrench dictionary. At the next stage, a script is applied to determine the number of occurrences of each spelling variant in the Bambara Reference Corpus, separately for the entire Corpus (more than 11 million words) and for its disambiguated subcorpus (about 1.5 million words). Statistics on the diversity of sources and authors are also obtained automatically. The statistical data are then sorted manually into two lists of lexemes: those whose standard spelling can be established statistically, and those requiring evaluation by expert linguists. Some difficult cases are discussed in the paper. At the final stage, a representative expert commission will discuss all those lexemes for which statistical data alone do not suffice to define a standard spelling variant, before taking a final decision on each. The resulting Bambara spelling dictionary will be published electronically and on paper.

2019 ◽  
Vol 26 (12) ◽  
pp. 1618-1626 ◽  
Author(s):  
Davy Weissenbacher ◽  
Abeed Sarker ◽  
Ari Klein ◽  
Karen O’Connor ◽  
Arjun Magge ◽  
...  

Abstract Objective Twitter posts are now recognized as an important source of patient-generated data, providing unique insights into population health. A fundamental step toward incorporating Twitter data in pharmacoepidemiologic research is to automatically recognize medication mentions in tweets. Given that lexical searches for medication names suffer from low recall due to misspellings or ambiguity with common words, we propose a more advanced method to recognize them. Materials and Methods We present Kusuri, an Ensemble Learning classifier able to identify tweets mentioning drug products and dietary supplements. Kusuri (薬, “medication” in Japanese) is composed of 2 modules: first, 4 different classifiers (lexicon based, spelling variant based, pattern based, and a weakly trained neural network) are applied in parallel to discover tweets potentially containing medication names; second, an ensemble of deep neural networks encoding morphological, semantic, and long-range dependencies of important words in the tweets makes the final decision. Results On a class-balanced (50-50) corpus of 15 005 tweets, Kusuri demonstrated performances close to human annotators with an F1 score of 93.7%, the best score achieved thus far on this corpus. On a corpus made of all tweets posted by 112 Twitter users (98 959 tweets, with only 0.26% mentioning medications), Kusuri obtained an F1 score of 78.8%. To the best of our knowledge, Kusuri is the first system to achieve this score on such an extremely imbalanced dataset. Conclusions The system identifies tweets mentioning drug names with performance high enough to ensure its usefulness, and is ready to be integrated in pharmacovigilance, toxicovigilance, or more generally, public health pipelines that depend on medication name mentions.


Turyzm ◽  
2018 ◽  
Vol 28 (2) ◽  
pp. 73-84
Author(s):  
Sylwia Żakowska ◽  
Katarzyna Podhorodecka

This article presents the correlation between natural and non-natural tourism assets and the distribution of tourist accommodation in the 24 powiats (districts) of Łódź Province. The authors, having divided these assets into natural and non-natural, discusses their occurrence in individual powiats. Next, tourist accommodation in Łódź Province is described, along with a presentation of statistical data. An important part of the paper is the presentation of the research results obtained by means of the point bonitation method. At the final stage, Spearman rank correlation coefficients is calculated, showing the strength of the relationship between selected tourism assets and the distribution of tourist accommodation.


Author(s):  
Yuri Ovchinnikov

The author discusses the question of enforcing the procedural independence of investigators which, in the current situation, has practically ceased to exist. Using the results of a sociological survey and statistical data, the author shows that there is a considerable shortage of personnel in the investigatory departments of internal affairs’ bodies and the Investigatory Committee of the Russian Federation, which leads to a high workload for investigators of criminal cases. The author describes the procedure by which the prosecutor assigns the investigator inappropriate functions, like handing the accused a copy of the indictment or ensuring his/her appearance before the prosecutor to obtain a copy of the final decision (using the examples of investigative divisions of the Internal Affair’s Department for Primorsky Region). The author concludes that, in order to encourage experienced investigators to stay in their jobs, and to enhance their prestige and authority, it is necessary to consider increasing their salaries compared to other personnel of the Agency, which will eliminate the shortage of staff, and also to include in legislation the criminal law mechanisms of increasing the procedural independence of the investigator.


1968 ◽  
Vol 11 (1) ◽  
pp. 204-218 ◽  
Author(s):  
Elizabeth Dodds ◽  
Earl Harford

Persons with a high frequency hearing loss are difficult cases for whom to find suitable amplification. We have experienced some success with this problem in our Hearing Clinics using a specially designed earmold with a hearing aid. Thirty-five cases with high frequency hearing losses were selected from our clinical files for analysis of test results using standard, vented, and open earpieces. A statistical analysis of test results revealed that PB scores in sound field, using an average conversational intensity level (70 dB SPL), were enhanced when utilizing any one of the three earmolds. This result was due undoubtedly to increased sensitivity provided by the hearing aid. Only the open earmold used with a CROS hearing aid resulted in a significant improvement in discrimination when compared with the group’s unaided PB score under earphones or when comparing inter-earmold scores. These findings suggest that the inclusion of the open earmold with a CROS aid in the audiologist’s armamentarium should increase his flexibility in selecting hearing aids for persons with a high frequency hearing loss.


1989 ◽  
Vol 28 (02) ◽  
pp. 69-77 ◽  
Author(s):  
R. Haux

Abstract:Expert systems in medicine are frequently restricted to assisting the physician to derive a patient-specific diagnosis and therapy proposal. In many cases, however, there is a clinical need to use these patient data for other purposes as well. The intention of this paper is to show how and to what extent patient data in expert systems can additionally be used to create clinical registries and for statistical data analysis. At first, the pitfalls of goal-oriented mechanisms for the multiple usability of data are shown by means of an example. Then a data acquisition and inference mechanism is proposed, which includes a procedure for controlling selection bias, the so-called knowledge-based attribute selection. The functional view and the architectural view of expert systems suitable for the multiple usability of patient data is outlined in general and then by means of an application example. Finally, the ideas presented are discussed and compared with related approaches.


1976 ◽  
Vol 15 (01) ◽  
pp. 36-42 ◽  
Author(s):  
J. Schlörer

From a statistical data bank containing only anonymous records, the records sometimes may be identified and then retrieved, as personal records, by on line dialogue. The risk mainly applies to statistical data sets representing populations, or samples with a high ratio n/N. On the other hand, access controls are unsatisfactory as a general means of protection for statistical data banks, which should be open to large user communities. A threat monitoring scheme is proposed, which will largely block the techniques for retrieval of complete records. If combined with additional measures (e.g., slight modifications of output), it may be expected to render, from a cost-benefit point of view, intrusion attempts by dialogue valueless, if not absolutely impossible. The bona fide user has to pay by some loss of information, but considerable flexibility in evaluation is retained. The proposal of controlled classification included in the scheme may also be useful for off line dialogue systems.


2003 ◽  
pp. 136-146
Author(s):  
K. Liuhto

Statistical data on reserves, production and exports of Russian oil are provided in the article. The author pays special attention to the expansion of opportunities of sea oil transportation by construction of new oil terminals in the North-West of the country and first of all the largest terminal in Murmansk. In his opinion, one of the main problems in this sphere is prevention of ecological accidents in the process of oil transportation through the Baltic sea ports.


Sign in / Sign up

Export Citation Format

Share Document