scholarly journals Large Linguistic Corpus Reduction with SCP Algorithms

2015 ◽  
Vol 41 (3) ◽  
pp. 355-383 ◽  
Author(s):  
Nelly Barbot ◽  
Olivier Boëffard ◽  
Jonathan Chevelu ◽  
Arnaud Delhay

Linguistic corpus design is a critical concern for building rich annotated corpora useful in different domains of applications. For example, speech technologies such as ASR (Automatic Speech Recognition) or TTS (Text-to-Speech) need a huge amount of speech data to train data-driven models or to produce synthetic speech. Collecting data is always related to costs (recording speech, verifying annotations, etc.), and as a rule of thumb, the more data you gather, the more costly your application will be. Within this context, we present in this article solutions to reduce the amount of linguistic text content while maintaining a sufficient level of linguistic richness required by a model or an application. This problem can be formalized as a Set Covering Problem (SCP) and we evaluate two algorithmic heuristics applied to design large text corpora in English and French for covering phonological information or POS labels. The first considered algorithm is a standard greedy solution with an agglomerative/spitting strategy and we propose a second algorithm based on Lagrangian relaxation. The latter approach provides a lower bound to the cost of each covering solution. This lower bound can be used as a metric to evaluate the quality of a reduced corpus whatever the algorithm applied. Experiments show that a suboptimal algorithm like a greedy algorithm achieves good results; the cost of its solutions is not so far from the lower bound (about 4.35% for 3-phoneme coverings). Usually, constraints in SCP are binary; we proposed here a generalization where the constraints on each covering feature can be multi-valued.

Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1839
Author(s):  
Broderick Crawford ◽  
Ricardo Soto ◽  
José Lemus-Romani ◽  
Marcelo Becerra-Rozas ◽  
José M. Lanza-Gutiérrez ◽  
...  

One of the central issues that must be resolved for a metaheuristic optimization process to work well is the dilemma of the balance between exploration and exploitation. The metaheuristics (MH) that achieved this balance can be called balanced MH, where a Q-Learning (QL) integration framework was proposed for the selection of metaheuristic operators conducive to this balance, particularly the selection of binarization schemes when a continuous metaheuristic solves binary combinatorial problems. In this work the use of this framework is extended to other recent metaheuristics, demonstrating that the integration of QL in the selection of operators improves the exploration-exploitation balance. Specifically, the Whale Optimization Algorithm and the Sine-Cosine Algorithm are tested by solving the Set Covering Problem, showing statistical improvements in this balance and in the quality of the solutions.


Author(s):  
Er.Meenakshi . ◽  
Dr.Satpal .

Today internet is a place where the huge amount of data is stored, there is need to sift, which create a problem for the internet user, so recommend system solve the problem. A recommendation system is a system that helps a user found the products and content by forecast the user’s rating of each item and showing them the items that they would rate highly. Recommendation systems are everywhere. With online shopping, customer has nearly infinite choices. No one has enough time to try every product for sale. Recommendation systems play an important role to solve the users search the products and content they care about. Recommendation system is a process of filtering the information that deal with information overloaded problems. Recommendation system is important for both user and service provider. It reduces the cost of transaction and selecting item in an online scenario it also improve the quality of decision making process. It is now an effective means for selling their product. So over emphasized of user is not good for recommendation system. To solve the problems of recommendation system like data sparsity we use one of best technique that is collaborative filtering technique.


2019 ◽  
Vol 2019 ◽  
pp. 1-16 ◽  
Author(s):  
José García ◽  
Paola Moraga ◽  
Matias Valenzuela ◽  
Broderick Crawford ◽  
Ricardo Soto ◽  
...  

The integration of machine learning techniques and metaheuristic algorithms is an area of interest due to the great potential for applications. In particular, using these hybrid techniques to solve combinatorial optimization problems (COPs) to improve the quality of the solutions and convergence times is of great interest in operations research. In this article, the db-scan unsupervised learning technique is explored with the goal of using it in the binarization process of continuous swarm intelligence metaheuristic algorithms. The contribution of the db-scan operator to the binarization process is analyzed systematically through the design of random operators. Additionally, the behavior of this algorithm is studied and compared with other binarization methods based on clusters and transfer functions (TFs). To verify the results, the well-known set covering problem is addressed, and a real-world problem is solved. The results show that the integration of the db-scan technique produces consistently better results in terms of computation time and quality of the solutions when compared with TFs and random operators. Furthermore, when it is compared with other clustering techniques, we see that it achieves significantly improved convergence times.


Author(s):  
Nur Maimun ◽  
Jihan Natassa ◽  
Wen Via Trisna ◽  
Yeye Supriatin

The accuracy in administering the diagnosis code was the important matter for medical recorder, quality of data was the most important thing for health information management of medical recorder. This study aims to know the coder competency for accuracy and precision of using ICD 10 at X Hospital in Pekanbaru. This study was a qualitative method with case study implementation from five informan. The result show that medical personnel (doctor) have never received a training about coding, doctors writing that hard and difficult to read, failure for making diagnoses code or procedures, doctor used an usual abbreviations that are not standard, theres still an officer who are not understand about the nomenclature and mastering anatomy phatology, facilities and infrastructure were supported for accuracy and precision of the existing code. The errors of coding always happen because there is a human error. The accuracy and precision in coding very influence against the cost of INA CBGs, medical and the committee did most of the work in the case of severity level III, while medical record had a role in monitoring or evaluation of coding implementation. If there are resumes that is not clearly case mix team check file needed medical record the result the diagnoses or coding for conformity. Keywords: coder competency, accuracy and precision of coding, ICD 10


2017 ◽  
pp. 139-145
Author(s):  
R. I. Hamidullin ◽  
L. B. Senkevich

A study of the quality of the development of estimate documentation on the cost of construction at all stages of the implementation of large projects in the oil and gas industry is conducted. The main problems that arise in construction organizations are indicated. The analysis of the choice of the perfect methodology of mathematical modeling of the investigated business process for improving the activity of budget calculations, conducting quality assessment of estimates and criteria for automation of design estimates is performed.


2015 ◽  
Vol 6 (1) ◽  
pp. 50-57
Author(s):  
Rizqa Raaiqa Bintana ◽  
Putri Aisyiyah Rakhma Devi ◽  
Umi Laili Yuhana

The quality of the software can be measured by its return on investment. Factors which may affect the return on investment (ROI) is the tangible factors (such as the cost) dan intangible factors (such as the impact of software to the users or stakeholder). The factor of the software itself are assessed through reviewing, testing, process audit, and performance of software. This paper discusses the consideration of return on investment (ROI) assessment criteria derived from the software and its users. These criteria indicate that the approach may support a rational consideration of all relevant criteria when evaluating software, and shows examples of actual return on investment models. Conducted an analysis of the assessment criteria that affect the return on investment if these criteria have a disproportionate effort that resulted in a return on investment of a software decreased. Index Terms - Assessment criteria, Quality assurance, Return on Investment, Software product


1991 ◽  
Vol 24 (10) ◽  
pp. 269-276
Author(s):  
J. R. Lawrence ◽  
N. C. D. Craig

The public has ever-rising expectations for the environmental quality of the North Sea and hence of everreducing anthropogenic inputs; by implication society must be willing to accept the cost of reduced contamination. The chemical industry accepts that it has an important part to play in meeting these expectations, but it is essential that proper scientific consideration is given to the potential transfer of contamination from one medium to another before changes are made. A strategy for North Sea protection is put forward as a set of seven principles that must govern the management decisions that are made. Some areas of uncertainty are identified as important research targets. It is concluded that although there have been many improvements over the last two decades, there is more to be done. A systematic and less emotive approach is required to continue the improvement process.


2020 ◽  
pp. 61-73
Author(s):  
Yu. M. Tsygalov

The forced work of Russian universities remotely in the context of the pandemic (COVID-19) has generated a lot of discussion about the benefits of the new form of education. The first results were summed up and reports were presented, the materials of which showed that the main goal of online education — the prevention of the spread of infection, - has been achieved. Against this background, proposals and publications have appeared substantiating the effectiveness of the massive introduction of distance learning in Russia, including in higher education. However, the assessment of such training by the population and students in publications and in social networks was predominantly negative and showed that the number of emerging problems exceeds the possible benefits of the new educational technology. Based on the analysis of the materials of publications and personal experience of teaching online, the potential benefits and problems of distance learning in higher education in Russia are considered. It is proposed to consider the effects separately for the suppliers of new technology (government, universities) and consumers (students, teachers, society). It is substantiated that the massive introduction of online education allows not only to reduce the negative consequences of epidemics, but also to reduce budgetary funding for universities, optimize the age composition of teachers, and reduce the cost of maintaining educational buildings. However, there will be a leveling / averaging of the quality of education, and responsibility for the quality of training will shift from the state/universities to students. The critical shortcomings of online education are the low degree of readiness of the digital infrastructure, the lack of a mechanism for identifying and monitoring the work of students, information security problems, and the lack of trust in such training of the population. The massive use of online education creates a number of risks for the country, the most critical of which is the destruction of the higher education system and a drop in the effectiveness of personnel training. The consequences of this risk realization are not compensated by any possible budget savings.


Sign in / Sign up

Export Citation Format

Share Document