Plural marking patterns of nouns and their associates in the world’s languages

2020 ◽  
Vol 44 (1) ◽  
pp. 231-269
Author(s):  
Rong Chen

Abstract Plural marking reaches most corners of languages. When a noun occurs with another linguistic element, which is called associate in this paper, plural marking on the two-component structure has four logically possible patterns: doubly unmarked, noun-marked, associate-marked and doubly marked. These four patterns do not distribute homogeneously in the world’s languages, because they are motivated by two competing motivations iconicity and economy. Some patterns are preferred over others, and this preference is consistently found in languages across the world. In other words, there exists a universal distribution of the four plural marking patterns. Furthermore, holding the view that plural marking on associates expresses plurality of nouns, I propose a hypothetical universal which uses the number of pluralized associates to predict plural marking on nouns. A data set collected from a sample of 100 languages is used to test the hypothetical universal, by employing the machine learning algorithm logistic regression.

Author(s):  
Kunal Parikh ◽  
Tanvi Makadia ◽  
Harshil Patel

Dengue is unquestionably one of the biggest health concerns in India and for many other developing countries. Unfortunately, many people have lost their lives because of it. Every year, approximately 390 million dengue infections occur around the world among which 500,000 people are seriously infected and 25,000 people have died annually. Many factors could cause dengue such as temperature, humidity, precipitation, inadequate public health, and many others. In this paper, we are proposing a method to perform predictive analytics on dengue’s dataset using KNN: a machine-learning algorithm. This analysis would help in the prediction of future cases and we could save the lives of many.


2022 ◽  
Vol 11 (1) ◽  
pp. 325-337
Author(s):  
Natalia Gil ◽  
Marcelo Albuquerque ◽  
Gabriela de

<p style="text-align: justify;">The article aims to develop a machine-learning algorithm that can predict student’s graduation in the Industrial Engineering course at the Federal University of Amazonas based on their performance data. The methodology makes use of an information package of 364 students with an admission period between 2007 and 2019, considering characteristics that can affect directly or indirectly in the graduation of each one, being: type of high school, number of semesters taken, grade-point average, lockouts, dropouts and course terminations. The data treatment considered the manual removal of several characteristics that did not add value to the output of the algorithm, resulting in a package composed of 2184 instances. Thus, the logistic regression, MLP and XGBoost models developed and compared could predict a binary output of graduation or non-graduation to each student using 30% of the dataset to test and 70% to train, so that was possible to identify a relationship between the six attributes explored and achieve, with the best model, 94.15% of accuracy on its predictions.</p>


A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


Author(s):  
Alexandre Todorov

The aim of the RELIEF algorithm is to filter out features (e.g., genes, environmental factors) that are relevant to a trait of interest, starting from a set of that may include thousands of irrelevant features. Though widely used in many fields, its application to the study of gene-environment interaction studies has been limited thus far. We provide here an overview of this machine learning algorithm and some of its variants. Using simulated data, we then compare of the performance of RELIEF to that of logistic regression for screening for gene-environment interactions in SNP data. Even though performance degrades in larger sets of markers, RELIEF remains a competitive alternative to logistic regression, and shows clear promise as a tool for the study of gene-environment interactions. Areas for further improvements of the algorithm are then suggested.


Author(s):  
Sercan Demirci ◽  
Durmuş Özkan Şahin ◽  
Ibrahim Halil Toprak

Skin cancer, which is one of the most common types of cancer in the world, is a malignant growth seen on the skin due to various reasons. There was an increase in the number of the cases of skin cancer nearly 200% between 2004-2009. Since the ozone layer is depleting, harmful rays reflected from the sun cannot be filtered. In this case, the likelihood of skin cancer will increase over the years and pose more risks for human beings. Early diagnosis is very significant as in all types of cancers. In this study, a mobile application is developed in order to detect whether the skin spots photographed by using the machine learning technique for early diagnosis have a suspicion of skin cancer. Thus, an auxiliary decision support system is developed that can be used both by the clinicians and individuals. For cases that are predicted to have a risk higher than a certain rate by the machine learning algorithm, early diagnosis could be initiated for the patients by consulting a physician when the case is considered to have a higher risk by machine learning algorithm.


2020 ◽  
Vol 17 (9) ◽  
pp. 4294-4298
Author(s):  
B. R. Sunil Kumar ◽  
B. S. Siddhartha ◽  
S. N. Shwetha ◽  
K. Arpitha

This paper intends to use distinct machine learning algorithms and exploring its multi-features. The primary advantage of machine learning is, a machine learning algorithm can predict its work automatically by learning what to do with information. This paper reveals the concept of machine learning and its algorithms which can be used for different applications such as health care, sentiment analysis and many more. Sometimes the programmers will get confused which algorithm to apply for their applications. This paper provides an idea related to the algorithm used on the basis of how accurately it fits. Based on the collected data, one of the algorithms can be selected based upon its pros and cons. By considering the data set, the base model is developed, trained and tested. Then the trained model is ready for prediction and can be deployed on the basis of feasibility.


Author(s):  
G. Keerthi Devipriya ◽  
E. Chandana ◽  
B. Prathyusha ◽  
T. Seshu Chakravarthy

Here by in this paper we are interested for classification of Images and Recognition. We expose the performance of training models by using a classifier algorithm and an API that contains set of images where we need to compare the uploaded image with the set of images available in the data set that we have taken. After identifying its respective category the image need to be placed in it. In order to classify images we are using a machine learning algorithm that comparing and placing the images.


2019 ◽  
Vol 8 (4) ◽  
pp. 2299-2302

Implementing a machine learning algorithm gives you a deep and practical appreciation for how the algorithm works. This knowledge can also help you to internalize the mathematical description of the algorithm by thinking of the vectors and matrices as arrays and the computational intuitions for the transformations on those structures. There are numerous micro-decisions required when implementing a machine learning algorithm, like Select programming language, Select Algorithm, Select Problem, Research Algorithm, Unit Test and these decisions are often missing from the formal algorithm descriptions. The notion of implementing a job recommendation (a classic machine learning problem) system using to two algorithms namely, KNN [3] and logistic regression [3] in more than one programming language (C++ and python) is introduced and we bring here the analysis and comparison of performance of each. We specifically focus on building a model for predictions of jobs in the field of computer sciences but they can be applied to a wide range of other areas as well. This paper can be used by implementers to deduce which language will best suite their needs to achieve accuracy along with efficiency We are using more than one algorithm to establish the fact that our finding is not just singularly applicable.


2021 ◽  
Vol 8 (3) ◽  
pp. 209-221
Author(s):  
Li-Li Wei ◽  
Yue-Shuai Pan ◽  
Yan Zhang ◽  
Kai Chen ◽  
Hao-Yu Wang ◽  
...  

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.


Sign in / Sign up

Export Citation Format

Share Document