Learning Predictors from Multidimensional Data with Tensor Factorizations

Soo Min Kwon; Anand D. Sarwate

doi:10.14713/arestyrurj.v1i3.165

Learning Predictors from Multidimensional Data with Tensor Factorizations

Aresty Rutgers Undergraduate Research Journal ◽

10.14713/arestyrurj.v1i3.165 ◽

2021 ◽

Vol 1 (3) ◽

Author(s):

Soo Min Kwon ◽

Anand D. Sarwate

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Multidimensional Data ◽

Tensor Structure ◽

Statistical Machine Learning ◽

Machine Learning Classification ◽

Independent Variables ◽

Tensor Factorizations ◽

The Relationship

Statistical machine learning algorithms often involve learning a linear relationship between dependent and independent variables. This relationship is modeled as a vector of numerical values, commonly referred to as weights or predictors. These weights allow us to make predictions, and the quality of these weights influence the accuracy of our predictions. However, when the dependent variable inherently possesses a more complex, multidimensional structure, it becomes increasingly difficult to model the relationship with a vector. In this paper, we address this issue by investigating machine learning classification algorithms with multidimensional (tensor) structure. By imposing tensor factorizations on the predictors, we can better model the relationship, as the predictors would take the form of the data in question. We empirically show that our approach works more efficiently than the traditional machine learning method when the data possesses both an exact and an approximate tensor structure. Additionally, we show that estimating predictors with these factorizations also allow us to solve for fewer parameters, making computation more feasible for multidimensional data.

Download Full-text

Relevant Independent Variables on MOBA Video Games to Train Machine Learning Algorithms

10.24132/csrn.2021.3101.19 ◽

2021 ◽

Author(s):

Juan Guillermo López Guzmán ◽

Cesar Julio Bustacara Medina

Keyword(s):

Machine Learning ◽

Video Games ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Multidimensional Data ◽

Data Sets ◽

Network Architectures ◽

Independent Variables ◽

Learning Techniques ◽

Multidimensional Data Sets

Popularity of Multiplayer Online Battle Arena (MOBA) video games has grown considerably, its popularity as well as the complexity of their playability, have attracted the attention in recent years of researchers from various areas of knowledge and in particular how they have resorted to different machine learning techniques. The papers reviewed mainly look for patterns in multidimensional data sets. Furthermore, these previous researches do not present a way to select the independent variables (predictors) to train the models. For this reason, this paper proposes a list of variables based on the techniques used and the objectives of the research. It allows to provide a set of variables to find patterns applied in MOBA videogames. In order to get the mentioned list, the consulted works were grouped by the used machine learning techniques, ranging from rule-based systems to complex neural network architectures. Also, a grouping technique is applied based on the objective of each research proposed.

Download Full-text

Restaurants Rating Prediction using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f3754.049620 ◽

2020 ◽

Vol 9 (6) ◽

pp. 466-469

Keyword(s):

Machine Learning ◽

Food Quality ◽

Average Cost ◽

Machine Learning Algorithms ◽

Regression Problem ◽

Independent Variable ◽

Quality Of Food ◽

Rating Prediction ◽

The Relationship

Restaurant Rating has become the most commonly used parameter for judging a restaurant for any individual. A lot of research has been done on different restaurants and the quality of food it serves. Rating of a restaurant depends on factors like reviews, area situated, average cost for two people, votes, cuisines and the type of restaurant. The project aim is to find out the relationship between the dependent and independent variable. Proposed project is a Machine Learning Regression problem which uses Restaurant Rating dataset. Based on various attributes like the food, quality, prize ambience of the restaurant it predicts the Restaurant Rating

Download Full-text

Algorithmic and human prediction of success in human collaboration from visual features

Scientific Reports ◽

10.1038/s41598-021-81145-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Martin Saveski ◽

Edmond Awad ◽

Iyad Rahwan ◽

Manuel Cebrian

Keyword(s):

Machine Learning ◽

Visual Cues ◽

Success Factors ◽

Group Performance ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Adventure Game ◽

Group Success ◽

The Relationship ◽

Better Than

AbstractAs groups are increasingly taking over individual experts in many tasks, it is ever more important to understand the determinants of group success. In this paper, we study the patterns of group success in Escape The Room, a physical adventure game in which a group is tasked with escaping a maze by collectively solving a series of puzzles. We investigate (1) the characteristics of successful groups, and (2) how accurately humans and machines can spot them from a group photo. The relationship between these two questions is based on the hypothesis that the characteristics of successful groups are encoded by features that can be spotted in their photo. We analyze >43K group photos (one photo per group) taken after groups have completed the game—from which all explicit performance-signaling information has been removed. First, we find that groups that are larger, older and more gender but less age diverse are significantly more likely to escape. Second, we compare humans and off-the-shelf machine learning algorithms at predicting whether a group escaped or not based on the completion photo. We find that individual guesses by humans achieve 58.3% accuracy, better than random, but worse than machines which display 71.6% accuracy. When humans are trained to guess by observing only four labeled photos, their accuracy increases to 64%. However, training humans on more labeled examples (eight or twelve) leads to a slight, but statistically insignificant improvement in accuracy (67.4%). Humans in the best training condition perform on par with two, but worse than three out of the five machine learning algorithms we evaluated. Our work illustrates the potentials and the limitations of machine learning systems in evaluating group performance and identifying success factors based on sparse visual cues.

Download Full-text

Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

Animals ◽

10.3390/ani11010241 ◽

2021 ◽

Vol 11 (1) ◽

pp. 241

Author(s):

Dongwon Seo ◽

Sunghyun Cho ◽

Prabuddha Manjula ◽

Nuri Choi ◽

Young-Kuk Kim ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Fixation Index ◽

Machine Learning Classification ◽

Genetic Components ◽

Marker Combination ◽

A Genome ◽

Minimum Number ◽

Native Chickens

A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers.

Download Full-text

Assessing automated CMR contouring algorithms using systematic contour quality scoring analysis

European Heart Journal - Cardiovascular Imaging ◽

10.1093/ehjci/jeaa356.434 ◽

2021 ◽

Vol 22 (Supplement_1) ◽

Author(s):

M Omer ◽

A Amir-Khalili ◽

A Sojoudi ◽

T Thao Le ◽

S A Cook ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Random Order ◽

Standard Operating Procedure ◽

Left Ventricular ◽

Quality Score ◽

Machine Learning Algorithms ◽

Dice Similarity Coefficient ◽

National Budget

Abstract Funding Acknowledgements Type of funding sources: Public grant(s) – National budget only. Main funding source(s): SmartHeart EPSRC programme grant (www.nihr.ac.uk), London Medical Imaging and AI Centre for Value-Based Healthcare Background Quality measures for machine learning algorithms include clinical measures such as end-diastolic (ED) and end-systolic (ES) volume, volumetric overlaps such as Dice similarity coefficient and surface distances such as Hausdorff distance. These measures capture differences between manually drawn and automated contours but fail to capture the trust of a clinician to an automatically generated contour. Purpose We propose to directly capture clinicians’ trust in a systematic way. We display manual and automated contours sequentially in random order and ask the clinicians to score the contour quality. We then perform statistical analysis for both sources of contours and stratify results based on contour type. Data The data selected for this experiment came from the National Health Center Singapore. It constitutes CMR scans from 313 patients with diverse pathologies including: healthy, dilated cardiomyopathy (DCM), hypertension (HTN), hypertrophic cardiomyopathy (HCM), ischemic heart disease (IHD), left ventricular non-compaction (LVNC), and myocarditis. Each study contains a short axis (SAX) stack, with ED and ES phases manually annotated. Automated contours are generated for each SAX image for which manual annotation is available. For this, a machine learning algorithm trained at Circle Cardiovascular Imaging Inc. is applied and the resulting predictions are saved to be displayed in the contour quality scoring (CQS) application. Methods: The CQS application displays manual and automated contours in a random order and presents the user an option to assign a contour quality score 1: Unacceptable, 2: Bad, 3: Fair, 4: Good. The UK Biobank standard operating procedure is used for assessing the quality of the contoured images. Quality scores are assigned based on how the contour affects clinical outcomes. However, as images are presented independent of spatiotemporal context, contour quality is assessed based on how well the area of the delineated structure is approximated. Consequently, small contours and small deviations are rarely assigned a quality score of less than 2, as they are not clinically relevant. Special attention is given to the RV-endo contours as often, mostly in basal images, two separate contours appear. In such cases, a score of 3 is given if the two disjoint contours sufficiently encompass the underlying anatomy; otherwise they are scored as 2 or 1. Results A total of 50991 quality scores (24208 manual and 26783 automated) are generated by five expert raters. The mean score for all manual and automated contours are 3.77 ± 0.48 and 3.77 ± 0.52, respectively. The breakdown of mean quality scores by contour type is included in Fig. 1a while the distribution of quality scores for various raters are shown in Fig. 1b. Conclusion We proposed a method of comparing the quality of manual versus automated contouring methods. Results suggest similar statistics in quality scores for both sources of contours. Abstract Figure 1

Download Full-text

PSIX-15 Assessment of machine learning algorithms for prediction of Aleutian disease in American mink

Journal of Animal Science ◽

10.1093/jas/skab235.484 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 264-265

Author(s):

Duy Ngoc Do ◽

Guoyu Hu ◽

Younes Miar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Models ◽

American Mink ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Enzyme Linked Immunosorbent Assay ◽

Linear Discriminant ◽

Machine Learning Classification

Abstract American mink (Neovison vison) is the major source of fur for the fur industries worldwide and Aleutian disease (AD) is causing severe financial losses to the mink industry. Different methods have been used to diagnose the AD in mink, but the combination of several methods can be the most appropriate approach for the selection of AD resilient mink. Iodine agglutination test (IAT) and counterimmunoelectrophoresis (CIEP) methods are commonly employed in test-and-remove strategy; meanwhile, enzyme-linked immunosorbent assay (ELISA) and packed-cell volume (PCV) methods are complementary. However, using multiple methods are expensive; and therefore, hindering the corrected use of AD tests in selection. This research presented the assessments of the AD classification based on machine learning algorithms. The Aleutian disease was tested on 1,830 individuals using these tests in an AD positive mink farm (Canadian Centre for Fur Animal Research, NS, Canada). The accuracy of classification for CIEP was evaluated based on the sex information, and IAT, ELISA and PCV test results implemented in seven machine learning classification algorithms (Random Forest, Artificial Neural Networks, C50Tree, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) using the Caret package in R. The accuracy of prediction varied among the methods. Overall, the Random Forest was the best-performing algorithm for the current dataset with an accuracy of 0.89 in the training data and 0.94 in the testing data. Our work demonstrated the utility and relative ease of using machine learning algorithms to assess the CIEP information, and consequently reducing the cost of AD tests. However, further works require the inclusion of production and reproduction information in the models and extension of phenotypic collection to increase the accuracy of current methods.

Download Full-text

Prediction on water quality of a lake in Chennai, India using machine learning algorithms

10.5004/dwt.2021.26970 ◽

2021 ◽

Vol 218 ◽

pp. 44-51

Author(s):

D. Venkata Vara Prasad ◽

Lokeswari Y. Venkataramana ◽

P. Senthil Kumar ◽

G. Prasannamedha ◽

K. Soumya ◽

...

Keyword(s):

Machine Learning ◽

Water Quality ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

A Review On The Relationship Between Human And Artificial Imagination With Their Implementations In Current Machine Learning Algorithms

10.1109/icccnt51525.2021.9579860 ◽

2021 ◽

Author(s):

Md. Abdullah-Al-Kafi ◽

Md. Fahad Hossain ◽

Sheak Rashed Haider Noori

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

The Relationship

Download Full-text

A Data-Driven Approach to Determine the Single Droplet Post-Impingement Pattern on a Dry Wall Using Statistical Machine Learning Classification Methods

10.4271/2021-01-0552 ◽

2021 ◽

Author(s):

Jiachen Zhai ◽

Seong-Young Lee

Keyword(s):

Machine Learning ◽

Data Driven ◽

Classification Methods ◽

Single Droplet ◽

Statistical Machine Learning ◽

Machine Learning Classification ◽

Data Driven Approach

Download Full-text

Identification of Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

10.21203/rs.3.rs-95706/v1 ◽

2020 ◽

Author(s):

Dongwon Seo ◽

Sunghyun Cho ◽

Prabuddha Manjula ◽

Nuri Choi ◽

Young Kuk Kim ◽

...

Keyword(s):

Machine Learning ◽

Snp Array ◽

Machine Learning Algorithms ◽

Case Group ◽

Machine Learning Classification ◽

Genetic Components ◽

Native Chicken ◽

Marker Combination ◽

A Genome ◽

Minimum Number

Abstract BackgroundA marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would also facilitate the protection of genetic resources, especially in developing countries. MethodsIn this study, a total of 20 lines 283 samples which were consist of Korean native chicken, commercial native chicken, and commercial broilers with layer population were used for finding the minimum number of marker combinations through the 600k high-density single nucleotide polymorphism (SNP) array. Application of the machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group from control chicken groups. In the verification of the selected markers, a total of 12 lines 182 samples were used to confirm the change in the accuracy of the target chicken breed identification.ResultsA total of 47,303 SNPs was used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by Adaboost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0% and 97.9%, respectively. The selected marker combinations increased the genetic distance between the case and control groups, and reduced the number of genetic components, confirming that an efficient classification of the groups was possible using small number of marker sets. In a verification study including additional chicken breeds and samples, the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations.ConclusionsThe GWAS and PCA analysis, machine learning algorithm used in this study is able to be applied efficiently to explore the minimum combination of markers that can distinguish varieties among a large number of SNP markers.

Download Full-text