Machine Learning Technique
Recently Published Documents





2022 ◽  
Vol 54 (8) ◽  
pp. 1-36
Shubhra Kanti Karmaker (“Santu”) ◽  
Md. Mahadi Hassan ◽  
Micah J. Smith ◽  
Lei Xu ◽  
Chengxiang Zhai ◽  

As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research. But although automation and efficiency are among AutoML’s main selling points, the process still requires human involvement at a number of vital steps, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training dataset, and selecting a promising machine learning technique. These steps often require a prolonged back-and-forth that makes this process inefficient for domain experts and data scientists alike and keeps so-called AutoML systems from being truly automatic. In this review article, we introduce a new classification system for AutoML systems, using a seven-tiered schematic to distinguish these systems based on their level of autonomy. We begin by describing what an end-to-end machine learning pipeline actually looks like, and which subtasks of the machine learning pipeline have been automated so far. We highlight those subtasks that are still done manually—generally by a data scientist—and explain how this limits domain experts’ access to machine learning. Next, we introduce our novel level-based taxonomy for AutoML systems and define each level according to the scope of automation support provided. Finally, we lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline and discussing important challenges that stand in the way of this ambitious goal.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Brady Lund ◽  
Jinxuan Ma

PurposeThis literature review explores the definitions and characteristics of cluster analysis, a machine-learning technique that is frequently implemented to identify groupings in big datasets and its applicability to library and information science (LIS) research. This overview is intended for researchers who are interested in expanding their data analysis repertory to include cluster analysis, rather than for existing experts in this area.Design/methodology/approachA review of LIS articles included in the Library and Information Source (EBSCO) database that employ cluster analysis is performed. An overview of cluster analysis in general (how it works from a statistical standpoint, and how it can be performed by researchers), the most popular cluster analysis techniques and the uses of cluster analysis in LIS is presented.FindingsThe number of LIS studies that employ a cluster analytic approach has grown from about 5 per year in the early 2000s to an average of 35 studies per year in the mid- and late-2010s. The journal Scientometrics has the most articles published within LIS that use cluster analysis (102 studies). Scientometrics is the most common subject area to employ a cluster analytic approach (152 studies). The findings of this review indicate that cluster analysis could make LIS research more accessible by providing an innovative and insightful process of knowledge discovery.Originality/valueThis review is the first to present cluster analysis as an accessible data analysis approach, specifically from an LIS perspective.

Atmosphere ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 1312
Bogdan Bochenek ◽  
Zbigniew Ustrnul ◽  
Agnieszka Wypych ◽  
Danuta Kubacka

Extreme weather phenomena such as wind gusts, heavy precipitation, hail, thunderstorms, tornadoes, and many others usually occur when there is a change in air mass and the passing of a weather front over a certain region. The climatology of weather fronts is difficult, since they are usually drawn onto maps manually by forecasters; therefore, the data concerning them are limited and the process itself is very subjective in nature. In this article, we propose an objective method for determining the position of weather fronts based on the random forest machine learning technique, digitized fronts from the DWD database, and ERA5 meteorological reanalysis. Several aspects leading to the improvement of scores are presented, such as adding new fields or dates to the training database or using the gradients of fields.

2021 ◽  
Vol 3 ◽  
Leonid Kholkine ◽  
Thomas Servotte ◽  
Arie-Willem de Leeuw ◽  
Tom De Schepper ◽  
Peter Hellinckx ◽  

Professional road cycling is a very competitive sport, and many factors influence the outcome of the race. These factors can be internal (e.g., psychological preparedness, physiological profile of the rider, and the preparedness or fitness of the rider) or external (e.g., the weather or strategy of the team) to the rider, or even completely unpredictable (e.g., crashes or mechanical failure). This variety makes perfectly predicting the outcome of a certain race an impossible task and the sport even more interesting. Nonetheless, before each race, journalists, ex-pro cyclists, websites and cycling fans try to predict the possible top 3, 5, or 10 riders. In this article, we use easily accessible data on road cycling from the past 20 years and the Machine Learning technique Learn-to-Rank (LtR) to predict the top 10 contenders for 1-day road cycling races. We accomplish this by mapping a relevancy weight to the finishing place in the first 10 positions. We assess the performance of this approach on 2018, 2019, and 2021 editions of six spring classic 1-day races. In the end, we compare the output of the framework with a mass fan prediction on the Normalized Discounted Cumulative Gain (NDCG) metric and the number of correct top 10 guesses. We found that our model, on average, has slightly higher performance on both metrics than the mass fan prediction. We also analyze which variables of our model have the most influence on the prediction of each race. This approach can give interesting insights to fans before a race but can also be helpful to sports coaches to predict how a rider might perform compared to other riders outside of the team.

2021 ◽  
Abdulmalik Ibragimov ◽  
Andrey Kan

Abstract Field production constrained with surface facilities on gas handling have to deal with well rates optimization by reducing gas oil ratio of the field production. This means the best way of reducing gas oil ratio on field level is not by closing wells with the highest gas oil ratio but chocking back wells where gas breakthrough occurred and GOR of a well is rate dependent [1]. In this paper, authors modeled and analyzed wells with gas breakthrough in single porosity and dual porosity sector models. The analysis showed single porosity models underestimate severity of gas breakthrough and fail to predict rate dependent GOR of a well in the field. Also, based on the sector model using machine-learning technique an empirical equation was developed to estimate rate dependent GOR of a well which can be further used in field level production optimization exercise to reach maximum liquid production under gas processing constraints.

2021 ◽  
Vol 10 (12) ◽  
pp. e444101220561
Jonathan Wagner de Medeiros ◽  
Anthony José da Cunha Carneiro Lins ◽  
Oluwarotimi Williams Samuel ◽  
Elker Lene Santos de Lima ◽  
Maria Luiza Tabosa de Carvalho Galvão ◽  

Introduction: Burkitt lymphoma belongs to the group of non-Hodgkin lymphomas. Although curable in 80% of less advanced stages, it presents in advanced stages in about 75% of cases in Brazil’s Northeast region, requiring urgent and intensive care in the early stages of treatment. Objectives: therefore, this study aimed to verify the participation of MBL-2 gene polymorphisms in the development of Burkitt lymphoma. Methods: In this article, computational approaches based on the Machine Learning technique were used, where we implemented the Random Forest and KMeans algorithms to classify patterns of individuals diagnosed with the disease and, therefore, differentiate them from healthy individuals. A group of 56 patients aged 0 to 18 years, with Burkitt lymphoma, from a reference hospital in the treatment of childhood cancer, was evaluated, together with a control group consisting of 150 samples, all of which were tested for exon 1 polymorphisms and the MBL2 gene -221 and -550 regions. Results: At first, an unsupervised classification was performed, which identified as two the number of groups that best represent the data present in our database, reaching 72.81% accuracy in the separation of patients and controls. Then, the supervised classification was performed, where the classifier obtained a 70.97% success rate, being possible to reach 75% accuracy in the best GridSearch configuration when performing a cross validation. Conclusion: It was not yet possible to conclude about the participation of the evaluated polymorphisms in the development of the BL, however the computational techniques used proved to be very promising for carrying out studies of this nature.

2021 ◽  
Vol 42 (9) ◽  
pp. e1286-e1292
Hajime Koyama ◽  
Anjin Mori ◽  
Daisuke Nagatomi ◽  
Takeshi Fujita ◽  
Kazuya Saito ◽  

Michael Shmoish ◽  
Alina German ◽  
Nurit Devir ◽  
Anna Hecht ◽  
Gary Butler ◽  

Sign in / Sign up

Export Citation Format

Share Document