Learning Technique
Recently Published Documents





2022 ◽  
Vol 54 (8) ◽  
pp. 1-36
Shubhra Kanti Karmaker (“Santu”) ◽  
Md. Mahadi Hassan ◽  
Micah J. Smith ◽  
Lei Xu ◽  
Chengxiang Zhai ◽  

As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research. But although automation and efficiency are among AutoML’s main selling points, the process still requires human involvement at a number of vital steps, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training dataset, and selecting a promising machine learning technique. These steps often require a prolonged back-and-forth that makes this process inefficient for domain experts and data scientists alike and keeps so-called AutoML systems from being truly automatic. In this review article, we introduce a new classification system for AutoML systems, using a seven-tiered schematic to distinguish these systems based on their level of autonomy. We begin by describing what an end-to-end machine learning pipeline actually looks like, and which subtasks of the machine learning pipeline have been automated so far. We highlight those subtasks that are still done manually—generally by a data scientist—and explain how this limits domain experts’ access to machine learning. Next, we introduce our novel level-based taxonomy for AutoML systems and define each level according to the scope of automation support provided. Finally, we lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline and discussing important challenges that stand in the way of this ambitious goal.

2021 ◽  
Vol 10 (10) ◽  
pp. 697
Massimiliano Pepe ◽  
Domenica Costantino ◽  
Vincenzo Saverio Alfio ◽  
Gabriele Vozza ◽  
Elena Cartellino

The aim of the paper is to identify a suitable method for the construction of a 3D city model from stereo satellite imagery. In order to reach this goal, it is necessary to build a workflow consisting of three main steps: (1) Increasing the geometric resolution of the color images through the use of pan-sharpening techniques, (2) identification of the buildings’ footprint through deep-learning techniques and, finally, (3) building an algorithm in GIS (Geographic Information System) for the extraction of the elevation of buildings. The developed method was applied to stereo imagery acquired by WorldView-2 (WV-2), a commercial Earth-observation satellite. The comparison of the different pan-sharpening techniques showed that the Gram–Schmidt method provided better-quality color images than the other techniques examined; this result was deduced from both the visual analysis of the orthophotos and the analysis of quality indices (RMSE, RASE and ERGAS). Subsequently, a deep-learning technique was applied for pan sharpening an image in order to extract the footprint of buildings. Performance indices (precision, recall, overall accuracy and the F1 measure) showed an elevated accuracy in automatic recognition of the buildings. Finally, starting from the Digital Surface Model (DSM) generated by satellite imagery, an algorithm built in the GIS environment allowed the extraction of the building height from the elevation model. In this way, it was possible to build a 3D city model where the buildings are represented as prismatic solids with flat roofs, in a fast and precise way.

Chaolun Ma ◽  
Yongxin Peng ◽  
Lingtao Wu ◽  
Xiaoyu Guo ◽  
Xiubin Wang ◽  

Distraction occurs when a driver’s attention is diverted from driving to a secondary task. The number of distraction-affected crashes has been increasing in recent years. Accurately predicting distraction-affected crashes is critical for roadway agencies to reduce distracted driving behaviors and distraction-affected crashes. Recently, more and more emerging phone-use data and machine learning techniques are available to safety researchers, and can potentially improve the prediction of distraction-affected crashes. Therefore, this study first examines if phone-use events provide essential information for distraction-affected crashes. The authors apply the machine learning technique (i.e., XGBoost) under two scenarios, with and without phone-use events, and compare their performances with two conventional statistical models: logistic regression model and mixed-effects logistic regression model. The comparison demonstrates the superiority of XGBoost over logistic regression with a high-dimensional unbalanced dataset. Further, this study implements SHAP (SHapley Additive exPlanation) to interpret the results and analyze the importance of individual features related to distraction-affected crashes and tests its ability to improve prediction accuracy. The trained XGBoost model achieves a sensitivity of 91.59%, a specificity of 85.92%, and 88.72% accuracy. The XGBoost and SHAP results suggest that: (1) phone-use information is an important factor associated with the occurrences of distraction-affected crashes; (2) distraction-affected crashes are more likely to occur on roadway segments with higher exposure (i.e., length and traffic volume), unevenness of traffic flow condition, or with medium truck volume.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Brady Lund ◽  
Jinxuan Ma

PurposeThis literature review explores the definitions and characteristics of cluster analysis, a machine-learning technique that is frequently implemented to identify groupings in big datasets and its applicability to library and information science (LIS) research. This overview is intended for researchers who are interested in expanding their data analysis repertory to include cluster analysis, rather than for existing experts in this area.Design/methodology/approachA review of LIS articles included in the Library and Information Source (EBSCO) database that employ cluster analysis is performed. An overview of cluster analysis in general (how it works from a statistical standpoint, and how it can be performed by researchers), the most popular cluster analysis techniques and the uses of cluster analysis in LIS is presented.FindingsThe number of LIS studies that employ a cluster analytic approach has grown from about 5 per year in the early 2000s to an average of 35 studies per year in the mid- and late-2010s. The journal Scientometrics has the most articles published within LIS that use cluster analysis (102 studies). Scientometrics is the most common subject area to employ a cluster analytic approach (152 studies). The findings of this review indicate that cluster analysis could make LIS research more accessible by providing an innovative and insightful process of knowledge discovery.Originality/valueThis review is the first to present cluster analysis as an accessible data analysis approach, specifically from an LIS perspective.

2021 ◽  
Lei Tong ◽  
Adam Corrigan ◽  
Navin Rathna Kumar ◽  
Kerry Hallbrook ◽  
Jonathon Orme ◽  

Abstract Cell line authentication is important in the biomedical field to ensure that researchers are not working with misidentified cells. Short tandem repeat is the gold standard method, but has its own limitations, including being expensive and time-consuming. Deep neural networks achieve great success in the analysis of cellular images in a cost-effective way. However, because of the lack of centralized available datasets, whether or not cell line authentication can be replaced or supported by cell image classification is still a question. Moreover, the relationship between the incubation times and cellular images has not been explored in previous studies. In this study, we automated the process of the cell line authentication by using deep learning analysis of brightfield cell line images. We proposed a novel multi-task framework to identify cell lines from cell images and predict the duration of how long cell lines have been incubated simultaneously. Using thirty cell lines’ data from the AstraZeneca Cell Bank, we demonstrated that our proposed method can accurately identify cell lines from brightfield images with a 99.8% accuracy and predicts the incubation durations for cell images with the coefficient of determination score of 0.927. Considering that new cell lines are continually added to the AstraZeneca Cell Bank, we integrated the transfer learning technique with the proposed system to deal with data from new cell lines not included in the pre-trained model. Our method achieved excellent performance with a sensitivity of 97.7% and specificity of 95.8% in the detection of 14 new cell lines. These results demonstrated that our proposed framework can effectively identify cell lines using brightfield images.

Kelen C. Teixeira Vivaldini ◽  
Gustavo Franco Barbosa ◽  
Igor Araujo Dias Santos ◽  
Pedro H. C. Kim ◽  
Grayson McMichael ◽  

G. Suseendran ◽  
D. Akila ◽  
Hannah Vijaykumar ◽  
T. Nusrat Jabeen ◽  
R. Nirmala ◽  

PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258361
Ashit Kumar Dutta

In recent years, advancements in Internet and cloud technologies have led to a significant increase in electronic trading in which consumers make online purchases and transactions. This growth leads to unauthorized access to users’ sensitive information and damages the resources of an enterprise. Phishing is one of the familiar attacks that trick users to access malicious content and gain their information. In terms of website interface and uniform resource locator (URL), most phishing webpages look identical to the actual webpages. Various strategies for detecting phishing websites, such as blacklist, heuristic, Etc., have been suggested. However, due to inefficient security technologies, there is an exponential increase in the number of victims. The anonymous and uncontrollable framework of the Internet is more vulnerable to phishing attacks. Existing research works show that the performance of the phishing detection system is limited. There is a demand for an intelligent technique to protect users from the cyber-attacks. In this study, the author proposed a URL detection technique based on machine learning approaches. A recurrent neural network method is employed to detect phishing URL. Researcher evaluated the proposed method with 7900 malicious and 5800 legitimate sites, respectively. The experiments’ outcome shows that the proposed method’s performance is better than the recent approaches in malicious URL detection.

Asad Khattak ◽  
Muhammad Zubair Asghar ◽  
Mushtaq Ali ◽  
Ulfat Batool

2021 ◽  
Vol 12 ◽  
Qian Su ◽  
Rui Zhao ◽  
ShuoWen Wang ◽  
HaoYang Tu ◽  
Xing Guo ◽  

Currently, strategies to diagnose patients and predict neurological recovery in cervical spondylotic myelopathy (CSM) using MR images of the cervical spine are urgently required. In light of this, this study aimed at exploring potential preoperative brain biomarkers that can be used to diagnose and predict neurological recovery in CSM patients using functional connectivity (FC) analysis of a resting-state functional MRI (rs-fMRI) data. Two independent datasets, including total of 53 patients with CSM and 47 age- and sex-matched healthy controls (HCs), underwent the preoperative rs-fMRI procedure. The FC was calculated from the automated anatomical labeling (AAL) template and used as features for machine learning analysis. After that, three analyses were used, namely, the classification of CSM patients from healthy adults using the support vector machine (SVM) within and across datasets, the prediction of preoperative neurological function in CSM patients via support vector regression (SVR) within and across datasets, and the prediction of neurological recovery in CSM patients via SVR within and across datasets. The results showed that CSM patients could be successfully identified from HCs with high classification accuracies (84.2% for dataset 1, 95.2% for dataset 2, and 73.0% for cross-site validation). Furthermore, the rs-FC combined with SVR could successfully predict the neurological recovery in CSM patients. Additionally, our results from cross-site validation analyses exhibited good reproducibility and generalization across the two datasets. Therefore, our findings provide preliminary evidence toward the development of novel strategies to predict neurological recovery in CSM patients using rs-fMRI and machine learning technique.

Sign in / Sign up

Export Citation Format

Share Document