Image Sensors for Wave Monitoring in Shore Protection: Characterization through a Machine Learning Algorithm

Waves propagating on the water surface can be considered as propagating in a dispersive medium, where gravity and surface tension at the air–water interface act as restoring forces. The velocity at which energy is transported in water waves is defined by the group velocity. The paper reports the use of video-camera observations to study the impact of water waves on an urban shore. The video-monitoring system consists of two separate cameras equipped with progressive RGB CMOS sensors that allow 1080p HDTV video recording. The sensing system delivers video signals that are processed by a machine learning technique. The scope of the research is to identify features of water waves that cannot be normally observed. First, a conventional modelling was performed using data delivered by image sensors together with additional data such as temperature, and wind speed, measured with dedicated sensors. Stealth waves are detected, as are the inverting phenomena encompassed in waves. This latter phenomenon can be detected only through machine learning. This double approach allows us to prevent extreme events that can take place in offshore and onshore areas.

Download Full-text

Machine-learning annotation of human splicing branchpoints

10.1101/094003 ◽

2016 ◽

Cited By ~ 3

Author(s):

Bethany Signal ◽

Brian S Gloss ◽

Marcel E Dinger ◽

Timothy R Mercer

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Gene Splicing ◽

Genetic Encoding ◽

Genome Wide ◽

Common Genetic Variants ◽

A Genome ◽

Wide Scale ◽

The Impact ◽

Splicing Patterns

ABSTRACTBackgroundThe branchpoint element is required for the first lariat-forming reaction in splicing. However due to difficulty in experimentally mapping at a genome-wide scale, current catalogues are incomplete.ResultsWe have developed a machine-learning algorithm trained with empirical human branchpoint annotations to identify branchpoint elements from primary genome sequence alone. Using this approach, we can accurately locate branchpoints elements in 85% of introns in current gene annotations. Consistent with branchpoints as basal genetic elements, we find our annotation is unbiased towards gene type and expression levels. A major fraction of introns was found to encode multiple branchpoints raising the prospect that mutational redundancy is encoded in key genes. We also confirmed all deleterious branchpoint mutations annotated in clinical variant databases, and further identified thousands of clinical and common genetic variants with similar predicted effects.ConclusionsWe propose the broad annotation of branchpoints constitutes a valuable resource for further investigations into the genetic encoding of splicing patterns, and interpreting the impact of common- and disease-causing human genetic variation on gene splicing.

Download Full-text

Analyzing the Impact of Climate Factors on GNSS-Derived Displacements by Combining the Extended Helmert Transformation and XGboost Machine Learning Algorithm

Journal of Sensors ◽

10.1155/2021/9926442 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Hanlin Liu ◽

Linqiang Yang ◽

Linchao Li

Keyword(s):

Machine Learning ◽

Puerto Rico ◽

Reference Frame ◽

Learning Algorithm ◽

Virgin Islands ◽

Machine Learning Algorithm ◽

Climate Factors ◽

Helmert Transformation ◽

The Impact

A variety of climate factors influence the precision of the long-term Global Navigation Satellite System (GNSS) monitoring data. To precisely analyze the effect of different climate factors on long-term GNSS monitoring records, this study combines the extended seven-parameter Helmert transformation and a machine learning algorithm named Extreme Gradient boosting (XGboost) to establish a hybrid model. We established a local-scale reference frame called stable Puerto Rico and Virgin Islands reference frame of 2019 (PRVI19) using ten continuously operating long-term GNSS sites located in the rigid portion of the Puerto Rico and Virgin Islands (PRVI) microplate. The stability of PRVI19 is approximately 0.4 mm/year and 0.5 mm/year in the horizontal and vertical directions, respectively. The stable reference frame PRVI19 can avoid the risk of bias due to long-term plate motions when studying localized ground deformation. Furthermore, we applied the XGBoost algorithm to the postprocessed long-term GNSS records and daily climate data to train the model. We quantitatively evaluated the importance of various daily climate factors on the GNSS time series. The results show that wind is the most influential factor with a unit-less index of 0.013. Notably, we used the model with climate and GNSS records to predict the GNSS-derived displacements. The results show that the predicted displacements have a slightly lower root mean square error compared to the fitted results using spline method (prediction: 0.22 versus fitted: 0.31). It indicates that the proposed model considering the climate records has the appropriate predict results for long-term GNSS monitoring.

Download Full-text

Quantifying changes in bicycle volumes using crowdsourced data

Environment and Planning B Urban Analytics and City Science ◽

10.1177/23998083211066103 ◽

2022 ◽

pp. 239980832110661

Author(s):

Ali Al-Ramini ◽

Mohammad A Takallou ◽

Daniel P Piatkowski ◽

Fadi Alsaleem

Keyword(s):

Machine Learning ◽

The United States ◽

Crowdsourced Data ◽

Machine Learning Approach ◽

Bicycle Infrastructure ◽

The Difference ◽

Infrastructure Investments ◽

Using Data ◽

The Impact ◽

The City

Most cities in the United States lack comprehensive or connected bicycle infrastructure; therefore, inexpensive and easy-to-implement solutions for connecting existing bicycle infrastructure are increasingly being employed. Signage is one of the promising solutions. However, the necessary data for evaluating its effect on cycling ridership is lacking. To overcome this challenge, this study tests the potential of using readily-available crowdsourced data in concert with machine-learning methods to provide insight into signage intervention effectiveness. We do this by assessing a natural experiment to identify the potential effects of adding or replacing signage within existing bicycle infrastructure in 2019 in the city of Omaha, Nebraska. Specifically, we first visually compare cycling traffic changes in 2019 to those from the previous two years (2017–2018) using data extracted from the Strava fitness app. Then, we use a new three-step machine-learning approach to quantify the impact of signage while controlling for weather, demographics, and street characteristics. The steps are as follows: Step 1 (modeling and validation) build and train a model from the available 2017 crowdsourced data (i.e., Strava, Census, and weather) that accurately predicts the cycling traffic data for any street within the study area in 2018; Step 2 (prediction) use the model from Step 1 to predict bicycle traffic in 2019 while assuming new signage was not added; Step 3 (impact evaluation) use the difference in prediction from actual traffic in 2019 as evidence of the likely impact of signage. While our work does not demonstrate causality, it does demonstrate an inexpensive method, using readily-available data, to identify changing trends in bicycling over the same time that new infrastructure investments are being added.

Download Full-text

Impact of Encoding of High Cardinality Categorical Data to Solve Prediction Problems

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9044 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4197-4201

Author(s):

Heena Gupta ◽

V. Asha

Keyword(s):

Machine Learning ◽

Categorical Data ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Prediction Problem ◽

Encoding Scheme ◽

The Impact ◽

Prediction Problems

The prediction problem in any domain is very important to assess the prices and preferences among people. This issue varies for different kinds of data. Data may be nominal or ordinal, it may involve more categories or less. For any category to be considered by a machine learning algorithm, it needs to be encoded before any other operation can be further performed. There are various encoding schemes available like label encoding, count encoding and one hot encoding. This paper aims to understand the impact of various encoding schemes and the accuracy among the prediction problems of high cardinality categorical data. The paper also proposes an encoding scheme based on curated strings. The domain chosen for this purpose is predicting doctors’ fees in various cities having different profiles and qualification.

Download Full-text

A Daily Covid-19 Cases Prediction System using Data Mining and Machine Learning Algorithm

10.5121/csit.2021.112320 ◽

2021 ◽

Author(s):

Yiqi Jack Gao ◽

Yu Sun

Keyword(s):

Machine Learning ◽

Random Forest ◽

Hospital Admissions ◽

Polynomial Regression ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Policy Makers ◽

Diverse Range ◽

Using Data

The start of 2020 marked the beginning of the deadly COVID-19 pandemic caused by the novel SARS-COV-2 from Wuhan, China. As of the time of writing, the virus had infected over 150 million people worldwide and resulted in more than 3.5 million global deaths. Accurate future predictions made through machine learning algorithms can be very useful as a guide for hospitals and policy makers to make adequate preparations and enact effective policies to combat the pandemic. This paper carries out a two pronged approach to analyzing COVID-19. First, the model utilizes the feature significance of random forest regressor to select eight of the most significant predictors (date, new tests, weekly hospital admissions, population density, total tests, total deaths, location, and total cases) for predicting daily increases of Covid-19 cases, highlighting potential target areas in order to achieve efficient pandemic responses. Then it utilizes machine learning algorithms such as linear regression, polynomial regression, and random forest regression to make accurate predictions of daily COVID-19 cases using a combination of this diverse range of predictors and proved to be competent at generating predictions with reasonable accuracy.

Download Full-text

A Machine-Learning Algorithm for Estimating and Ranking the Impact of Environmental Risk Factors in Exploratory Epidemiological Studies

Statistical Modeling for Biological Systems ◽

10.1007/978-3-030-34675-1_8 ◽

2020 ◽

pp. 137-156

Author(s):

Jessica G. Young ◽

Alan E. Hubbard ◽

Brenda Eskenazi ◽

Nicholas P. Jewell

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Environmental Risk ◽

Learning Algorithm ◽

Epidemiological Studies ◽

Environmental Risk Factors ◽

Machine Learning Algorithm ◽

The Impact

Download Full-text

EQUIPMENT CONDITION MODELING BASED ON RANDOM FOREST AND ARIMA MACHINE LEARNING ALGORITHM STACKING

Cherepovets State University Bulletin ◽

10.23859/1994-0637-2020-4-97-3 ◽

2020 ◽

Vol 4 (97) ◽

pp. 32-40

Author(s):

EVGENY V. ERSHOV ◽

OLGA V. YUDINA ◽

LYUDMILA N. VINOGRADOVA ◽

NIKITA I. SHAKHANOV

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Block Diagram ◽

Approaches To Learning ◽

Machine Learning Algorithm ◽

Industrial Equipment ◽

Condition Modeling ◽

Simulation Results ◽

Using Data

The article discusses algorithms for constructing predicative models of the industrial equipment condition using data analysis and machine learning. The model is based on Random Forest (RF) and ARIMA (AR) algorithms. The authors consider approaches to learning algorithms and optimizing parameters. A block diagram of a time series predictive model applying stacking is presented, as well as an assessment of the simulation results.

Download Full-text

Machine Learning Prediction of Stroke Mechanism in Embolic Strokes of Undetermined Source

Stroke ◽

10.1161/strokeaha.120.029305 ◽

2020 ◽

Vol 51 (9) ◽

Cited By ~ 1

Author(s):

Hooman Kamel ◽

Babak B. Navi ◽

Neal S. Parikh ◽

Alexander E. Merkler ◽

Peter M. Okin ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Cross Validation ◽

Learning Algorithm ◽

Random Search ◽

Model Performance ◽

Area Under The Curve ◽

Independent Set ◽

Predicted Probability ◽

Using Data

Background and Purpose: One-fifth of ischemic strokes are embolic strokes of undetermined source (ESUS). Their theoretical causes can be classified as cardioembolic versus noncardioembolic. This distinction has important implications, but the categories’ proportions are unknown. Methods: Using data from the Cornell Acute Stroke Academic Registry, we trained a machine-learning algorithm to distinguish cardioembolic versus non-cardioembolic strokes, then applied the algorithm to ESUS cases to determine the predicted proportion with an occult cardioembolic source. A panel of neurologists adjudicated stroke etiologies using standard criteria. We trained a machine learning classifier using data on demographics, comorbidities, vitals, laboratory results, and echocardiograms. An ensemble predictive method including L1 regularization, gradient-boosted decision tree ensemble (XGBoost), random forests, and multivariate adaptive splines was used. Random search and cross-validation were used to tune hyperparameters. Model performance was assessed using cross-validation among cases of known etiology. We applied the final algorithm to an independent set of ESUS cases to determine the predicted mechanism (cardioembolic or not). To assess our classifier’s validity, we correlated the predicted probability of a cardioembolic source with the eventual post-ESUS diagnosis of atrial fibrillation. Results: Among 1083 strokes with known etiologies, our classifier distinguished cardioembolic versus noncardioembolic cases with excellent accuracy (area under the curve, 0.85). Applied to 580 ESUS cases, the classifier predicted that 44% (95% credibility interval, 39%–49%) resulted from cardiac embolism. Individual ESUS patients’ predicted likelihood of cardiac embolism was associated with eventual atrial fibrillation detection (OR per 10% increase, 1.27 [95% CI, 1.03–1.57]; c-statistic, 0.68 [95% CI, 0.58–0.78]). ESUS patients with high predicted probability of cardiac embolism were older and had more coronary and peripheral vascular disease, lower ejection fractions, larger left atria, lower blood pressures, and higher creatinine levels. Conclusions: A machine learning estimator that distinguished known cardioembolic versus noncardioembolic strokes indirectly estimated that 44% of ESUS cases were cardioembolic.

Download Full-text

Conversion Uplift in E-Commerce: A Systematic Benchmark of Modeling Strategies

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622019500172 ◽

2019 ◽

Vol 18 (03) ◽

pp. 747-791 ◽

Cited By ~ 3

Author(s):

Robin Gubela ◽

Artem Bequé ◽

Stefan Lessmann ◽

Fabian Gebert

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Model Performance ◽

Predictive Performance ◽

Academic Disciplines ◽

Response Models ◽

Uplift Modeling ◽

Online Retailers ◽

Using Data ◽

Modeling Strategy

Uplift modeling combines machine learning and experimental strategies to estimate the differential effect of a treatment on individuals’ behavior. The paper considers uplift models in the scope of marketing campaign targeting. Literature on uplift modeling strategies is fragmented across academic disciplines and lacks an overarching empirical comparison. Using data from online retailers, we fill this gap and contribute to literature through consolidating prior work on uplift modeling and systematically comparing the predictive performance and utility of available uplift modeling strategies. Our empirical study includes three experiments in which we examine the interaction between an uplift modeling strategy and the underlying machine learning algorithm to implement the strategy, quantify model performance in terms of business value and demonstrate the advantages of uplift models over response models, which are widely used in marketing. The results facilitate making specific recommendations how to deploy uplift models in e-commerce applications.

Download Full-text

Day-Ahead Electricity Price And Spike Forecasting Using Machine Learning Techniques

10.32920/ryerson.14649129.v1 ◽

2021 ◽

Author(s):

Harmanjot Singh Sandhu

Keyword(s):

Neural Network ◽

Machine Learning ◽

Electricity Markets ◽

Electricity Market ◽

Numerical Experiments ◽

Status Report ◽

Electricity Prices ◽

Forecasting Accuracy ◽

Using Data ◽

The Impact

Various machine learning-based methods and techniques are developed for forecasting day-ahead electricity prices and spikes in deregulated electricity markets. The wholesale electricity market in the Province of Ontario, Canada, which is one of the most volatile electricity markets in the world, is utilized as the case market to test and apply the methods developed. Factors affecting electricity prices and spikes are identified by using literature review, correlation tests, and data mining techniques. Forecasted prices can be utilized by market participants in deregulated electricity markets, including generators, consumers, and market operators. A novel methodology is developed to forecast day-ahead electricity prices and spikes. Prices are predicted by a neural network called the base model first and the forecasted prices are classified into the normal and spike prices using a threshold calculated from the previous year’s prices. The base model is trained using information from similar days and similar price days for a selected number of training days. The spike prices are re-forecasted by another neural network. Three spike forecasting neural networks are created to test the impact of input features. The overall forecasting is obtained by combining the results from the base model and a spike forecaster. Extensive numerical experiments are carried out using data from the Ontario electricity market, showing significant improvements in the forecasting accuracy in terms of various error measures. The performance of the methodology developed is further enhanced by improving the base model and one of the spike forecasters. The base model is improved by using multi-set canonical correlation analysis (MCCA), a popular technique used in data fusion, to select the optimal numbers of training days, similar days, and similar price days and by numerical experiments to determine the optimal number of neurons in the hidden layer. The spike forecaster is enhanced by having additional inputs including the predicted supply cushion, mined from information publicly available from the Ontario electricity market’s day-ahead System Status Report. The enhanced models are employed to conduct numerical experiments using data from the Ontario electricity market, which demonstrate significant improvements for forecasting accuracy.

Download Full-text