scholarly journals Music Genre Classifier using Machine Learning Algorithms

Author(s):  
Sparsh Nagpal

Audio data extraction and analysis is important and less explored compared to other forms of data. Here we use an Audio dataset (GTZan) to extract musical information and categorize the musical genre based on the parameters of audio. We compared the study on seven Machine learning algorithms and tested on unvisited user data to see the model performance seeing the algorithms’ accuracy ranging from 45% to 87%.

2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
H Lea ◽  
E Hutchinson ◽  
A Meeson ◽  
S Nampally ◽  
G Dennis ◽  
...  

Abstract Background and introduction Accurate identification of clinical outcome events is critical to obtaining reliable results in cardiovascular outcomes trials (CVOTs). Current processes for event adjudication are expensive and hampered by delays. As part of a larger project to more reliably identify outcomes, we evaluated the use of machine learning to automate event adjudication using data from the SOCRATES trial (NCT01994720), a large randomized trial comparing ticagrelor and aspirin in reducing risk of major cardiovascular events after acute ischemic stroke or transient ischemic attack (TIA). Purpose We studied whether machine learning algorithms could replicate the outcome of the expert adjudication process for clinical events of ischemic stroke and TIA. Could classification models be trained on historical CVOT data and demonstrate performance comparable to human adjudicators? Methods Using data from the SOCRATES trial, multiple machine learning algorithms were tested using grid search and cross validation. Models tested included Support Vector Machines, Random Forest and XGBoost. Performance was assessed on a validation subset of the adjudication data not used for training or testing in model development. Metrics used to evaluate model performance were Receiver Operating Characteristic (ROC), Matthews Correlation Coefficient, Precision and Recall. The contribution of features, attributes of data used by the algorithm as it is trained to classify an event, that contributed to a classification were examined using both Mutual Information and Recursive Feature Elimination. Results Classification models were trained on historical CVOT data using adjudicator consensus decision as the ground truth. Best performance was observed on models trained to classify ischemic stroke (ROC 0.95) and TIA (ROC 0.97). Top ranked features that contributed to classification of Ischemic Stroke or TIA corresponded to site investigator decision or variables used to define the event in the trial charter, such as duration of symptoms. Model performance was comparable across the different machine learning algorithms tested with XGBoost demonstrating the best ROC on the validation set for correctly classifying both stroke and TIA. Conclusions Our results indicate that machine learning may augment or even replace clinician adjudication in clinical trials, with potential to gain efficiencies, speed up clinical development, and retain reliability. Our current models demonstrate good performance at binary classification of ischemic stroke and TIA within a single CVOT with high consistency and accuracy between automated and clinician adjudication. Further work will focus on harmonizing features between multiple historical clinical trials and training models to classify several different endpoint events across trials. Our aim is to utilize these clinical trial datasets to optimize the delivery of CVOTs in further cardiovascular drug development. FUNDunding Acknowledgement Type of funding sources: Private company. Main funding source(s): AstraZenca Plc


2021 ◽  
Author(s):  
Ali Sakhaee ◽  
Anika Gebauer ◽  
Mareike Ließ ◽  
Axel Don

Abstract. Soil organic carbon (SOC), as the largest terrestrial carbon pool, has the potential to influence climate change and mitigation, and consequently SOC monitoring is important in the frameworks of different international treaties. There is therefore a need for high resolution SOC maps. Machine learning (ML) offers new opportunities to do this due to its capability for data mining of large datasets. The aim of this study, therefore, was to test three commonly used algorithms in digital soil mapping – random forest (RF), boosted regression trees (BRT) and support vector machine for regression (SVR) – on the first German Agricultural Soil Inventory to model agricultural topsoil SOC content. Nested cross-validation was implemented for model evaluation and parameter tuning. Moreover, grid search and differential evolution algorithm were applied to ensure that each algorithm was tuned and optimised suitably. The SOC content of the German Agricultural Soil Inventory was highly variable, ranging from 4 g kg−1 to 480 g kg−1. However, only 4 % of all soils contained more than 87 g kg−1 SOC and were considered organic or degraded organic soils. The results show that SVR provided the best performance with RMSE of 32 g kg−1 when the algorithms were trained on the full dataset. However, the average RMSE of all algorithms decreased by 34 % when mineral and organic soils were modeled separately, with the best result from SVR with RMSE of 21 g kg−1. Model performance is often limited by the size and quality of the available soil dataset for calibration and validation. Therefore, the impact of enlarging the training data was tested by including 1223 data points from the European Land Use/Land Cover Area Frame Survey for agricultural sites in Germany. The model performance was enhanced for maximum 1 % for mineral soils and 2 % for organic soils. Despite the capability of machine learning algorithms in general, and particularly SVR, in modelling SOC on a national scale, the study showed that the most important to improve the model performance was separate modelling of mineral and organic soils.


2021 ◽  
Vol 29 (Supplement_1) ◽  
pp. i18-i18
Author(s):  
N Hassan ◽  
R Slight ◽  
D Weiand ◽  
A Vellinga ◽  
G Morgan ◽  
...  

Abstract Introduction Sepsis is a life-threatening condition that is associated with increased mortality. Artificial intelligence tools can inform clinical decision making by flagging patients who may be at risk of developing infection and subsequent sepsis and assist clinicians with their care management. Aim To identify the optimal set of predictors used to train machine learning algorithms to predict the likelihood of an infection and subsequent sepsis and inform clinical decision making. Methods This systematic review was registered in PROSPERO database (CRD42020158685). We searched 3 large databases: Medline, Cumulative Index of Nursing and Allied Health Literature, and Embase, using appropriate search terms. We included quantitative primary research studies that focused on sepsis prediction associated with bacterial infection in adult population (>18 years) in all care settings, which included data on predictors to develop machine learning algorithms. The timeframe of the search was 1st January 2000 till the 25th November 2019. Data extraction was performed using a data extraction sheet, and a narrative synthesis of eligible studies was undertaken. Narrative analysis was used to arrange the data into key areas, and compare and contrast between the content of included studies. Quality assessment was performed using Newcastle-Ottawa Quality Assessment scale, which was used to evaluate the quality of non-randomized studies. Bias was not assessed due to the non-randomised nature of the included studies. Results Fifteen articles met our inclusion criteria (Figure 1). We identified 194 predictors that were used to train machine learning algorithms to predict infection and subsequent sepsis, with 13 predictors used on average across all included studies. The most significant predictors included age, gender, smoking, alcohol intake, heart rate, blood pressure, lactate level, cardiovascular disease, endocrine disease, cancer, chronic kidney disease (eGFR<60ml/min), white blood cell count, liver dysfunction, surgical approach (open or minimally invasive), and pre-operative haematocrit < 30%. These predictors were used for the development of all the algorithms in the fifteen articles. All included studies used artificial intelligence techniques to predict the likelihood of sepsis, with average sensitivity 77.5±19.27, and average specificity 69.45±21.25. Conclusion The type of predictors used were found to influence the predictive power and predictive timeframe of the developed machine learning algorithm. Two strengths of our review were that we included studies published since the first definition of sepsis was published in 2001, and identified factors that can improve the predictive ability of algorithms. However, we note that the included studies had some limitations, with three studies not validating the models that they developed, and many tools limited by either their reduced specificity or sensitivity or both. This work has important implications for practice, as predicting the likelihood of sepsis can help inform the management of patients and concentrate finite resources to those patients who are most at risk. Producing a set of predictors can also guide future studies in developing more sensitive and specific algorithms with increased predictive time window to allow for preventive clinical measures.


2022 ◽  
pp. 320-336
Author(s):  
Asiye Bilgili

Health informatics is an interdisciplinary field in the computer and health sciences. Health informatics, which enables the effective use of medical information, has the potential to reduce both the cost and the burden of healthcare workers during the pandemic process. Using the machine learning algorithms support vector machines, naive bayes, k-nearest neighbor, and C4.5 algorithms, a model performance evaluation was performed to identify the algorithm that will show the highest performance for the prediction of the disease. Three separate training and test datasets were created 70% - 30%, 75% - 25%, and 80% - 20%, respectively. The implementation phase of the study was carried out by following the CRISP-DM steps, and the analyses were made using the R language. By examining the model performance evaluation criteria, the findings show that the C4.5 algorithm showed the best performance with 70% training dataset.


2021 ◽  
Author(s):  
Nuno Moniz ◽  
Susana Barbosa

<p>The Dansgaard-Oeschger (DO) events are one of the most striking examples of abrupt climate change in the Earth's history, representing temperature oscillations of about 8 to 16 degrees Celsius within a few decades. DO events have been studied extensively in paleoclimatic records, particularly in ice core proxies. Examples include the Greenland NGRIP record of oxygen isotopic composition.<br>This work addresses the anticipation of DO events using machine learning algorithms. We consider the NGRIP time series from 20 to 60 kyr b2k with the GICC05 timescale and 20-year temporal resolution. Forecasting horizons range from 0 (nowcasting) to 400 years. We adopt three different machine learning algorithms (random forests, support vector machines, and logistic regression) in training windows of 5 kyr. We perform validation on subsequent test windows of 5 kyr, based on timestamps of previous DO events' classification in Greenland by Rasmussen et al. (2014). We perform experiments with both sliding and growing windows.<br>Results show that predictions on sliding windows are better overall, indicating that modelling is affected by non-stationary characteristics of the time series. The three algorithms' predictive performance is similar, with a slightly better performance of random forest models for shorter forecast horizons. The prediction models' predictive capability decreases as the forecasting horizon grows more extensive but remains reasonable up to 120 years. Model performance deprecation is mostly related to imprecision in accurately determining the start and end time of events and identifying some periods as DO events when such is not valid.</p>


2019 ◽  
Vol 31 (4) ◽  
pp. 568-578 ◽  
Author(s):  
Anshit Goyal ◽  
Che Ngufor ◽  
Panagiotis Kerezoudis ◽  
Brandon McCutcheon ◽  
Curtis Storlie ◽  
...  

OBJECTIVENonhome discharge and unplanned readmissions represent important cost drivers following spinal fusion. The authors sought to utilize different machine learning algorithms to predict discharge to rehabilitation and unplanned readmissions in patients receiving spinal fusion.METHODSThe authors queried the 2012–2013 American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) for patients undergoing cervical or lumbar spinal fusion. Outcomes assessed included discharge to nonhome facility and unplanned readmissions within 30 days after surgery. A total of 7 machine learning algorithms were evaluated. Predictive hierarchical clustering of procedure codes was used to increase model performance. Model performance was evaluated using overall accuracy and area under the receiver operating characteristic curve (AUC), as well as sensitivity, specificity, and positive and negative predictive values. These performance metrics were computed for both the imputed and unimputed (missing values dropped) datasets.RESULTSA total of 59,145 spinal fusion cases were analyzed. The incidence rates of discharge to nonhome facility and 30-day unplanned readmission were 12.6% and 4.5%, respectively. All classification algorithms showed excellent discrimination (AUC > 0.80, range 0.85–0.87) for predicting nonhome discharge. The generalized linear model showed comparable performance to other machine learning algorithms. By comparison, all models showed poorer predictive performance for unplanned readmission, with AUC ranging between 0.63 and 0.66. Better predictive performance was noted with models using imputed data.CONCLUSIONSIn an analysis of patients undergoing spinal fusion, multiple machine learning algorithms were found to reliably predict nonhome discharge with modest performance noted for unplanned readmissions. These results provide early evidence regarding the feasibility of modern machine learning classifiers in predicting these outcomes and serve as possible clinical decision support tools to facilitate shared decision making.


Molecules ◽  
2020 ◽  
Vol 25 (21) ◽  
pp. 4987
Author(s):  
Hongyan Zhu ◽  
Jun-Li Xu

Different varieties and geographical origins of walnut usually lead to different nutritional values, contributing to a big difference in the final price. The conventional analytical techniques have some unavoidable limitations, e.g., chemical analysis is usually time-expensive and labor-intensive. Therefore, this work aims to apply Fourier transform mid-infrared spectroscopy coupled with machine learning algorithms for the rapid and accurate classification of walnut species that originated from ten varieties produced from four provinces. Three types of models were developed by using five machine learning classifiers to (1) differentiate four geographical origins; (2) identify varieties produced from the same origin; and (3) classify all 10 varieties from four origins. Prior to modeling, the wavelet transform algorithm was used to smooth and denoise the spectrum. The results showed that the identification of varieties under the same origin performed the best (i.e., accuracy = 100% for some origins), followed by the classification of four different origins (i.e., accuracy = 96.97%), while the discrimination of all 10 varieties is the least desirable (i.e., accuracy = 87.88%). Our results implicated that using the full spectral range of 700–4350 cm−1 is inferior to using the subsets of the optimal spectral variables for some classifiers. Additionally, it is demonstrated that back propagation neural network (BPNN) delivered the best model performance, while random forests (RF) produced the worst outcome. Hence, this work showed that the authentication and provenance of walnut can be realized effectively based on Fourier transform mid-infrared spectroscopy combined with machine learning algorithms.


2021 ◽  
Vol 13 (12) ◽  
pp. 2242
Author(s):  
Jianzhao Liu ◽  
Yunjiang Zuo ◽  
Nannan Wang ◽  
Fenghui Yuan ◽  
Xinhao Zhu ◽  
...  

The net ecosystem CO2 exchange (NEE) is a critical parameter for quantifying terrestrial ecosystems and their contributions to the ongoing climate change. The accumulation of ecological data is calling for more advanced quantitative approaches for assisting NEE prediction. In this study, we applied two widely used machine learning algorithms, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), to build models for simulating NEE in major biomes based on the FLUXNET dataset. Both models accurately predicted NEE in all biomes, while XGBoost had higher computational efficiency (6~62 times faster than RF). Among environmental variables, net solar radiation, soil water content, and soil temperature are the most important variables, while precipitation and wind speed are less important variables in simulating temporal variations of site-level NEE as shown by both models. Both models perform consistently well for extreme climate conditions. Extreme heat and dryness led to much worse model performance in grassland (extreme heat: R2 = 0.66~0.71, normal: R2 = 0.78~0.81; extreme dryness: R2 = 0.14~0.30, normal: R2 = 0.54~0.55), but the impact on forest is less (extreme heat: R2 = 0.50~0.78, normal: R2 = 0.59~0.87; extreme dryness: R2 = 0.86~0.90, normal: R2 = 0.81~0.85). Extreme wet condition did not change model performance in forest ecosystems (with R2 changing −0.03~0.03 compared with normal) but led to substantial reduction in model performance in cropland (with R2 decreasing 0.20~0.27 compared with normal). Extreme cold condition did not lead to much changes in model performance in forest and woody savannas (with R2 decreasing 0.01~0.08 and 0.09 compared with normal, respectively). Our study showed that both models need training samples at daily timesteps of >2.5 years to reach a good model performance and >5.4 years of daily samples to reach an optimal model performance. In summary, both RF and XGBoost are applicable machine learning algorithms for predicting ecosystem NEE, and XGBoost algorithm is more feasible than RF in terms of accuracy and efficiency.


Music makes up a huge portion of the contents stored and used over the internet, with several sites and applications developed solely to provide music-related services to their users/ customers.Some of the most challenging tasks in this scenario would include music classification based on languages and genres, playlist suggestions based on music history, song suggestions based on playlist contents, top genres / songs based on listeners' rating, likes, number of streams, song loops, popularity of artists based on number of songs released per year, hit songs per year, etc. One of the most important stages to solve the above-mentioned challenges would be music genre classification. It would be impractical to analyze each and every song in a given database to identify and classify music genres, even though human beings are better at performing such tasks. Hence, useful Machine Learning algorithms and Deep Learning approaches may be used for accomplishing such tasks with ease. A thorough analysis to understand the different uses of Machine Learning and Deep Learning algorithms and relevance of such algorithms with respect to situations would be made to highlight and contrast the advantages and disadvantages of each approach. The outcomes of the optimized models would be visualized and comparedto the expected outcomes for better perception.


Sign in / Sign up

Export Citation Format

Share Document