Hierarchical vs. flat n-gram-based text categorization: Can we do better?

Hierarchical text categorization (HTC) refers to assigning a text document to one or more most suitable categories from a hierarchical category space. In this paper we present two HTC techniques based on kNN and SVM machine learning techniques for categorization process and byte n-gram based document representation. They are fully language independent and do not require any text preprocessing steps, or any prior information about document content or language. The effectiveness of the presented techniques and their language independence are demonstrated in experiments performed on five tree-structured benchmark category hierarchies that differ in many aspects: Reuters-Hier1, Reuters-Hier2, 15NGHier and 20NGHier in English and TanCorpHier in Chinese. The results obtained are compared with the corresponding flat categorization techniques applied to leaf level categories of the considered hierarchies. While kNN-based flat text categorization produced slightly better results than kNN-based HTC on the largest TanCorpHier and 20NGHier datasets, SVM-based HTC results do not considerably differ from the corresponding flat techniques, due to shallow hierarchies; still, they outperform both kNN-based flat and hierarchical categorization on all corpora except the smallest Reuters-Hier1 and Reuters-Hier2 datasets. Formal evaluation confirmed that the proposed techniques obtained state-of-the-art results.

Download Full-text

Advances in the Application of Machine Learning Techniques for Power System Analytics: A Survey

Energies ◽

10.3390/en14164776 ◽

2021 ◽

Vol 14 (16) ◽

pp. 4776

Author(s):

Seyed Mahdi Miraftabzadeh ◽

Michela Longo ◽

Federica Foiadelli ◽

Marco Pasetti ◽

Raul Igual

Keyword(s):

Machine Learning ◽

Power Systems ◽

Smart Grids ◽

State Of The Art ◽

Smart Cities ◽

Power Grids ◽

Machine Learning Techniques ◽

Learning Techniques ◽

New Research ◽

Traditional Approaches

The recent advances in computing technologies and the increasing availability of large amounts of data in smart grids and smart cities are generating new research opportunities in the application of Machine Learning (ML) for improving the observability and efficiency of modern power grids. However, as the number and diversity of ML techniques increase, questions arise about their performance and applicability, and on the most suitable ML method depending on the specific application. Trying to answer these questions, this manuscript presents a systematic review of the state-of-the-art studies implementing ML techniques in the context of power systems, with a specific focus on the analysis of power flows, power quality, photovoltaic systems, intelligent transportation, and load forecasting. The survey investigates, for each of the selected topics, the most recent and promising ML techniques proposed by the literature, by highlighting their main characteristics and relevant results. The review revealed that, when compared to traditional approaches, ML algorithms can handle massive quantities of data with high dimensionality, by allowing the identification of hidden characteristics of (even) complex systems. In particular, even though very different techniques can be used for each application, hybrid models generally show better performances when compared to single ML-based models.

Download Full-text

A state-of-the-art review of machine learning techniques for fraud detection research

Proceedings of the 2018 International Conference on Software Engineering in Africa - SEiA '18 ◽

10.1145/3195528.3195534 ◽

2018 ◽

Cited By ~ 2

Author(s):

Sinayobye Janvier Omar ◽

Kiwanuka Fred ◽

Kaawaase Kyanda Swaib

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Fraud Detection ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

State of the Art in Artificial Intelligence and Machine Learning Techniques for Improving Patient Outcomes Pertaining to the Cardiovascular and Respiratory Systems

Cardiac Bioelectric Therapy ◽

10.1007/978-3-030-63355-4_24 ◽

2021 ◽

pp. 335-352

Author(s):

Wan-Tai M. Au-Yeung ◽

Rahul Kumar Sevakula ◽

Jagmeet P. Singh ◽

E. Kevin Heist ◽

Eric M. Isselbacher ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Patient Outcomes ◽

State Of The Art ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Respiratory Systems

Download Full-text

Machine Learning Techniques in IoT Applications: A State of The Art

IoT Applications, Security Threats, and Countermeasures ◽

10.1201/9781003124252-6 ◽

2021 ◽

pp. 105-117

Author(s):

Shaw Laxmi ◽

Narayan Sahoo Rudra ◽

K. Hemachandran ◽

Kumar Nanda Santosh

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Machine Learning Techniques ◽

Iot Applications ◽

Learning Techniques

Download Full-text

Multi-Hazard Exposure Mapping Using Machine Learning Techniques: A Case Study from Iran

Remote Sensing ◽

10.3390/rs11161943 ◽

2019 ◽

Vol 11 (16) ◽

pp. 1943 ◽

Cited By ~ 15

Author(s):

Omid Rahmati ◽

Saleh Yousefi ◽

Zahra Kalantari ◽

Evelyn Uuemaa ◽

Teimur Teimurian ◽

...

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Characteristic Curve ◽

Machine Learning Techniques ◽

Support Vector ◽

Mountainous Area ◽

Data Set ◽

Boosted Regression Tree ◽

Hazard Exposure ◽

Learning Techniques

Mountainous areas are highly prone to a variety of nature-triggered disasters, which often cause disabling harm, death, destruction, and damage. In this work, an attempt was made to develop an accurate multi-hazard exposure map for a mountainous area (Asara watershed, Iran), based on state-of-the art machine learning techniques. Hazard modeling for avalanches, rockfalls, and floods was performed using three state-of-the-art models—support vector machine (SVM), boosted regression tree (BRT), and generalized additive model (GAM). Topo-hydrological and geo-environmental factors were used as predictors in the models. A flood dataset (n = 133 flood events) was applied, which had been prepared using Sentinel-1-based processing and ground-based information. In addition, snow avalanche (n = 58) and rockfall (n = 101) data sets were used. The data set of each hazard type was randomly divided to two groups: Training (70%) and validation (30%). Model performance was evaluated by the true skill score (TSS) and the area under receiver operating characteristic curve (AUC) criteria. Using an exposure map, the multi-hazard map was converted into a multi-hazard exposure map. According to both validation methods, the SVM model showed the highest accuracy for avalanches (AUC = 92.4%, TSS = 0.72) and rockfalls (AUC = 93.7%, TSS = 0.81), while BRT demonstrated the best performance for flood hazards (AUC = 94.2%, TSS = 0.80). Overall, multi-hazard exposure modeling revealed that valleys and areas close to the Chalous Road, one of the most important roads in Iran, were associated with high and very high levels of risk. The proposed multi-hazard exposure framework can be helpful in supporting decision making on mountain social-ecological systems facing multiple hazards.

Download Full-text

Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions

Computer Science Review ◽

10.1016/j.cosrev.2020.100311 ◽

2020 ◽

Vol 38 ◽

pp. 100311

Author(s):

Femi Emmanuel Ayo ◽

Olusegun Folorunso ◽

Friday Thomas Ibharalu ◽

Idowu Ademola Osinuga

Keyword(s):

Machine Learning ◽

Hate Speech ◽

State Of The Art ◽

Machine Learning Techniques ◽

Research Directions ◽

Twitter Data ◽

Learning Techniques ◽

Future Challenges ◽

Speech Classification

Download Full-text

Survey of Machine Learning Techniques in Drug Discovery

Current Drug Metabolism ◽

10.2174/1389200219666180820112457 ◽

2019 ◽

Vol 20 (3) ◽

pp. 185-193 ◽

Cited By ~ 47

Author(s):

Natalie Stephenson ◽

Emily Shane ◽

Jessica Chase ◽

Jason Rowland ◽

David Ries ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Large Scale ◽

State Of The Art ◽

New Drugs ◽

Machine Learning Techniques ◽

Current Status ◽

Learning Techniques ◽

Pharmaceutical Industries ◽

Current Stage

Background:Drug discovery, which is the process of discovering new candidate medications, is very important for pharmaceutical industries. At its current stage, discovering new drugs is still a very expensive and time-consuming process, requiring Phases I, II and III for clinical trials. Recently, machine learning techniques in Artificial Intelligence (AI), especially the deep learning techniques which allow a computational model to generate multiple layers, have been widely applied and achieved state-of-the-art performance in different fields, such as speech recognition, image classification, bioinformatics, etc. One very important application of these AI techniques is in the field of drug discovery.Methods:We did a large-scale literature search on existing scientific websites (e.g, ScienceDirect, Arxiv) and startup companies to understand current status of machine learning techniques in drug discovery.Results:Our experiments demonstrated that there are different patterns in machine learning fields and drug discovery fields. For example, keywords like prediction, brain, discovery, and treatment are usually in drug discovery fields. Also, the total number of papers published in drug discovery fields with machine learning techniques is increasing every year.Conclusion:The main focus of this survey is to understand the current status of machine learning techniques in the drug discovery field within both academic and industrial settings, and discuss its potential future applications. Several interesting patterns for machine learning techniques in drug discovery fields are discussed in this survey.

Download Full-text

State‐of‐the‐Art Machine Learning Techniques Aiming to Improve Patient Outcomes Pertaining to the Cardiovascular System

Journal of the American Heart Association ◽

10.1161/jaha.119.013924 ◽

2020 ◽

Vol 9 (4) ◽

Cited By ~ 7

Author(s):

Rahul Kumar Sevakula ◽

Wan‐Tai M. Au‐Yeung ◽

Jagmeet P. Singh ◽

E. Kevin Heist ◽

Eric M. Isselbacher ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular System ◽

Patient Outcomes ◽

State Of The Art ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Improve Patient

Download Full-text

Blue Skies: A Methodology for Data-Driven Clear Sky Modelling

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/528 ◽

2017 ◽

Author(s):

Kartik Palani ◽

Ramachandra Kota ◽

Amar Prakash Azad ◽

Vijay Arya

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Weather Conditions ◽

Data Driven ◽

Machine Learning Techniques ◽

Clear Sky ◽

Learning Techniques ◽

Atmospheric Data ◽

Average Irradiance ◽

Photo Voltaic

One of the major challenges confronting the widespread adoption of solar energy is the uncertainty of production. The energy generated by photo-voltaic systems is a function of the received solar irradiance which varies due to atmospheric and weather conditions. A key component required for forecasting irradiance accurately is the clear sky model which estimates the average irradiance at a location at a given time in the absence of clouds. Current methods for modelling clear sky irradiance are either inaccurate or require extensive atmospheric data, which tends to vary with location and is often unavailable. In this paper, we present a data-driven methodology, Blue Skies, for modelling clear sky irradiance solely based on historical irradiance measurements. Using machine learning techniques, Blue Skies is able to generate clear sky models that are more accurate spatio-temporally compared to the state of the art, reducing errors by almost 50%.

Download Full-text

Deep learning regression model for antimicrobial peptide design

10.1101/692681 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jacob Witten ◽

Zack Witten

Keyword(s):

Neural Network ◽

Antimicrobial Peptides ◽

Network Architecture ◽

State Of The Art ◽

Machine Learning Techniques ◽

Neural Network Architecture ◽

Antibiotic Resistant ◽

E Coli ◽

Naturally Occurring ◽

Learning Techniques

AbstractAntimicrobial peptides (AMPs) are naturally occurring or synthetic peptides that show promise for treating antibiotic-resistant pathogens. Machine learning techniques are increasingly used to identify naturally occurring AMPs, but there is a dearth of purely computational methods to design novel effective AMPs, which would speed AMP development. We collected a large database, Giant Repository of AMP Activities (GRAMPA), containing AMP sequences and associated MICs. We designed a convolutional neural network to perform combined classification and regression on peptide sequences to quantitatively predict AMP activity against Escherichia coli. Our predictions outperformed the state of the art at AMP classification and were also effective at regression, for which there were no publicly available comparisons. We then used our model to design novel AMPs and experimentally demonstrated activity of these AMPs against the pathogens E. coli, Pseudomonas aeruginosa, and Staphylococcus aureus. Data, code, and neural network architecture and parameters are available at https://github.com/zswitten/Antimicrobial-Peptides.

Download Full-text