Model-Driven Data Warehouse Automation

Transformation design is a key step in model-driven engineering, and it is a very challenging task, particularly in context of the model-driven data warehouse. Currently, this process is ensured by human experts. The authors propose a new methodology using machine learning techniques to automatically derive these transformation rules. The main goal is to automatically derive the transformation rules to be applied in the model-driven data warehouse process. The proposed solution allows for a simple design of the decision support systems and the reduction of time and costs of development. The authors use the inductive logic programming framework to learn these transformation rules from examples of previous projects. Then, they find that in model-driven data warehouse application, dependencies exist between transformations. Therefore, the authors investigate a new machine learning methodology, learning dependent-concepts, that is suitable to solve this kind of problem. The experimental evaluation shows that the dependent-concept learning approach gives significantly better results.

Download Full-text

A Semi-Supervised Learning Approach for Tackling Twitter Spam Drift

International Journal of Computational Intelligence and Applications ◽

10.1142/s146902681950010x ◽

2019 ◽

Vol 18 (02) ◽

pp. 1950010 ◽

Cited By ~ 2

Author(s):

Niddal Imam ◽

Biju Issac ◽

Seibu Mary Jacob

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Research Community ◽

Machine Learning Techniques ◽

Spam Detection ◽

Learning Approach ◽

New Approach ◽

Detection Systems ◽

Learning Techniques ◽

Over Time

Twitter has changed the way people get information by allowing them to express their opinion and comments on the daily tweets. Unfortunately, due to the high popularity of Twitter, it has become very attractive to spammers. Unlike other types of spam, Twitter spam has become a serious issue in the last few years. The large number of users and the high amount of information being shared on Twitter play an important role in accelerating the spread of spam. In order to protect the users, Twitter and the research community have been developing different spam detection systems by applying different machine-learning techniques. However, a recent study showed that the current machine learning-based detection systems are not able to detect spam accurately because spam tweet characteristics vary over time. This issue is called “Twitter Spam Drift”. In this paper, a semi-supervised learning approach (SSLA) has been proposed to tackle this. The new approach uses the unlabeled data to learn the structure of the domain. Different experiments were performed on English and Arabic datasets to test and evaluate the proposed approach and the results show that the proposed SSLA can reduce the effect of Twitter spam drift and outperform the existing techniques.

Download Full-text

A Hybrid Machine Learning Approach for Freeway Traffic Speed Estimation

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198120935875 ◽

2020 ◽

Vol 2674 (10) ◽

pp. 68-78

Author(s):

Zhao Zhang ◽

Yun Yuan ◽

Xianfeng (Terry) Yang

Keyword(s):

Machine Learning ◽

Flow Model ◽

Hybrid Approach ◽

Traffic Monitoring ◽

Machine Learning Techniques ◽

Learning Approach ◽

Freeway Traffic ◽

Learning Techniques ◽

Machine Learning Approach ◽

Traffic Speed Estimation

Accurate and timely estimation of freeway traffic speeds by short segments plays an important role in traffic monitoring systems. In the literature, the ability of machine learning techniques to capture the stochastic characteristics of traffic has been proved. Also, the deployment of intelligent transportation systems (ITSs) has provided enriched traffic data, which enables the adoption of a variety of machine learning methods to estimate freeway traffic speeds. However, the limitation of data quality and coverage remain a big challenge in current traffic monitoring systems. To overcome this problem, this study aims to develop a hybrid machine learning approach, by creating a new training variable based on the second-order traffic flow model, to improve the accuracy of traffic speed estimation. Grounded on a novel integrated framework, the estimation is performed using three machine learning techniques, that is, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN). All three models are trained with the integrated dataset including the traffic flow model estimates and the iPeMS and PeMS data from the Utah Department of Transportation (DOT). Further using the PeMS data as the ground truth for model evaluation, the comparisons between the hybrid approach and pure machine learning models show that the hybrid approach can effectively capture the time-varying pattern of the traffic and help improve the estimation accuracy.

Download Full-text

Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques

International Journal of Systems Assurance Engineering and Management ◽

10.1007/s13198-016-0508-1 ◽

2016 ◽

Vol 8 (S2) ◽

pp. 703-715 ◽

Cited By ~ 2

Author(s):

Sangeeta Sabharwal ◽

Sushama Nagpal ◽

Gargi Aggarwal

Keyword(s):

Machine Learning ◽

Data Warehouse ◽

Empirical Analysis ◽

Object Oriented ◽

Machine Learning Techniques ◽

Multidimensional Model ◽

Unsupervised Machine Learning ◽

Learning Techniques

Download Full-text

Machine Learning for Epidemiological Analysis in The Industrial Area for a Sustainable Life

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1107.1292s219 ◽

2019 ◽

Vol 9 (2S2) ◽

pp. 1017-1025

Keyword(s):

Machine Learning ◽

Air Pollution ◽

Industrial Area ◽

Epidemiological Studies ◽

Machine Learning Techniques ◽

Learning Approach ◽

Learning Techniques ◽

Machine Learning Approach ◽

Air Pollution Exposure ◽

Pollution Exposure

Pollution exposure and human health in the industry contaminated area are always a concern. The need for industrialization urges to concentrate on sustainable life of residents in the vicinity of the industrial area rather than opposing the industrialists. Literature in epidemiological studies reveal that air pollution is one of the major problems for health risks faced by residents in the industrial area. Main pollutants in industry related air pollution are particulate matter (PM2.5, PM10), SO2 , NO2 , and other pollutants upon the industry. Data for epidemiological studies obtained from different sources which are limited to public access include residents’ sociodemographic characters, health problems, and air quality index for personal exposure to pollutants. This combined data and limited resources make the analysis more complex so that statistical methods cannot compensate. Our review finds that there is an increase in literature that evaluates the connection between ambient air pollution exposure and associated health events of residents in the industrially polluted area using statistical methods, mainly regression models. A very few applies machine learning techniques to figure out the impact of common air pollution exposure on human health. Most of the machine learning approach to epidemiological studies end up in air pollution exposure monitoring, not to correlate its association with diseases. A machine learning approach to epidemiological studies can automatically characterize the residents’ exposure to pollutants and its associated health effects. Uniqueness of the model depends on the appropriate exhaustive data that characterizes the features, and machine learning algorithm used to build the model. In this contribution, we discuss various existing approaches that evaluate residents’ health effects and the source of irritation in association with air pollution exposure, focuses machine learning techniques and mathematical background for epidemiological studies for residents’ sustainable life.

Download Full-text

Relevant Features of Polypharmacologic Human-Target Antimicrobials Discovered by Machine-Learning Techniques

Pharmaceuticals ◽

10.3390/ph13090204 ◽

2020 ◽

Vol 13 (9) ◽

pp. 204

Author(s):

Rodrigo A. Nava Lara ◽

Jesús A. Beltrán ◽

Carlos A. Brizuela ◽

Gabriel Del Rio

Keyword(s):

Machine Learning ◽

Broad Spectrum ◽

Human Microbiome ◽

Antibiotic Activity ◽

Machine Learning Techniques ◽

Learning Approach ◽

Broad Spectrum Antibiotic ◽

Approach Training ◽

Learning Techniques ◽

Machine Learning Approach

Polypharmacologic human-targeted antimicrobials (polyHAM) are potentially useful in the treatment of complex human diseases where the microbiome is important (e.g., diabetes, hypertension). We previously reported a machine-learning approach to identify polyHAM from FDA-approved human targeted drugs using a heterologous approach (training with peptides and non-peptide compounds). Here we discover that polyHAM are more likely to be found among antimicrobials displaying a broad-spectrum antibiotic activity and that topological, but not chemical features, are most informative to classify this activity. A heterologous machine-learning approach was trained with broad-spectrum antimicrobials and tested with human metabolites; these metabolites were labeled as antimicrobials or non-antimicrobials based on a naïve text-mining approach. Human metabolites are not commonly recognized as antimicrobials yet circulate in the human body where microbes are found and our heterologous model was able to classify those with antimicrobial activity. These results provide the basis to develop applications aimed to design human diets that purposely alter metabolic compounds proportions as a way to control human microbiome.

Download Full-text

Extraction of drug–drug interaction using neural embedding

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018400279 ◽

2018 ◽

Vol 16 (06) ◽

pp. 1840027 ◽

Cited By ~ 2

Author(s):

Wen Juan Hou ◽

Bamfa Ceesay

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Drug Interaction ◽

Relevant Information ◽

Learning System ◽

Machine Learning Techniques ◽

Future Research ◽

Learning Approach ◽

Learning Techniques ◽

Drug Drug Interaction

Information on changes in a drug’s effect when taken in combination with a second drug, known as drug–drug interaction (DDI), is relevant in the pharmaceutical industry. DDIs can delay, decrease, or enhance absorption of either drug and thus decrease or increase their action or cause adverse effects. Information Extraction (IE) can be of great benefit in allowing identification and extraction of relevant information on DDIs. We here propose an approach for the extraction of DDI from text using neural word embedding to train a machine learning system. Results show that our system is competitive against other systems for the task of extracting DDIs, and that significant improvements can be achieved by learning from word features and using a deep-learning approach. Our study demonstrates that machine learning techniques such as neural networks and deep learning methods can efficiently aid in IE from text. Our proposed approach is well suited to play a significant role in future research.

Download Full-text

A Comprehensive Review on Online News Popularity Prediction using Machine Learning Approach

SMART MOVES JOURNAL IJOSCIENCE ◽

10.24113/ijoscience.v5i1.181 ◽

2019 ◽

Vol 5 (1) ◽

pp. 7

Author(s):

Priyanka Rathord ◽

Dr. Anurag Jain ◽

Chetan Agrawal

Keyword(s):

Machine Learning ◽

Social Media ◽

Online News ◽

Machine Learning Techniques ◽

News Article ◽

Learning Approach ◽

Learning Techniques ◽

Machine Learning Approach ◽

The World ◽

Popularity Prediction

With the help of Internet, the online news can be instantly spread around the world. Most of peoples now have the habit of reading and sharing news online, for instance, using social media like Twitter and Facebook. Typically, the news popularity can be indicated by the number of reads, likes or shares. For the online news stake holders such as content providers or advertisers, it’s very valuable if the popularity of the news articles can be accurately predicted prior to the publication. Thus, it is interesting and meaningful to use the machine learning techniques to predict the popularity of online news articles. Various works have been done in prediction of online news popularity. Popularity of news depends upon various features like sharing of online news on social media, comments of visitors for news, likes for news articles etc. It is necessary to know what makes one online news article more popular than another article. Unpopular articles need to get optimize for further popularity. In this paper, different methodologies are analyzed which predict the popularity of online news articles. These methodologies are compared, their parameters are considered and improvements are suggested. The proposed methodology describes online news popularity predicting system.

Download Full-text

A comparison of the integrated fuzzy object-based deep learning approach and three machine learning techniques for land use/cover change monitoring and environmental impacts assessment

GIScience & Remote Sensing ◽

10.1080/15481603.2021.2000350 ◽

2021 ◽

pp. 1-28

Author(s):

Bakhtiar Feizizadeh ◽

Keyvan Mohammadzade Alajujeh ◽

Tobia Lakes ◽

Thomas Blaschke ◽

Davoud Omarzadeh

Keyword(s):

Machine Learning ◽

Land Use ◽

Deep Learning ◽

Environmental Impacts ◽

Machine Learning Techniques ◽

Learning Approach ◽

Learning Techniques ◽

Object Based ◽

Change Monitoring

Download Full-text

A segmented machine learning modeling approach of social media for predicting occupancy

International Journal of Contemporary Hospitality Management ◽

10.1108/ijchm-06-2020-0611 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Cited By ~ 1

Author(s):

Apostolos Ampountolas ◽

Mark P. Legg

Keyword(s):

Machine Learning ◽

Social Media ◽

Life Cycles ◽

Machine Learning Techniques ◽

Learning Approach ◽

Negative Effects ◽

Data Set ◽

Content Type ◽

Learning Techniques ◽

The Usa

Purpose This study aims to predict hotel demand through text analysis by investigating keyword series to increase demand predictions’ precision. To do so, this paper presents a framework for modeling hotel demand that incorporates machine learning techniques. Design/methodology/approach The empirical forecasting is conducted by introducing a segmented machine learning approach of leveraging hierarchical clustering tied to machine learning and deep learning techniques. These features allow the model to yield more precise estimates. This study evaluates an extensive range of social media–derived words with the most significant probability of gradually establishing an understanding of an optimal outcome. Analyzes were performed on a major hotel chain in an urban market setting within the USA. Findings The findings indicate that while traditional methods, being the naïve approach and ARIMA models, struggled with forecasting accuracy, segmented boosting methods (XGBoost) leveraging social media predict hotel occupancy with greater precision for all examined time horizons. Additionally, the segmented learning approach improved the forecasts’ stability and robustness while mitigating common overfitting issues within a highly dimensional data set. Research limitations/implications Incorporating social media into a segmented learning framework can augment the current generation of forecasting methods’ accuracy. Moreover, the segmented learning approach mitigates the negative effects of market shifts (e.g. COVID-19) that can reduce in-production forecasts’ life-cycles. The ability to be more robust to market deviations will allow hospitality firms to minimize development time. Originality/value The results are expected to generate insights by providing revenue managers with an instrument for predicting demand.

Download Full-text

Deep Learning Approach for Extracting Catch Phrases from Legal Documents

Advances in Computer and Electrical Engineering - Neural Networks for Natural Language Processing ◽

10.4018/978-1-7998-1159-6.ch009 ◽

2020 ◽

pp. 143-158

Author(s):

Kayalvizhi S. ◽

Thenmozhi D.

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Statistical Methods ◽

Deep Neural Network ◽

Mean Average Precision ◽

Machine Learning Techniques ◽

Learning Approach ◽

Legal Documents ◽

Learning Techniques

Catch phrases are the important phrases that precisely explain the document. They represent the context of the whole document. They can also be used to retrieve relevant prior cases by the judges and lawyers for assuring justice in the domain of law. Currently, catch phrases are extracted using statistical methods, machine learning techniques, and deep learning techniques. The authors propose a sequence to sequence (Seq2Seq) deep neural network to extract catch phrases from legal documents. They have employed several layers, namely embedding layer, encoder-decoder layer, projection layer, and loss layer to build the deep neural network. The methodology is evaluated on IRLeD@FIRE-2017 dataset and the method has obtained 0.787 and 0.607 as mean average precision and recall scores respectively. Results show that the proposed method outperforms the existing systems.

Download Full-text