Model-Driven Data Warehouse Automation

Author(s):  
Moez Essaidi ◽  
Aomar Osmani ◽  
Céline Rouveirol

Transformation design is a key step in model-driven engineering, and it is a very challenging task, particularly in context of the model-driven data warehouse. Currently, this process is ensured by human experts. The authors propose a new methodology using machine learning techniques to automatically derive these transformation rules. The main goal is to automatically derive the transformation rules to be applied in the model-driven data warehouse process. The proposed solution allows for a simple design of the decision support systems and the reduction of time and costs of development. The authors use the inductive logic programming framework to learn these transformation rules from examples of previous projects. Then, they find that in model-driven data warehouse application, dependencies exist between transformations. Therefore, the authors investigate a new machine learning methodology, learning dependent-concepts, that is suitable to solve this kind of problem. The experimental evaluation shows that the dependent-concept learning approach gives significantly better results.

Author(s):  
Niddal Imam ◽  
Biju Issac ◽  
Seibu Mary Jacob

Twitter has changed the way people get information by allowing them to express their opinion and comments on the daily tweets. Unfortunately, due to the high popularity of Twitter, it has become very attractive to spammers. Unlike other types of spam, Twitter spam has become a serious issue in the last few years. The large number of users and the high amount of information being shared on Twitter play an important role in accelerating the spread of spam. In order to protect the users, Twitter and the research community have been developing different spam detection systems by applying different machine-learning techniques. However, a recent study showed that the current machine learning-based detection systems are not able to detect spam accurately because spam tweet characteristics vary over time. This issue is called “Twitter Spam Drift”. In this paper, a semi-supervised learning approach (SSLA) has been proposed to tackle this. The new approach uses the unlabeled data to learn the structure of the domain. Different experiments were performed on English and Arabic datasets to test and evaluate the proposed approach and the results show that the proposed SSLA can reduce the effect of Twitter spam drift and outperform the existing techniques.


Author(s):  
Zhao Zhang ◽  
Yun Yuan ◽  
Xianfeng (Terry) Yang

Accurate and timely estimation of freeway traffic speeds by short segments plays an important role in traffic monitoring systems. In the literature, the ability of machine learning techniques to capture the stochastic characteristics of traffic has been proved. Also, the deployment of intelligent transportation systems (ITSs) has provided enriched traffic data, which enables the adoption of a variety of machine learning methods to estimate freeway traffic speeds. However, the limitation of data quality and coverage remain a big challenge in current traffic monitoring systems. To overcome this problem, this study aims to develop a hybrid machine learning approach, by creating a new training variable based on the second-order traffic flow model, to improve the accuracy of traffic speed estimation. Grounded on a novel integrated framework, the estimation is performed using three machine learning techniques, that is, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN). All three models are trained with the integrated dataset including the traffic flow model estimates and the iPeMS and PeMS data from the Utah Department of Transportation (DOT). Further using the PeMS data as the ground truth for model evaluation, the comparisons between the hybrid approach and pure machine learning models show that the hybrid approach can effectively capture the time-varying pattern of the traffic and help improve the estimation accuracy.


Pollution exposure and human health in the industry contaminated area are always a concern. The need for industrialization urges to concentrate on sustainable life of residents in the vicinity of the industrial area rather than opposing the industrialists. Literature in epidemiological studies reveal that air pollution is one of the major problems for health risks faced by residents in the industrial area. Main pollutants in industry related air pollution are particulate matter (PM2.5, PM10), SO2 , NO2 , and other pollutants upon the industry. Data for epidemiological studies obtained from different sources which are limited to public access include residents’ sociodemographic characters, health problems, and air quality index for personal exposure to pollutants. This combined data and limited resources make the analysis more complex so that statistical methods cannot compensate. Our review finds that there is an increase in literature that evaluates the connection between ambient air pollution exposure and associated health events of residents in the industrially polluted area using statistical methods, mainly regression models. A very few applies machine learning techniques to figure out the impact of common air pollution exposure on human health. Most of the machine learning approach to epidemiological studies end up in air pollution exposure monitoring, not to correlate its association with diseases. A machine learning approach to epidemiological studies can automatically characterize the residents’ exposure to pollutants and its associated health effects. Uniqueness of the model depends on the appropriate exhaustive data that characterizes the features, and machine learning algorithm used to build the model. In this contribution, we discuss various existing approaches that evaluate residents’ health effects and the source of irritation in association with air pollution exposure, focuses machine learning techniques and mathematical background for epidemiological studies for residents’ sustainable life.


2020 ◽  
Vol 13 (9) ◽  
pp. 204
Author(s):  
Rodrigo A. Nava Lara ◽  
Jesús A. Beltrán ◽  
Carlos A. Brizuela ◽  
Gabriel Del Rio

Polypharmacologic human-targeted antimicrobials (polyHAM) are potentially useful in the treatment of complex human diseases where the microbiome is important (e.g., diabetes, hypertension). We previously reported a machine-learning approach to identify polyHAM from FDA-approved human targeted drugs using a heterologous approach (training with peptides and non-peptide compounds). Here we discover that polyHAM are more likely to be found among antimicrobials displaying a broad-spectrum antibiotic activity and that topological, but not chemical features, are most informative to classify this activity. A heterologous machine-learning approach was trained with broad-spectrum antimicrobials and tested with human metabolites; these metabolites were labeled as antimicrobials or non-antimicrobials based on a naïve text-mining approach. Human metabolites are not commonly recognized as antimicrobials yet circulate in the human body where microbes are found and our heterologous model was able to classify those with antimicrobial activity. These results provide the basis to develop applications aimed to design human diets that purposely alter metabolic compounds proportions as a way to control human microbiome.


2018 ◽  
Vol 16 (06) ◽  
pp. 1840027 ◽  
Author(s):  
Wen Juan Hou ◽  
Bamfa Ceesay

Information on changes in a drug’s effect when taken in combination with a second drug, known as drug–drug interaction (DDI), is relevant in the pharmaceutical industry. DDIs can delay, decrease, or enhance absorption of either drug and thus decrease or increase their action or cause adverse effects. Information Extraction (IE) can be of great benefit in allowing identification and extraction of relevant information on DDIs. We here propose an approach for the extraction of DDI from text using neural word embedding to train a machine learning system. Results show that our system is competitive against other systems for the task of extracting DDIs, and that significant improvements can be achieved by learning from word features and using a deep-learning approach. Our study demonstrates that machine learning techniques such as neural networks and deep learning methods can efficiently aid in IE from text. Our proposed approach is well suited to play a significant role in future research.


2019 ◽  
Vol 5 (1) ◽  
pp. 7
Author(s):  
Priyanka Rathord ◽  
Dr. Anurag Jain ◽  
Chetan Agrawal

With the help of Internet, the online news can be instantly spread around the world. Most of peoples now have the habit of reading and sharing news online, for instance, using social media like Twitter and Facebook. Typically, the news popularity can be indicated by the number of reads, likes or shares. For the online news stake holders such as content providers or advertisers, it’s very valuable if the popularity of the news articles can be accurately predicted prior to the publication. Thus, it is interesting and meaningful to use the machine learning techniques to predict the popularity of online news articles. Various works have been done in prediction of online news popularity. Popularity of news depends upon various features like sharing of online news on social media, comments of visitors for news, likes for news articles etc. It is necessary to know what makes one online news article more popular than another article. Unpopular articles need to get optimize for further popularity. In this paper, different methodologies are analyzed which predict the popularity of online news articles. These methodologies are compared, their parameters are considered and improvements are suggested. The proposed methodology describes online news popularity predicting system.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Apostolos Ampountolas ◽  
Mark P. Legg

Purpose This study aims to predict hotel demand through text analysis by investigating keyword series to increase demand predictions’ precision. To do so, this paper presents a framework for modeling hotel demand that incorporates machine learning techniques. Design/methodology/approach The empirical forecasting is conducted by introducing a segmented machine learning approach of leveraging hierarchical clustering tied to machine learning and deep learning techniques. These features allow the model to yield more precise estimates. This study evaluates an extensive range of social media–derived words with the most significant probability of gradually establishing an understanding of an optimal outcome. Analyzes were performed on a major hotel chain in an urban market setting within the USA. Findings The findings indicate that while traditional methods, being the naïve approach and ARIMA models, struggled with forecasting accuracy, segmented boosting methods (XGBoost) leveraging social media predict hotel occupancy with greater precision for all examined time horizons. Additionally, the segmented learning approach improved the forecasts’ stability and robustness while mitigating common overfitting issues within a highly dimensional data set. Research limitations/implications Incorporating social media into a segmented learning framework can augment the current generation of forecasting methods’ accuracy. Moreover, the segmented learning approach mitigates the negative effects of market shifts (e.g. COVID-19) that can reduce in-production forecasts’ life-cycles. The ability to be more robust to market deviations will allow hospitality firms to minimize development time. Originality/value The results are expected to generate insights by providing revenue managers with an instrument for predicting demand.


Author(s):  
Kayalvizhi S. ◽  
Thenmozhi D.

Catch phrases are the important phrases that precisely explain the document. They represent the context of the whole document. They can also be used to retrieve relevant prior cases by the judges and lawyers for assuring justice in the domain of law. Currently, catch phrases are extracted using statistical methods, machine learning techniques, and deep learning techniques. The authors propose a sequence to sequence (Seq2Seq) deep neural network to extract catch phrases from legal documents. They have employed several layers, namely embedding layer, encoder-decoder layer, projection layer, and loss layer to build the deep neural network. The methodology is evaluated on IRLeD@FIRE-2017 dataset and the method has obtained 0.787 and 0.607 as mean average precision and recall scores respectively. Results show that the proposed method outperforms the existing systems.


Sign in / Sign up

Export Citation Format

Share Document