Potential of Benford's Law and Machine Learning Based Verification in Agricultural Logistics

Assigning sentiment labels to documents is, at first sight, a standard multi-label classification task. Many approaches have been used for this task, but the current state-of-the-art solutions use deep neural networks (DNNs). As such, it seems likely that standard machine learning algorithms, such as these, will provide an effective approach. We describe an alternative approach, involving the use of probabilities to construct a weighted lexicon of sentiment terms, then modifying the lexicon and calculating optimal thresholds for each class. We show that this approach outperforms the use of DNNs and other standard algorithms. We believe that DNNs are not a universal panacea and that paying attention to the nature of the data that you are trying to learn from can be more important than trying out ever more powerful general purpose machine learning algorithms.

Download Full-text

Metric Learning Tutorial

10.20944/preprints201809.0131.v1 ◽

2018 ◽

Author(s):

Parag Jain

Keyword(s):

Machine Learning ◽

Euclidean Distance ◽

Learning Algorithm ◽

Metric Learning ◽

General Purpose ◽

Small Distance ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Manhattan Distance ◽

Nearest Neighbour

Most popular machine learning algorithms like k-nearest neighbour, k-means, SVM uses a metric to identify the distance(or similarity) between data instances. It is clear that performances of these algorithm heavily depends on the metric being used. In absence of prior knowledge about data we can only use general purpose metrics like Euclidean distance, Cosine similarity or Manhattan distance etc, but these metric often fail to capture the correct behaviour of data which directly affects the performance of the learning algorithm. Solution to this problem is to tune the metric according to the data and the problem, manually deriving the metric for high dimensional data which is often difficult to even visualize is not only tedious but is extremely difficult. Which leads to put effort on \textit{metric learning} which satisfies the data geometry.Goal of metric learning algorithm is to learn a metric which assigns small distance to similar points and relatively large distance to dissimilar points.

Download Full-text

Benford's law in medicinal chemistry: Implications for drug design

Future Medicinal Chemistry ◽

10.4155/fmc-2019-0006 ◽

2019 ◽

Vol 11 (17) ◽

pp. 2247-2253 ◽

Cited By ~ 2

Author(s):

Alfonso T García-Sosa

Keyword(s):

Machine Learning ◽

Medicinal Chemistry ◽

Pattern Mining ◽

Benford’S Law ◽

Log P ◽

Data Generation ◽

Natural Phenomena ◽

Quality Of Data ◽

Benford's Law ◽

Predicted Values

Aim: The explosion of data based technology has accelerated pattern mining. However, it is clear that quality and bias of data impacts all machine learning and modeling. Results & methodology: A technique is presented for using the distribution of first significant digits of medicinal chemistry features: log P, log S, and p Ka. experimental and predicted, to assess their following of Benford's law as seen in many natural phenomena. Conclusion: Quality of data depends on the dataset sizes, diversity, and magnitudes. Profiling based on drugs may be too small or narrow; using larger sets of experimentally determined or predicted values recovers the distribution seen in other natural phenomena. This technique may be used to improve profiling, machine learning, large dataset assessment and other data based methods for better (automated) data generation and designing compounds.

Download Full-text

MARTT: Automatic Markup of Taxonomic Descriptions with XML

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais277 ◽

2013 ◽

Author(s):

Hong Cui

Keyword(s):

Machine Learning ◽

Information Content ◽

Large Scale ◽

Learning Algorithms ◽

General Purpose ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods ◽

Taxonomic Descriptions ◽

Efficient Machine

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges…

Download Full-text

Identifying Narrative Contexts in Brazilian Popular Music Lyrics Using Sparse Topic Models: A Comparison Between Human-Based and Machine-Based Classification

10.5753/sbcm.2019.10417 ◽

2019 ◽

Author(s):

André Dalmora ◽

Tiago Tavares

Keyword(s):

Machine Learning ◽

Popular Music ◽

Life Stories ◽

Great Part ◽

Topic Models ◽

General Purpose ◽

Machine Learning Algorithms ◽

Part Of Speech ◽

Popular Songs ◽

Music Lyrics

Music lyrics can convey a great part of the meaning in popular songs. Such meaning is important for humans to understand songs as related to typical narratives, such as romantic interests or life stories. This understanding is part of affective aspects that can be used to choose songs to play in particular situations. This paper analyzes the effectiveness of using text mining tools to classify lyrics according to their narrative contexts. For such, we used a vote-based dataset and several machine learning algorithms. Also, we compared the classification results to that of a typical human. Last, we compare the problems of identifying narrative contexts and of identifying lyric valence. Our results indicate that narrative contexts can be identified more consistently than valence. Also, we show that human-based classification typically do not reach a high accuracy, which suggests an upper bound for automatic classification. narrative contexts. For such, we built a dataset containing Brazilian popular music lyrics which were raters voted online according to its context and valence. We approached the problem using a machine learning pipeline in which lyrics are projected into a vector space and then classified using general-purpose algorithms. We experimented with document representations based on sparse topic models [11, 12, 13, 14], which aims to find groups of words that typically appear together in the dataset. Also, we extracted part-of-speech tags for each lyric and used their histogram as features in the classification process.

Download Full-text

Forecasting Supply Chain Demand Using Machine Learning Algorithms

Machine Learning ◽

10.4018/978-1-60960-818-7.ch609 ◽

2012 ◽

pp. 1652-1686

Author(s):

Réal Carbonneau ◽

Rustam Vahidov ◽

Kevin Laframboise

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Supply Chains ◽

Forecast Accuracy ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Sets ◽

Demand Prediction ◽

Statistics Canada ◽

Toner Cartridge

Managing supply chains in today’s complex, dynamic, and uncertain environment is one of the key challenges affecting the success of the businesses. One of the crucial determinants of effective supply chain management is the ability to recognize customer demand patterns and react accordingly to the changes in face of intense competition. Thus the ability to adequately predict demand by the participants in a supply chain is vital to the survival of businesses. Demand prediction is aggravated by the fact that communication patterns between participants that emerge in a supply chain tend to distort the original consumer’s demand and create high levels of noise. Distortion and noise negatively impact forecast quality of the participants. This work investigates the applicability of machine learning (ML) techniques and compares their performances with the more traditional methods in order to improve demand forecast accuracy in supply chains. To this end we used two data sets from particular companies (chocolate manufacturer and toner cartridge manufacturer), as well as data from the Statistics Canada manufacturing survey. A representative set of traditional and ML-based forecasting techniques have been applied to the demand data and the accuracy of the methods was compared. As a group, Machine Learning techniques outperformed traditional techniques in terms of overall average, but not in terms of overall ranking. We also found that a support vector machine (SVM) trained on multiple demand series produced the most accurate forecasts.

Download Full-text

Flow-Data Gathering Using NetFlow Sensors for Fitting Malicious-Traffic Detection Models

Sensors ◽

10.3390/s20247294 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7294

Author(s):

Adrián Campazas-Vega ◽

Ignacio Samuel Crespo-Martínez ◽

Ángel Manuel Guerrero-Higueras ◽

Camino Fernández-Llamas

Keyword(s):

Machine Learning ◽

Data Gathering ◽

General Purpose ◽

Machine Learning Algorithms ◽

Classification Models ◽

Flow Data ◽

Network Attacks ◽

Detection Rates ◽

Advanced Persistent Threats ◽

Traffic Detection

Advanced persistent threats (APTs) are a growing concern in cybersecurity. Many companies and governments have reported incidents related to these threats. Throughout the life cycle of an APT, one of the most commonly used techniques for gaining access is network attacks. Tools based on machine learning are effective in detecting these attacks. However, researchers usually have problems with finding suitable datasets for fitting their models. The problem is even harder when flow data are required. In this paper, we describe a framework to gather flow datasets using a NetFlow sensor. We also present the Docker-based framework for gathering netflow data (DOROTHEA), a Docker-based solution implementing the above framework. This tool aims to easily generate taggable network traffic to build suitable datasets for fitting classification models. In order to demonstrate that datasets gathered with DOROTHEA can be used for fitting classification models for malicious-traffic detection, several models were built using the model evaluator (MoEv), a general-purpose tool for training machine-learning algorithms. After carrying out the experiments, four models obtained detection rates higher than 93%, thus demonstrating the validity of the datasets gathered with the tool.

Download Full-text

Forecasting Supply Chain Demand Using Machine Learning Algorithms

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications - Advances in Intelligent Information Technologies ◽

10.4018/978-1-60566-144-5.ch018 ◽

2011 ◽

pp. 328-365 ◽

Cited By ~ 1

Author(s):

Réal Carbonneau ◽

Rustam Vahidov ◽

Kevin Laframboise

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Supply Chains ◽

Forecast Accuracy ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Sets ◽

Demand Prediction ◽

Statistics Canada ◽

Toner Cartridge

Managing supply chains in today’s complex, dynamic, and uncertain environment is one of the key challenges affecting the success of the businesses. One of the crucial determinants of effective supply chain management is the ability to recognize customer demand patterns and react accordingly to the changes in face of intense competition. Thus the ability to adequately predict demand by the participants in a supply chain is vital to the survival of businesses. Demand prediction is aggravated by the fact that communication patterns between participants that emerge in a supply chain tend to distort the original consumer’s demand and create high levels of noise. Distortion and noise negatively impact forecast quality of the participants. This work investigates the applicability of machine learning (ML) techniques and compares their performances with the more traditional methods in order to improve demand forecast accuracy in supply chains. To this end we used two data sets from particular companies (chocolate manufacturer and toner cartridge manufacturer), as well as data from the Statistics Canada manufacturing survey. A representative set of traditional and ML-based forecasting techniques have been applied to the demand data and the accuracy of the methods was compared. As a group, Machine Learning techniques outperformed traditional techniques in terms of overall average, but not in terms of overall ranking. We also found that a support vector machine (SVM) trained on multiple demand series produced the most accurate forecasts.

Download Full-text

Combining Benford’s Law and machine learning to detect money laundering. An actual Spanish court case

Forensic Science International ◽

10.1016/j.forsciint.2017.11.008 ◽

2018 ◽

Vol 282 ◽

pp. 24-34 ◽

Cited By ~ 10

Author(s):

Elena Badal-Valero ◽

José A. Alvarez-Jareño ◽

Jose M. Pavía

Keyword(s):

Machine Learning ◽

Money Laundering ◽

Benford’S Law ◽

Court Case ◽

Benford's Law

Download Full-text

Analyzing Barriers of Circular Food Supply Chains and Proposing Industry 4.0 Solutions

Sustainability ◽

10.3390/su13126812 ◽

2021 ◽

Vol 13 (12) ◽

pp. 6812

Author(s):

Nesrin Ada ◽

Yigit Kazancoglu ◽

Muruvvet Deniz Sezer ◽

Cigdem Ede-Senturk ◽

Idil Ozer ◽

...

Keyword(s):

Machine Learning ◽

Supply Chains ◽

Food Supply ◽

Industry 4.0 ◽

Big Data Analytics ◽

Crucial Issue ◽

Food Supply Chains ◽

Production And Consumption ◽

Cloud Technologies ◽

The Internet Of Things

The concept of the circular economy (CE) has gained importance worldwide recently since it offers a wider perspective in terms of promoting sustainable production and consumption with limited resources. However, few studies have investigated the barriers to CE in circular food supply chains. Accordingly, this paper presents a systematic literature review of 136 papers from 2010 to 2020 from WOS and Scopus databases regarding these barriers to understand CE implementation in food supply chains. The barriers are classified under seven categories: “cultural”, “business and business finance”, “regulatory and governmental”, “technological”, “managerial”, “supply-chain management”, “knowledge and skills”. The findings show the need to identify barriers preventing the transition to CE. The findings also indicate that these challenges to CE can be overcome through Industry 4.0, which includes a variety of technologies, such as the Internet of Things (IoT), cloud technologies, machine learning, and blockchain. Specifically, machine learning can offer support by making workflows more efficient through the forecasting and analytical capabilities of food supply chains. Blockchain and big data analytics can provide the necessary support to establish legal systems and improve environmental regulations since transparency is a crucial issue for taxation and incentives systems. Thus, CE can be promoted via adequate laws, policies, and innovative technologies.

Download Full-text