Technology of the Surround

In addressing the issue of harmful bias in AI systems, this paper asks for a consideration of a generatively wild AI that exceeds the framework of predictive machine learning. The argument places supervised learning with its labeled training data as primarily a form of reproduction of a status quo. Based on this framework, the paper moves through an analysis of two AI modalities—supervised learning (e.g., machine vision) and unsupervised learning (e.g., game play)—to demonstrate the potential of AI as mechanism that creates patterns of association outside of a purely reproductive condition. This analysis is followed by an introduction to the concept of the technology of the surround, where the paper then turns toward theoretical positions that unbind categorical logics, moving toward other possible positionalities—the surround (Harney and Moten), alien intelligence (Parisi), and intra-actions of subject/object resolution (Barad). The paper frames two key concepts in relation to an AI in the wild: the colonial sublime and black techné. The paper concludes with a summation of what AI in the wild can contribute to the subversion of technologies of oppression toward a liberatory potential of AI.

Download Full-text

Mengenal Machine Learning Dengan Teknik Supervised Dan Unsupervised Learning Menggunakan Python

BINA INSANI ICT JOURNAL ◽

10.51211/biict.v7i2.1422 ◽

2020 ◽

Vol 7 (2) ◽

pp. 156

Author(s):

Endang Retnoningsih ◽

Rully Pramudita

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Clustering Algorithm ◽

Training Data ◽

Abstract Machine ◽

Dbscan Clustering ◽

Learning Techniques ◽

Learning Technique ◽

Learning Programming

Abstrak: Machine learning merupakan sistem yang mampu belajar sendiri untuk memutuskan sesuatu tanpa harus berulangkali diprogram oleh manusia sehingga komputer menjadi semakin cerdas berlajar dari pengalaman data yang dimiliki. Berdasarkan teknik pembelajarannya, dapat dibedakan supervised learning menggunakan dataset (data training) yang sudah berlabel, sedangkan unsupervised learning menarik kesimpulan berdasarkan dataset. Input berupa dataset digunakan pembelajaran mesin untuk menghasilkan analisis yang benar. Permasalahan yang akan diselesaikan bunga iris (iris tectorum) yang memiliki bunga bermaca-macam warna dan memiliki sepal dan petal yang menunjukkan spesies bunga, dibutuhkan metode yang tepat untuk pengelompokan bunga-bunga tersebut kedalam spesiesnya iris-setosa, iris-versicolor atau iris-virginica. Penyelesaian digunakan Python yang menyediakan algoritma dan library yang digunakan membuat machine learning. Penyelesaian dengan teknik supervised learning dipilih algoritma KNN Clasiffier dan teknik unsupervised learning dipilih algoritma DBSCAN Clustering. Hasil yang diperoleh Python menyediakan library yang lengkap numPy, Pandas, matplotlib, sklearn untuk membuat pemrograman machine learning dengan algortima KNN memanggil from sklearn import neighbors termasuk teknik supervised, maupun DBSCAN memanggil from sklearn.cluster import DBSCAN termasuk teknik unsupervised learning. Kemampuan Python memberikan hasil output sesuai input dalam dataset menghasilkan keputusan berupa klasifikasi maupun klusterisasi. Kata kunci: DBSCAN, KNN, machine learning, python. Abstract: Machine learning is a system that is able to learn on its own to decide something without having to be repeatedly programmed by humans so that computers become smarter in learning from the experience of the data they have. Based on the learning technique, supervised learning can be distinguished using a dataset (training data) that is already labeled, while unsupervised learning draws conclusions based on the dataset. The input in the form of a dataset is used by machine learning to produce the correct analysis. The problem to be solved by iris flowers (iris tectorum), which has flowers of various colors and has sepals and petals that indicate the species of flowers, requires an appropriate method for grouping these flowers into iris-setosa, iris-versicolor or iris-virginica species. The solution is used by Python, which provides the algorithms and libraries used to make machine learning. The solution with the supervised learning technique was chosen by the KNN Clasiffier algorithm and the unsupervised learning technique was selected by the DBSCAN Clustering algorithm. The results obtained by Python provide a complete library of numPy, Pandas, matplotlib, sklearn to create machine learning programming with KNN algorithms calling from sklearn import neighbors including supervised techniques, and DBSCAN calling from sklearn.cluster import DBSCAN including unsupervised learning techniques. Python's ability to provide output according to the input in the dataset results in decisions in the form of classification and clustering. Keywords: DBSCAN, KNN, machine learning, python.

Download Full-text

Application of Machine Learning in Animal Disease Analysis and Prediction

Current Bioinformatics ◽

10.2174/1574893615999200728195613 ◽

2020 ◽

Vol 15 ◽

Author(s):

Shuwen Zhang ◽

Qiang Su ◽

Qin Chen

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Clustering Algorithm ◽

Principal Component ◽

Support Vector ◽

Animal Disease ◽

Human Beings ◽

Animal Diseases ◽

Disease Analysis

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.

Download Full-text

A review: preprocessing techniques and data augmentation for sentiment analysis

Computational Social Networks ◽

10.1186/s40649-020-00080-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Huu-Thanh Duong ◽

Tram-Anh Nguyen-Thi

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Supervised Learning ◽

Data Augmentation ◽

Original Data ◽

Training Data ◽

Unseen Data ◽

Augmentation Techniques ◽

User Intervention

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.

Download Full-text

Evaluating disaster-related tweet credibility using content-based and user-based features

Information Discovery and Delivery ◽

10.1108/idd-04-2020-0044 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Nasser Assery ◽

Yuan (Dorothy) Xiaohong ◽

Qu Xiuli ◽

Roy Kaushik ◽

Sultan Almalki

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Emergency Response ◽

Learning Model ◽

Performance Comparison ◽

Supervised Machine Learning ◽

Learning Methods ◽

Content Type ◽

Machine Learning Classification

Purpose This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used supervised machine learning models. Design/methodology/approach First historical tweets on two recent hurricane events are collected via Twitter API. Then a credibility scoring system is implemented in which the tweet features are analyzed to give a credibility score and credibility label to the tweet. After that, supervised machine learning classification is implemented using various classification algorithms and their performances are compared. Findings The proposed unsupervised learning model could enhance the emergency response by providing a fast way to determine the credibility of disaster-related tweets. Additionally, the comparison of the supervised classification models reveals that the Random Forest classifier performs significantly better than the SVM and Logistic Regression classifiers in classifying the credibility of disaster-related tweets. Originality/value In this paper, an unsupervised 10-point scoring model is proposed to evaluate the tweets’ credibility based on the user-based and content-based features. This technique could be used to evaluate the credibility of disaster-related tweets on future hurricanes and would have the potential to enhance emergency response during critical events. The comparative study of different supervised learning methods has revealed effective supervised learning methods for evaluating the credibility of Tweeter data.

Download Full-text

Machine Learning for Non-Intrusive Speech Quality Assessment

10.26686/wgtn.16985584 ◽

2021 ◽

Author(s):

◽

Mouna Hakami

Keyword(s):

Machine Learning ◽

Quality Assessment ◽

Unsupervised Learning ◽

Supervised Learning ◽

Latent Variable ◽

Generative Models ◽

Speech Quality ◽

Speech Signals ◽

Latent Space ◽

Speech Quality Assessment

This thesis presents two studies on non-intrusive speech quality assessment methods. The first applies supervised learning methods to speech quality assessment, which is a common approach in machine learning based quality assessment. To outperform existing methods, we concentrate on enhancing the feature set. In the second study, we analyse quality assessment from a different point of view inspired by the biological brain and present the first unsupervised learning based non-intrusive quality assessment that removes the need for labelled training data. Supervised learning based, non-intrusive quality predictors generally involve the development of a regressor that maps signal features to a representation of perceived quality. The performance of the predictor largely depends on 1) how sensitive the features are to the different types of distortion, and 2) how well the model learns the relation between the features and the quality score. We improve the performance of the quality estimation by enhancing the feature set and using a contemporary machine learning model that fits this objective. We propose an augmented feature set that includes raw features that are presumably redundant. The speech quality assessment system benefits from this redundancy as it results in reducing the impact of unwanted noise in the input. Feature set augmentation generally leads to the inclusion of features that have non-smooth distributions. We introduce a new pre-processing method and re-distribute the features to facilitate the training. The evaluation of the system on the ITU-T Supplement23 database illustrates that the proposed system outperforms the popular standards and contemporary methods in the literature. The unsupervised learning quality assessment approach presented in this thesis is based on a model that is learnt from clean speech signals. Consequently, it does not need to learn the statistics of any corruption that exists in the degraded speech signals and is trained only with unlabelled clean speech samples. The quality has a new definition, which is based on the divergence between 1) the distribution of the spectrograms of test signals, and 2) the pre-existing model that represents the distribution of the spectrograms of good quality speech. The distribution of the spectrogram of the speech is complex, and hence comparing them is not trivial. To tackle this problem, we propose to map the spectrograms of speech signals to a simple latent space. Generative models that map simple latent distributions into complex distributions are excellent platforms for our work. Generative models that are trained on the spectrograms of clean speech signals learned to map the latent variable $Z$ from a simple distribution $P_Z$ into a spectrogram $X$ from the distribution of good quality speech. Consequently, an inference model is developed by inverting the pre-trained generator, which maps spectrograms of the signal under the test, $X_t$, into its relevant latent variable, $Z_t$, in the latent space. We postulate the divergence between the distribution of the latent variable and the prior distribution $P_Z$ is a good measure of the quality of speech. Generative adversarial nets (GAN) are an effective training method and work well in this application. The proposed system is a novel application for a GAN. The experimental results with the TIMIT and NOIZEUS databases show that the proposed measure correlates positively with the objective quality scores.

Download Full-text

Fault-Guided Seismic Stratigraphy Interpretation via Semi-Supervised Learning

10.2118/207218-ms ◽

2021 ◽

Author(s):

Haibin Di ◽

Chakib Kada Kloucha ◽

Cen Li ◽

Aria Abubakar ◽

Zhun Li ◽

...

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Model Building ◽

Structural Information ◽

Mapping Function ◽

Seismic Stratigraphy ◽

Training Data ◽

Entire Study ◽

Depositional Process ◽

Convolutional Autoencoder

Abstract Delineating seismic stratigraphic features and depositional facies is of importance to successful reservoir mapping and identification in the subsurface. Robust seismic stratigraphy interpretation is confronted with two major challenges. The first one is to maximally automate the process particularly with the increasing size of seismic data and complexity of target stratigraphies, while the second challenge is to efficiently incorporate available structures into stratigraphy model building. Machine learning, particularly convolutional neural network (CNN), has been introduced into assisting seismic stratigraphy interpretation through supervised learning. However, the small amount of available expert labels greatly restricts the performance of such supervised CNN. Moreover, most of the exiting CNN implementations are based on only amplitude, which fails to use necessary structural information such as faults for constraining the machine learning. To resolve both challenges, this paper presents a semi-supervised learning workflow for fault-guided seismic stratigraphy interpretation, which consists of two components. The first component is seismic feature engineering (SFE), which aims at learning the provided seismic and fault data through a unsupervised convolutional autoencoder (CAE), while the second one is stratigraphy model building (SMB), which aims at building an optimal mapping function between the features extracted from the SFE CAE and the target stratigraphic labels provided by an experienced interpreter through a supervised CNN. Both components are connected by embedding the encoder of the SFE CAE into the SMB CNN, which forces the SMB learning based on these features commonly existing in the entire study area instead of those only at the limited training data; correspondingly, the risk of overfitting is greatly eliminated. More innovatively, the fault constraint is introduced by customizing the SMB CNN of two output branches, with one to match the target stratigraphies and the other to reconstruct the input fault, so that the fault continues contributing to the process of SMB learning. The performance of such fault-guided seismic stratigraphy interpretation is validated by an application to a real seismic dataset, and the machine prediction not only matches the manual interpretation accurately but also clearly illustrates the depositional process in the study area.

Download Full-text

TOPICAL ISSUES OF APPLICATION OF MACHINE LEARNING METHODS IN ECONOMY

Инновационные аспекты развития науки и техники. Сборник статей VIII Международной научно-практической конференции: сборник статей, [электронное издание сетевого распространения] / Под ред. Н.В. Емельянова. – М.: “КДУ”, “Добросвет”, 2021. – 149 с. ◽

10.31453/kdu.ru.978-5-7913-1176-4-2021-28-33 ◽

2021 ◽

Author(s):

Natalia Pavlovna Persteneva ◽

◽

Darya Dmitrievn Skryleva ◽

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Learning Model ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Model ◽

Popular Classes ◽

Machine Learning Models

The article discusses machine learning methods. Using the example of two popular classes: supervised learning and unsupervised learning. Variants of the main types of machine learning models for each method are presented. A generalized algorithm for building any machine learning model is formed.

Download Full-text

Semantics-Based Document Categorization Employing Semi-Supervised Learning

Advances in Linguistics and Communication Studies - Modern Computational Models of Semantic Discovery in Natural Language ◽

10.4018/978-1-4666-8690-8.ch005 ◽

2015 ◽

pp. 112-140 ◽

Cited By ~ 1

Author(s):

Jan Žižka ◽

František Dařena

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Real World ◽

Supervised Machine Learning ◽

The Internet ◽

Learning Method ◽

Label Information ◽

Document Categorization

The automated categorization of unstructured textual documents according to their semantic contents plays important role particularly linked with the ever growing volume of such data originating from the Internet. Having a sufficient number of labeled examples, a suitable supervised machine learning-based classifier can be trained. When no labeling is available, an unsupervised learning method can be applied, however, the missing label information often leads to worse classification results. This chapter demonstrates a method based on semi-supervised learning when a smallish set of manually labeled examples improves the categorization process in comparison with clustering, and the results are comparable with the supervised learning output. For the illustration, a real-world dataset coming from the Internet is used as the input of the supervised, unsupervised, and semi-supervised learning. The results are shown for different number of the starting labeled samples used as “seeds” to automatically label the remaining volume of unlabeled items.

Download Full-text

Classification Methods

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch032 ◽

2011 ◽

pp. 196-201 ◽

Cited By ~ 2

Author(s):

Aijun An

Keyword(s):

Unsupervised Learning ◽

Supervised Learning ◽

Credit Card ◽

A Priori ◽

Training Data ◽

Classification Model ◽

Highly Active ◽

Data Object ◽

Patient Database ◽

Data Objects

Generally speaking, classification is the action of assigning an object to a category according to the characteristics of the object. In data mining, classification refers to the task of analyzing a set of pre-classified data objects to learn a model (or a function) that can be used to classify an unseen data object into one of several predefined classes. A data object, referred to as an example, is described by a set of attributes or variables. One of the attributes describes the class that an example belongs to and is thus called the class attribute or class variable. Other attributes are often called independent or predictor attributes (or variables). The set of examples used to learn the classification model is called the training data set. Tasks related to classification include regression, which builds a model from training data to predict numerical values, and clustering, which groups examples to form categories. Classification belongs to the category of supervised learning, distinguished from unsupervised learning. In supervised learning, the training data consists of pairs of input data (typically vectors), and desired outputs, while in unsupervised learning there is no a priori output. Classification has various applications, such as learning from a patient database to diagnose a disease based on the symptoms of a patient, analyzing credit card transactions to identify fraudulent transactions, automatic recognition of letters or digits based on handwriting samples, and distinguishing highly active compounds from inactive ones based on the structures of compounds for drug discovery.

Download Full-text

Classification Based on Unsupervised Learning

Statistical Techniques for Network Security ◽

10.4018/978-1-59904-708-9.ch010 ◽

2011 ◽

pp. 348-395

Author(s):

Yu Wang

Keyword(s):

Network Security ◽

Unsupervised Learning ◽

Supervised Learning ◽

Network Traffic ◽

High Speed ◽

Ad Hoc ◽

Training Data ◽

Traffic Data ◽

Response Variable ◽

Learning Techniques

The requirement for having a labeled response variable in training data from the supervised learning technique may not be satisfied in some situations: particularly, in dynamic, short-term, and ad-hoc wireless network access environments. Being able to conduct classification without a labeled response variable is an essential challenge to modern network security and intrusion detection. In this chapter we will discuss some unsupervised learning techniques including probability, similarity, and multidimensional models that can be applied in network security. These methods also provide a different angle to analyze network traffic data. For comprehensive knowledge on unsupervised learning techniques please refer to the machine learning references listed in the previous chapter; for their applications in network security see Carmines, Edward & McIver (1981), Lane & Brodley (1997), Herrero, Corchado, Gastaldo, Leoncini, Picasso & Zunino (2007), and Dhanalakshmi & Babu (2008). Unlike in supervised learning, where for each vector 1 2 ( , , , ) n X x x x = ? we have a corresponding observed response, Y, in unsupervised learning we only have X, and Y is not available either because we could not observe it or its frequency is too low to be fit ted with a supervised learning approach. Unsupervised learning has great meanings in practice because in many circumstances, available network traffic data may not include any anomalous events or known anomalous events (e.g., traffics collected from a newly constructed network system). While high-speed mobile wireless and ad-hoc network systems have become popular, the importance and need to develop new unsupervised learning methods that allow the modeling of network traffic data to use anomaly-free training data have significantly increased.

Download Full-text