Care and Scale: Decorrelative Ethics in Algorithmic Recommendation

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.

Download Full-text

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010.v1 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT). We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach. This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.

Download Full-text

Machine Learning Approaches to Hybrid Music Recommender Systems

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-10997-4_42 ◽

2019 ◽

pp. 639-642

Author(s):

Andreu Vall ◽

Gerhard Widmer

Keyword(s):

Machine Learning ◽

Recommender Systems ◽

Learning Approaches ◽

Music Recommender Systems

Download Full-text

NILAI-NILAI KEPEMIMPINAN ISLAM DALAM BERKELUARGA SEBAGAI TOLAK UKUR MENDASAR (BASE ON) INTEGRITAS CALON LEGISLATIF

LITIGASI ◽

10.23969/litigasi.v15i1.75 ◽

2016 ◽

Vol 15 (1) ◽

Author(s):

Tuti Rastuti

Keyword(s):

Large Scale ◽

Community Integration ◽

Small Scale ◽

The People ◽

The Family ◽

High Integrity ◽

Islamic Leadership ◽

Family Leadership ◽

Islamic Values

Prepare leaders with integrity processed fostering of the family. Various methods are used to determine the state 's future leaders, both through the regeneration of the cadre, the appointment or election. Election is a means to process and determine who will lead and hold the mandate of the people they lead. Election is a process, but the nature of the leader determines to be seen from the quality of his personality. Communities are often faced with the dilemma of how good a leader ? Values of the Islamic leadership in the family will answer what and how a leader with high integrity. Therefore, the family is the smallest unit of a miniature community. Integration leader is a leader who has a commitment to his family. The leader of the ummah in a large scale is determined by its leadership in the small-scale leader applying Islamic values in the family.Keywords : Islamic Values; Family; Leadership; LegislativeABSTRAKMempersiapkan pemimpin yang berintegritas diproses pembinaannya dari keluarga. Berbagai cara dilakukan oleh negara untuk menentukan calon pemimpin, baik melalui regenerasi pengkaderan, penunjukan ataupun pemilihan. Pemilu merupakan sarana untuk mengolah dan menentukan siapa yang akan memimpin dan memegang amanah rakyat yang dipimpinnya. Pemilu merupakan proses, namun hakikat menentukan pemimpin harus dilihat dari kualitas kepribadiannya. Masyarakat sering dihadapkan pada dilema sosok pemimpin bagaimanakah yang baik? Nilai-Nilai kepemimpinan Islam dalam berkeluarga akan menjawab apa dan bagaimana pemimpin yang memiliki integritas tinggi. Sebab, keluarga merupakan miniatur unit terkecil dari suatu komunitas. Pemimpin yang berintegrasi adalah pemimpin yang memiliki komitmen terhadap keluarganya. Pemimpin umat dalam skala yang besar ditentukan oleh kepemimpinannya dalam skala kecil yaitu pemimpin yang menerapkan nilai-nilai Islam dalam berkeluarga. Kata Kunci: Nilai Islam; Keluarga; Kepemimpinan; Legislatif

Download Full-text

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT). We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach. This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.

Download Full-text

Wessex in Context

Social Relations in Later Prehistory ◽

10.1093/oso/9780199577712.003.0010 ◽

2010 ◽

Author(s):

Niall Sharples

Keyword(s):

Large Scale ◽

Daily Basis ◽

Small Scale ◽

Internal Dynamic ◽

Archaeological Record ◽

Human Relationships ◽

Architectural Form ◽

Individual Agency ◽

The People ◽

The Individual

In this book I have attempted to create a new agenda for the study of Britain in the last millennium BC. The book consciously sets out, in its structure and content, to direct attention away from the nature of the archaeological record towards the nature of past human societies. This does not mean I am not interested in the archaeological record, and readers will have noted there is a considerable amount of detail in the text, perhaps too much for some people; but the data has to be examined in relation to the people who lived in a particular place at a particular time: ‘the archaeologist is digging up, not things, but people’ (Wheeler 1954b: v). The objective has been to outline the overall constraints of place and time (Chapter 2) and to see how these created a distinctive archaeological record that differed not only from other areas of Britain, but which varied significantly within the region. I examine how people created communities (Chapter 3) and explore how the mechanisms used to organize human relationships, within that society, changed through time. These changes were partly brought about through events outside their control, but always in a way that was affected by their own particular circumstances. I consider how the most ubiquitous architectural form in later prehistory, the house, was used to structure social relationships on a daily basis in relation to the family, and how this provided a template for thinking about the world (Chapter 4). The analysis concludes with an examination of how these societies considered individual freedom and connectedness, and how the complex variability of individual agency provides an internal dynamic to social change that was influenced by external events, but not led by them (Chapter 5). When I originally conceived of this book the structure was reversed: I started with the individual and worked up to the organization of the larger landscapes. At first sight this may sound a more sensible way of presenting the evidence, moving from small-scale structures to large-scale processes, but during the writing of the book I found this did not seem to work.

Download Full-text

Reckoning with the barriers to Lean implementation in Northern Indian SMEs using the AHP-TOPSIS approach

Journal of Science and Technology Policy Management ◽

10.1108/jstpm-02-2020-0032 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sachin Saini ◽

Doordarshi Singh

Keyword(s):

Lean Manufacturing ◽

Large Scale ◽

Service Sector ◽

Extensive Study ◽

Small Scale ◽

Successful Implementation ◽

Topsis Method ◽

Lean Implementation ◽

Content Type ◽

The People

Purpose The purpose of this study is to recognize critical barriers for Lean manufacturing practices implementation in small and medium enterprises (SMEs) focusing in the context of a developing economy. The advancement of SMEs is of utmost important for a surge in exports while competing with other countries and these barriers have to be given due importance as they play a major role in stalling the overall development of SMEs. Design/methodology/approach In this present investigation, 26 barriers to Lean implementation in SMEs have been identified after an extensive study of the literature available on the subject. After that, the influential barriers were investigated through the Analytical hierarchy process-Technique of order preference by similarity to ideal solution (AHP-TOPSIS) method using priority weightage given to them by different experts in their industries. The ranking given to the barriers is based on the AHP-TOPSIS method and has been validated by the sensitivity analysis. Findings The investigation reveals that for the successful implementation of Lean manufacturing practices, the will of the management, individual will power and contribution of the people matter a lot apart from other barriers such as flexibility, expertise of the people, resources and resistance offered by the people to new programs. The solutions for overcoming these barriers are also provided in this study and a model has been suggested for the same. Research limitations/implications This work was devoted to the evaluation of obstacles in the introduction of Lean practices and prioritizing them. But it was limited to the medium- and small-scale organizations located in Northern India. Further studies can expand the scope to the large-scale units in the field. Moreover, the scope of this study was confined to the manufacturing sector. Future studies can extend it to the non-manufacturing environments such as the service sector, health care, etc. This investigation was based on the judgments of industry experts and academicians. Another approach such as Viekriterijumsko kompromisno rangiranje can be used for future investigations. Originality/value This study is significant when keeping in mind the contribution of SMEs to a country’s economy, especially in the Indian context.

Download Full-text

A Simple and Efficient Pipeline for Construction, Merging, Expansion, and Simulation of Large-Scale, Single-Cell Mechanistic Models

10.1101/2020.11.09.373407 ◽

2020 ◽

Author(s):

Cemal Erdem ◽

Ethan M. Bensman ◽

Arnab Mutsuddy ◽

Michael M. Saint-Antoine ◽

Mehdi Bouhaddou ◽

...

Keyword(s):

Machine Learning ◽

Data Integration ◽

Single Cell ◽

Large Scale ◽

Mechanistic Modeling ◽

Small Scale ◽

Test Case ◽

Mechanistic Models ◽

Biomedical Data ◽

Egf Receptors

ABSTRACTThe current era of big biomedical data accumulation and availability brings data integration opportunities for leveraging its totality to make new discoveries and/or clinically predictive models. Black-box statistical and machine learning methods are powerful for such integration, but often cannot provide mechanistic reasoning, particularly on the single-cell level. While single-cell mechanistic models clearly enable such reasoning, they are predominantly “small-scale”, and struggle with the scalability and reusability required for meaningful data integration. Here, we present an open-source pipeline for scalable, single-cell mechanistic modeling from simple, annotated input files that can serve as a foundation for mechanistic data integration. As a test case, we convert one of the largest existing single-cell mechanistic models to this format, demonstrating robustness and reproducibility of the approach. We show that the model cell line context can be changed with simple replacement of input file parameter values. We next use this new model to test alternative mechanistic hypotheses for the experimental observations that interferon-gamma (IFNG) inhibits epidermal growth factor (EGF)-induced cell proliferation. Model- based analysis suggested, and experiments support that these observations are better explained by IFNG-induced SOCS1 expression sequestering activated EGF receptors, thereby downregulating AKT activity, as opposed to direct IFNG-induced upregulation of p21 expression. Overall, this new pipeline enables large-scale, single-cell, and mechanistically-transparent modeling as a data integration modality complementary to machine learning.

Download Full-text

Exploring essential variables in the settlement selection for small-scale maps using machine learning

Abstracts of the ICA ◽

10.5194/ica-abs-1-162-2019 ◽

2019 ◽

Vol 1 ◽

pp. 1-2 ◽

Cited By ~ 1

Author(s):

Izabela Karsznia ◽

Karolina Sielicka

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Spatial Data ◽

Large Scale ◽

Selection Process ◽

Small Scale ◽

Learning Material ◽

Optimal Sequence ◽

Small Scales ◽

New Variables

Abstract. The decision about removing or maintaining an object while changing detail level requires taking into account many features of the object itself and its surrounding. Automatic generalization is the optimal way to obtain maps at various scales, based on a single spatial database, storing up-to-date information with a high level of spatial accuracy. Researchers agree on the need for fully automating the generalization process (Stoter et al., 2016). Numerous research centres, cartographic agencies as well as commercial companies have undertaken successful attempts of implementing certain generalization solutions (Stoter et al., 2009, 2014, 2016; Regnauld, 2015; Burghardt et al., 2008; Chaundhry and Mackaness, 2008). Nevertheless, an effective and consistent methodology for generalizing small-scale maps has not gained enough attention so far, as most of the conducted research has focused on the acquisition of large-scale maps (Stoter et al., 2016). The presented research aims to fulfil this gap by exploring new variables, which are of the key importance in the automatic settlement selection process at small scales. Addressing this issue is an essential step to propose new algorithms for effective and automatic settlement selection that will contribute to enriching, the sparsely filled small-scale generalization toolbox.The main idea behind this research is using machine learning (ML) for the new variable exploration which can be important in the automatic settlement generalization in small-scales. For automation of the generalization process, cartographic knowledge has to be collected and formalized. So far, a few approaches based on the use of ML have already been proposed. One of the first attempts to determine generalization parameters with the use of ML was performed by Weibel et al. (1995). The learning material was the observation of cartographers manual work. Also, Mustière tried to identify the optimal sequence of the generalization operators for the roads using ML (1998). A different approach was presented by Sester (2000). The goal was to extract the cartographic knowledge from spatial data characteristics, especially from the attributes and geometric properties of objects, regularities and repetitive patterns that govern object selection with the use of decision trees. Lagrange et al. (2000), Balboa and López (2008) also used ML techniques, namely neural networks to generalize line objects. Recently, Sester et al. (2018) proposed the application of deep learning for the task of building generalization. As noticed by Sester et al. (2018), these ideas, although interesting, remained proofs of concepts only. Moreover, they concerned topographic databases and large-scale maps. Promising results of automatic settlement selection in small scales was reported by Karsznia and Weibel (2018). To improve the settlement selection process, they have used data enrichment and ML. Thanks to classification models based on the decision trees, they explored new variables that are decisive in the settlement selection process. However, they have also concluded that there is probably still more “deep knowledge” to be discovered, possibly linked to further variables that were not included in their research. Thus the motivation for this research is to fulfil this research gap and look for additional, essential variables governing settlement selection in small scales.

Download Full-text

PERCEPTION OF SUDANO-SAHELIAN’S PEOPLE ON DESERTIFICATION AND ITS PROCESS

African Journal of Sustainable Agricultural Development ◽

10.46654/2714.141223 ◽

2020 ◽

pp. 69-79

Author(s):

FC Ezeh ◽

PA Ogwo

Keyword(s):

Significant Relationship ◽

Large Scale ◽

Small Scale ◽

Regression Analyses ◽

Way Of Life ◽

Degraded Soil ◽

The People ◽

Causative Factors ◽

Small Scale Farmers ◽

Structured Questionnaire

Desertification is a major challenge for the people of the Sudano-Sahelian. Some of its effects on Sudano-Sahelians include hunger arising from a degraded soil, absence of portable water and general poverty. Incidentally, the action of humans has been implicated as major causative factors of desertification. Therefore, this study aimed to investigate if the bearers of the burden of desertification are the major causes of the problem. Zamfara state was randomly picked from among the eleven frontline states that fall within the Sudano-Sahelia region. Applying the Taro Yamen’s formula, 500 farmers comprising 50 large scale farmers and 450 small scale farmers were selected and interviewed using a structured questionnaire. Data were computed using SPSS version 20, while correlation and regression analyses were applied to test the hypothesis as regards significant relationship between desertification and the perception of the people. The results indicated no significant relationship (p>0.054) between desertification and the perception of the people. In conclusion, though the people are aware that their activities impacted desertification but are handicapped in terms of possessing the resources to fight it. Hence, they continued with their way of life and thereby aggravating an already bad situation.

Download Full-text