scholarly journals The Feature Selection Problem in Computer–Assisted Cytology

2018 ◽  
Vol 28 (4) ◽  
pp. 759-770 ◽  
Author(s):  
Marek Kowal ◽  
Marcin Skobel ◽  
Norbert Nowicki

Abstract Modern cancer diagnostics is based heavily on cytological examinations. Unfortunately, visual inspection of cytological preparations under the microscope is a tedious and time-consuming process. Moreover, intra- and inter-observer variations in cytological diagnosis are substantial. Cytological diagnostics can be facilitated and objectified by using automatic image analysis and machine learning methods. Computerized systems usually preprocess cytological images, segment and detect nuclei, extract and select features, and finally classify the sample. In spite of the fact that a lot of different computerized methods and systems have already been proposed for cytology, they are still not routinely used because there is a need for improvement in their accuracy. This contribution focuses on computerized breast cancer classification. The task at hand is to classify cellular samples coming from fine-needle biopsy as either benign or malignant. For this purpose, we compare 5 methods of nuclei segmentation and detection, 4 methods of feature selection and 4 methods of classification. Nuclei detection and segmentation methods are compared with respect to recall and the F1 score based on the Jaccard index. Feature selection and classification methods are compared with respect to classification accuracy. Nevertheless, the main contribution of our study is to determine which features of nuclei indicate reliably the type of cancer. We also check whether the quality of nuclei segmentation/detection significantly affects the accuracy of cancer classification. It is verified using the test set that the average accuracy of cancer classification is around 76%. Spearman’s correlation and chi-square test allow us to determine significantly better features than the feature forward selection method.

2021 ◽  
Vol 13 (5) ◽  
pp. 2563
Author(s):  
Małgorzata Ćwiek ◽  
Katarzyna Maj-Waśniowska ◽  
Katarzyna Stabryła-Chudzio

This article undertakes the research problem of the assessment of the significance of poverty as a social challenge for local self-government units, and the differences in the assessment of the incidence of this phenomenon depending on the type of municipality. The authors also analyse the relationships between the ageing of the population and the assessment of the extent of poverty by municipalities. It must be pointed out that the undertaken problem has not been a subject of in-depth analysis thus far. Hence, this article fills the identified research gap in this field. The empirical part is based on the results of our own research, conducted using the Computer-Assisted Web Interview (CAWI) method on a sample of 144 municipalities of the Małopolskie Voivodship (Poland). In order to verify whether there is a relationship between the researched qualitative variables, the chi-square test of independence was used. In order to determine the relationships occurring between the categories of variables characterising the scale of the incidence of poverty and the remaining variables, a correspondence analysis was conducted. The research enabled us to find the issue of poverty to be one of the most important social problems from the point of view of municipalities. It is also worth noting that the degree of ageing in the population has an impact on the assessment of poverty among the elderly.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Hongyan Zhang ◽  
Lanzhi Li ◽  
Chao Luo ◽  
Congwei Sun ◽  
Yuan Chen ◽  
...  

In efforts to discover disease mechanisms and improve clinical diagnosis of tumors, it is useful to mine profiles for informative genes with definite biological meanings and to build robust classifiers with high precision. In this study, we developed a new method for tumor-gene selection, the Chi-square test-based integrated rank gene and direct classifier (χ2-IRG-DC). First, we obtained the weighted integrated rank of gene importance from chi-square tests of single and pairwise gene interactions. Then, we sequentially introduced the ranked genes and removed redundant genes by using leave-one-out cross-validation of the chi-square test-based Direct Classifier (χ2-DC) within the training set to obtain informative genes. Finally, we determined the accuracy of independent test data by utilizing the genes obtained above withχ2-DC. Furthermore, we analyzed the robustness ofχ2-IRG-DC by comparing the generalization performance of different models, the efficiency of different feature-selection methods, and the accuracy of different classifiers. An independent test of ten multiclass tumor gene-expression datasets showed thatχ2-IRG-DC could efficiently control overfitting and had higher generalization performance. The informative genes selected byχ2-IRG-DC could dramatically improve the independent test precision of other classifiers; meanwhile, the informative genes selected by other feature selection methods also had good performance inχ2-DC.


2020 ◽  
Vol 12 (15) ◽  
pp. 6035
Author(s):  
Maksymilian Czeczotko ◽  
Hanna Górska-Warsewicz ◽  
Wacław Laskowski

Consumer behavior towards private labels (PLs) is constantly changing, accompanied by the development from generic products, offered at very low prices, towards sustainable PLs. Our study aimed to analyze the behavior of British and Polish consumers towards PLs of the retail chains. To achieve this, special attention was given to the following issues: frequency of purchasing PLs by food categories, motives for purchasing PL products, opinions of the current development of PLs, and length of the period of purchasing products under PLs. We also presented the socioeconomic features of the Polish and British consumers purchasing PL products using a correspondence graph. Our research was conducted using a sample of 500 adults from Poland and 500 adults from the UK and the Computer-Assisted Web Interviewing method (CAWI) was used. The questionnaire was addressed only to adults who declared that they purchase PL products. For a detailed analysis of consumer choices and service quality assessment, we used Pearson’s chi-square test, as well as the Kohonen’s neural network and multi-dimensional cluster analysis. We have divided the sample population into 4 clusters based on 6 factors, characterizing households: education level, income, household residence, age, gender, and period of buying PL products. Our study indicates that Polish consumers are more likely to pay attention to lower prices for PLs, while British consumers point quality compared to manufacturers’ brands. In the opinion of Polish consumers, an improvement in quality is only just beginning. This means that PLs available on the British market are characterized by a higher stage of development towards sustainable PLs.


2019 ◽  
Vol 74 (4) ◽  
pp. 861-871 ◽  
Author(s):  
Diana Dryglas ◽  
Adrian Lubowiecki-Vikuk

Purpose The purpose of this paper is to identify Poland’s image as a medical tourism destination (MTD). Design/methodology/approach Survey data were collected from 282 German and British medical tourists, using a self-administered questionnaire. The Computer-Assisted Web Interviewing method was used to conduct the survey. Subsequently, the responses were analysed using advanced statistical tools (McNemar’s exact test, Cochran’s Q test and Chi-square test). Findings Before visiting Poland, the respondents perceived the country through the prism of medical attributes, whereas after the visit, they perceived it through the prism of non-medical attributes. Research limitations/implications Identification of a set of MTD image characteristics has important implications for scholars, allowing them to understand attributes which shape projected and perceived MTD image. Such construct can also be a useful tool for marketing planners, destination managers and marketers to create an effective marketing policy and projected image of MTDs based on these features. Originality/value The study fills an important gap regarding the lack of conceptual and empirical content allowing for exploration of MTD image.


Energies ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 1597
Author(s):  
Anna Justyna Parzonko ◽  
Agata Balińska ◽  
Anna Sieczko

The research reported here aims to investigate the pro-environmental behavior of respondents in the context of the concept of homo socio-oeconomicus. The main research question addressed in this paper considers the pro-environmental behavior of Generation Z representatives, due to the fact that this age group is believed to display different behavior patterns. In order to identify the differences in the pro-environmental behaviors of Generation Z, the results obtained from this group have been confronted with declarations of respondents from an older group (aged 25 to 65). It is worth noting that in the research on pro-environmental behavior of households in Poland conducted so far, Generation Z has not been taken into consideration as a separate demographic, so this study aims to make a contribution to the existing research gap. The data on the surveyed population were obtained through a standardized research questionnaire. The survey was carried out using the internet surveying technique—computer-assisted web interview (CAWI). This paper uses descriptive, tabular and graphic methods to analyze and present the collected materials. The basic measures of descriptive statistics were used in the analysis of the dataset, i.e., mean, median, mode, Pearson chi-square test and Mann–Whitney U test. The conducted research has shown that the representatives of Generation Z are less engaged in pro-environmental behavior than people from the older age group. Their pro-environmental actions mainly included turning off lights when leaving a room and choosing public transportation as the basic means of transport. For the whole surveyed sample, the most highly rated pro-environmental behaviors included those imposed by legal regulations and those whose implementation brings financial benefits in the form of lower maintenance costs. The main motivating and demotivating factors determining pro-environmental behavior were predominantly economic in nature.


A deep learning system Long Short-term memory (LSTM) is incorporated for the classification of differentially expressed genes which causes certain abnormalities in the human body. The LSTM is employed along with the K-Nearest Neighbour (KNN) algorithm so as to achieve the classification to its precision. The feature selection process plays a vital as some of the existing algorithms tend to neglect the features of concern. The classification further leads to enhanced prediction method. The K-Nearest Neighbour method is used to filter the correlation degree between each value with target value. This hybrid algorithm has a clear leverage over the existing methods. This work is well supported by the Feature Selection which includes a hybrid of Principal Component Analysis and the CHI square test. This hybrid approach provides with a good feature selection which aides in the seamless flow of the process towards classification and prediction. The Eigen values and the Eigen vectors are computed which effectively leads to the identification of Principal components. The Chi Square test is implemented for calculating the scores. The features that are obtained are ranked by these scores and the datasets which has the highest scores are further taken for training. The algorithms employed in this work has a clear advantage over the Bayesian networks as the Bayesian networks are prone to errors within the layers which may cause the values to explode or vanish. The accuracy of the classification and the prediction process achieved is unsurpassed when compared to the existing methods.


2020 ◽  
Vol 15 ◽  
Author(s):  
Chaokun Yan ◽  
Jingjing Ma ◽  
Bin Wu ◽  
Jianlin Wang ◽  
Ge Zhang ◽  
...  

Aims: Microarray data is widely used in disease analysis and diagnosis. However, these data could contain thousands of genes and a small number of samples and some existing models cannot capture the patterns on these datasets accurately without utilizing feature selection method. Background: Feature selection is an important stage in data preprocessing. Given the limitations of employing filter or wrapper approaches individually for feature selection, it is promising to combine filter and wrapper into a hybrid algorithm by utilizing their respective advantages to search optimal feature subsets. Objective: The primary objective of this study is to design a good feature selection strategy for high-dimensional biomedical datasets. Method: A novel hybrid filter-wrapper approach is proposed for high dimensional datasets. First, the Chi-square Test is utilized to filter out most of the irrelevant or redundant features. Next, an improved binary Fruit Fly Optimization algorithm is used to further search optimal feature subset without degrading the classification accuracy. The KNN classifier with the 10-fold-CV is utilized to evaluate the accuracy of classification. Result: Experimental results show that CS-IFOA can use a smaller number of features while achieving higher classification accuracy. Furthermore, the standard deviation of the calculation results is relatively small, indicating that the repeated 10-fold-CV is reliable and the proposed algorithm is relatively robust. Conclusion: Proposed strategy can be used as an ideal pre-processing tool to help optimize the feature selection process of high-dimensional biomedical data sets, which further indicate integrating filter method into wrapper model can enhance the performance of feature subset selection. Other: For future work, proposed strategy will be applied to many other biological datasets, and other classifiers can also be combined with this strategy to verify and extend this approach. The findings of our study could open a basis for further research for hybrid feature selects approaches.


Author(s):  
Hadeel N. Alshaer ◽  
Mohammed A. Otair ◽  
Laith Abualigah

<span>Feature selection problem is one of the main important problems in the text and data mining domain. </span><span>This paper presents a comparative study of feature selection methods for Arabic text classification. Five of the feature selection methods were selected: ICHI square, CHI square, Information Gain, Mutual Information and Wrapper. It was tested with five classification algorithms: Bayes Net, Naive Bayes, Random Forest, Decision Tree and Artificial Neural Networks. In addition, Data Collection was used in Arabic consisting of 9055 documents, which were compared by four criteria: Precision, Recall, F-measure and Time to build model. The results showed that the improved ICHI feature selection got almost all the best results in comparison with other methods.</span>


Sign in / Sign up

Export Citation Format

Share Document