Clustering of the Multi-Value Documents based on Probabilistic Features Association Mechanism

A significant amount of attention appears to be focusing on how to better collect, analyze, and disseminate information. In doing so, technology is commonly and increasingly looked upon as both a tool, and, in some cases, a substitute, for human resources. One such technology that is playing a prominent role in homeland security initiatives is data mining. Similar to the concept of homeland security, while data mining is widely mentioned in a growing number of bills, laws, reports, and other policy documents, an agreed upon definition or conceptualization of data mining appears to be generally lacking within the policy community (Relyea, 2002). While data mining initiatives are usually purported to provide insightful, carefully constructed analysis, at various times data mining itself is alternatively described as a technology, a process, and/or a productivity tool. In other words, data mining, or factual data analysis, or predictive analytics, as it also is sometimes referred to, means different things to different people. Regardless of which definition one prefers, a common theme is the ability to collect and combine, virtually if not physically, multiple data sources, for the purposes of analyzing the actions of individuals. In other words, there is an implicit belief in the power of information, suggesting a continuing trend in the growth of “dataveillance,” or the monitoring and collection of the data trails left by a person’s activities (Clarke, 1988). More importantly, it is clear that there are high expectations for data mining, or factual data analysis, being an effective tool. Data mining is not a new technology but its use is growing significantly in both the private and public sectors. Industries such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs, enhance research, and increase sales. In the public sector, data mining applications initially were used as a means to detect fraud and waste, but have grown to also be used for purposes such as measuring and improving program performance. While not completely without controversy, these types of data mining applications have gained greater acceptance. However, some national defense/homeland security data mining applications represent a significant expansion in the quantity and scope of data to be analyzed. Moreover, due to their security-related nature, the details of these initiatives (e.g., data sources, analytical techniques, access and retention practices, etc.) are usually less transparent.

Download Full-text

Data Mining and Homeland Security

Encyclopedia of Digital Government ◽

10.4018/978-1-59140-789-8.ch042 ◽

2011 ◽

pp. 277-282 ◽

Cited By ~ 2

Author(s):

J. W. Seifert

Keyword(s):

Data Mining ◽

Data Analysis ◽

Homeland Security ◽

Predictive Analytics ◽

New Technology ◽

Analytical Techniques ◽

Data Sources ◽

High Expectations ◽

Multiple Data ◽

Factual Data

A significant amount of attention appears to be focusing on how to better collect, analyze, and disseminate information. In doing so, technology is commonly and increasingly looked upon as both a tool, and, in some cases, a substitute, for human resources. One such technology that is playing a prominent role in homeland security initiatives is data mining. Similar to the concept of homeland security, while data mining is widely mentioned in a growing number of bills, laws, reports, and other policy documents, an agreed upon definition or conceptualization of data mining appears to be generally lacking within the policy community (Relyea, 2002). While data mining initiatives are usually purported to provide insightful, carefully constructed analysis, at various times data mining itself is alternatively described as a technology, a process, and/or a productivity tool. In other words, data mining, or factual data analysis, or predictive analytics, as it also is sometimes referred to, means different things to different people. Regardless of which definition one prefers, a common theme is the ability to collect and combine, virtually if not physically, multiple data sources, for the purposes of analyzing the actions of individuals. In other words, there is an implicit belief in the power of information, suggesting a continuing trend in the growth of “dataveillance,” or the monitoring and collection of the data trails left by a person’s activities (Clarke, 1988). More importantly, it is clear that there are high expectations for data mining, or factual data analysis, being an effective tool. Data mining is not a new technology but its use is growing significantly in both the private and public sectors. Industries such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs, enhance research, and increase sales. In the public sector, data mining applications initially were used as a means to detect fraud and waste, but have grown to also be used for purposes such as measuring and improving program performance. While not completely without controversy, these types of data mining applications have gained greater acceptance. However, some national defense/homeland security data mining applications represent a significant expansion in the quantity and scope of data to be analyzed. Moreover, due to their security-related nature, the details of these initiatives (e.g., data sources, analytical techniques, access and retention practices, etc.) are usually less transparent.

Download Full-text

Mini-Batch Normalized Mutual Information: A Hybrid Feature Selection Method

IEEE Access ◽

10.1109/access.2019.2936346 ◽

2019 ◽

Vol 7 ◽

pp. 116875-116885 ◽

Cited By ~ 4

Author(s):

G. S. Thejas ◽

Sajal Raj Joshi ◽

S. S. Iyengar ◽

N. R. Sunitha ◽

Prajwal Badrinath

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Feature Selection Method ◽

Selection Method ◽

Normalized Mutual Information

Download Full-text

Normalized mutual information feature selection for electroencephalogram data based on grassberger entropy estimator

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2017.8217730 ◽

2017 ◽

Author(s):

Xiaowei Zhang ◽

Yuan Yao ◽

Manman Wang ◽

Jian Shen ◽

Lei Feng ◽

...

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Normalized Mutual Information ◽

Entropy Estimator ◽

Selection For

Download Full-text

Optimized automatic sleep stage classification using the normalized mutual information feature selection (NMIFS) method

2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) ◽

10.1109/embc.2017.8037511 ◽

2017 ◽

Cited By ~ 2

Author(s):

Dongrae Cho ◽

Boreom Lee

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Sleep Stage ◽

Normalized Mutual Information ◽

Stage Classification ◽

Sleep Stage Classification

Download Full-text

A NOVEL FEATURE SELECTION ALGORITHM WITH SUPERVISED MUTUAL INFORMATION FOR CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500279 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350027

Author(s):

JAGANATHAN PALANICHAMY ◽

KUPPUCHAMY RAMASAMY

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class A ◽

Selection Algorithms ◽

The Relationship ◽

Class Variable

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.

Download Full-text

Improved Feature Selection Based on Normalized Mutual Information

2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES) ◽

10.1109/dcabes.2015.135 ◽

2015 ◽

Cited By ~ 3

Author(s):

Li Yin ◽

Ma Xingfei ◽

Yang Mengxi ◽

Zhao Wei ◽

Gu Wenqiang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Normalized Mutual Information

Download Full-text

EEG feature selection based on weighted-normalized mutual information for mental fatigue classification

2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings ◽

10.1109/i2mtc.2016.7520423 ◽

2016 ◽

Cited By ~ 3

Author(s):

Pengbo Zhang ◽

Xue Wang ◽

Xuanping Li ◽

Peng Dai

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Mental Fatigue ◽

Normalized Mutual Information

Download Full-text

Feature selection based on weighted conditional mutual information

Applied Computing and Informatics ◽

10.1016/j.aci.2019.12.003 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Hongfang Zhou ◽

Xiqian Wang ◽

Yao Zhang

Keyword(s):

Data Mining ◽

Feature Selection ◽

Standard Deviation ◽

Mutual Information ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Conditional Mutual Information ◽

The Core ◽

Essential Step

Feature selection is an essential step in data mining. The core of it is to analyze and quantize the relevancy and redundancy between the features and the classes. In CFR feature selection method, they rarely consider which feature to choose if two or more features have the same value using evaluation criterion. In order to address this problem, the standard deviation is employed to adjust the importance between relevancy and redundancy. Based on this idea, a novel feature selection method named as Feature Selection Based on Weighted Conditional Mutual Information (WCFR) is introduced. Experimental results on ten datasets show that our proposed method has higher classification accuracy.

Download Full-text