Evidence Collection Agent Model Design for Big Data Forensic Analysis

Author(s):  
Zhihao Yuan ◽  
Hao Li ◽  
Xian Li
2018 ◽  
Vol 18 (03) ◽  
pp. e23 ◽  
Author(s):  
María José Basgall ◽  
Waldo Hasperué ◽  
Marcelo Naiouf ◽  
Alberto Fernández ◽  
Francisco Herrera

The volume of data in today's applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced datasets. The main characteristic of this problem is that one of the classes is underrepresented, and therefore it is usually more complex to find a model that identifies it correctly. For this reason, it is common to apply preprocessing techniques such as oversampling to balance the distribution of examples in classes. In this work we present SMOTE-BD, a fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighborhood of each example of the minority class. Our novel development is made to be independent of the number of partitions or processes created to achieve a higher degree of efficiency. Experiments conducted on different standard and Big Data datasets show the quality of the proposed design and implementation.


2021 ◽  
Vol 20 ◽  
pp. 82-87
Author(s):  
Stella Vetova

The presented paper deals with data integration and sorting of Covid-19 data. The data file contains fifteen data fiels and for the design of integration and sorting model each of them is configured in data type, format and field length. For the data integration and sorting model design Talend Open Studio is used. The model concerns the performance of four main tasks: data integration, data sorting, result display, and output in .xls file format. For the sorting process two rules are assigned in accordance with the medical and biomedical requirements, namely to sort report date descending order and the Country Name field in alphabetical one


2018 ◽  
Vol 24 (3) ◽  
pp. 1603-1607
Author(s):  
Zul-Azri Ibrahim ◽  
Fiza Abdul Rahim ◽  
Roslan Ismail ◽  
Asmidar Abu Bakar

Sign in / Sign up

Export Citation Format

Share Document