Data Mining
Recently Published Documents


(FIVE YEARS 16674)



The emergence of online education helps improving the traditional English teaching quality greatly. However, it only moves the teaching process from offline to online, which does not really change the essence of traditional English teaching. In this work, we mainly study an intelligent English teaching method to further improve the quality of English teaching. Specifically, the random forest is firstly used to analyze and excavate the grammatical and syntactic features of the English text. Then, the decision tree based method is proposed to make a prediction about the English text in terms of its grammar or syntax issues. The evaluation results indicate that the proposed method can effectively improve the accuracy of English grammar or syntax recognition.

Vinod Gendre

Abstract: Crime is a preeminent issue where the main concern has been worried by individual, the local area and government. Wrongdoing forecast utilizes past information and in the wake of investigating information, anticipate the future wrongdoing with area and time. In present days sequential criminal cases quickly happen so it is a provoking assignment to anticipate future wrongdoing precisely with better execution. This paper examines about various wrongdoing expectation and location. A productive wrongdoing forecast framework speeds up the method involved with addressing violations.. Wrongdoing Prediction framework utilizes recorded information and examinations the information utilizing a few dissecting strategies and later can anticipate the examples and patterns of wrongdoing utilizing any of the underneath referenced methodologies. Keywords: Crime Analysis, Data Mining, Classifiaction , Clustering

Tuğçe Ayhan ◽  
Tamer Uçar

The demand for credit is increasing constantly. Banks are looking for various methods of credit evaluation that provide the most accurate results in a shorter period in order to minimize their rising risks. This study focuses on various methods that enable the banks to increase their asset quality without market loss regarding the credit allocation process. These methods enable the automatic evaluation of loan applications in line with the sector practices, and enable determination of credit policies/strategies based on actual needs. Within the scope of this study, the relationship between the predetermined attributes and the credit limit outputs are analyzed by using a sample data set of consumer loans. Random forest (RF), sequential minimal optimization (SMO), PART, decision table (DT), J48, multilayer perceptron(MP), JRip, naïve Bayes (NB), one rule (OneR) and zero rule (ZeroR) algorithms were used in this process. As a result of this analysis, SMO, PART and random forest algorithms are the top three approaches for determining customer credit limits.

Özerk Yavuz

Epidemic diseases can be extremely dangerous with its hazarding influences. They may have negative effects on economies, businesses, environment, humans, and workforce. In this paper, some of the factors that are interrelated with COVID-19 pandemic have been examined using data mining methodologies and approaches. As a result of the analysis some rules and insights have been discovered and performances of the data mining algorithms have been evaluated. According to the analysis results, JRip algorithmic technique had the most correct classification rate and the lowest root mean squared error (RMSE). Considering classification rate and RMSE measure, JRip can be considered as an effective method in understanding factors that are related with corona virus caused deaths.

Fa Zhang ◽  
Shi-Hui Wu ◽  
Zhi-Hua Song

Multi-agent based simulation (MABS) is an important approach for studying complex systems. The Agent-based model often contains many parameters, these parameters are usually not independent, with differences in their range, and may be subjected to constraints. How to use MABS investigating complex systems effectively is still a challenge. The common tasks of MABS include: summarizing the macroscopic patterns of the system, identifying key factors, establishing a meta-model, and optimization. We proposed a framework of experimental design and data mining for MABS. In the framework, method of experimental design is used to generate experiment points in the parameter space, then generate simulation data, and finally using data mining techniques to analyze data. With this framework, we could explore and analyze complex system iteratively. Using central composite discrepancy (CCD) as measure of uniformity, we designed an algorithm of experimental design in which parameters could meet any constraints. We discussed the relationship between tasks of complex system simulation and data mining, such as using cluster analysis to classify the macro patterns of the system, and using CART, PCA, ICA and other dimensionality reduction methods to identify key factors, using linear regression, stepwise regression, SVM, neural network, etc. to build the meta-model of the system. This framework integrates MABS with experimental design and data mining to provide a reference for complex system exploration and analysis.

2022 ◽  
Vol 3 (2) ◽  
pp. 39-45
Muhammad Farid Satrio Wibowo ◽  
Nila Feby Puspitasari ◽  
Barka Satya

Pemilihan konsentrasi atau minat studi merupakan hal yang tidak mudah dilakukan oleh seorang mahasiswa pada sebuah jurusan di Perguruan Tinggi. Mahasiswa akan berupaya memilih konsentrasi yang menurut mereka paling tepat dan sesuai dengan kompetensi dan minat studi, karena konsentrasi yang dipilih akan mempengaruhi minat belajar, prestasi, lama studi dan juga berpengaruh terhadap Indeks Prestasi Akademik (IPK) mahasiswa. Pentingnya memilih sebuah konsentrasi penjurusan bagi mahasiswa pada Institusi Perguruan Tinggi, maka perlu dibangun suatu model yang dapat membantu mahasiswa dalam memilih konsentrasi sesuai dengan kompetensi dan minat studi mahasiswa. Oleh karena itu, peneliti akan melakukan penelitian dengan membuat sistem untuk pemilihan konsentrasi mahasiswa menggunakan algoritma Naïve Bayes dengan metode klasifikasi. Untuk membantu dalam mengambil keputusan pemilihan konsentrasi, penelitian ini menggunakan teknik data mining sebagai proses pencarian pola yang diinginkan dalam sebuah database yang besar. Hasil pengujian yang telah dilakukan terhadap sample dataset sebanyak 1534 data menggunakan Algoritma Naïve Bayes, diperoleh bahwa hasil prediksi untuk menentukan konsentrasi memiliki nilai akurasi sebesar 84.27%. Variabel berpengaruh terhadap tingkat akurasi yang di hasilkan. Ukuran variabel yang sempit atau sedikit menyebabkan hasil akurasi yang kurang baik, tetapi ukuran variabel yang luas dapat menghasilkan akurasi ouput yang lebih optimal

Ольга Герасименко

В статье рассмотрены вопросы встраивания концепции геомаркетинга в систему стратегического маркетингового планирования с целью повышения конкурентоспособности компании. Описаны отдельные результаты контент-анализ термина «геомаркетинг» (в англоязычной версии поисковой системы, предложено авторское определение геомаркетинга. Приведены атрибуты геомаркетинговой концепции 4G: Geomodeling Intelligence, Geo Product, Geotake Value, Geo Data Mining. Методологической основой исследования выступили группы следующих методов: маркетинговые (экспертный опрос, маркетинговые исследования, геомаркетинговый анализ, опрос, социологические исследования), географические (картографические, ГИС), цифровые (обработка пространственных данных, Big Data, программное моделирование). Разработаны концептуальные представления геомаркетинга в системе стратегического маркетингового управления. Сделаны выводы о возможностях применения геомаркетинговых исследований на примере магазинов рыбалки г. Белгорода. Верификация авторских гипотез подтверждена выбором оптимального местоположения для открытия магазина по адресу: г. Белгород, ул. Есенина, д. 9, корпус 3. Инструментом проведения геомаркетинговых исследований является авторское программное обеспечение, оформленное в виде патента.

2022 ◽  
Vol 8 (2) ◽  
pp. 81-84
Henry George Maquera Quispe ◽  
Richard Yuri Mercado Rivas ◽  
José Luis Cerrón Pérez

El presente  trabajo de investigación ha sido desarrollado debido a la importancia que tiene hoy en día la protección de las madres puérperas, ya que en los últimos años se han venido produciendo un incremento en las muertes maternas en nuestra región. Debemos manifestar que las muertes maternas hoy en día son consideradas factores de subdesarrollo. El Hospital Regional Docente Materno Infantil cuenta en la actualidad con un  Sistema de Información denominado SIP-2000 (Sistema Informático Perinatal – 2000), desarrollado por el Ministerio de Salud en colaboración con organismos internacionales como por ejemplo USAID. El sistema SIP-2000 permite el registro de diferentes datos que pertenecen a las puérperas que acuden al centro de salud desde sus controles prenatales, del proceso de parto y de postparto. Esta información que se registra considera múltiples factores como los de alto riesgo obstétrico (ARO), todos ellos relacionados con la puérpera como por ejemplo: abortos, cesáreas, edad, procedencia, número de partos previos, morbilidades venéreas, etc.  Todos estos datos pueden ser procesados  y analizados para poder determinar los factores que son motivo de muerte en las puérperas. Utilizando los modelos de análisis de datos que proporciona la minería de datos; la investigación proporcionará como resultado las variables directas del alto riesgo obstétrico que influyen en la muerte materna. Factores que deben ser tomados en cuenta por los niveles de gobierno respectivo a fin de cambiar la situación tan agobiante que vive nuestra región central.

2022 ◽  
Priyanka Srivastava ◽  
Chitra Bamba ◽  
Seema Chopra ◽  
Kausik Mandal

There are a plethora of publications on the role of miRNA gene polymorphism and its association with recurrent pregnancy loss (RPL), but a lack of uniformity in the studies available due to the variable subject population, heterogeneity and contrary results of significance. Rigorous data mining was done through PubMed, SCOPUS, Cochrane library, Elsevier and Google Scholar to extract the studies of interest published until June 2021. A total of eight SNPs of miRNAs have been included, where ≥2 studies per SNPs were available. Analysis was done on the basis of pooled odds ratios and 95% CI. This is the first meta-analysis on miRNA SNPs in RPL that suggests that rs11614913, rs3746444 and rs2292832 biomarkers may decrease the risk of RPL under different genetic models.

Sign in / Sign up

Export Citation Format

Share Document