scholarly journals On solving large p-median problems

2019 ◽  
Vol 47 (6) ◽  
pp. 981-996
Author(s):  
Wangshu Mu ◽  
Daoqin Tong

Incorporating big data in urban planning has great potential for better modeling of urban dynamics and more efficiently allocating limited resources. However, big data may present new challenges for problem solutions. This research focuses on the p-median problem, one of the most widely used location models in urban and regional planning. Similar to many other location models, the p-median problem is non-deterministic polynomial-time hard (NP-hard), and solving large-sized p-median problems is difficult. This research proposes a high performance computing-based algorithm, random sampling and spatial voting, to solve large-sized p-median problems. Instead of solving a large p-median problem directly, a random sampling scheme is introduced to create smaller sub- p-median problems that can be solved in parallel efficiently. A spatial voting strategy is designed to evaluate the candidate facility sites for inclusion in obtaining the final problem solution. Tests with the Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) data set show that random sampling and spatial voting provides high-quality solutions and reduces computing time significantly. Tests also demonstrate the dynamic scalability of the algorithm; it can start with a small amount of computing resources and scale up and down flexibly depending on the availability of the computing resources.

Author(s):  
Jia Zhao ◽  
Jia Sun ◽  
Yunan Zhai ◽  
Yan Ding ◽  
Chunyi Wu ◽  
...  

The data are rapidly expanding nowadays, which makes it very difficult to analyze valuable information from big data. Most of the existing data mining algorithms deal with big data problems at large time and space costs. This paper focuses on the sampling problem of big data and puts forward an efficient heuristic Cluster Sampling Arithmetic, called CSA. Many of the former researchers adopted random method to extract early sample set from the original data and then made a variety of different processing of the sample in order to obtain the corresponding minimum sample set, which is regarded as a representation of the original big data set. However, the final processing results of big data will be severely affected by the random sampling process at the beginning, resulting in lower comprehensiveness and quality of the final data results and longer processing time. Based on this view, CSA introduces the idea of clustering to obtain minimum sample set of big data, which is in contrast to the random sampling method in the current literature. CSA makes cluster analysis of the original data set and selects the center of each class as centralized members of the minimum sample set. It aims at ensuring that the sample distribution accords with the characteristics of the original data, guarantees the data integrity and reduces the processing time. The max–min distance means that the pattern recognition has been integrated into the clustering process in order to get the clustering center and prevent algorithm from local optimum. The final experimental results show that, compared with the existing work, CSA algorithm can efficiently reflect the characteristics of the original data and reduce the time of data processing. The obtained minimum sample set has also achieved good effects in the classification algorithm.


Author(s):  
Yihao Tian

Big data is an unstructured data set with a considerable volume, coming from various sources such as the internet, business organizations, etc., in various formats. Predicting consumer behavior is a core responsibility for most dealers. Market research can show consumer intentions; it can be a big order for a best-designed research project to penetrate the veil, protecting real customer motivations from closer scrutiny. Customer behavior usually focuses on customer data mining, and each model is structured at one stage to answer one query. Customer behavior prediction is a complex and unpredictable challenge. In this paper, advanced mathematical and big data analytical (BDA) methods to predict customer behavior. Predictive behavior analytics can provide modern marketers with multiple insights to optimize efforts in their strategies. This model goes beyond analyzing historical evidence and making the most knowledgeable assumptions about what will happen in the future using mathematical. Because the method is complex, it is quite straightforward for most customers. As a result, most consumer behavior models, so many variables that produce predictions that are usually quite accurate using big data. This paper attempts to develop a model of association rule mining to predict customers’ behavior, improve accuracy, and derive major consumer data patterns. The finding recommended BDA method improves Big data analytics usability in the organization (98.2%), risk management ratio (96.2%), operational cost (97.1%), customer feedback ratio (98.5%), and demand prediction ratio (95.2%).


2021 ◽  
pp. 58-60
Author(s):  
Naziru Fadisanku Haruna ◽  
Ran Vijay Kumar Singh ◽  
Samsudeen Dahiru

In This paper a modied ratio-type estimator for nite population mean under stratied random sampling using single auxiliary variable has been proposed. The expression for mean square error and bias of the proposed estimator are derived up to the rst order of approximation. The expression for minimum mean square error of proposed estimator is also obtained. The mean square error the proposed estimator is compared with other existing estimators theoretically and condition are obtained under which proposed estimator performed better. A real life population data set has been considered to compare the efciency of the proposed estimator numerically.


Author(s):  
Kristen Weidner ◽  
Joneen Lowman ◽  
Anne Fleischer ◽  
Kyle Kosik ◽  
Peyton Goodbread ◽  
...  

Purpose Telepractice was extensively utilized during the COVID-19 pandemic. Little is known about issues experienced during the wide-scale rollout of a service delivery model that was novel to many. Social media research is a way to unobtrusively analyze public communication, including during a health crisis. We investigated the characteristics of tweets about telepractice through the lens of an established health technology implementation framework. Results can help guide efforts to support and sustain telehealth beyond the pandemic context. Method We retrieved a historical Twitter data set containing tweets about telepractice from the early months of the pandemic. Tweets were analyzed using a concurrent mixed-methods content analysis design informed by the nonadoption, abandonment, scale-up, spread, and sustainability (NASSS) framework. Results Approximately 2,200 Twitter posts were retrieved, and 820 original tweets were analyzed qualitatively. Volume of tweets about telepractice increased in the early months of the pandemic. The largest group of Twitter users tweeting about telepractice was a group of clinical professionals. Tweet content reflected many, but not all, domains of the NASSS framework. Conclusions Twitter posting about telepractice increased during the pandemic. Although many tweets represented topics expected in technology implementation, some represented phenomena were potentially unique to speech-language pathology. Certain technology implementation topics, notably sustainability, were not found in the data. Implications for future telepractice implementation and further research are discussed.


2019 ◽  
Vol 2 ◽  
pp. 1-6
Author(s):  
Wenjuan Lu ◽  
Aiguo Liu ◽  
Chengcheng Zhang

<p><strong>Abstract.</strong> With the development of geographic information technology, the way to get geographical information is constantly, and the data of space-time is exploding, and more and more scholars have started to develop a field of data processing and space and time analysis. In this, the traditional data visualization technology is high in popularity and simple and easy to understand, through simple pie chart and histogram, which can reveal and analyze the characteristics of the data itself, but still cannot combine with the map better to display the hidden time and space information to exert its application value. How to fully explore the spatiotemporal information contained in massive data and accurately explore the spatial distribution and variation rules of geographical things and phenomena is a key research problem at present. Based on this, this paper designed and constructed a universal thematic data visual analysis system that supports the full functions of data warehousing, data management, data analysis and data visualization. In this paper, Weifang city is taken as the research area, starting from the aspects of rainfall interpolation analysis and population comprehensive analysis of Weifang, etc., the author realizes the fast and efficient display under the big data set, and fully displays the characteristics of spatial and temporal data through the visualization effect of thematic data. At the same time, Cassandra distributed database is adopted in this research, which can also store, manage and analyze big data. To a certain extent, it reduces the pressure of front-end map drawing, and has good query analysis efficiency and fast processing ability.</p>


A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Lin Yang

In recent years, people have paid more and more attention to cloud data. However, because users do not have absolute control over the data stored on the cloud server, it is necessary for the cloud storage server to provide evidence that the data are completely saved to maintain their control over the data. Give users all management rights, users can independently install operating systems and applications and can choose self-service platforms and various remote management tools to manage and control the host according to personal habits. This paper mainly introduces the cloud data integrity verification algorithm of sustainable computing accounting informatization and studies the advantages and disadvantages of the existing data integrity proof mechanism and the new requirements under the cloud storage environment. In this paper, an LBT-based big data integrity proof mechanism is proposed, which introduces a multibranch path tree as the data structure used in the data integrity proof mechanism and proposes a multibranch path structure with rank and data integrity detection algorithm. In this paper, the proposed data integrity verification algorithm and two other integrity verification algorithms are used for simulation experiments. The experimental results show that the proposed scheme is about 10% better than scheme 1 and about 5% better than scheme 2 in computing time of 500 data blocks; in the change of operation data block time, the execution time of scheme 1 and scheme 2 increases with the increase of data blocks. The execution time of the proposed scheme remains unchanged, and the computational cost of the proposed scheme is also better than that of scheme 1 and scheme 2. The scheme in this paper not only can verify the integrity of cloud storage data but also has certain verification advantages, which has a certain significance in the application of big data integrity verification.


Author(s):  
Hena Iqbal ◽  
Sujni Paul ◽  
Khaliquzzaman Khan

Evaluation is an analytical and organized process to figure out the present positive influences, favourable future prospects, existing shortcomings and ulterior complexities of any plan, program, practice or a policy. Evaluation of policy is an essential and vital process required to measure the performance or progression of the scheme. The main purpose of policy evaluation is to empower various stakeholders and enhance their socio-economic environment. A large number of policies or schemes in different areas are launched by government in view of citizen welfare. Although, the governmental policies intend to better shape up the life quality of people but may also impact their every day’s life. A latest governmental scheme Saubhagya launched by Indian government in 2017 has been selected for evaluation by applying opinion mining techniques. The data set of public opinion associated with this scheme has been captured by Twitter. The primary intent is to offer opinion mining as a smart city technology that harness the user-generated big data and analyse it to offer a sustainable governance model.


2016 ◽  
Author(s):  
Rui J. Costa ◽  
Hilde Wilkinson-Herbots

AbstractThe isolation-with-migration (IM) model is commonly used to make inferences about gene flow during speciation, using polymorphism data. However, Becquet and Przeworski (2009) report that the parameter estimates obtained by fitting the IM model are very sensitive to the model's assumptions (including the assumption of constant gene flow until the present). This paper is concerned with the isolation-with-initial-migration (IIM) model of Wilkinson-Herbots (2012), which drops precisely this assumption. In the IIM model, one ancestral population divides into two descendant subpopulations, between which there is an initial period of gene flow and a subsequent period of isolation. We derive a very fast method of fitting an extended version of the IIM model, which also allows for asymmetric gene flow and unequal population sizes. This is a maximum-likelihood method, applicable to data on the number of segregating sites between pairs of DNA sequences from a large number of independent loci. In addition to obtaining parameter estimates, our method can also be used to distinguish between alternative models representing different evolutionary scenarios, by means of likelihood ratio tests. We illustrate the procedure on pairs of Drosophila sequences from approximately 30,000 loci. The computing time needed to fit the most complex version of the model to this data set is only a couple of minutes. The R code to fit the IIM model can be found in the supplementary files of this paper.


Sign in / Sign up

Export Citation Format

Share Document