scholarly journals Robust Semi-Automatic Annotation of object Data Sets with Bounding Rectangles.

Author(s):  
Abdelhamid ZAIDI

Abstract Object datasets used in the construction of object detectors are typically manually annotated with horizontal or rotated bounding rectangles. The optimality of an annotation is obtained by fulfilling two conditions (i) the rectangle covers the whole object (ii) the area of ​​the rectangle is minimal. Building a large-scale object dataset requires annotators with equal manual dexterity to carry out this tedious work. When an object is horizontal, it is easy for the annotator to reach the optimal bounding box within a reasonable time. However, if the object is rotated, the annotator needs additional time to decide whether the object will be annotated with a horizontal rectangle or a rotated rectangle. Moreover, in both cases, the final decision is not based on any objective argument, and the annotation is generally not optimal. In this study, we propose a new method of annotation by rectangles, called Robust Semi-Automatic Annotation, which combines speed and robustness. Our method has two phases. The first phase consists in inviting the annotator to click on the most relevant points located on the contour of the object. The outputs of the first phase are used by an algorithm to determine a rectangle enclosing these points. To carry out the second phase, we develop an algorithm called RANGE-MBR, which determines, from the selected points on the contour of the object, a rectangle enclosing these points in a linear time. The rectangle returned by RANGE-MBR always satisfies optimality condition (i). We prove that the optimality condition (ii) is always satisfied for objects with isotropic shapes. For objects with anisotropic shapes, we study the optimality condition (ii) by simulations. We show that the rectangle returned by RANGE-MBR is quasi-optimal for the condition (ii), and that its performance increases with dilated objects, which is the case for most of the objects appearing on images collected by aerial photography.

Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


2019 ◽  
Vol 12 (4) ◽  
pp. 53-66 ◽  
Author(s):  
Hui Cai ◽  
Yi Lu ◽  
Hugo Sheward

Objectives: To provide a historical review on the evolution of contemporary Chinese nursing unit design and contextual factors that drive the design and changes. Background: China is undergoing a major healthcare construction boom. A systematic investigation of the characteristics and development of Chinese nursing unit design is warranted to help U.S. healthcare designers to provide design that fits the local context. Methods: The investigation is developed in two phases. The first phase is a large-scale spatial analysis of 176 Chinese acute care unit layouts from three periods: 1989–1999, 1999–2004, and 2005–2015. In addition to qualitative descriptions of the nursing unit typologies, the percentage of various typologies, patient room (PR) types, the number of beds, visibility from nurse station (NS) to PRs, and access to natural light during each period were evaluated quantitatively. The second phase defined key factors that shape Chinese nursing unit design through expert interviews. Results: Significant differences were found between design in these three periods. Chinese nursing unit size has continuously grown in the number of beds. Most PRs have shifted from three-bed to double-bed rooms. Most Chinese hospitals use single corridor, racetrack, and mutated racetrack layouts. Mutated racetrack has taken over single corridor as the dominant configuration. The access to southern sunlight remains important. The average visibility from NS to some PRs is restricted by the preferences of allocating most PRs on the south side of a unit. Conclusions: Chinese nursing unit design has undergone transformations to fit the local cultural, socioeconomic context and staffing model.


2020 ◽  
Vol 34 (04) ◽  
pp. 4412-4419 ◽  
Author(s):  
Zhao Kang ◽  
Wangtao Zhou ◽  
Zhitong Zhao ◽  
Junming Shao ◽  
Meng Han ◽  
...  

A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods.


Identifying communities has always been a fundamental task in analysis of complex networks. Currently used algorithms that identify the community structures in large-scale real-world networks require a priori information such as the number and sizes of communities or are computationally expensive. Amongst them, the label propagation algorithm (LPA) brings great scaslability together with high accuracy but which is not accurate enough because of its randomness. In this paper, we study the equivalence properties of nodes on social network graphs according to the labeling criteria to shorten social network graphs and develop label propagation algorithms on shortened graphs to discover effective social networking communities without requiring optimization of the objective function as well as advanced information about communities. Test results on sample data sets show that the proposed algorithm execution time is significantly reduced compared to the published algorithms. The proposed algorithm takes an almost linear time and improves the overall quality of the identified community in complex networks with a clear community structure.


Author(s):  
Kim Coutts ◽  
Mershen Pillay

Background: The bedside assessment is often seen as a screener because of its high variability in sensitivity and specificity, whilst the instrumental measures are viewed as gold standards because of the ability of speech-language therapist (SLT) to visualise the swallow more objectively.Objectives: This research article explores how the value needs to be placed on the decision-making abilities of the SLT rather than on the assessment measure itself.Method: A mixed methodology concurrent triangulation design was employed to collect data in two phases: the first phase included observing seven SLTs conducting assessments using a standardised bedside measure together with pulse oximetry and cervical auscultation. The second phase was a focus group discussion based on the findings from the first phase. Data were analysed thematically using a bottom-up approach.Results: The following factors were found to influence the decision-making process at the bedside: bedside assessment data sets, patient, multidisciplinary team, context and then SLT. The availability of more data from the assessment from different data sets improved the confidence of the SLT at the bedside when needing to make clinical decisions. Clinical instincts are developed through experience and observation of those more experienced. These skills need to be developed from junior years.Conclusion: This research study showed that a bedside assessment can provide valuable information that will allow for diagnostic decisions to be made at the bedside. This study also highlighted the importance of critical thinking using clinical instincts, and that these are the factors that need to be valued and emphasised rather than the assessment measures themselves.


Author(s):  
Amit Kumar ◽  
Manish Kumar ◽  
Nidhya R.

In recent years, a huge increase in the demand of medically related data is reported. Due to this, research in medical disease diagnosis has emerged as one of the most demanding research domains. The research reported in this chapter is based on developing an ACO (ant colony optimization)-based Bayesian hybrid prediction model for medical disease diagnosis. The proposed model is presented in two phases. In the first phase, the authors deal with feature selection by using the application of a nature-inspired algorithm known as ACO. In the second phase, they use the obtained feature subset as input for the naïve Bayes (NB) classifier for enhancing the classification performances over medical domain data sets. They have considered 12 datasets from different organizations for experimental purpose. The experimental analysis advocates the superiority of the presented model in dealing with medical data for disease prediction and diagnosis.


Author(s):  
R. Vishnu Priya ◽  
A.Vadivel ◽  
R. S. Thakur

The knowledge discovery from large database is useful for decision making in industry real-time problems. Given a large voluminous transaction database, the knowledge is discovered by extracting maximal pattern after some analysis. Various methods have been proposed for extracting maximal pattern including FP and CP trees. It has been noticed that time taken by these methods for mining is found to be large. This paper modifies tree construction strategy of CP-tree for mining maximal pattern and the strategy takes less time for mining. The proposed modified CP-tree is constructed in two phases. The first phase constructs the tree based on user given item order along with its corresponding item list. In the second phase, each node in the branch of the constructed tree is dynamically rearranged based on item sorted list. The maximal patterns are retrieved from the proposed tree using the FPmax algorithm. The proposed tree has been built to support both interactive and incremental mining. The performance is evaluated using both dense and sparse bench mark data sets such as CHESS, MUSHROOM, CONNECT-4, PUMSB, and RETAIL respectively. The performance of the modified CP-tree is encouraging compared to some of the recently proposed approaches.


2020 ◽  
Vol 37 (10) ◽  
pp. 3047-3060
Author(s):  
Xiang Ji ◽  
Zhenyu Zhang ◽  
Andrew Holbrook ◽  
Akihiko Nishimura ◽  
Guy Baele ◽  
...  

Abstract Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, where N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithm significantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.


2021 ◽  
Vol 2 (2) ◽  
pp. 115-123
Author(s):  
Basil Thomas

The surplus number of comic magazines that existed in Keralam even before the official formation of the state in 1956 reinstates the affinity of the Malayalam speaking people towards cartoons. The social situations and the functioning of various institutions like education and employment were a source of inspiration for the cartoonists from Keralam. The high literacy rate of Keralam is not the product of a single day. The foundation work of this had started even before the formation of the state. This paper tries to analyse the education system of Keralam portrayed in the cartoons of cartoonists Toms, Aravindan and Thomas. The cartoons of Toms, Aravindan and Thomas portray the day to day life of Keralam in their social cartoons, including the school life and college life of the period of the second half of the twentieth century. There are two phases in the development of the education sector in Keralam: the first phase focusing mass education where the major capital investor was the state itself and private investment was not encouraged. The second phase witnessed a fast paced growth and urbanization after the 1970s because of the large scale migrant remittance. The changing faces of the education system of Keralam can be seen in the cartoons of these select cartoonists.


2013 ◽  
Vol 427-429 ◽  
pp. 2618-2621 ◽  
Author(s):  
Ling Shen ◽  
Qing Xi Peng

As the emerging date intensive applications have received more and more attentions from researchers, its a severe challenge for near duplicated text detection for large scale data. This paper presents an algorithm based on MapReduce and ontology for near duplicated text detection via computing pair document similarity in large scale document collections. We mapping the words in the document to the synonym and then calculate the similarity between them. MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key /value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. In large scale test, experimental result demonstrates that this approach outperforms other state of the art solutions. Many advantages such as linear time and accuracy make the algorithm valuable in actual practice.


Sign in / Sign up

Export Citation Format

Share Document