Text Mining Methods for Hierarchical Document Indexing

Author(s):  
Han-Joon Kim

We have recently seen a tremendous growth in the volume of online text documents from networked resources such as the Internet, digital libraries, and company-wide intranets. One of the most common and successful methods of organizing such huge amounts of documents is to hierarchically categorize documents according to topic (Agrawal, Bayardo & Srikant, 2000; Kim & Lee, 2003). The documents indexed according to a hierarchical structure (termed ‘topic hierarchy’ or ‘taxonomy’) are kept in internal categories as well as in leaf categories, in the sense that documents at a lower category have increasing specificity. Through the use of a topic hierarchy, users can quickly navigate to any portion of a document collection without being overwhelmed by a large document space. As is evident from the popularity of web directories such as Yahoo (http:// www.yahoo.com/) and Open Directory Project (http:// www.dmoz.org/), topic hierarchies have increased in importance as a tool for organizing or browsing a large volume of electronic text documents. Currently, the topic hierarchies maintained by most information systems are manually constructed and maintained by human editors. The topic hierarchy should be continuously subdivided to cope with the high rate of increase in the number of electronic documents. For example, the topic hierarchy of the Open Directory Project has now reached about 590,000 categories. However, manually maintaining the hierarchical structure incurs several problems. First, such a manual task is prohibitively costly as well as time-consuming. Until now, large search portals such as Yahoo have invested significant time and money into maintaining their taxonomy, but obviously they will not be able to keep up with the pace of growth and change in electronic documents through such manual activity. Moreover, for a dynamic networked resource (e.g., World Wide Web) that contains highly heterogeneous documents accompanied by frequent content changes, maintain- ing a ‘good’ hierarchy is fraught with difficulty, and oftentimes is beyond the human experts’ capabilities. Lastly, since human editors’ categorization decision is not only highly subjective but their subjectivity is also variable over time, it is difficult to maintain a reliable and consistent hierarchical structure. The above limitations require information systems that can provide intelligent organization capabilities with topic hierarchies. Related commercial systems include Verity Knowledge Organizer (http://www.verity.com/), Inktomi Directory Engine (http://www.inktomi.com/), and Inxight Categorizer (http://www.inxight.com/), which enable a browsable web directory to be automatically built. However, these systems did not address the (semi-)automatic evolving capabilities of organizational schemes and classification models at all. This is one of the reasons why the commercial taxonomy-based services do not tend to be as popular as their manually constructed counterparts, such as Yahoo.

2020 ◽  
Author(s):  
Lei Qin ◽  
Yidan Wang ◽  
Qiang Sun ◽  
Xiaomei Zhang ◽  
Ben-Chang Shia ◽  
...  

BACKGROUND Since the outbreak of COVID-19 in December 2019 in Wuhan, Hubei Province, China, frequent interregional contacts and the high rate of infection spread have catalyzed the formation of an epidemic network. OBJECTIVE The aim of this study was to identify influential nodes and highlight the hidden structural properties of the COVID-19 epidemic network, which we believe is central to prevention and control of the epidemic. METHODS We first constructed a network of the COVID-19 epidemic among 31 provinces in mainland China; after some basic characteristics were revealed by the degree distribution, the k-core decomposition method was employed to provide static and dynamic evidence to determine the influential nodes and hierarchical structure. We then exhibited the influence power of the above nodes and the evolution of this power. RESULTS Only a small fraction of the provinces studied showed relatively strong outward or inward epidemic transmission effects. The three provinces of Hubei, Beijing, and Guangzhou showed the highest out-degrees, and the three highest in-degrees were observed for the provinces of Beijing, Henan, and Liaoning. In terms of the hierarchical structure of the COVID-19 epidemic network over the whole period, more than half of the 31 provinces were located in the innermost core. Considering the correlation of the characteristics and coreness of each province, we identified some significant negative and positive factors. Specific to the dynamic transmission process of the COVID-19 epidemic, three provinces of Anhui, Beijing, and Guangdong always showed the highest coreness from the third to the sixth week; meanwhile, Hubei Province maintained the highest coreness until the fifth week and then suddenly dropped to the lowest in the sixth week. We also found that the out-strengths of the innermost nodes were greater than their in-strengths before January 27, 2020, at which point a reversal occurred. CONCLUSIONS Increasing our understanding of how epidemic networks form and function may help reduce the damaging effects of COVID-19 in China as well as in other countries and territories worldwide.


Author(s):  
Han-Joon Kim

We have recently seen a tremendous growth in the volume of online text documents from networked resources such as the Internet, digital libraries, and company-wide intranets. One of the most common and successful methods of organizing such huge amounts of documents is to hierarchically categorize documents according to topic (Agrawal, Bayardo, & Srikant, 2000; Kim & Lee, 2003). The documents indexed according to a hierarchical structure (termed ‘topic hierarchy’ or ‘taxonomy’) are kept in internal categories as well as in leaf categories, in the sense that documents at a lower category have increasing specificity. Through the use of a topic hierarchy, users can quickly navigate to any portion of a document collection without being overwhelmed by a large document space. As is evident from the popularity of Web directories such as Yahoo (http://www.yahoo.com/) and Open Directory Project (http://dmoz.org/), topic hierarchies have increased in importance as a tool for organizing or browsing a large volume of electronic text documents.


10.2196/24291 ◽  
2020 ◽  
Vol 6 (4) ◽  
pp. e24291
Author(s):  
Lei Qin ◽  
Yidan Wang ◽  
Qiang Sun ◽  
Xiaomei Zhang ◽  
Ben-Chang Shia ◽  
...  

Background Since the outbreak of COVID-19 in December 2019 in Wuhan, Hubei Province, China, frequent interregional contacts and the high rate of infection spread have catalyzed the formation of an epidemic network. Objective The aim of this study was to identify influential nodes and highlight the hidden structural properties of the COVID-19 epidemic network, which we believe is central to prevention and control of the epidemic. Methods We first constructed a network of the COVID-19 epidemic among 31 provinces in mainland China; after some basic characteristics were revealed by the degree distribution, the k-core decomposition method was employed to provide static and dynamic evidence to determine the influential nodes and hierarchical structure. We then exhibited the influence power of the above nodes and the evolution of this power. Results Only a small fraction of the provinces studied showed relatively strong outward or inward epidemic transmission effects. The three provinces of Hubei, Beijing, and Guangzhou showed the highest out-degrees, and the three highest in-degrees were observed for the provinces of Beijing, Henan, and Liaoning. In terms of the hierarchical structure of the COVID-19 epidemic network over the whole period, more than half of the 31 provinces were located in the innermost core. Considering the correlation of the characteristics and coreness of each province, we identified some significant negative and positive factors. Specific to the dynamic transmission process of the COVID-19 epidemic, three provinces of Anhui, Beijing, and Guangdong always showed the highest coreness from the third to the sixth week; meanwhile, Hubei Province maintained the highest coreness until the fifth week and then suddenly dropped to the lowest in the sixth week. We also found that the out-strengths of the innermost nodes were greater than their in-strengths before January 27, 2020, at which point a reversal occurred. Conclusions Increasing our understanding of how epidemic networks form and function may help reduce the damaging effects of COVID-19 in China as well as in other countries and territories worldwide.


Author(s):  
Janusz Adam Frykowski

AbstractThe following paper depicts the history of Saint Simeon Stylites Uniate Parish in Rachanie since it became known in historical sources until 1811- that is the time it ceased to be an independent church unit. The introduction of the article contains the geographical location of the parish, its size and the position within the hierarchical structure of the Church. Having analysed post-visit inspection protocols left by Chelm Bishops, the appearance as well as fittings and ancillary equipment of the church in Rachanie in that particular period are reported. Moreover, the list of 4 local clergymen is recreated and their benefice is determined. As far as possible, both the number of worshipers and the number of Holy Communion receivers is determined.


1993 ◽  
Vol 18 (2-4) ◽  
pp. 129-149
Author(s):  
Serge Garlatti

Representation systems based on inheritance networks are founded on the hierarchical structure of knowledge. Such representation is composed of a set of objects and a set of is-a links between nodes. Objects are generally defined by means of a set of properties. An inheritance mechanism enables us to share properties across the hierarchy, called an inheritance graph. It is often difficult, even impossible to define classes by means of a set of necessary and sufficient conditions. For this reason, exceptions must be allowed and they induce nonmonotonic reasoning. Many researchers have used default logic to give them formal semantics and to define sound inferences. In this paper, we propose a survey of the different models of nonmonotonic inheritance systems by means of default logic. A comparison between default theories and inheritance mechanisms is made. In conclusion, the ability of default logic to take some inheritance mechanisms into account is discussed.


Sign in / Sign up

Export Citation Format

Share Document