Lexicons on Demand: Neural Word Embeddings for Large-Scale Text Analysis

Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by learning a neural embedding across billions of words on the web. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated such as neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

Download Full-text

Comparative study of deep learning models for sentiment analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.24459 ◽

2018 ◽

Vol 7 (2.14) ◽

pp. 5726

Author(s):

Oumaima Hourrane ◽

El Habib Benlahmar ◽

Ahmed Zellou

Keyword(s):

Deep Learning ◽

Comparative Study ◽

Sentiment Analysis ◽

Language Processing ◽

Word Embeddings ◽

Human Language ◽

Learning Models ◽

Automatic Learning ◽

Learning Capability ◽

The Web

Sentiment analysis is one of the new absorbing parts appeared in natural language processing with the emergence of community sites on the web. Taking advantage of the amount of information now available, research and industry have been seeking ways to automatically analyze the sentiments expressed in texts. The challenge for this task is the human language ambiguity, and also the lack of labeled data. In order to solve this issue, sentiment analysis and deep learning have been merged as deep learning models are effective due to their automatic learning capability. In this paper, we provide a comparative study on IMDB movie review dataset, we compare word embeddings and further deep learning models on sentiment analysis and give broad empirical outcomes for those keen on taking advantage of deep learning for sentiment analysis in real-world settings.

Download Full-text

A Large-scale Text Analysis with Word Embeddings and Topic Modeling

Journal of Cognitive Science ◽

10.17791/jcs.2019.20.1.147 ◽

2019 ◽

Vol 20 (1) ◽

pp. 147-188

Author(s):

Won-Joon Choi ◽

Euhee Kim

Keyword(s):

Text Analysis ◽

Topic Modeling ◽

Large Scale ◽

Word Embeddings

Download Full-text

Mapping global water projects: improving access to donor investment information on the web

Water Policy ◽

10.2166/wp.2003.0012 ◽

2003 ◽

Vol 5 (3) ◽

pp. 203-212

Author(s):

J. Lisa Jorgensona

Keyword(s):

Water Resources ◽

Web Sites ◽

Large Scale ◽

Technical Assistance ◽

Water Withdrawal ◽

Small Scale ◽

Water Projects ◽

Essential Information ◽

Water Project ◽

The Web

This paper discusses a series of discusses how web sites now report international water project information, and maps the combined donor investment in more than 6000 water projects, active since 1995. The maps show donor investment: • has addressed water scarcity, • has improved access to improvised water resources, • correlates with growth in GDP, • appears to show a correlation with growth in net private capital flow, • does NOT appear to correlate with growth in GNI. Evaluation indicates problems in the combined water project portfolios for major donor organizations: •difficulties in grouping projects over differing Sector classifications, food security, or agriculture/irrigation is the most difficult. • inability to map donor projects at the country or river basin level because 60% of the donor projects include no location data (town, province, watershed) in the title or abstracts available on the web sites. • no means to identify donor projects with utilization of water resources from training or technical assistance. • no information of the source of water (river, aquifer, rainwater catchment). • an identifiable quantity of water (withdrawal amounts, or increased water efficiency) is not provided. • differentiation between large scale verses small scale projects. Recommendation: Major donors need to look at how the web harvests and combines their information, and look at ways to agree on a standard template for project titles to include more essential information. The Japanese (JICA) and the Asian Development Bank provide good models.

Download Full-text

Data-driven State Space Reconstruction of Mobility on Demand Systems for Sizing-Rebalancing Analysis

Proceedings of the 2018 Symposium on Simulation for Architecture and Urban Design (SimAUD 2018) ◽

10.22360/simaud.2018.simaud.029 ◽

2018 ◽

Keyword(s):

State Space ◽

Data Driven ◽

Demand Systems ◽

State Space Reconstruction ◽

On Demand

Download Full-text

Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven Analysis

Algorithms ◽

10.3390/a14050154 ◽

2021 ◽

Vol 14 (5) ◽

pp. 154

Author(s):

Marcus Walldén ◽

Masao Okita ◽

Fumihiko Ino ◽

Dimitris Drikakis ◽

Ioannis Kokkinakis

Keyword(s):

Large Scale ◽

Data Driven ◽

Data Sets ◽

Output Constraints ◽

Data Driven Approach ◽

Scientific Simulations ◽

Multiple Metrics ◽

In Transit ◽

Multiple Compression ◽

Large Scale Simulations

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.

Download Full-text

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

International Journal of Artificial Intelligence in Education ◽

10.1007/s40593-021-00267-x ◽

2021 ◽

Author(s):

Ekaterina Kochmar ◽

Dung Do Vu ◽

Robert Belfer ◽

Varun Gupta ◽

Iulian Vlad Serban ◽

...

Keyword(s):

Machine Learning ◽

Student Performance ◽

Language Processing ◽

Intelligent Tutoring Systems ◽

Large Scale ◽

Intelligent Tutoring ◽

Performance Outcomes ◽

Data Driven ◽

Personalized Feedback ◽

Tutoring Systems

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.

Download Full-text

Visualization products on-demand through the Web

Proceedings of the third symposium on Virtual reality modeling language - VRML '98 ◽

10.1145/271897.271904 ◽

1998 ◽

Cited By ~ 6

Author(s):

Suzana Djurcilov ◽

Alex Pang

Keyword(s):

On Demand ◽

The Web

Download Full-text

Data-Driven Energy Use Estimation in Large Scale Transportation Networks

Proceedings of the 2nd ACM/EIGSCC Symposium on Smart Cities and Communities - SCC '19 ◽

10.1145/3357492.3358632 ◽

2019 ◽

Author(s):

Bin Wang ◽

Cy Chan ◽

Divya Somasi ◽

Jane Macfarlane ◽

Eric Rask

Keyword(s):

Large Scale ◽

Energy Use ◽

Transportation Networks ◽

Data Driven

Download Full-text

Improving the management of type 2 diabetes through large-scale general practice: the role of a data-driven and technology-enabled education programme

BMJ Open Quality ◽

10.1136/bmjoq-2020-001087 ◽

2021 ◽

Vol 10 (1) ◽

pp. e001087

Author(s):

Tarek F Radwan ◽

Yvette Agyako ◽

Alireza Ettefaghian ◽

Tahira Kamran ◽

Omar Din ◽

...

Keyword(s):

Type 2 Diabetes ◽

Primary Care ◽

Large Scale ◽

Education Programme ◽

Educational Programme ◽

Data Driven ◽

Treatment Targets ◽

Care Processes ◽

Data Driven Approach

A quality improvement (QI) scheme was launched in 2017, covering a large group of 25 general practices working with a deprived registered population. The aim was to improve the measurable quality of care in a population where type 2 diabetes (T2D) care had previously proved challenging. A complex set of QI interventions were co-designed by a team of primary care clinicians and educationalists and managers. These interventions included organisation-wide goal setting, using a data-driven approach, ensuring staff engagement, implementing an educational programme for pharmacists, facilitating web-based QI learning at-scale and using methods which ensured sustainability. This programme was used to optimise the management of T2D through improving the eight care processes and three treatment targets which form part of the annual national diabetes audit for patients with T2D. With the implemented improvement interventions, there was significant improvement in all care processes and all treatment targets for patients with diabetes. Achievement of all the eight care processes improved by 46.0% (p<0.001) while achievement of all three treatment targets improved by 13.5% (p<0.001). The QI programme provides an example of a data-driven large-scale multicomponent intervention delivered in primary care in ethnically diverse and socially deprived areas.

Download Full-text

EZ-ALBI Score for Predicting Hepatocellular Carcinoma Prognosis

Liver Cancer ◽

10.1159/000508971 ◽

2020 ◽

Vol 9 (6) ◽

pp. 734-743

Author(s):

Kazuya Kariyama ◽

Kazuhiro Nouso ◽

Atsushi Hiraoka ◽

Akiko Wakuta ◽

Ayano Oonishi ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Liver Function ◽

Large Scale ◽

Regression Coefficient ◽

Information Criterion ◽

Proportional Hazard ◽

Cox Proportional Hazard ◽

Highly Correlated ◽

Survival Risk ◽

Good Agreement

Introduction: The ALBI score is acknowledged as the gold standard for the assessment of liver function in patients with hepatocellular carcinoma (HCC). Unlike the Child-Pugh score, the ALBI score uses only objective parameters, albumin (Alb) and total bilirubin (T.Bil), enabling a better evaluation. However, the complex calculation of the ALBI score limits its applicability. Therefore, we developed a simplified ALBI score, based on data from a large-scale HCC database.We used the data of 5,249 naïve HCC cases registered in eight collaborating hospitals. Methods: We developed a new score, the EZ (Easy)-ALBI score, based on regression coefficients of Alb and T.Bil for survival risk in a multivariate Cox proportional hazard model. We also developed the EZ-ALBI grade and EZ-ALBI-T grade as alternative options for the ALBI grade and ALBI-T grade and evaluated their stratifying ability. Results: The equation used to calculate the EZ-ALBI score was simple {[T.Bil (mg/dL)] – [9 × Alb (g/dL)]}; this value highly correlated with the ALBI score (correlation coefficient, 0.981; p < 0.0001). The correlation was preserved across different Barcelona clinic liver cancer grade scores (regression coefficient, 0.93–0.98) and across different hospitals (regression coefficient, 0.98–0.99), indicating good generalizability. Although a good agreement was observed between ALBI and EZ-ALBI, discrepancies were observed in patients with poor liver function (T.Bil, ≥3 mg/dL; regression coefficient, 0.877). The stratifying ability of EZ-ALBI grade and EZ-ALBI-T grade were good and their Akaike’s information criterion values (35,897 and 34,812, respectively) were comparable with those of ALBI grade and ALBI-T grade (35,914 and 34,816, respectively). Conclusions: The EZ-ALBI score, EZ-ALBI grade, and EZ-ALBI-T grade are useful, simple scores, which might replace the conventional ALBI score in the future.

Download Full-text