scholarly journals Visual Summary Identification From Scientific Publications via Self-Supervised Learning

Author(s):  
Shintaro Yamamoto ◽  
Anne Lauscher ◽  
Simone Paolo Ponzetto ◽  
Goran Glavaš ◽  
Shigeo Morishima

The exponential growth of scientific literature yields the need to support users to both effectively and efficiently analyze and understand the some body of research work. This exploratory process can be facilitated by providing graphical abstracts–a visual summary of a scientific publication. Accordingly, previous work recently presented an initial study on automatic identification of a central figure in a scientific publication, to be used as the publication’s visual summary. This study, however, have been limited only to a single (biomedical) domain. This is primarily because the current state-of-the-art relies on supervised machine learning, typically relying on the existence of large amounts of labeled data: the only existing annotated data set until now covered only the biomedical publications. In this work, we build a novel benchmark data set for visual summary identification from scientific publications, which consists of papers presented at conferences from several areas of computer science. We couple this contribution with a new self-supervised learning approach to learn a heuristic matching of in-text references to figures with figure captions. Our self-supervised pre-training, executed on a large unlabeled collection of publications, attenuates the need for large annotated data sets for visual summary identification and facilitates domain transfer for this task. We evaluate our self-supervised pretraining for visual summary identification on both the existing biomedical and our newly presented computer science data set. The experimental results suggest that the proposed method is able to outperform the previous state-of-the-art without any task-specific annotations.

2021 ◽  
Vol 4 (3) ◽  
pp. 1-56
Author(s):  
Agathe Balayn ◽  
Jie Yang ◽  
Zoltan Szlavik ◽  
Alessandro Bozzon

The automatic detection of conflictual languages (harmful, aggressive, abusive, and offensive languages) is essential to provide a healthy conversation environment on the Web. To design and develop detection systems that are capable of achieving satisfactory performance, a thorough understanding of the nature and properties of the targeted type of conflictual language is of great importance. The scientific communities investigating human psychology and social behavior have studied these languages in details, but their insights have only partially reached the computer science community. In this survey, we aim both at systematically characterizing the conceptual properties of online conflictual languages, and at investigating the extent to which they are reflected in state-of-the-art automatic detection systems. Through an analysis of psychology literature, we provide a reconciled taxonomy that denotes the ensemble of conflictual languages typically studied in computer science. We then characterize the conceptual mismatches that can be observed in the main semantic and contextual properties of these languages and their treatment in computer science works; and systematically uncover resulting technical biases in the design of machine learning classification models and the dataset created for their training. Finally, we discuss diverse research opportunities for the computer science community and reflect on broader technical and structural issues.


2019 ◽  
Author(s):  
Joël Legrand ◽  
Romain Gogdemir ◽  
Cédric Bousquet ◽  
Kevin Dalleau ◽  
Marie-Dominique Devignes ◽  
...  

AbstractPharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes knowledge related to PGx a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly usable to humans or software. Natural language processing techniques have been developed and are indeed employed for guiding experts curating this amount of knowledge. But, existing works are limited by the absence of high quality annotated corpora focusing on the domain. This absence restricts in particular the use of supervised machine learning approaches. This article introduces PGxCorpus, a manually annotated corpus, designed for the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly genes variations, gene, drugs and phenotypes), and relationships between those. We present in this article the method used to annotate consistently texts, and a baseline experiment that illustrates how this resource may be leveraged to synthesize and summarize PGx knowledge.


Author(s):  
Rohit Rastogi ◽  
Devendra Kumar Chaturvedi ◽  
Mayank Gupta

Many apps and analyzers based on machine learning have been designed already to help and cure the stress issue, which is increasing rapidly. The project is based on an experimental research work that the authors have performed at Research Labs and Scientific Spirituality Centers of Dev Sanskriti VishwaVidyalaya, Haridwar and Patanjali Research Foundations, Uttarakhand. In their research work, the correctness and accuracy have been studied and compared for two biofeedback devices, electromyography (EMG) and galvanic skin response (GSR), which can operate in three modes—audio, visual, and audio-visual—with the help of data set of tension type headache (TTH) patients. The authors have realized by their research work that these days people have a lot of stress in their lives so they planned to make an effort for reducing the stress level of people by their technical knowledge of computer science. In their project, the authors have a website that contains a closed set of questionnaires, which have some weight associated with each question.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Joël Legrand ◽  
Romain Gogdemir ◽  
Cédric Bousquet ◽  
Kevin Dalleau ◽  
Marie-Dominique Devignes ◽  
...  

AbstractPharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes PGx-related knowledge a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable by humans or software. Natural language processing techniques have been developed to guide experts who curate this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. In particular, this absence restricts the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. In this article, we present the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.


2020 ◽  
Vol 13 (3) ◽  
pp. 381-393
Author(s):  
Farhana Fayaz ◽  
Gobind Lal Pahuja

Background:The Static VAR Compensator (SVC) has the capability of improving reliability, operation and control of the transmission system thereby improving the dynamic performance of power system. SVC is a widely used shunt FACTS device, which is an important tool for the reactive power compensation in high voltage AC transmission systems. The transmission lines compensated with the SVC may experience faults and hence need a protection system against the damage caused by these faults as well as provide the uninterrupted supply of power.Methods:The research work reported in the paper is a successful attempt to reduce the time to detect faults on a SVC-compensated transmission line to less than quarter of a cycle. The relay algorithm involves two ANNs, one for detection and the other for classification of faults, including the identification of the faulted phase/phases. RMS (Root Mean Square) values of line voltages and ratios of sequence components of line currents are used as inputs to the ANNs. Extensive training and testing of the two ANNs have been carried out using the data generated by simulating an SVC-compensated transmission line in PSCAD at a signal sampling frequency of 1 kHz. Back-propagation method has been used for the training and testing. Also the criticality analysis of the existing relay and the modified relay has been done using three fault tree importance measures i.e., Fussell-Vesely (FV) Importance, Risk Achievement Worth (RAW) and Risk Reduction Worth (RRW).Results:It is found that the relay detects any type of fault occurring anywhere on the line with 100% accuracy within a short time of 4 ms. It also classifies the type of the fault and indicates the faulted phase or phases, as the case may be, with 100% accuracy within 15 ms, that is well before a circuit breaker can clear the fault. As demonstrated, fault detection and classification by the use of ANNs is reliable and accurate when a large data set is available for training. The results from the criticality analysis show that the criticality ranking varies in both the designs (existing relay and the existing modified relay) and the ranking of the improved measurement system in the modified relay changes from 2 to 4.Conclusion:A relaying algorithm is proposed for the protection of transmission line compensated with Static Var Compensator (SVC) and criticality ranking of different failure modes of a digital relay is carried out. The proposed scheme has significant advantages over more traditional relaying algorithms. It is suitable for high resistance faults and is not affected by the inception angle nor by the location of fault.


2020 ◽  
Vol 11 (1) ◽  
pp. 237
Author(s):  
Abdallah Namoun ◽  
Abdullah Alshanqiti

The prediction of student academic performance has drawn considerable attention in education. However, although the learning outcomes are believed to improve learning and teaching, prognosticating the attainment of student outcomes remains underexplored. A decade of research work conducted between 2010 and November 2020 was surveyed to present a fundamental understanding of the intelligent techniques used for the prediction of student performance, where academic success is strictly measured using student learning outcomes. The electronic bibliographic databases searched include ACM, IEEE Xplore, Google Scholar, Science Direct, Scopus, Springer, and Web of Science. Eventually, we synthesized and analyzed a total of 62 relevant papers with a focus on three perspectives, (1) the forms in which the learning outcomes are predicted, (2) the predictive analytics models developed to forecast student learning, and (3) the dominant factors impacting student outcomes. The best practices for conducting systematic literature reviews, e.g., PICO and PRISMA, were applied to synthesize and report the main results. The attainment of learning outcomes was measured mainly as performance class standings (i.e., ranks) and achievement scores (i.e., grades). Regression and supervised machine learning models were frequently employed to classify student performance. Finally, student online learning activities, term assessment grades, and student academic emotions were the most evident predictors of learning outcomes. We conclude the survey by highlighting some major research challenges and suggesting a summary of significant recommendations to motivate future works in this field.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1807
Author(s):  
Sascha Grollmisch ◽  
Estefanía Cano

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.


Author(s):  
V.T Priyanga ◽  
J.P Sanjanasri ◽  
Vijay Krishna Menon ◽  
E.A Gopalakrishnan ◽  
K.P Soman

The widespread use of social media like Facebook, Twitter, Whatsapp, etc. has changed the way News is created and published; accessing news has become easy and inexpensive. However, the scale of usage and inability to moderate the content has made social media, a breeding ground for the circulation of fake news. Fake news is deliberately created either to increase the readership or disrupt the order in the society for political and commercial benefits. It is of paramount importance to identify and filter out fake news especially in democratic societies. Most existing methods for detecting fake news involve traditional supervised machine learning which has been quite ineffective. In this paper, we are analyzing word embedding features that can tell apart fake news from true news. We use the LIAR and ISOT data set. We churn out highly correlated news data from the entire data set by using cosine similarity and other such metrices, in order to distinguish their domains based on central topics. We then employ auto-encoders to detect and differentiate between true and fake news while also exploring their separability through network analysis.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2532
Author(s):  
Encarna Quesada ◽  
Juan J. Cuadrado-Gallego ◽  
Miguel Ángel Patricio ◽  
Luis Usero

Anomaly Detection research is focused on the development and application of methods that allow for the identification of data that are different enough—compared with the rest of the data set that is being analyzed—and considered anomalies (or, as they are more commonly called, outliers). These values mainly originate from two sources: they may be errors introduced during the collection or handling of the data, or they can be correct, but very different from the rest of the values. It is essential to correctly identify each type as, in the first case, they must be removed from the data set but, in the second case, they must be carefully analyzed and taken into account. The correct selection and use of the model to be applied to a specific problem is fundamental for the success of the anomaly detection study and, in many cases, the use of only one model cannot provide sufficient results, which can be only reached by using a mixture model resulting from the integration of existing and/or ad hoc-developed models. This is the kind of model that is developed and applied to solve the problem presented in this paper. This study deals with the definition and application of an anomaly detection model that combines statistical models and a new method defined by the authors, the Local Transilience Outlier Identification Method, in order to improve the identification of outliers in the sensor-obtained values of variables that affect the operations of wind tunnels. The correct detection of outliers for the variables involved in wind tunnel operations is very important for the industrial ventilation systems industry, especially for vertical wind tunnels, which are used as training facilities for indoor skydiving, as the incorrect performance of such devices may put human lives at risk. In consequence, the use of the presented model for outlier detection may have a high impact in this industrial sector. In this research work, a proof-of-concept is carried out using data from a real installation, in order to test the proposed anomaly analysis method and its application to control the correct performance of wind tunnels.


Logistics ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 8
Author(s):  
Hicham Lamzaouek ◽  
Hicham Drissi ◽  
Naima El Haoud

The bullwhip effect is a pervasive phenomenon in all supply chains causing excessive inventory, delivery delays, deterioration of customer service, and high costs. Some researchers have studied this phenomenon from a financial perspective by shedding light on the phenomenon of cash flow bullwhip (CFB). The objective of this article is to provide the state of the art in relation to research work on CFB. Our ambition is not to make an exhaustive list, but to synthesize the main contributions, to enable us to identify other interesting research perspectives. In this regard, certain lines of research remain insufficiently explored, such as the role that supply chain digitization could play in controlling CFB, the impact of CFB on the profitability of companies, or the impacts of the omnichannel commerce on CFB.


Sign in / Sign up

Export Citation Format

Share Document