positive class
Recently Published Documents





Saurabh R. Sangwan ◽  
M. P. S. Bhatia

Cyberspace has been recognized as a conducive environment for use of various hostile, direct, and indirect behavioural tactics to target individuals or groups. Denigration is one of the most frequently used cyberbullying ploys to actively damage, humiliate, and disparage the online reputation of target by sending, posting, or publishing cruel rumours, gossip, and untrue statements. Previous pertinent studies report detecting profane, vulgar, and offensive words primarily in the English language. This research puts forward a model to detect online denigration bullying in low-resource Hindi language using attention residual networks. The proposed model Hindi Denigrate Comment–Attention Residual Network (HDC-ARN) intends to uncover defamatory posts (denigrate comments) written in Hindi language which stake and vilify a person or an entity in public. Data with 942 denigrate comments and 1499 non-denigrate comments is scraped using certain hashtags from two recent trending events in India: Tablighi Jamaat spiked Covid-19 (April 2020, Event 1) and Sushant Singh Rajput Death (June 2020: Event 2). Only text-based features, that is, the actual content of the post, are considered. The pre-trained word embedding for Hindi language from fastText is used. The model has three ResNet blocks with an attention layer that generates a post vector for a single input, which is passed through a sigmoid activation function to get the final output as either denigrate (positive class) or non-denigrate (negative class). An F-1 score of 0.642 is achieved on the dataset.

Molecules ◽  
2021 ◽  
Vol 27 (1) ◽  
pp. 41
Brandan Dunham ◽  
Madhavi K. Ganapathiraju

Protein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resource limitations, computational prediction of PPIs is essential and accurately assessing the performance of algorithms is required before further application or translation. However, many published methods compose their evaluation datasets incorrectly, using a higher proportion of positive class data than occuring naturally, leading to exaggerated performance. We re-implemented various published algorithms and evaluated them on datasets with realistic data compositions and found that their performance is overstated in original publications; with several methods outperformed by our control models built on ‘illogical’ and random number features. We conclude that these methods are influenced by an over-characterization of some proteins in the literature and due to scale-free nature of PPI network and that they fail when tested on all possible protein pairs. Additionally, we found that sequence-only-based algorithms performed worse than those that employ functional and expression features. We present a benchmark evaluation of many published algorithms for PPI prediction. The source code of our implementations and the benchmark datasets created here are made available in open source.

2021 ◽  
Vol 4 (1) ◽  
pp. 17-22
Zetta Nillawati Reyka Putri ◽  
Muhammad Muhajir

At the end of 2020, Habib Rizieq's return to Indonesia drew criticism from the public for causing crowds during the Covid-19 pandemic. News and opinions about Habib Rizieq fill internet platforms, including Twitter. The researcher wants to classify the opinion text data of Habib Rizieq's return from Twitter into positive and negative sentiments using the Support Vector Machine method. Opinion data comes from Twitter, so the data is analyzed by text mining through the preprocessing stage. The SVM classification of unbalanced data between positive and negative classes resulted in 95.06% accuracy with a negative class precision value of 84% and better than 72% recall, in the positive class the precision value was 96% less than 2% of recall 98%. While the SVM classification with the oversampling method gets 100% accuracy, precision, and recall. The results of positive sentiments are known that the public will always support and want freedom for Rizieq, for negative sentiments it is known that many people are disappointed with Rizieq regarding the lies of his swab test results.

2021 ◽  
Vol 11 (1) ◽  
Rafael Mamede ◽  
Florbela Pereira ◽  
João Aires-de-Sousa

AbstractMachine learning (ML) algorithms were explored for the classification of the UV–Vis absorption spectrum of organic molecules based on molecular descriptors and fingerprints generated from 2D chemical structures. Training and test data (~ 75 k molecules and associated UV–Vis data) were assembled from a database with lists of experimental absorption maxima. They were labeled with positive class (related to photoreactive potential) if an absorption maximum is reported in the range between 290 and 700 nm (UV/Vis) with molar extinction coefficient (MEC) above 1000 Lmol−1 cm−1, and as negative if no such a peak is in the list. Random forests were selected among several algorithms. The models were validated with two external test sets comprising 998 organic molecules, obtaining a global accuracy up to 0.89, sensitivity of 0.90 and specificity of 0.88. The ML output (UV–Vis spectrum class) was explored as a predictor of the 3T3 NRU phototoxicity in vitro assay for a set of 43 molecules. Comparable results were observed with the classification directly based on experimental UV–Vis data in the same format.

2021 ◽  
Vol 11 (21) ◽  
pp. 10268
Parag Verma ◽  
Ankur Dumka ◽  
Rajesh Singh ◽  
Alaknanda Ashok ◽  
Anita Gehlot ◽  

The Internet of Things (IoT) has gained significant importance due to its applicability in diverse environments. Another reason for the influence of the IoT is its use of a flexible and scalable framework. The extensive and diversified use of the IoT in the past few years has attracted cyber-criminals. They exploit the vulnerabilities of the open-source IoT framework due to the absentia of robust and standard security protocols, hence discouraging existing and potential stakeholders. The authors propose a binary classifier approach developed from a machine learning ensemble method to filter and dump malicious traffic to prevent malicious actors from accessing the IoT network and its peripherals. The gradient boosting machine (GBM) ensemble approach is used to train the binary classifier using pre-processed recorded data packets to detect the anomaly and prevent the IoT networks from zero-day attacks. The positive class performance metrics of the model resulted in an accuracy of 98.27%, a precision of 96.40%, and a recall of 95.70%. The simulation results prove the effectiveness of the proposed model against cyber threats, thus making it suitable for critical applications for the IoT.

2021 ◽  
Vol 4 ◽  
Florian Beck ◽  
Johannes Fürnkranz

Inductive rule learning is arguably among the most traditional paradigms in machine learning. Although we have seen considerable progress over the years in learning rule-based theories, all state-of-the-art learners still learn descriptions that directly relate the input features to the target concept. In the simplest case, concept learning, this is a disjunctive normal form (DNF) description of the positive class. While it is clear that this is sufficient from a logical point of view because every logical expression can be reduced to an equivalent DNF expression, it could nevertheless be the case that more structured representations, which form deep theories by forming intermediate concepts, could be easier to learn, in very much the same way as deep neural networks are able to outperform shallow networks, even though the latter are also universal function approximators. However, there are several non-trivial obstacles that need to be overcome before a sufficiently powerful deep rule learning algorithm could be developed and be compared to the state-of-the-art in inductive rule learning. In this paper, we therefore take a different approach: we empirically compare deep and shallow rule sets that have been optimized with a uniform general mini-batch based optimization algorithm. In our experiments on both artificial and real-world benchmark data, deep rule networks outperformed their shallow counterparts, which we take as an indication that it is worth-while to devote more efforts to learning deep rule structures from data.

Geosphere ◽  
2021 ◽  
Giovanny Jiménez ◽  
Helbert García-Delgado ◽  
John W. Geissman

We report paleomagnetic results from the Jurassic to Lower Cretaceous continental sedimentary succession exposed in the eastern limb of the Los Yariguíes anticlinorium, Eastern Cordillera, Colombia. About 820 m of a strati­graphic section of the upper part of the Girón Group (Angostura del Río Lebrija and Los Santos Formations) was sampled to construct a magnetic polarity stratigraphy. A total of 199 independent samples that yield interpretable and acceptable data have a characteristic remanent magnetization component (ChRM) isolated between 400 °C and 680 °C in progressive thermal demagneti­zation. Demagnetization behavior and rock magnetic properties are interpreted to indicate that hematite is the principal magnetization carrier with a possible contribution by magnetite in some parts of the section. After tilt correction, 123 samples are of normal polarity (declination [D] = 44.9°, inclination [I] = +9.7°, R = 110.64, k = 9.87, and α95 = 4.3°), and the other 76 accepted samples are of reverse polarity (D = 216.4°, I = −6.1°, R = 68.29, k = 9.72, and α95 = 5.5°). The sta­tistical reversal test conducted on virtual geomagnetic poles is positive (class B). Based on paleontologic age estimates for the Cumbre and Rosablanca Formations, we assume a Berriasian age for the Los Santos Formation. The magnetostratigraphic data from the Girón Group strata are interpreted to suggest an age for the sampled part of the section between early Kimmerid­gian and early Valanginian (ca. 157–139 Ma). The age of the Angostura del Río Lebrija Formation is estimated as between early Kimmeridgian and early Tithonian (ca. 157–146.5 Ma). The age of the Los Santos Formation is esti­mated between early Tithonian and early Valanginian (146.5–139.3 Ma). With our proposed, but nonunique, correlation with the Geomagnetic Polarity Time Scale, the Jurassic-Cretaceous boundary is interpreted to be located within the Los Santos Formation. The Girón Group is characterized by two periods of high (>8 cm/k.y.) and two periods of low (< 2 cm/k.y.) sedimentation rates. An inferred clockwise rotation of ~44°, based on paleomagnetic declination data from the Girón Group, is similar to rotation estimates reported in some previous studies in the general area, and this facet of deformation could be related to local and regional response to displacement along regional-scale strike-slip faults.

Maciej A. Wujec

The deep neural network - BERT model (Bidirectional Encoder Representations from Transformers) and the stocks cumulative abnormal return is used in this article to analyze the sentiment of financial texts. The proposed approach, unlike those used so far, does not require the creation of dictionaries, takes into account the broad context of words and their meaning in financial texts, eliminates the problem of ambiguity of words in various contexts, does not require manual labelling of data and is free from the subjective assessment of the researcher. The sentiment of financial texts in the meaning presented in this paper is directly related to the market reaction to the information contained in these texts. For texts belonging to one of the two classes (positive or negative) with the highest probability the BERT model gives the results of predictions with a precision level of 62.38% for the positive class and 55% for the negative class. The results at this level can be used in event study, market efficiency research, investment strategy development or support of investment analysts using fundamental analysis.

2021 ◽  
Vol 4 ◽  
pp. 76-82
Wilma Latuny ◽  
Victor O. Lawalata ◽  
Daniel B. Paillin ◽  
Rahman Ohoirenan

UD Sinar Baru has eucalyptus oil products with various sizes from 30 ml to 550 ml, and the size of 550 ml is the most consumed eucalyptus oil product. However, this product has been criticized by consumers for its packaging which has not met their expectations. This study aims to obtain an accurate method of classifying consumer sentiment and obtain features that affect the redesign of the 550 ml eucalyptus oil product packaging. Collecting data using an online survey method from social media Facebook to get consumer comments using power queries. Data analysis uses the concept of the Support Vector Machine (SVM) method with the support of the WEKA application to provide sentiment analysis and accuracy of consumer comments. The results of the study present the tendency of comments on each attribute with an assessment of 83% accuracy for the entire class, 3% for positive class comments, and 57% comments for negative class. The sentiment that shows the packaging tends to be normal at 20% which is interpreted as neutral. The conclusion from the results of this study is that SMO has a very accurate prediction rate to analyze consumer sentiment about the features of the 550 ml eucalyptus oil packaging, and it is necessary to redesign the current packaging by considering the features of shape, color, size, and efficiency.

2021 ◽  
Vol 11 (1) ◽  
Asmaa Ali ◽  
Mona Hasan ◽  
Shaimaa Hamed ◽  
Amir Elhamy

Abstract Background Around 25% of the world population was affected by the metabolic-related fatty liver disorder. Hepatic steatosis is frequently observed in conjunction with hypertension, obesity comorbidities, and diabetes. We evaluate the hepatic steatosis frequency found in chest CT exams of COVID-19-positive cases compared to non-infected controls and evaluate the related increased prevalence and severity of COVID. Results Our research includes 355 subjects, 158 with positive PCR for COVID-19 (case group) and 197 with negative PCR and negative CT chest (control group). The mean age in the positive group was 50.6 ± 16 years, and in the control, it was 41.3 ± 16 years (p < 0.001). Our study consists of 321 men (90.5%) and 34 women (9.5%). The number of males in both cases and control groups was greater. In the case group, 93% men vs. 6.9% women, while in controls, 88.3% men vs.11.6% women, p < 0.001. CT revealed normal results in 55.5% of individuals (i.e., CORADs 1) and abnormal findings in 45.5% of participants (i.e., CORADs 2–5). In abnormal scan, CO-RADs 2 was 13.92%, while CO-RADs 3–4 were 20.89% of cases. CO-RADs 5 comprised 65.19% of all cases. Approximately 42.6% of cases had severe disease (CT score ≥ 20), all of them were CO-RADs 5. The PCR-positive class had a greater prevalence of hepatic steatosis than controls (28.5% vs.12.2%, p < 0.001). CO-RADs 2 represented 11.1%, CO-RADs 3–4 represented 15.6%, and CO-RADs 5 represented 73.3% in the hepatic steatosis cases. The mean hepatic attenuation value in the case group was 46.79 ± 12.68 and in the control group 53.34 ± 10.28 (p < 0.001). When comparing patients with a higher severity score (CT score ≥ 20) to those with non-severe pneumonia, it was discovered that hepatic steatosis is more prevalent (73.2% vs. 26.8%). Conclusions Steatosis was shown to be substantially more prevalent in COVID-19-positive individuals. There is a relation among metabolic syndrome, steatosis of the liver, and obesity, as well as the COVID-19 severity.

Sign in / Sign up

Export Citation Format

Share Document