Datasets | Division Recherche et Développement en Sciences de l'Information et Humanités Numériques

"AraCOVID19-MFH" annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset. It contains 10,828 Arabic tweets annotated with 10 different labels. The labels have been designed to consider some aspects relevant to the fact-checking task, such as the tweet’s check worthiness, positivity/negativity, and factuality. To confirm our annotated dataset’s practical utility, we used it to train and evaluate several classification models and reported the obtained results. Though the dataset is mainly designed for fake news detection, it can also be used for hate speech detection, opinion/news classification, dialect identification, and many other tasks.

Download link

license :

https://creativecommons.org/licenses/by-nc-sa/4.0/

cite the paper :

https://www.sciencedirect.com/science/article/pii/S1877050921012059

A lot of users in social media platforms often employ sarcasm to convey their intended meaning in a humorous, funny, and indirect way making it hard for computer-based applications to automatically understand and identify their goal and the harm level that they can inflect.Motivated by the emerging need for annotated datasets that tackle these kinds of problems in the context of COVID-19, we build releases AraCOVID19-SSD1 a manually annotated Arabic COVID-19 sarcasm and sentiment detection dataset containing 5,162 tweets. To confirm the practical utility of the built dataset, it has been carefully analyzed and tested using several classification models.

Division Recherche et Développement en Sciences de l'Information et Humanités Numériques

AraCOVID19-MFH

Arabic dataset for sentiment and sarcasm detection

Contactez nous