L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language

dc.contributor.authorMulki, Hala
dc.contributor.authorHaddad, Hatem
dc.contributor.authorAli, Chedi Bechikh
dc.contributor.authorAlshabani, Halima
dc.date.accessioned2025-01-21T16:33:07Z
dc.date.available2025-01-21T16:33:07Z
dc.date.issued2019
dc.departmentKırıkkale Üniversitesi
dc.description3rd Workshop on Abusive Language Online -- AUG 01, 2019 -- Florence, ITALY
dc.description.abstractHate speech and abusive language have become a common phenomenon on Arabic social media. Automatic hate speech and abusive detection systems can facilitate the prohibition of toxic textual contents. The complexity, informality and ambiguity of the Arabic dialects hindered the provision of the needed resources for Arabic abusive/hate speech detection research. In this paper, we introduce the first publicly-available Levantine Hate Speech and Abusive (L-HSAB) Twitter dataset with the objective to be a benchmark dataset for automatic detection of online Levantine toxic contents. We, further, provide a detailed review of the data collection steps and how we design the annotation guidelines such that a reliable dataset annotation is guaranteed. This has been later emphasized through the comprehensive evaluation of the annotations as the annotation agreement metrics of Cohen's Kappa (k) and Krippendorff's alpha (alpha) indicated the consistency of the annotations.
dc.description.sponsorshipUCLA,Google,Facebook,Element AI,Aylien
dc.identifier.endpage118
dc.identifier.isbn978-1-950737-43-7
dc.identifier.startpage111
dc.identifier.urihttps://hdl.handle.net/20.500.12587/23730
dc.identifier.wosWOS:000538480400012
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.language.isoen
dc.publisherAssoc Computational Linguistics-Acl
dc.relation.ispartofThird Workshop on Abusive Language Online
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_20241229
dc.titleL-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language
dc.typeConference Object

Dosyalar