L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language

Hate speech and abusive language have become a common phenomenon on Arabic social media. Automatic hate speech and abusive detection systems can facilitate the prohibition of toxic textual contents. The complexity, informality and ambiguity of the Arabic dialects hindered the provision of the needed resources for Arabic abusive/hate speech detection research. In this paper, we introduce the first publicly-available Levantine Hate Speech and Abusive (L-HSAB) Twitter dataset with the objective to be a benchmark dataset for automatic detection of online Levantine toxic contents. We, further, provide a detailed review of the data collection steps and how we design the annotation guidelines such that a reliable dataset annotation is guaranteed. This has been later emphasized through the comprehensive evaluation of the annotations as the annotation agreement metrics of Cohen's Kappa (k) and Krippendorff's alpha (alpha) indicated the consistency of the annotations.

Açıklama

3rd Workshop on Abusive Language Online -- AUG 01, 2019 -- Florence, ITALY

Kaynak

Third Workshop on Abusive Language Online

WoS Q Değeri

N/A

Bağlantı

https://hdl.handle.net/20.500.12587/23730

Koleksiyon

WOS İndeksli Yayınlar Koleksiyonu
Bildiri ve Sunum Koleksiyonu

Detaylı Öğe Kaydı

L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon