Offensive Language Detection System

Studies have shown that cyberhate, online harassment, and use of offensive language in social media are on the rise. Use of offensive language, even online, creates an exclusive environment and can even foster real-world violence. Despite significant technological advances in Natural Language Processing (NLP), online offensive language detection remains one of the most challenging text classification tasks due to the ambiguity and informality of the language used in user-generated content, as well as the social context of the users. Automatic offensive language detection is further complicated in languages with diverse forms and limited resources, such as Arabic. Arabic is spoken widely in the Middle East, encompasses multiple dialects and cultures, and represents a multitude of nationalities. The wide-adoption and the heterogeneity of Arabic affects also the social context that allows the interpretation and automatic recognition of offensive language.

This project aims to develop novel methods and resources for automatic offensive language detection for Arabic in social media. It explores transfer learning approaches to tackle this challenging task in a resource-constrained language that is rich in dialectal and cultural variations.  

Social Media and Healthcare

Social Media has been transforming numerous activities of everyday life, impacting healthcare. However, few studies investigate the medical use of social media by medical practitioners and patients. To understand the behavior of medical practitioners and patients in social media toward healthcare and medical purposes, this project conducts user studies. Through an online survey, content analysis, and interviews, this project aims to harness the benefits that social media applications can have on patients’ lives and for healthcare practitioners as well. Detailed investigations into user behaviors have potential to create future solutions by emphasizing on medical purpose supportive features and reducing possible privacy risks.

Kuwaiti Dialect Linguistic Resources Development

This project will extend the state-of-the-art research in Arabic natural language processing field by providing the first Arabic dialectal dataset for the Kuwaiti dialect to researchers from the academic sectors. It will create a labelled dataset of 10,000 tweets that are covering multiple automatic text classification tasks and diverse themes. It will allow researchers to develop systems supporting the automation of analytical and linguistically features identification that are specific for the Kuwaiti dialect. This project will also support researchers working in Arabic natural language processing in general, as it will enrich the available Arabic linguistic resources. The main goal of this project is to expand the natural language processing’s capabilities for the Kuwaiti dialect and improve its analytical automation processing techniques