info Overview
This theme drives DASCLAB's commitment to learning-by-doing. All courses include a substantial hands-on computing component, with a particular focus on developing language models for low-resource African languages.
flag Objectives
- Deliver hands-on NLP workshop for 5 African languages annually
- Develop and release datasets for Swahili, Kikuyu, Dholuo, and Luganda NLP
- Train language models competitive with multilingual baselines
- Integrate applied projects into all postgraduate coursework
settings Methods & Approaches
Transfer learning from multilingual base models (mBERT, XLM-R)
Crowd-sourced data collection and annotation pipelines
GPU training on KENET HPC cluster
Evaluation on sentiment analysis, NER, and machine translation
folder_open Related Projects
Swahili NLP Corpus Development
Completed
Crowd-sourced named entity recognition corpus annotated by 200+ volunteers.
calendar_today2023