arrow_back Research Areas
translate
Thematic Research Area

Hands-On Training

Applied language models for African language NLP

info Overview

This theme drives DASCLAB's commitment to learning-by-doing. All courses include a substantial hands-on computing component, with a particular focus on developing language models for low-resource African languages.

flag Objectives

  • Deliver hands-on NLP workshop for 5 African languages annually
  • Develop and release datasets for Swahili, Kikuyu, Dholuo, and Luganda NLP
  • Train language models competitive with multilingual baselines
  • Integrate applied projects into all postgraduate coursework

settings Methods & Approaches

check_circle Transfer learning from multilingual base models (mBERT, XLM-R)
check_circle Crowd-sourced data collection and annotation pipelines
check_circle GPU training on KENET HPC cluster
check_circle Evaluation on sentiment analysis, NER, and machine translation

folder_open Related Projects

Swahili NLP Corpus Development
Completed

Crowd-sourced named entity recognition corpus annotated by 200+ volunteers.

calendar_today2023