Project

In the project work, your task will be to

The projects can be carried out individually or in groups of 2–4 students.

Preliminary project description

Please submit a short text that describes what you intend to work on. For suggestions of topics, see below.

The text should include a project title and about 1 page giving a rough outline of how you intend to work. If you have found any related work or relevant dataset, this is useful information to include. The deadline for this preliminary description is 2024-05-10; we will give you comments on your description before the second meeting (16–17 May).

The report

The text should be structured as a typical technical report, including an abstract, statement of problem, method description, experiments and results, and a conclusion. The deadline for submitting the report is 2024-06-14. If you do not manage to submit by that date, there is a second chance on 2024-08-31.

Finding a topic

You can work on any project that is small and manageable enough for a short project of this kind. Ideally, try to find a topic that is relevant to your own research interests. It’s OK if you end up with preliminary results, as long as you have some tidbit to present at the end.

If you need help deciding on a topic, here are a few suggestions. Some of these are very easy and some more challenging, and this list is not sorted by the difficulty level.

  • Try out a benchmark. There are some popular benchmark sets such as GLUE and the more recently released Super GLUE that are used to evaluate language understanding systems. Build a model and evaluate it on one of these benchmarks.
  • Crosslingual applications. For some application, e.g. some categorization or tagging task we’ve seen in the course, investigate how well crosslingual representations such as multilingual BERT allows us to train with one language and evaluate with other languages.
  • Shared tasks. Every year, several NLP competitions are arranged at the SemEval conference. Get the data from some task from previous years. Additionally, there are shared tasks organized by CoNLL, BioNLP and other conferences, where datasets can often be accessed easily.

Alternatively, if you don’t have an idea and you don’t like the suggested topics, just talk to Marco or Richard.