Internship Project
Computer Sciences

Crosslingual Natural Language Processing with the Flair NLP Framework

Institution
Humboldt-Universität zu Berlin,
Department of Computer Science
Subject Area
Natural Language Processing (NLP)
Availability
Position filled
Project Supervisor(s)
Prof. Dr. Alan Akbik
Academic Level
  • Advanced undergraduate students (from third year) 
  • Master's students 
  • Ph.D. students 
Language
English required, German helpful
Further Information
Project Type
Academic Research
Project Content
The Flair NLP framework is developed by the Machine Learning Chair at Humboldt-Universität Berlin, together with the open source community. It is a deep learning-based framework for natural language processing (NLP) tasks such as text classification, sequence labeling, language modeling and embedding text. It is currently the state-of-the-art across many NLP tasks such as named entity recognition (NER) and frequently cited as one of the most popular NLP frameworks. The framework is based on PyTorch.

Our research at the Machine Learning Group is in the area of language modeling, sample-efficient learning (few-shot and zero-shot learning), multilinguality and never-ending learning. Most of our work is integrated into Flair to allow the research community to reproduce and build upon our research, and to enable projects to use our research in applications.

For this internship project, we focus on the issue of multilinguality by researching and developing new NLP components that work across different languages without the need for language-specific training data. For instance, we might aim to develop a text analysis component for German-language data in a use case where we have no training data available. We investigate a range of machine learning methods to solve this problem.

Flair Framework: https://github.com/flairNLP/flair
Tasks for Interns
The project is about developing cross-lingual components (NLP methods that work across multiple human languages) in the Flair framework. The basis for this project are multilingual transformer architectures such as XLM-RoBERTa, as well as recent work in annotation projection across languages and potentially multi-task learning.
Academic Level
  • Advanced undergraduate students (from third year) 
  • Master's students 
  • Ph.D. students 
Requirements
  • very strong Python development skills required;
  • ideally experience in collaborating on open source projects;
  • ideally machine learning / deep learning knowledge;
  • ideally experience with deep learning frameworks such as PyTorch.
Expected Preparation
Completion of Flair tutorials and engagement with the Flair open source community. Also completion of online tutorials on PyTorch in case no PyTorch experience exists. In addition, read recent literature on transformers, in particular BERT and XLM-RoBERTa.
Back to Project List

For more information on the Humboldt Internship Program or the project, please contact the program coordinator.