UCREL NLP Summer School

Last week I attended the UCREL Summer School in corpus-based natural language processing (NLP). The summer school is taught by leading experts in the field both from Lancaster University and other institutions.

Here are a couple of my thoughts and take aways from the week.

Ethics

Cambridge Analytical has provided a perfect example of how not to use data ethically. It serves as an important reminder to always think about how you want to use the data before starting any analysis and keep your research questions constantly in your mind.

P-values

P-values have been a hot topic in statistics lately. It’s interesting to see this discussion start to move into other areas including NLP.

Reproducibility

Another topic close to my heart ❤️.

Quick tips

  • Document EVERYTHING! From how you scraped and cleaned the data, to creating that pretty plot.
  • Release all code and data with any papers you write. I mean if it isn’t on GitHub is it even research?
  • Before releasing a corpus publicly think carefully about possible ethical and legal issues.

Be an interdisciplinary hero

Get involved! There are so many applications areas which make interesting NLP projects. Just during this week, we have had talks from bio-sciences, accounting and finances, geography and the publishing industry to name a few. However, it’s important to work closely with the domain experts and utilise their understanding of the area.