Tidy Tuesday

When people ask me “How do I learn R”, I always point them towards the excellent R for Data Science book. It’s freely available and I love the order the chapters, starting with visualisation and tidy data before delving into the details of programming with R.

Recently a community has developed around this book, when I act as a mentor. One of the regular community events is Tidy Tuesday, a weekly challenge to take a dataset and create a visualisation.

I had a go at the week 4 submission on Global Mortality data.

I decided to attempt to make some sparklines graphics. I used code from Dr Lukasz Piwek and his Tufte in R project. It’s a great project and I hope I get chance to play with some of the other plots in the future.

I started off considering death by Cardiovascular diseases.

I wrote some code which identified the 5 countries with the largest increase in share of death by cardiovascular diseases, and also the 5 countries with the largest decrease. These 10 countries were then plotted as a sparkline, with the minimum and maximum values highlighted.

I then wrapped this code as a function, and generated plots for all possible causes of death to look for interesting findings. Full code is available on Github.

As a quick plot, I’m fairly happy with this. However, as always there are many ways to improve.

  • Looping over all possible causes of death generates some weird plots, especially when the percentages are small.
  • The Sparkline plots should probably be more information rich and have less white space to make Tufte happy.
  • My function to create the sparkline plots should probably be chunked into smaller functions

Let me know what you think on Twitter. Suggestions/pull requests welcome!