In last month's blogpost, we referred to the impactful Economist article stating that data is the new oil. One interpretation of this metaphor is that data can be seen as "fuel" for today's economy. This also applies to the nonprofit sector. The reference to oil does, however, not necessarily mean "digging for data" implies high costs in terms of acquiring and collecting it. There is quite some open data around which might be useful in a fundraising context. We see a broad range of possible use cases when it comes to open data in a fundraising context:
We will rush trough two hands-on examples to illistrate how to obtain, process and visualize open data with possible value for fundraising decision makers and analysts.
Example 1: Visualizing income data on regional level
Geographical disparities can be of relevance in the context of certain fundraising practices such as events or contact to High Net Worth Individuals (HNWI). Some regions are "wealthier" than others in terms of the respective average income levels. This information allows some conclusions about overall fundraising potential in a respective area.
An aggregation level that we find useful and a good "common denominator" are the so-called NUTS regions the European Union uses. NUTS sounds like an English acronym, however, it is a French abbreviation and for Nomenclature des unités territoriales statistiques, in other words regional stat units.
The European Union´s Statistics Office is called Eurostat. They offer a huge database that can be accessed online and is free of charge in most cases. We download not only data on the regional distribution on the level of NUTS2 areas but also use the specific R package eurostat. NUTS2 areas are quite "intuitive" in our eyes. In Austria, for instance, they reflect the federal provinces whereas Germany with its 16 provinces is split into 38 regions. For reasons of completeness, we show you how we searched for respective income data in the code snippet below. Our code is by the way inspired by this recommendable tutorial on eurostat.
Code Snippet 1.1: Load libraries and search for data
The query above shows us 3 tables that contain income data. We decided to use the table with the index tgs00026, it contains data on disposable household income on regional level.
Code Snippet 1.2: Obtain both income and geospatial data
We now have two dataframes, one for the income data with a regional variable and one for the actual geospatial data. We merge the two and dive into the visualization immediately:
Code Snippet 1.3: Merge datasets into one and visualize data
As we already used a little French today, we are now able already to say voilà as the overview visualization we were striving for is finised and presentable:
Example 2: What about large companies and their CEOs?
The name Forbes might ring a bell if you think of listing the wealthiest people on the planet. Forbes also publishes data on the largest companies on an annual basis. We signed up at the platform data world - which we can also recommend - and obtained the 2018 dataset for the 2.000 largest corporations (here - signup is necessary). Luckily this data contains country information and also names the respective CEO - but step by step. As we are working with a flatfile, we did some data prep in it before loading it into R; we end up with a dataset that contains the following variables:
Code Snippet 2.1.: Read the file and select one country
So far, so good. We now have a condensed dataframe in R that contains the Dutch corporations that are listed in the Forbes 2000, i.e. 22 firms. We prepare a bubble chart with the company revenues and profits on the axes. The market value (mostly in shares) shall be reflected by the acutal size of the bubble.
Code Snippet 2.2.: Draw the bubble chart
This is our result:
We can also spot a cluster of companies with high profit volumes in the lower right of the bubble chart. This cluster contains "big names" like Unilver but also contains firms that are not as widely known:
Even though it might not be so easy to meet some of the CEOs immediately, it might be worthwhile researching whether those big and powerful corporations have CSR departments, foundations etc. one can get in touch with.
This was it for this month´s post. We hope you stay "open" over the summer break not only towards data but also our blog :-).
I have to be honest at this time of the year. Now that most European football leagues are about to close as they have found their winners and Champions League final is close, I must admit that summer will be empty somehow as I am a football fan. Being realistic at the same time, it is hard to think too idealistically about football clubs, players and their fans. Football has become a billion-euro business. Renowned clubs nowadays have to think in terms of a globalized market. Regardless of that, I still have the impression that football is still frequently associated with concepts like intuition (of coaches and players) and talent (particularly in the case of players). Well, I have the impression that there is still a lot of it in the game. Both are definitely unmeasurable and will keep the sport as fascinating as it is. However, a revolution that has taken place over the last years that should not be overlooked. This revolution is about using data.
The American author Michael Lewis wrote the book Moneyball that told the story of Billy Beane, the general manager of the mediocre baseball team Oakland A’s. The team started using rigorous statistical performance measures to evaluate players which gave them a competitive edge also against the richer clubs in the National Baseball League. For the ones interested in the story, the movie Moneyball from 2011 starring Brad Pitt can be recommended.
Data Science has arrived in modern football. Manchester City for example employs 11 data scientists according to a Guardian article from 2014. Liverpool FC introduced the position Director of Research some years ago, it is currently held by a theoretical physicist who earned his PhD at Cambridge university. Also, training has been “datafied” at many clubs such as TSG Hoffenheim who secured a fourth-place finish in German Bundesliga this season and will therefore play qualifying matches for the Champions League. The amount of data collected (often through wearable devices) and analysed in real-time is huge. It is some seven million data points that are derived from a 10-minute session with 10 players and three balls. The data is accessible in real time and often used in player feedback to help them work on their weaknesses. TSG Hoffenheim closely works with Software giant SAP which is not a really big surprise as the company is the club’s main sponsor.
What is surprising that data and football – particularly in Britain – have a rather long history together. It was in 1950 when Charles Reep, a veteran from World War II and trained accountant got so frustrated with one performance of his club Swindon Town that he started collecting data about the match in the second half. He concluded from the data that it would be reasonable to move the ball toward the goal with as few passes as possible to increase the chance to score more often and thereby win the match. Reeps ideas had strong influence in British football and coined what was long known as British “kick and rush” way to play the game. Today methods are more advanced and the amount of possible data to be analysed is immense. However, the underlying ideas and aims of number crunching in the context of football have not changed a lot. Take a look at this paper published in connection with this year’s Sports Analytics Conference hosted by the MIT Sloan School of Management. That´s counting a lot of passes from a lot of players in a highly advanced manner!
If you are a data scientist interested in football and want to play around a bit (maybe in the football-free-zone over summer), you might wish to get inspired by guys like Martin Eastwood and his talk Predicting football using r. Everything you need to re-do the example can be found in this Git Hub repository. Martin also has a blog on football analytics that is worth reading, it is called Pena.lt/y. If you want to dive deeper (what a wordplay in a football-related blog post 😊) there is for instance a whole Sport Analytics package in R which is a selection of data sets, functions to fetch sports data, examples, and demos. Not to mention the countless possibilities of web scraping in R towards online football databases.
So, let’s keep the ball rolling. I wish you a nice start into the summer.
Artist R. Luke DuBois transforms data into artworks. In his talk DuBois shares nine very different projects from American presidents to Britney Spears. He also reflects critically on the way we use data in our culture. Just click the picture above to see the artist in action. By the way it is worthwhile to check out his website that contains other projects: lukedubois.com/
Media confront us with bad news every day. In the face of political uncertainties and extremism, climate change, growing inequalities etc. it sometimes seems hard not to see the world as a rather hopeless place. However, we must not forget that - with regard to many aspects of human life such as health, education, life expectancy etc. - we actually live in the best world and time since the beginning of mankind. The good news is: There are tons of data that support this view of the world! There are some guys and pages I can really recommend in this context.
If you are on TED and watch talks there, then you might know Hans Rosling as an inspiring and humorous speaker. Hans is one of the founders of the Gapminder Foundation (www.gapminder.org/). The foundation's page offers great data visualization tools and contains a large number of datasets that can be downloaded for free.
Oxford economist Max Roser (www.maxroser.com/), by the way an alumnus of Innsbruck University like myself, is the mastermind behind the platform Our World in Data (https://ourworldindata.org/) which also offers a lot of material and functionalities that can be used to develop and sustain a fact-based world view.