Ask your data
  • Blog
  • About & Contact

Must-watch movies for analytics lovers and data aficionados

9/26/2020

0 Comments

 
Bild

​Autumn is coming closer in many parts of the world. As days are getting darker and shorter, lots of  people like getting comfortable on their sofas to watch an interesting movie. If you are into data analytics, statistics and artificial intelligence, we have some recommendable picks for you.

Moneyball
​
​Released: 2011

Big names: Brad Pitt, Philip Seymour Hoffman

IMDB Rating: 76%
​

Plot in a nutshell: The movie is based on the book Moneyball: The Art of Winning an Unfair Game by Michael Lewis. Its main protagonist is Billy Beane who started as General Manager of the baseball club Oakland Athletics in 1997. Beane was confronted with the challenge of building a team with very limited financial resources and introduced predictive modelling and data-driven decision making to assess the performance and potential of players. Beane and his peers were successful and managed to reach the playoffs of the Major Leage Baseball several times in a row.
​
Trailer: 
​​Why you should watch this movie: Moneyball highlights the importance of communication skills and persistence for people aiming to drive change using data science.

​

The Imitation Game
​
​Released: 2014

Big names: Benedict Cumberbatch, Keira Knightley

IMDB Rating: 80%
​

Plot in a nutshell: The Imitation Game is based upon the real-life story of British mathematician Alan Turing who is known as the father of modern computer science and for the test named after him. The film is centered around Turing and his team of code-breakers working hard to decipher the Nazi German military encryption Enigma. To crack the code, Turing creates a primitive computer system that would consider permutations at a much faster speed than any human could. The code breakers at Bletchley Park succeeded and thereby not only helped Allied forces ensure victory over the Wehrmacht but contributed to shorten the horros of the Second World War.
​
Trailer: 
​​Why you should watch this movie: It is a (too) late tribute to Alan Turing. Turing was prosecuted for his homosexuality after WWII and eventually committed suicide. The film is also about the power of machines and ethical perspectives in analytics​

​
Margin Call 
​
​Released: 2011

Big names: Paul Bettany, Stanley Tucci, Demi Moore

IMDB Rating: 71%
​

Plot in a nutshell: Margin Call plays during the first days of the last global financial crisis in 2008. A junior analyst at a large Wall Street investment bank discovers a major flaw in the risk evaluation model of the bank. The story develops during the night as the young employee informs senior managers that the bank is close to a financial disaster, knowing that the bancruptcy of the firm would lead to a dramatic chain reaction in the market – and millions of lives would be affected.

​
Trailer: 
​​Why you should watch this movie: The film depicts to what extent algorithms dominate decision making in the financial industry. It also portrays the interplay between supposedly objective models and human beings driven by emotions and interests.​

​
21
​
​Released: 2008

Big names: Kate Bosworth, Laurence Fishburne

IMDB Rating: 68%
​

Plot in a nutshell: Six students of the renowned Massachusetts Institute of Technology (MIT) get trained in card counting and rip off Las Vergas casinos at various blackjack tables. The film is based upon a true story.

​
Trailer: 
​​Why you should watch this movie: It is an entertaining and fun movie. In addition to that, it contains some interesting mathematical concepts such as the Fibonacci Series and the Monty Hall Problem.​
​

We hope our tipps are valuable for you and you enjoy any of the flicks. 📺🎬 🍿☕🍷
Take care and all the beston behalf of joint systems
Johannes

0 Comments

Using Open Data for Fundraising

6/27/2019

0 Comments

 
Bild
In last month's blogpost, we referred to the impactful Economist article stating that data is the new oil. One interpretation of this metaphor is that data can be seen as "fuel" for today's economy. This also applies to the nonprofit sector. The reference to oil does, however, not necessarily mean "digging for data" implies high costs in terms of acquiring and collecting it. There is quite some open data around which might be useful in a fundraising context. We see a broad range of possible use cases when it comes to open data in a fundraising context:
  • Market Research
  • Input for Fundraising Strategy Development and Planning
  • Comparative Analyses and Benchmarking
  • Prospect Research
  • ...

We will rush trough two hands-on examples to illistrate how to obtain, process and visualize open data with possible value for fundraising decision makers and analysts.

Example 1: Visualizing income data on regional level

Geographical disparities can be of relevance in the context of certain fundraising practices such as events or contact to High Net Worth Individuals (HNWI). Some regions are "wealthier" than others in terms of the respective average income levels. This information allows some conclusions about overall fundraising potential in a respective area.

An aggregation level that we find useful and a good "common denominator" are the so-called
NUTS regions the European Union uses. NUTS sounds like an English acronym, however, it is a French abbreviation and for Nomenclature des unités territoriales statistiques, in other words regional stat units.

The European Union´s Statistics Office is called
Eurostat. They offer a huge database that can be accessed online and is free of charge in most cases. We download not only data on the regional distribution on the level of NUTS2 areas but also use the specific R package eurostat. NUTS2 areas are quite "intuitive" in our eyes. In Austria, for instance, they reflect the federal provinces whereas Germany with its 16 provinces is split into 38 regions. For reasons of completeness, we show you how we searched for respective income data in the code snippet below. Our code is by the way inspired by this recommendable tutorial on eurostat.
Code Snippet 1.1: Load libraries and search for data

    
The query above shows us 3 tables that contain income data. We decided to use the table with the index tgs00026, it contains data on disposable household income on regional level.
Code Snippet 1.2: Obtain both income and geospatial data

    
We now have two dataframes, one for the income data with a regional variable and one for the actual geospatial data. We merge the two and dive into the visualization immediately:
Code Snippet 1.3: Merge datasets into one and visualize data

    
As we already used a little French today, we are now able already to say voilà as the overview visualization we were striving for is finised and presentable:
Bild
Example 2: What about large companies and their CEOs?

The name Forbes might ring a bell if you think of listing the wealthiest people on the planet. Forbes also publishes data on the largest companies on an annual basis. We signed up at the platform data world - which we can also recommend - and obtained the 2018 dataset for the 2.000 largest corporations (here - signup is necessary). Luckily this data contains country information and also names the respective CEO - but step by step. As we are working with a flatfile, we did some data prep in it before loading it into R; we end up with a dataset that contains the following variables:
  • Rank
  • Name
  • Country
  • Profits
  • Position
  • HQ
  • Revenue
  • MarketValue
  • Assets
  • Ceo
  • Industry
Now we go ahead and load / filter the data for later visualization:
Code Snippet 2.1.: Read the file and select one country

    
So far, so good. We now have a condensed dataframe in R that contains the Dutch corporations that are listed in the Forbes 2000, i.e. 22 firms. We prepare a bubble chart with the company revenues and profits on the axes. The market value (mostly in shares) shall be reflected by the acutal size of the bubble. 
Code Snippet 2.2.: Draw the bubble chart

    
This is our result:
Bild
We see that the largest firm in Holland revenue-wise is Royal Dutch Shell with its CEO Bernardus Cornelis Margriet van Beurden - or just "Ben" van Beurden. Not only Shell´s revenue of some 320 billion US dollars is impressive but also its market value of some 300 billion that makes it the largest bubble in the plot. 
Bild
We can also spot a cluster of companies with high profit volumes in the lower right of the bubble chart. This cluster contains "big names" like Unilver but also contains firms that are not as widely known:
Bild
Even though it might not be so easy to meet some of the CEOs immediately, it might be worthwhile researching whether those big and powerful corporations have CSR departments, foundations etc. one can get in touch with.

This was it for this month´s post. We hope you stay "open" over the summer break not only towards data but also our blog :-).
Bild
0 Comments

The Datafication of Football

5/28/2017

0 Comments

 
Bild
I have to be honest at this time of the year. Now that most European football leagues are about to close as they have found their winners and Champions League final is close, I must admit that summer will be empty somehow as I am a football fan. Being realistic at the same time, it is hard to think too idealistically about football clubs, players and their fans. Football has become a billion-euro business. Renowned clubs nowadays have to think in terms of a globalized market. Regardless of that, I still have the impression that football is still frequently associated with concepts like intuition (of coaches and players) and talent (particularly in the case of players). Well, I have the impression that there is still a lot of it in the game. Both are definitely unmeasurable and will keep the sport as fascinating as it is. However, a revolution that has taken place over the last years that should not be overlooked. This revolution is about using data. 

The American author Michael Lewis wrote the book Moneyball that told the story of Billy Beane, the general manager of the mediocre baseball team Oakland A’s. The team started using rigorous statistical performance measures to evaluate players which gave them a competitive edge also against the richer clubs in the National Baseball League. For the ones interested in the story, the movie Moneyball from 2011 starring Brad Pitt can be recommended.

Data Science has arrived in modern football. Manchester City for example employs 11 data scientists according to a Guardian article from 2014. Liverpool FC introduced the position Director of Research some years ago, it is currently held by a theoretical physicist who earned his PhD at Cambridge university. Also, training has been “datafied” at many clubs such as TSG Hoffenheim who secured a fourth-place finish in German Bundesliga this season and will therefore play qualifying matches for the Champions League. The amount of data collected (often through wearable devices) and analysed in real-time is huge. It is some seven million data points that are derived from a 10-minute session with 10 players and three balls. The data is accessible in real time and often used in player feedback to help them work on their weaknesses. TSG Hoffenheim closely works with Software giant SAP which is not a really big surprise as the company is the club’s main sponsor.

What is surprising that data and football – particularly in Britain – have a rather long history together. It was in 1950 when Charles Reep, a veteran from World War II and trained accountant got so frustrated with one performance of his club Swindon Town that he started collecting data about the match in the second half. He concluded from the data that it would be reasonable to move the ball toward the goal with as few passes as possible to increase the chance to score more often and thereby win the match. Reeps ideas had strong influence in British football and coined what was long known as British “kick and rush” way to play the game. Today methods are more advanced and the amount of possible data to be analysed is immense. However, the underlying ideas and aims of number crunching in the context of football have not changed a lot. Take a look at this paper published in connection with this year’s Sports Analytics Conference hosted by the MIT Sloan School of Management. That´s counting a lot of passes from a lot of players in a highly advanced manner!

If you are a data scientist interested in football and want to play around a bit (maybe in the football-free-zone over summer), you might wish to get inspired by guys like Martin Eastwood and his talk Predicting football using r. Everything you need to re-do the example can be found in this Git Hub repository. Martin also has a blog on football analytics that is worth reading, it is called Pena.lt/y. If you want to dive deeper (what a wordplay in a football-related blog post 😊) there is for instance a whole Sport Analytics package in R which is a selection of data sets, functions to fetch sports data, examples, and demos. Not to mention the countless possibilities of web scraping in R towards online football databases.

So, let’s keep the ball rolling. I wish you a nice start into the summer.





0 Comments

Turning data into art

1/21/2017

0 Comments

 
Bild















​Artist R. Luke DuBois transforms data into artworks. In his talk DuBois shares nine very different projects from American presidents to Britney Spears. He also reflects critically on the way we use data in our culture. Just click the picture above to see the artist in action. By the way it is worthwhile to check out his website that contains other projects: lukedubois.com/​

0 Comments

Good News!

1/19/2017

0 Comments

 
Bild
Media confront us with bad news every day. In the face of political uncertainties and extremism, climate change, growing inequalities etc. it sometimes seems hard not to see the world as a rather hopeless place. However, we must not forget that - with regard to many aspects of human life such as health, education, life expectancy etc. - we actually live in the best world and time since the beginning of mankind. The good news is: There are tons of data that support this view of the world! There are some guys and pages I can really recommend in this context.

If you are on TED and watch talks there, then you might know Hans Rosling as an inspiring and humorous speaker. Hans is one of the founders of the Gapminder Foundation (www.gapminder.org/). The foundation's page offers great data visualization tools and contains a large number of datasets that can be downloaded for free.

Oxford economist Max Roser (www.maxroser.com/), by the way an alumnus of Innsbruck University like myself, is the mastermind behind the platform Our World in Data (https://ourworldindata.org/) which also offers a lot of material and functionalities that can be used to develop and sustain a fact-based world view.

0 Comments

    Categories

    Alle
    Artificial Intelligence
    Because It´s Fun!
    Churn
    Clustering
    Data Sciene @NPO
    Data Visualization
    Facebook
    Machine Learning
    Maps
    Natural Language Processing
    Neural Nets
    Power BI
    Predictive Analytics
    Social Media
    Time Series
    Twitter

    Archive

    Januar 2021
    November 2020
    September 2020
    August 2020
    Mai 2020
    April 2020
    Februar 2020
    Dezember 2019
    November 2019
    September 2019
    Juni 2019
    April 2019
    März 2019
    Januar 2019
    Dezember 2018
    Oktober 2018
    August 2018
    Juni 2018
    Mai 2018
    März 2018
    Februar 2018
    Dezember 2017
    November 2017
    Oktober 2017
    September 2017
    August 2017
    Juli 2017
    Mai 2017
    April 2017
    März 2017
    Februar 2017
    Januar 2017

About

Copyright © 2018
  • Blog
  • About & Contact