## If you are interested in the art of |

Week |
Day |
Type |
Content & Link |

1 |
Monday |
📝 Blog |
Daydreaming Numbers: bit.ly/2S928qS |

1 |
Tuesday |
🎬 Video |
TED Talk by David McCandless: bit.ly/2HFgBZy |

1 |
Wednesday |
📝 Blog |
Visual Capitalist: bit.ly/3kU0gON |

1 |
Thursday |
📰 Article |
HBR: A Data Scientist´s real job: bit.ly/3jcEbL6 |

1 |
Friday |
🎬 Video |
TED Talk by Hans Rosling: bit.ly/30fz19v |

2 |
Monday |
📰 Article |
Narrative Visualization: stanford.io/2HFEK27 |

2 |
Tuesday |
📝 Blog |
Make it conversationsl: bit.ly/3cIsuJx |

2 |
Wednesday |
🎬 Video |
TED Talk by Tommy McCall: bit.ly/3cC0tn5 |

2 |
Thursday |
🎨 Gallery |
Collection of infographics: bit.ly/3kZt4p5 |

2 |
Friday |
💻 PPT |
Berkeley: Data Journalism: bit.ly/30j7Z1g |

3 |
Monday |
🎬 Video |
Data Storytelling: bit.ly/33arivv |

3 |
Tuesday |
📝 Blog |
Impact on Audience: bit.ly/338dIbQ |

3 |
Wednesday |
🎨 Gallery |
Juice Analytics Gallery: bit.ly/2G6nL8I |

3 |
Thursday |
📕 Book |
Data Journalism Handbook: bit.ly/2S94Hcd |

3 |
Friday |
🎬 Video |
TED Talk by Aaron Koblin: bit.ly/2EFZWDY |

4 |
Monday |
🎨 Gallery |
DataViz Catalogue: bit.ly/34mdy0b |

4 |
Tuesday |
📝 Blog |
Data Visualization Checklist: bit.ly/3cQ0d45 |

4 |
Wednesday |
🎬 Video |
TED Talk by Chris Jordan: bit.ly/3kTaaQT |

4 |
Thursday |
📝 Blog |
Toolbox for Data Storytelling: bit.ly/3mZrd5H |

4 |
Friday |
🎬 Video |
Storytelling with Data: bit.ly/3jd2W9Q |

**Moneyball**

**Released:**2011

**Big names:**Brad Pitt, Philip Seymour Hoffman

**IMDB Rating:**76%

**Plot in a nutshell:**The movie is based on the book

*Moneyball: The Art of Winning an Unfair Game*by Michael Lewis. Its main protagonist is Billy Beane who started as General Manager of the baseball club Oakland Athletics in 1997. Beane was confronted with the challenge of building a team with very limited financial resources and introduced predictive modelling and data-driven decision making to assess the performance and potential of players. Beane and his peers were successful and managed to reach the playoffs of the Major Leage Baseball several times in a row.

**Trailer:**

**Why you should watch this movie:**

*Moneyball*highlights the importance of communication skills and persistence for people aiming to drive change using data science.

__The Imitation Game__

**Released:**2014

**Big names:**Benedict Cumberbatch, Keira Knightley

**IMDB Rating: 80%**

**Plot in a nutshell:**

*The Imitation Game is based*upon the real-life story of British mathematician Alan Turing who is known as the father of modern computer science and for the test named after him. The film is centered around Turing and his team of code-breakers working hard to decipher the Nazi German military encryption Enigma. To crack the code, Turing creates a primitive computer system that would consider permutations at a much faster speed than any human could. The code breakers at Bletchley Park succeeded and thereby not only helped Allied forces ensure victory over the

*Wehrmacht*but contributed to shorten the horros of the Second World War.

**Trailer:**

**Why you should watch this movie:**It is a (too) late tribute to Alan Turing. Turing was prosecuted for his homosexuality after WWII and eventually committed suicide. The film is also about the power of machines and ethical perspectives in analytics

__Margin Call__

**Released:**2011

**Big names:**Paul Bettany, Stanley Tucci, Demi Moore

**IMDB Rating: 71%**

**Plot in a nutshell:**

*Margin Call*plays during the first days of the last global financial crisis in 2008. A junior analyst at a large Wall Street investment bank discovers a major flaw in the risk evaluation model of the bank. The story develops during the night as the young employee informs senior managers that the bank is close to a financial disaster, knowing that the bancruptcy of the firm would lead to a dramatic chain reaction in the market – and millions of lives would be affected.

**Trailer:**

**Why you should watch this movie:**The film depicts to what extent algorithms dominate decision making in the financial industry. It also portrays the interplay between supposedly objective models and human beings driven by emotions and interests.

__21__

**Released:**2008

**Big names:**Kate Bosworth, Laurence Fishburne

**IMDB Rating: 68%**

**Plot in a nutshell:**Six students of the renowned Massachusetts Institute of Technology (MIT) get trained in card counting and rip off Las Vergas casinos at various blackjack tables. The film is based upon a true story.

**Trailer:**

**Why you should watch this movie:**It is an entertaining and fun movie. In addition to that, it contains some interesting mathematical concepts such as the Fibonacci Series and the Monty Hall Problem.

## We hope our tipps are valuable for you and you enjoy any of the flicks. 📺🎬 🍿☕🍷

**Take care and all the beston behalf of joint systems**

Johannes

## Particularly in uncertain times like these, organizations strive to predict the future in the best possible way. Previously we have already explored multiple times **h****ow** to forecast future income using the past income trajectory, for instance in these blog posts. We now want to go a step further and investigate the **relationship between fundraising income and the general economic climate**, exploring whether or not it is possible to infer extra information from and improve income forecasting tools by using economic indicators.

**ow**to forecast future income using the past income trajectory

**Introduction**

**amount of sporadic donations**to a charitable non-profit organization from the

**period of 2000 to 2019**and t

**hree economic data sets from the country the NPO is based in**: the

**national unemployment rate**, the

**national stock market index**and an

**index of economic activity in retail**, authored by the national bureau of statistics. We chose these data sets for their ability to paint a picture of the general economic climate, their relatively easy accessibility down to the monthly level and the fact that to the extent that they exhibit seasonality, they do so in the same yearly rhythm as the amount of sporadic donations, simplifying the statistical analyses. The statistical analyses and models employed in this blog post were all implemented in

**Python**, taking heavy use of the

**library.**

*statsmodels***time series**, which is a set of data points collected over a discrete, ordered time. We have already looked at time series in detail in a past entry on this blog. An important property of time series is that of

**stationarity**, which we discussed here. As in that post, we use the

**Dickey-Fuller test**to investigate whether or not our time series are stationary, which can be done in python like this:

**Dickey-Fuller test**

**differencing**the time series – this means subtracting the previous value from the current – can help in making the season stationary. However, due to our time series exhibiting strong yearly trends, in our case it makes sense also to take the time series’

**seasonal differences**, subtracting the values from the year before.

**Granger causality**

**Pearson’s Chi-Squared test**, we examine the (made stationary) series’

**Granger causality**, which tells us whether data from one series can help in forecasting the future trajectory of another series. The test is applied pairwise on two different time series, with the null hypothesis being that the second time series does not Granger cause the first. The following function returns a data frame that shows the p-values of the tests investigating whether the column variable granger causes the row variable. Especially the first row – indicating (no) Granger causality between the economic data sets on the one hand and the amount of sporadic donations on the other – is interesting to us. At a significance level of 0.05, we keep the null hypothesis of no Granger causality between the economic indicators and the sporadic donations for all three time series.

**Granger causality**

**Cointegration**

**cointegration**. We consider a set of time series

**cointegrated**if there exists a linear combination of the time series that is stationary. Cointegrated time series share a common, long-term equilibrium and we can use them to predict each other’s future trajectory using a process called

**Vector autoregression (VAR)**. A common test for cointegration is the

**Johansen cointegration test**. In the following, we define a function that returns a dataframe with the test statistic and the critical values of the Johansen test, leading to these results:

**Johansen cointegration test**

## Vector autoregression (VAR)

**VAR-model**. If we ignore our test results for a moment and do so anyway, we can immediately see that the model falls catastrophically short. The black line in the below graphs show the actual time series of sporadic donations. Having used data until 2016 as our training set, we constructed with the Python library

*statsmodels*a

**VAR-model**that we can use to forecast the sporadic donations from 2017 to 2019 – the red line in the graph – using the actual data for these years for evaluation. As we can see, the model is not able to forecast the amount of sporadic donations very well, capturing the seasonality, but failing to accurately predict the trend and overall trajectory.

**ARIMA and ARIMAX**

**ARIMAX**-model, a generalisation of the

**ARIMA**-model, which we have used previously in this blog, that also takes into account data from external variables – in our case, the economic time series.

**ARIMA**models are composed of three parts:

**AR**for autoregression, indicating a regression on the time series’ past values,

**I**for integration, signifying differencing terms in the case of a non-stationary time series, and

**MA**for the moving-average-model, a regression on past values white noise terms.

**ARIMAX**takes all three of those terms and adds data from external variables – a different time series – to better forecast the time series at hand. Both

**ARIMA**and

**ARIMAX**are implemented in python as part of the

*statsmodels*library, while the

*pmdarima*library comes with an autoarima function modelled on R’s autoarima function, allowing for a quick search through the possible parameters of the

**ARIMA(X)**model.

We have used all four time series to construct an

**ARIMAX**-model, using the economic data to help forecast the amount of sporadic donations. Again we used the data until 2016 as our training set, with the data from 2017 to 2020 as a test set to evaluate results. We have also used a standard

**ARIMA**-model to construct a forecast for sporadic donations only on the time series’ historical data. Interestingly, the models’ projected forecasts did not differ much from each other:

**It seems thus that the economic data we have provided – the unemployment rate, the stock index and a retail index – did not add much extra information to better forecast the amount of sporadic donations**. Upon closer inspection the

**ARIMA**-model, relying solely on historical data from the time series itself, even performed slightly better. In light of the previous test results – the lack of Granger causality and cointegration – this points to the fact that those economic indicators have little measurable effect on the development of sporadic donations, and thus cannot be used to improve forecasting models for the amount of sporadic donations in the future.

**However, we believe that the approach of using time series analysis for fundraising income predictions hand in hand with open (economic) data deserves further focus as other base data, time frames, data sources etc. might lead to different results!**

**COVID-19 impacts on fundraising - the case of Face-to-Face**

The ongoing crisis caused by the Corona pandemic has brought huge challenges for many people all over the globe and dislocation in all types of industries. The evident impacts and maybe the ones yet to come imply serious threats for numerous fundraising nonprofit organizations. The pandemic has significantly affected the conditions under which widespread fundraising channels can be used. Considering lockdowns all over the world leading to drastically reduced mobility, Corona has most obviously affected

**Face-to-Face fundraising (F2F)**. Since the introduction of its contemporary form in the 1990ies (by the way in Austria, where the askyourdata-team is based), F2F has become an

**enormously important channel for many charities,**particularly for the acquisition of regular supporters.

In the majority of countries affected by COVID-19, people were not completely forced to stay inside but allowed to move for certain purposes (work, groceries, walking etc.). One can get an idea of the impact on people´s mobility using the

**currently publicly available mobile data from Google**. You can go ahead and download a flat file to play with on this website. We obtained the global dataset and put together the following

**dashboard**for which we invite you to have a closer look. Just click the two little arrows in the bottom right corner of the dashboard or follow this link.

If you are looking for an

**insightful situation report**on the

**state of Face-to-Face fundraising in times of Corona**from a global perspective, we can recommend the

**recording of a recent panel discussion hosted by**. In short, F2F teams all across the world have proved their adaptiveness in many ways already...

*The Resource Alliance***What Now?**

I cannot tell how many countless times I have recently come across quotes talking about the opportunities that lie in crises. In many cases, they were mere platitudes, at the same time I deeply believe that the world will gradually get closer to how it was before Corona. This will be reflected by people sitting in cafes after some relaxed high-street shopping enjoying the sun ... everything completely mask-free.

**Will Face-to-Face fundraising be exactly the same then?**

Let us try to start dealing with this question with an analogy. COVID seems to have changed almost everything in our lives - but

**the world keeps turning for the good and the bad**. This means, for instance, that Climate Change will not pause just because we are busy with another crisis. The same applies to - in a more positive way - the ongoing digital revoution as well as the expansion of analytics and data science across all types of industries. From my point of view,

**F2F fundraising has been keeping pace with technological developments quite well in the recent past**. The chances to come across F2F agents using tablets, simple and customer-oriented processes, instant messaging services etc. are quite high in many countries. Our hypothesis is, however, that there is scope for even farther innovations ...

**The Power of Where: Using Location Intelligence in F2F?**

This nice newspaper article illustrates examples of how companies use geolocation data to target their (potential) customers. One of our favourite blogs

*Towards Data Science*has summarised the Power of Where and goes as far as to postulate that location analytics will change the world. Location analytics also has the potential to make contributions during and after this pandemic, as outlined in this recent article by the platform Carto.

Seen from a practical perspective, what might use cases of Location Intelligence be in F2F fundraising? Many

**mobile network providers**across the globe have started offering services in the context of

**Mobile Location Analytics**, as US-provider Verizon calls it. These services are typically not as prominently advertised as other products and tools - but they are there. What might "Mobile Location Analytics" mean? Well, in a retail context, interesting "research questions" might be:

- How mobile users get to brick-and-mortar stores?
- Where do they come from and where do they go subsequently?
- Which locations to they frequently use?
- ...

Seizing this idea,

**would it not be interesting to know who is moving when across the (high-)street or the shopping center where the next large-scale F2F campaign will take place?**Of course, nonprofits following-up the use of such services have to have

**a**

**wareness of data protection and privacy**(although this is what the networks have to take care of) and donor communication to be prepared.

**Admittedly, we are raising a somewhat ambiguous and maybe even controversial approach as potential add-on to professionalized Face-to-Face fundraising. What is your opinion?**

## In modern economy, coming up with forecasts as best possible predictions of future income has become an imperative across industries. Also the charitable nonprofit sector has seen an increasing adaption of forecasting methods in recent years. This is why we already dealt with this topic on this blog some time ago. The endeavour of forecasting is challenging enough in times of economic stability but seems almost impossible after the advent of a „black swans“ like the Corona Virus. Of course, the future is just as uncertain as Ilya Prigogine said. At the same time, there is a familiy of statistical models which not only provides "well-informed" income predictions but particularly help in finding out to what extent current data deviates from the "expected normal". Let us therefore take a closer look at **fundraising income forecasting using time series.**

**The basiscs**

In their book Financial Management for Nonprofit Organizations from 2018 (by the way a recommendable read), Zietlow et al. differentiate between different forecasting types:

**causal model**is one in which an analyst or data scientist has defined one (simple regression) or several cause factors (multiple regression) for the dependent variable she or he is trying to predict. In the case of income prediction on an aggregated level, drivers (i.e. independent variables) might be found both on the level of donors and exogenous factors.

**Time series models**work differently and tend to be more complex. They essentially use historical data to come up with predictions on future outcomes. Time series are widely used for so called non-stationary data. A stationary time series is one whose properties are independent from the point on the timeline for which its data is observed. From a statistical standpoint, a time series is stationary if its mean, variance, and autocovariance are time-invariant. The requirement of stationarity (i.e. stability) makes intuitive sense as time series use previous lags of time to model developments and modeling stable series with consistent properties implies lower overall uncertainty.

**Time Series in a Nutshell**

Time series are often comprised of the following components, although not all time series will necessarily have all or any of them.

**Trend:**The time series contains a trend when there is a long-term increase or decrease in the data. This trend does not have to be linear.**Seasonal:**A*seasonal*pattern exists when a time series is affected by seasonal factors such as month(s) of a year or certain weekdays.**Cycle:**A cylce is there when the data shows rises and falls that are not fixed in frequencies. The respective fluctuations are often related to economic conditions.**Noise:**Remaining random variation in the data.

The model we will take a closer look at for the prediction of fundraising income is ARIMA which stands for Auto-Regressive Integrated Moving Average. ARIMA models can be seen as the most generic approach to model time series. Wheter the ARIMA algorithm can be applied to the historic income data right away can be evaluated by statistical methods such as the Dickey-Fuller-Test. The null hypothesis of the Dickey-Fuller-Test assumes that the series is non-stationary. Even if this hypothesis cannot be rejected, which means that the data under scrutiny is non-stationary, there are ways to make time series usable. In this case, differencing and log-transformations of the data can be applied in a preparatory step.

**Applied Example**

The raw data used for the following example is really straightforward. Historic fundraising income from 2012 to 2019 is extracted in a simple structure (Donor ID, payment date, amount). In a next step, datewise aggregation is applied. Having distinct dates in the dataset allows using both a weekly and monthly perspective on the data. ARIMA is now used to decompose the data. The resulting charts look as follows:

*No. donations*, first on top). The second chart from above shows an overall upward trend in the data.

For research purposes, we decided to use the data from 2012 to 2018 as "training set" and use ARIMA to generate a prediction for the already closed year 2019. We did that both on the level of accumulated donation counts and donation sum per week. The week-based prediction for the donations looks like this with the

**blue line being the prediction and the red line being the acutal data:**

**The chart above shows that the blue predicted line generally runs close to the red line representing the actual data.**The actual data also oscillates within the forecast prediction intervals at 80% and 95% confidence levels. This is what the actual and predicted data look like for accumulated income over the weeks 2019.

**Conclusion**

**To what extent can a time series approach inform fundraising planning and decision making in times of a highly dynamic environment coined by the Corona pandemic?**Well, it is still quite unclear how global economy and different countries will develop in the near future. It is also yet to be seen how the Corona crisis will affed fundraising markets on the mid- and long-run.

**In essence, time series can be an interesting approach to come up with a sophisticated analysis on the extent of the deviation from normal income level**, most probably caused by Corona in a direct manner (e.g. face to face fundraising currently stopped) or indirect effects (e.g. rising unemployment) ...

**One must have a tough mind - and a soft heart.****The good old days of Web 1.0**

A world without the WWW is hard to imagine for most of us - although it is not that far away in the past. I get a little nostalgic when I think of setting up my first email address back in 1999. Back then the world was like this (ok, this

__is__quite some time ago ..).

Some 20 years ago, the overall number of websites around was ridiculously low for today´s standards. To get any idea about how a page was performing, web marketers had to rely on things like the notorious visitor counters or wrangling log files created by web servers. The latter was definitely a hassle as access to these files was required in the first place, configuration file hacking or programming was necessary, only static reports etc.

**Data-driven Digital Fundraising today**

Nowadays the common denominator for most organizations having a digital presence is definitely running a website. In a context of marketing and sales as well as fundraising, this page often acts as kind of hub to which other digital channels and communication activities point. Running a website requires sound management. And as management is in essence a data-driven discipline, an evidence-based approach also is key in a web context...

This is where

**Google Analytics**comes into play. It has the capacity to answer questions like these for online fundraisers and decisions makers in general:

*How did users find and get to the site?**Who visits the website and when?**What kind of traffic does the site generate?**How do users behave once they are on the site?**How do users interact with the website, how engaged they are?**What are the most and least interesting pages?**What drives conversions? Who is most likely to convert?**...*

Google Analytics is free in its base version which suits many use cases of websites. For big data applications, there is the even more powerful tool called

*as part of the*

**Google Analytics 360***.*

**Google Marketing Cloud**All you need to set up Google Analytics (free version) – apart from a website you have the ownership for of course – is a Google Account and the possibility to embed the little Java Script code snippet that does the magic. This can either be done by yourself or a web developer. Frameworks like

*Wordpress*or

*Weebly*(where this blog runs on) make life even easier through assistants. For the ones interested in what is happening under the hood, this infographic might be interesting.

**Google Analytics in a nutshell**

Google Analytics is embedded in an ecosystem together with tools like

*,*

**Google Tag Manager***etc. This is particularly the case on in a more advanced and differentiated online fundraising context.*

**Google Adwords**Google analytics offers a lot of

**potentially insightful reports**out of the box which can be used from day one.

**The main areas are**:

: This is where you will find live data, i.e. see how many users from where with which devices, where they came from etc. are on the respective site right now. It is a good point of entry particularly for beginners, last not least because it is fun to play with. Real-life use cases might imply portals with peak days of traffic (e.g. Black Friday, Giving Tuesday etc.)**Realtime**

: This is the node where you will find demographic, geographic, structural and technological information about the users on your site. The menu looks impressive at first glance, certain information (e.g. age and gender) requires the activation of additional tracking in the respective account. This might affect the privacy policy of your respective site. Sociodemographic data like age and gender are mainly derived from people who are logged in to a Google account and from third-party DoubleClick cookies (user tracking cookies).*Audience*: This node helps you understand where your organization is acquiring its visitors, i.e. whether they coming via search engines, display advertising, social media, partner links etc. This is also where to look for specific campaign data in case this is relevant for your site.**Acquisition**: This node contains reports on how visitors are interacting with the content on the respective website. It includes information on the “user journey” of visitors and might provide insights on how they experience your site.**Behaviour**: This node holds more advanced reports that require the setup of Goals and / or Ecommerce Tracking. In case you have one or several clear-cut “fundraising sales objectives” with your site, using these reports is definitely beneficial. The Conversion section does not only provide information on so called Goal Completion but shows the path users took until they converted, i.e. the so called sales funnel.**Conversion**

**Learn more**

There is a plethora of resources from Google and other providers to take a deep dive into Google Analytics. These are a few:

- Google Analytics Academy
- Google Analytics Support
- Youtube Channel about Google Analytics
- Blog about Google Analytics

**Advanced Analytics @ Google Analytics**

Google analytics can moreover also be interesting in a broader analytics and data science context. Think of the following uses cases:

**Create insightful and easy-to-use dashboards**for various audiences using tools like Google Data Studio or - our favourite - Microsoft Power BI.- In a context together with a CRM-system, analytics and decision makers will be interested in what the
**real digital sources of “fundraising sales”**(e.g. committed gifts) are. In many cases, the labelling of the respective signups is as general as “digital”. This definitely does not account for all the digital channels, their specifics and the differentiated measures they require. - Getting raw data from Google Analytics to
**apply data science methods**such as time series analyses to predict website traffic, apply clustering methods, tailor attribution models based on insights generated from your conversion data etc.

**To cut the long story short, analysts, data scientists etc. working in a fundraising context should - in addition to the digital fundraising experts taking care of the website and digital channels - care about digital data analyses in general and Google Analytics in particular. As 2020 just started - let´s get cracking! :-)**

All the best and keep in touch as asykourdata.co or johannes.spiess@sos-kd.org.

Johannes

## Categories

Alle

Artificial Intelligence

Because It´s Fun!

Churn

Clustering

Data Sciene @NPO

Data Visualization

Facebook

Machine Learning

Maps

Neural Nets

Power BI

Predictive Analytics

Social Media

Time Series

Twitter

## Archive

September 2020

August 2020

Mai 2020

April 2020

Februar 2020

Dezember 2019

November 2019

September 2019

Juni 2019

April 2019

März 2019

Januar 2019

Dezember 2018

Oktober 2018

August 2018

Juni 2018

Mai 2018

März 2018

Februar 2018

Dezember 2017

November 2017

Oktober 2017

September 2017

August 2017

Juli 2017

Mai 2017

April 2017

März 2017

Februar 2017

Januar 2017