COVID-19 impacts on fundraising - the case of Face-to-Face
The ongoing crisis caused by the Corona pandemic has brought huge challenges for many people all over the globe and dislocation in all types of industries. The evident impacts and maybe the ones yet to come imply serious threats for numerous fundraising nonprofit organizations. The pandemic has significantly affected the conditions under which widespread fundraising channels can be used. Considering lockdowns all over the world leading to drastically reduced mobility, Corona has most obviously affected Face-to-Face fundraising (F2F). Since the introduction of its contemporary form in the 1990ies (by the way in Austria, where the askyourdata-team is based), F2F has become an enormously important channel for many charities, particularly for the acquisition of regular supporters.
In the majority of countries affected by COVID-19, people were not completely forced to stay inside but allowed to move for certain purposes (work, groceries, walking etc.). One can get an idea of the impact on people´s mobility using the currently publicly available mobile data from Google. You can go ahead and download a flat file to play with on this website. We obtained the global dataset and put together the following dashboard for which we invite you to have a closer look. Just click the two little arrows in the bottom right corner of the dashboard or follow this link.
If you are looking for an insightful situation report on the state of Face-to-Face fundraising in times of Corona from a global perspective, we can recommend the recording of a recent panel discussion hosted by The Resource Alliance. In short, F2F teams all across the world have proved their adaptiveness in many ways already...
I cannot tell how many countless times I have recently come across quotes talking about the opportunities that lie in crises. In many cases, they were mere platitudes, at the same time I deeply believe that the world will gradually get closer to how it was before Corona. This will be reflected by people sitting in cafes after some relaxed high-street shopping enjoying the sun ... everything completely mask-free. Will Face-to-Face fundraising be exactly the same then?
Let us try to start dealing with this question with an analogy. COVID seems to have changed almost everything in our lives - but the world keeps turning for the good and the bad. This means, for instance, that Climate Change will not pause just because we are busy with another crisis. The same applies to - in a more positive way - the ongoing digital revoution as well as the expansion of analytics and data science across all types of industries. From my point of view, F2F fundraising has been keeping pace with technological developments quite well in the recent past. The chances to come across F2F agents using tablets, simple and customer-oriented processes, instant messaging services etc. are quite high in many countries. Our hypothesis is, however, that there is scope for even farther innovations ...
The Power of Where: Using Location Intelligence in F2F?
This nice newspaper article illustrates examples of how companies use geolocation data to target their (potential) customers. One of our favourite blogs Towards Data Science has summarised the Power of Where and goes as far as to postulate that location analytics will change the world. Location analytics also has the potential to make contributions during and after this pandemic, as outlined in this recent article by the platform Carto.
Seen from a practical perspective, what might use cases of Location Intelligence be in F2F fundraising? Many mobile network providers across the globe have started offering services in the context of Mobile Location Analytics, as US-provider Verizon calls it. These services are typically not as prominently advertised as other products and tools - but they are there. What might "Mobile Location Analytics" mean? Well, in a retail context, interesting "research questions" might be:
Seizing this idea, would it not be interesting to know who is moving when across the (high-)street or the shopping center where the next large-scale F2F campaign will take place? Of course, nonprofits following-up the use of such services have to have awareness of data protection and privacy (although this is what the networks have to take care of) and donor communication to be prepared.
Admittedly, we are raising a somewhat ambiguous and maybe even controversial approach as potential add-on to professionalized Face-to-Face fundraising. What is your opinion?
In modern economy, coming up with forecasts as best possible predictions of future income has become an imperative across industries. Also the charitable nonprofit sector has seen an increasing adaption of forecasting methods in recent years. This is why we already dealt with this topic on this blog some time ago. The endeavour of forecasting is challenging enough in times of economic stability but seems almost impossible after the advent of a „black swans“ like the Corona Virus. Of course, the future is just as uncertain as Ilya Prigogine said. At the same time, there is a familiy of statistical models which not only provides "well-informed" income predictions but particularly help in finding out to what extent current data deviates from the "expected normal". Let us therefore take a closer look at fundraising income forecasting using time series.
In their book Financial Management for Nonprofit Organizations from 2018 (by the way a recommendable read), Zietlow et al. differentiate between different forecasting types:
Statistical, forecasting methods can be gernerally divided into causal (also known as regression) methods and time series methods. A causal model is one in which an analyst or data scientist has defined one (simple regression) or several cause factors (multiple regression) for the dependent variable she or he is trying to predict. In the case of income prediction on an aggregated level, drivers (i.e. independent variables) might be found both on the level of donors and exogenous factors.
Time series models work differently and tend to be more complex. They essentially use historical data to come up with predictions on future outcomes. Time series are widely used for so called non-stationary data. A stationary time series is one whose properties are independent from the point on the timeline for which its data is observed. From a statistical standpoint, a time series is stationary if its mean, variance, and autocovariance are time-invariant. The requirement of stationarity (i.e. stability) makes intuitive sense as time series use previous lags of time to model developments and modeling stable series with consistent properties implies lower overall uncertainty.
Time Series in a Nutshell
Time series are often comprised of the following components, although not all time series will necessarily have all or any of them.
The model we will take a closer look at for the prediction of fundraising income is ARIMA which stands for Auto-Regressive Integrated Moving Average. ARIMA models can be seen as the most generic approach to model time series. Wheter the ARIMA algorithm can be applied to the historic income data right away can be evaluated by statistical methods such as the Dickey-Fuller-Test. The null hypothesis of the Dickey-Fuller-Test assumes that the series is non-stationary. Even if this hypothesis cannot be rejected, which means that the data under scrutiny is non-stationary, there are ways to make time series usable. In this case, differencing and log-transformations of the data can be applied in a preparatory step.
The raw data used for the following example is really straightforward. Historic fundraising income from 2012 to 2019 is extracted in a simple structure (Donor ID, payment date, amount). In a next step, datewise aggregation is applied. Having distinct dates in the dataset allows using both a weekly and monthly perspective on the data. ARIMA is now used to decompose the data. The resulting charts look as follows:
What is clearly visible in the charts above is a degree of seasonality that is obvious in the third box from above but already striking in the overall chart (No. donations, first on top). The second chart from above shows an overall upward trend in the data.
For research purposes, we decided to use the data from 2012 to 2018 as "training set" and use ARIMA to generate a prediction for the already closed year 2019. We did that both on the level of accumulated donation counts and donation sum per week. The week-based prediction for the donations looks like this with the blue line being the prediction and the red line being the acutal data:
The chart above shows that the blue predicted line generally runs close to the red line representing the actual data. The actual data also oscillates within the forecast prediction intervals at 80% and 95% confidence levels. This is what the actual and predicted data look like for accumulated income over the weeks 2019.
To what extent can a time series approach inform fundraising planning and decision making in times of a highly dynamic environment coined by the Corona pandemic? Well, it is still quite unclear how global economy and different countries will develop in the near future. It is also yet to be seen how the Corona crisis will affed fundraising markets on the mid- and long-run. In essence, time series can be an interesting approach to come up with a sophisticated analysis on the extent of the deviation from normal income level, most probably caused by Corona in a direct manner (e.g. face to face fundraising currently stopped) or indirect effects (e.g. rising unemployment) ...
We wish you and your dear ones all the best for these challenging times. I think that it is now more than ever worthwhile trying to turn Sophie Scholl´s quote into an attitude: One must have a tough mind - and a soft heart.
The good old days of Web 1.0
A world without the WWW is hard to imagine for most of us - although it is not that far away in the past. I get a little nostalgic when I think of setting up my first email address back in 1999. Back then the world was like this (ok, this is quite some time ago ..).
Some 20 years ago, the overall number of websites around was ridiculously low for today´s standards. To get any idea about how a page was performing, web marketers had to rely on things like the notorious visitor counters or wrangling log files created by web servers. The latter was definitely a hassle as access to these files was required in the first place, configuration file hacking or programming was necessary, only static reports etc.
Data-driven Digital Fundraising today
Nowadays the common denominator for most organizations having a digital presence is definitely running a website. In a context of marketing and sales as well as fundraising, this page often acts as kind of hub to which other digital channels and communication activities point. Running a website requires sound management. And as management is in essence a data-driven discipline, an evidence-based approach also is key in a web context...
This is where Google Analytics comes into play. It has the capacity to answer questions like these for online fundraisers and decisions makers in general:
Google Analytics is free in its base version which suits many use cases of websites. For big data applications, there is the even more powerful tool called Google Analytics 360 as part of the Google Marketing Cloud.
All you need to set up Google Analytics (free version) – apart from a website you have the ownership for of course – is a Google Account and the possibility to embed the little Java Script code snippet that does the magic. This can either be done by yourself or a web developer. Frameworks like Wordpress or Weebly (where this blog runs on) make life even easier through assistants. For the ones interested in what is happening under the hood, this infographic might be interesting.
Google Analytics in a nutshell
Google Analytics is embedded in an ecosystem together with tools like Google Tag Manager, Google Adwords etc. This is particularly the case on in a more advanced and differentiated online fundraising context.
Google analytics offers a lot of potentially insightful reports out of the box which can be used from day one. The main areas are:
There is a plethora of resources from Google and other providers to take a deep dive into Google Analytics. These are a few:
Advanced Analytics @ Google Analytics
Google analytics can moreover also be interesting in a broader analytics and data science context. Think of the following uses cases:
To cut the long story short, analysts, data scientists etc. working in a fundraising context should - in addition to the digital fundraising experts taking care of the website and digital channels - care about digital data analyses in general and Google Analytics in particular. As 2020 just started - let´s get cracking! :-)
All the best and keep in touch as asykourdata.co or firstname.lastname@example.org.
When we started this blog some three years ago, data science was widely seen as a mere buzzword. The interest in the concept seems to be here to stay – and grow steadily. Google Trends shows how the interest in the search term Data Science has developed in the last three years:
Data Science is here to stay
We think that Data Science is more than just a fancy term for statistics and agree with the popular blog KDnuggets: Data science is about creating value through data and supporting digital transformation of other processes in a company such as marketing, customer service, production etc. We believe that the positive impact of advanced analytical methods is something that can be generated across industries and is not limited to the corporate sector. Earlier this year, we discussed the adoption of advanced analytics by nonprofits. Given the already existing relevance of data science, we asked ourselves what 2020 might bring - and found some interesting hypothesis on the web.
Take everyone onto the Data Science Journey
According to towardsdatascience, some 100 papers on Machine Learning were published in 2019 on a single day. This reflects that Data Science as a whole is here to stay. Hand in hand with increasing presence comes certain differentiation. The mentioned blog sees a trend towards specialization among different roles in data science. On the one hand, there are experts on bringing models into production and providing the necessary infrastructure. On the other hand, there are people involved in investigative work and decision support.
The footprint of Data Science is getting larger as models are becoming an indispensable part of business operations. This implies the ongoing challenge to further increase model performance, the possible need for model retraining or rebuilding as well as continuous levels of support for model stakeholders.
We mentioned before that Data Science is essentially about turning data into value for the respective organization. This value creation is, according to towardsdatascience, not only dependent on the “physical technology” consisting of algorithms and data flows. The “social technology”, i.e. effective lines of related communication and decision-making or executive awareness (or even better, a basic understanding provided by interesting in-house-trainings in Data Science) are at least as important.
People and Tools are needed
Data Science is done by Data Scientists. According to a study by IBM, the demand for Data Scientists will grow by some 28% until 2020 (compared to 2017). Some might go as far as to call Data Scientists the “sexiest job of the 21st century” – like Harvard Business Review did back in 2012. Regardless of any labels, it can be expected that the perceived shortage of expert staff will remain in 2020 both across industries and on a global level. The good news is that further developed self-service tools will gradually improve the ease of data preparation, exploration, visualization and modelling.
Natural Language Processing
Most people think of structured information in rows and columns when they hear the term “data”. In fact, an unbelievable large amount of unstructured data, i.e. texts, speech, sounds and videos are produced every single day. This also applies to different forms of personalized data and general customer communication. A powerful approach to make most of unstructured data is so called Natural Language Processing. It is essentially about classifying texts in categories, sentiments, similarities etc. What happens under the hood is that characters are translated into numbers and further processed by models such as Neural Networks. Breakthroughs in Machine Learning and emerging libraries like Tensorflow have drastically increased the possibility to apply NLP models to unstructured data.
Data Privacy and Security as relevant constraint
There is no data science without data. The “raw material” for analysis and models is often personal data, be it from customers or donors. Particularly in a European context, the public has become more aware and careful regarding the ownership of personal data. The ongoing challenge for any kind of organization involved in data science is to keep highest data security and protection standards, aligned with best practices and being transparent upon customer request. If organizations stick to that, there is no need to become paranoid about data protection at the same time.
What is it that fundraising nonprofits can do or learn about Data Science in 2020?
We think that a classic quote by Mark Twain gives valuable hints into this direction:
The secret of getting ahead is getting started. The secret of getting started is breaking your complex overwhelming tasks into small, managable tasks and starting on the first one.
No matter how far away you see yourself away from applied and sophistiacated Data Science, it definitely will pay off to be even more data driven in 2020 and beyond. As we outlined earlier this year, there still seems to be a competitive edge in the industry for "analytcial NPOs" (see our blog post for facts and figures in this regard if you are interested. Do not hesitate to ask experts or organizations you trust for guidance - also joint systems will be happy to help throughout 2020. :-)
We wish you merry Christmas holidays and a good start into a happy, healthy and successful 2020.
David Weber and Johannes Spiess
Factfulness: Ten Reasons We're Wrong About the World -and Why Things Are Better Than You Think by Hans Rosling (2018)
Our very first blogpost on askyourdata.co from January 2017 was inspired by Hans Rosling's fact-based worldview and the tons of high-quality material and data the Rosling's and their fellows had piled together at gaprminder.org. Sadly, Rosling passed away in February 2017. One can go as far as to call Factfulness, Rosling's last book he had written with his son and daughter-in-law, his legacy. Factfulness is in essence a plea against seeing today's world in a "binary", black-or- white, rich-or-poor manner. One reflection of this are four generic income levels Rosling suggests instead of “rich and poor”.
This is what Bill Gates, a personal friend of Rosling and fan of Factfulness says about that: Hans compares this [..] to standing on top of a skyscraper and looking down at a city. All of the other buildings will look short to you whether they're ten stories or 50 stories high. It’s the same with income. Life is significantly better for those on level 2 than level 1, but it’s hard to see that from level 4 unless you know to look for it.
The main part of the book is structured along ten human instincts (e.g. fear instinct, size instinct etc.) that prevent us from fully perceiving the world as well as gradual developments and improvements as they are. Factfulness is not a mere 350-page-stream of undifferentiated optimism and positivity. Rosling names the threats and risks humankind is confronted with (climate change, global economic crisis etc.). A fascinating level of differentiation is maintained from the very first to the very last page.
Thinking Fast and Slow by Daniel Kahnemann (2012)
The fact that Daniel Kahnemann won a Nobel prize for economics in 2002 does not make this book a good read per se. In Thinking Fast and Slow, Kahnemann takes us on a tour de force through human thinking and decision making. The sheer quantity of insights Daniel Kahnemann and Amos Tversky, his main research partner who deceased in 1996, developed over time is breath-taking. Kahnemann shows us that there are essentially two ways of human thinking. The first system is fast thinking which happens almost automatically and instinctively. This is the predominant way of thinking. The second system is the one of slow thinking. It is capable of rational thought and conscious decision making but takes more of a person's energy and concentration.
Kahnemann and Tversky have coined the research on cognitive biases. For instance, the two conducted the first study on the anchoring effect that drives the perception of figures and decision making on a subconscious level. This can serve as an interesting baseline for fundraising appeals (e.g. use large numbers in texts to increase average donations). Thinking Fast and Slow is the lifetime achievements of two of the most influential contemporary psychologists within one book. The hardcover is a 450page+ read, i.e. not something for a rainy autumn weekend alone. I can recommend the audiobook, particularly for a commute. There is an extensive amount of related material on the web.
The Black Swan: The Impact of the Highly Improbable by Nassim Nicholas Taleb (2015)
Nassim Nicholas Taleb's book has been a global bestseller of popular science since it was first published in 2008. Also Daniel Kahnemann refers to Taleb in Thinking Fast and Slow and admits that his thinking at a later stage of his career was strongly influenced by Taleb.
Although the book has also be criticized by renowned reviewers for its partial lack of empirical evidence, intensive use of anecdotes and messy structure, we find it a worthwhile read. The key themes of the book are limitations of predictions and the occurrence of presumably “impossible events” – just like black swans were to Europeans before they were first sighted in Australia.
Taleb was a successful Wallstreet trader and quantitative analyst before becoming a successful author. This is why he argues that the unexpected is not only key to the understanding of financial markets but of history as a whole. This “holisitic” attempt by Taleb might be aiming a bit too high. However, The Black Swan is a thought-provoking and entertaining read.
Visual Display of Quantitative Information by Edward Tufte (2001)
The history of data visualization goes back many centuries. In a world of growing „datafication“, the visual display of quantitative information is growing more and more important. We have dedicated a whole section of blogposts to this exciting topic.
Edward Tufte’s book Visual Display of Quantitative Information is a highly valuable read for everyone that has to visualize data and communicate insights. Tufte advocates a quite radical „less is more“-approach when it comes to visualizing data. This means for example that, with insufficient data, a simple table might be preferable compared to creating any chart. If charts are used, no drop of ink should be wasted as Tufte continuously proclaims clarity, precision and efficiency in data visualization.
The book is not only insightful for producers of quantitative information but also for recipients, i.e. basically everyone reading a management report, newspapers, blogs etc. Data visualizations can sometimes tend to distort the underlying information in order to be more persuasive. This is what Tufte calls the “lie factor”, i.e. the size of effect shown in the visualiation in relation to the effect size in the actual data. Visual Display of Quantitative Information is a very recommendable extension to every analyst's , data scientist's and information designer's library.
Are there any books, papers, webinars and talks you can recommend for long and dark evening? Don´t hesitate to post comments and / or stay in touch with the team of askyourdata.co.
All the best and read you soon,
David Weber, BA
Data Science & Analytics
“If your only tool is a hammer, then every problem looks like a nail.” – Unknown.
Today’s data science landscape is a great example where we need hammers, but we also need screwdrivers, wrenches and pliers. Even though R and Python are the most used programming languages for data science, it is important to expand the toolset with other great utensils.
Today’s blog post will introduce a tool, which lets you leverage the benefits of data science without being native in coding: KNIME Analytics Platform.
KNIME (Konstanz Information Miner) provides a workflow-based analytics platform that enables you to fully focus on your domain knowledge such as fundraising processes. The intuitive and automatable environment enables guided analytics without knowing how to code. This blog provides you a hands-on demonstration of the key-concepts.
Important KNIME Terms
Before we start with a walkthrough of a relevant example, we need to declare some of the most important KNIME terms.
Nodes: A node represents a processing point for a certain action. There are many nodes for different tasks, for example reading a CSV-file. You can find further explanations about different nodes on Node Pit.
Workflow: A workflow within KNIME is a set of nodes or a sequence of actions you take to accomplish your particular task. Workflows can easily be saved and used by other colleagues. Even collapsing workflows into a single meta-node is possible. That makes them reusable in other workflows.
Extensions: KNIME Extensions are an easy and flexible way to extend the platform’s functionalities by providing nodes for certain tasks (connecting to databases, processing data, API requests, etc.).
Sample Dataset and Data Import
For demonstration, we are going to use a telecom dataset for churn prediction. This dataset could be easily replaced with a fundraising dataset containing information about churned donors. Customer or donor churn, also known as customer attrition is a critical metric for every business, especially in the non-profit sector (i.e. quitting regular donations). For more information on donor churn, visit our previous blog posts.
Consisting of two tables, the dataset includes call-data from customers and a table about customer contract data. While using some of the information available, we will try to predict whether a customer will quit his subscription or not. Churn is represented with a binary variable (0 = no churn, 1 = churn). For visualization purposes, we are going to use a decision tree classifier, although there are probably even better classification algorithms available.
First, we are using an Excel Reader and a File Reader to import both files. To make things easier, we use a Joiner node where we join both tables based on a common key. The result is a single table now ready for exploration and further analysis.
Feature Engineering is the process of analyzing your data and deciding which parameters have an impact on the label we want to predict – in this case whether a customer will quit or not. In other words: Making your data insightful.
But before we look at correlations between the label and the features, a general exploration of our data is recommendable. The Data Explorer node is perfect for some basic information. One thing we notice is that we need to convert our churn label to a string, in order to make it interpretable for our classifier later. This can be done with the Number to String node. Now it’s time for some correlation matrices. We are able to see some correlation between various features and our churn label, whereas others do not correlate really. We decide to get rid of those.
Now let’s start with the training of our model. But before we can do that, we need to partition our data into a training- and test-dataset. The training-set (mostly around 60-80% of our data) is used to train our model. The other part of our data will be used to test our model and to make sure it has prediction power. We can verify this with certain metrics. In this case, we will set the partition-percentage to 80%, which seems to be a good amount. This data will be fed into our decision tree learner.
After some computing time, our finished decision tree looks like this:
In order to make the model reusable and available for predictions with new datasets, we can save it with the PMML Writer node for later use. PMML is a format for sharing and reusing built models. If we want to, we can read the model later on with a PMML Reader node to make predictions with a new, unknown dataset. But before we use our model on a regular basis, we need to evaluate it with our test-dataset, which we split earlier.
Model Prediction and Evaluation
Now, testing our new model and evaluating its performance is one of the most important steps. If we can’t be sure that our model will predict right to a certain extent, it would be fatal to deploy it. So we feed the Decision Tree Predictor node with our test-dataset. This lets us see how the model performed.
We have certain metrics within our KNIME workflow to fully evaluate it. First, we are using a Scorer node to get the confusion matrix and some other important statistics. Our confusion matrix gives us a little hint about threshold tuning, but the accuracy with 84% looks already pretty good. Our model predicted 29 cases as ‘no churn’, although they were actually ‘churning’. This number is rather high, so we should consider tuning our model parameter.
Next up is the ROC (Receiver Operating Characteristics) Curve. It maps True Positive Rates and False Positive Rates against each other. One of the results is the AUC (Area under the Curve) which adds up to a very good score of 0.914. A score of 0.5 (the diagonal in the chart below) represents prediction without any meaningfulness, because it means predicting randomly.
Additional metrics would be the Lift Chart and Lift Table, but an explanation would be beyond the scope of today’s blog. We think it’s time to summarize and draw a conclusion.
Too good to be true?
KNIME is a powerful platform, which provides various possibilities to extract, transform, load and analyze your data. However, simplicity has its limitations. In direct comparison with various R packages, visualizations are not as neat and configurable. And the ‘simplistic’ approach to data science limits possibilities in some way or another, requiring the user to have a thorough understanding of the data science pipeline and process. Further, most real life cases are more complicated and need more feature engineering and analysis beforehand – creating the model itself is mostly one of the smallest challenges.
Nevertheless, we think that KNIME is an awesome tool for data engineering / exploration and workflow automation (and building fun stuff with social media and web scraping). But if you are looking for complex models supporting your business decisions – KNIME won’t probably be the platform you are searching for.
We hope you liked this month’s blog post and we would love to get in touch if you are interested in achieving advanced insights with your data or just want to dive deeper into the topic. If you want to know what Joint Systems can offer you concerning data analytics, this page will provide you with more information.