Ask your data
  • Blog
  • About & Contact

Want to (re-)discover your Donor Data? Let’s take a deep dive!

6/21/2024

0 Comments

 
Picture
If you're not thinking segments, you're not thinking, Theodore Levitt once said. Fundraisers have traditionally put immense efforts in developing and tracking segments - so we are definitely not talking about something completely new for fundraisers. However, in times of Predictive AI and Data Science, advanced algorithms enable us to shed new light on the data of our supporters. In this blog post, we explore how advanced algorithms can generate new insights from donor data, improving targeted fundraising strategies. This deep dive into donor data segmentation reveals the importance of understanding and leveraging different segmentation techniques for optimal results.

Why should we segment Donor Data?

Segmentation is the process of dividing a market of individuals or organizations into subgroups based on common characteristics. This process allows organizations to align their products, services, and communication strategies with the specific needs of different segments. In the case of fundraising, by understanding what drives donor groups and their available choices, organizations can tailor their approaches for more effective engagement.

Key Dimensions for Donor Segmentation

Donor segmentation can be approached from various dimensions, including psychographic, geographical, behavioral, and demographic factors. Each dimension offers unique insights:
  • Behavioral: Payment behavior, loyalty, lifetime value, etc.  - this is the "classic" and common denominator for many.
  • Psychographic: Lifestyle, interests, attitudes, etc.
  • Geographical: Country, region, topography, etc.
  • Demographic: Age, income, gender, occupational group, etc.
These dimensions provide a framework for understanding donors' motivations and behaviors, which is essential for creating targeted and effective fundraising campaigns.


Picture
Of course, not all data necessary for the above-mentioned dimensions is available right away. Some of it is hard to obtain, only indirectly available, or not obtainable at all. In a stylized, way, the possibilites can be summarized as follows:
​

Picture

​​Behavior-Based Segmentation with RFM

One of the most effective methods for behavior-based segmentation is the RFM model, which evaluates donors based on Recency, Frequency, and Monetary value:
  • Recency: When did a donor last donate?
  • Frequency: How often does a donor donate over a certain period?
  • Monetary Value: How much do donors contribute per donation or a set time period such as a year?
This classic approach is easy to understand and implement, making it a popular choice for many organizations.

Unsupervised Learning for Donor Segmentation

In addition, not necessarily as a replacement, unsupervised learning offers a more advanced technique for donor segmentation. Unlike RFM, which relies on predefined categories, unsupervised learning models detect patterns or groups within the data without prior labels. This method is highly flexible and can uncover hidden (sub-)segments that traditional methods might overlook.

A simplified "cooking recipe" for a clustering approach like k-means looks as follows:
  1. Randomly select reference points in the data (e.g. a donor with her / his values for R, F and M)
  2. Assign adjacent data to the closest reference point.
  3. Move reference points to the data center.
  4. Reassign data based on new affiliations.
  5. Repeat until reference points are centered within correctly associated data.
​
Picture
Comparing RFM vs. Unsupervised Learning
​

Both RFM and unsupervised learning have their advantages and limitations:
  • RFM: Simple, easy to implement, but may miss nuanced behavioral patterns.
  • Unsupervised Learning: Flexible and data-driven, but can be challenging to interpret without additional effort.
Despite their differences, these approaches can coexist, providing a comprehensive understanding of donor behavior. By leveraging both methods, organizations can gain deeper insights and develop more effective fundraising strategies.

So What? A straightforward conclusion
​

Unsupervised methods of donor segmentation are designed to incorporate various data types unbiasedly, offering a data-driven approach to understanding donor behavior. While these methods provide valuable insights, they also come with limitations, particularly regarding traceability and group stability over time. Ultimately, a combination of RFM and unsupervised learning techniques can yield the most comprehensive and actionable insights for donor-centered fundraising.

Inspired? Interested? In need for a chat? Or are there experiences you can share? Please go ahead and do so and do not hesitate to reach out.

All the best and have a great summer!
0 Comments

Diving into donor segments with k-means clustering

10/5/2018

0 Comments

 
Bild

​Clustering means grouping similar things together. In this month’s we take a closer look at how the so called k-means algorithm as a clustering method might help to ​develop an even deeper understanding of established donor segments. 

Many fundraising organizations segment their donor bases to enable target-group oriented communication and appeals. Working with segments also allows the development of more differentiated analyses on group migrations and income development. The example organization whose data we are playing around with has an up-and-running segmentation model that evaluates the payment behaviour of donors in terms of Recency (when was the last payment), Frequency (how often payments are made) and Monetary Value (sum of donations). This RFM method is commonly used in database-related direct marketing, both in the profit and non-profit sector.

We take a closer look at the best donor group in the following. The segmentation model already tells us that the group under scrutiny consists of “good donors” in terms of payment behaviour. In this regard, the segment as such can be seen as a homogenous group. Can a clustering algorithm like k-means generate additional insights regarding the “inner structure” of this group?

K-means is an unsupervised machine learning algorithm. Unsupervised learning data comes without pre-defined labels, i.e. any kind of classifications or categorizations are not included. The goal of unsupervised learning algorithms is to discover hidden structures in data. The basic idea of k-means is to assign a number of records (n) into a certain number of similar clusters (k). The number of clusters k is either pre-defined or can be jointly defined by the respective analyst / fundraising manager.

The following walktrough was inspired by Kimberly Coffey’s blogpost on k-means Clustering for Customer Segmentation. It is a highly recommended read and can be found here.

Data extract and preprocessing

Our example organization has an established RFM-based segmentation model that yields 4 core groups. We defined the “best” of those groups to be subject for a k-means clustering attempt. The dataset we extract is straightforward in the first place as it contains the unique identifier, the Recency measured as number of days between December 31st, 2017 and the day of the last donation per person. Frequency reflects the number of payments for the respective record in 2017 whereas Monetary Value shows the donation sum.
Bild

K-means clustering requires continuous variables and it works best with (relatively) normally-distributed, standardized input variables. We therefore apply a logarithm (log) to the variables and standardize them to avoid positive skew. Standardizing variables means re-scaling them to have a mean of zero and a standard deviation of one, i.e. aligning them to the standard normal distribution. The following code snippet illustrates the loading and transformation of the data.

Code Snippet 1: Load packages and data, then transfrom data

    
This is how our dataset looks like the aforementioned transformation. Columns 5 to 7 contain the logs of Recency, Frequency and Monetary Value, whereas the standardized (z-)values are in columns 8 to 10.
Bild

​A quick exploratory view

To intially dive into the data, ​we plot the log-transformed Monetary Value as well as log-transformed Frequency of donations and use the log-transformed Recency for colouring by using the code in Code Snippet 2.
Code Snippet 2: Exploratory plot of RFM variables

    
What is striking is the high general density of observations on the one hand (which is due to the large amount of data) and the different shades of blue that reflect certain heterogeneity in terms of Recency. 
​
Bild
Running the K-means Algorithm

We now turn to running the k-means algorithm. The following code contains a loop that runs for a number of j clusters (in this example 10). It writes the cluster membership on donor level back into the dataset, creates two-dimensional plots (see example for 3 and 7 clusters below) and collects the model information.
Code Snippet 3: K-means clustering

    
These are two output examples of the code using 3 and 7 clusters.
Bild
​
So how do we choose the “optimal” number of clusters now? The graphs below both aim for the detection of the number of clusters beyond which adding a cluster adds only little additional explanatory power. In other words we look for bends in the so-called elbow chart. In our example it looks as if this would be at 4. The same decision could have also been made or at least influenced by a business decision regarding the "feasible" number of clusters. Adding additional clusters always adds explanatory power, however, in practice 4 groups are easier to handle (e.g. in the context of a direct marketing test) than 10 or more clusters.
Bild
Results and interpretation
Bild
Let us now take a closer look at the results. Clik on the picture on the left to get to an interactive 3d-graph of the 4-cluster solution for which the R-code can be found below. The 4-cluster solution yields 4 ellipsoids aiming to reflect the areas with high observation densities for the clusters. These ellispoids should contribute to the ease of reading the graph, the actual observations are still represented by differently coloured dots just like in the 2-dimensional plot we used for exploration. The three "upper clusters" in the picture share a comparable level of Monetary Value and Recency. The dark blue ellispoid stand out of the three as it reflects higher Frequeny. The lower ellipsoid reflects observations that rank relatively low on all of the three RFM variables (remember, the higher the recency, the "worse" - knowing that we are working with a dataset of good donors). The video below contains a fixed-axis rotation.

Code Snippet 4: 3d graph

    
Conclusion

So, what can we conclude from our clustering attempt? K-means is a widely-used and straightforward algorithm that can be applied relatively easily in practice. It is, however,  worthwhile to dive into the underlying concepts of the algorithm and consider the related diagnostics (variance explianed, "withins"). Due to the derived possibilites in terms of data visualization, results can be directly communicated to fundraising decision makers. These decision makers should be involved in the process at an early stage. Although there are "objective" measures for the numbers of clusters, application-oriented considerations (e.g. for further analyses of test designs) should not be left out.

We hope you liked this month's post and wish you a nice beginning of autumn. Do not hesitate to share, comment, recommend etc. Read / hear / see you soon!


0 Comments

    This website uses marketing and tracking technologies. Opting out of this will opt you out of all cookies, except for those needed to run the website. Note that some products may not work as well without tracking cookies.

    Opt Out of Cookies

    Categories

    All
    Artificial Intelligence
    Attribution Modelling
    Because It´s Fun!
    Churn
    Clustering
    Data Sciene @NPO
    Data Strategy
    Data Visualization
    Ethical AI
    Facebook
    Machine Learning
    Maps
    Marketing Mix Modelling
    Natural Language Processing
    Neural Nets
    Next Best Action
    Power BI
    Predictive Analytics
    Recommender Systems
    Segmentation
    Social Media
    Time Series
    Trends
    Twitter

    Archive

    December 2024
    September 2024
    August 2024
    June 2024
    December 2023
    August 2023
    March 2023
    January 2023
    October 2022
    August 2022
    December 2021
    September 2021
    June 2021
    January 2021
    November 2020
    September 2020
    August 2020
    May 2020
    April 2020
    February 2020
    December 2019
    November 2019
    September 2019
    June 2019
    April 2019
    March 2019
    January 2019
    December 2018
    October 2018
    August 2018
    June 2018
    May 2018
    March 2018
    February 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017

About

Copyright © 2018
  • Blog
  • About & Contact