The year 2017 is coming to its close soon. We guess that for many people the days between years are a time to reflect on the opportunities and challenges the year that is about to start will bring. One can expect that this is also true when it comes to trends in Data Science. We did some research for Data Science outlooks 2018.
Machine Learning has been one of the buzzwords in the recent past. We also published an introductory blog post on the topic earlier this year. One might say the term is over-hyped, however, machine learning is applied in academia and across industries as a very recent survey by KDnuggets shows.
Dataconomy, a Berlin-based Data Science company, elaborate on the promising field of machine learning but also mention from where organizations are starting from. Regardless of Data Science concepts and tools, 77 percent of German companies still rely on “small" data tools like Excel and Access. Many of them might still have plenty of homework to do to transform into what Dataconomy call data-driven organizations. At the same time, a recent KPMG study (available upon request in German) shows that 60 percent of companies have been able to benefit in different forms (reduction of costs or risks and/or increase of revenues) from Data Science – which of course includes Machine Learning and Artifical Intelligence.
Artificial Intelligence (AI)
Speaking of artificial intelligence - for many, 2018 will be the year when AI breaks though. The prominent research company Gartner defines AI as one of the most important strategic technology trends for 2018. They refer to a recent survey showing that 6 out of 10 organizations are in the process of evaluating AI strategies whereas the remaining 4 have started adopting respective technologies
The analytics company absolutedata goes as far as to speak of AI powered marketing and formulates certain predictions regarding what 2018 will bring in this context:
Bill Vorhies, Editorial Director for Data Science Central, is a bit more hesitant in the context of AI. He predicts that – regardless of the hype – the diffusion of techniques and tools in from the field of AI and deep learning will be slower and more gradual than expected by many. One already visible manifestation of the spread of AI are chat bots which are increasingly used in a web and mobile context. Chat bots essentially process natural language and thereby involve customers and prospects in an interaction. The implementation of facial and gesture recognition currently look like the next big thing as possible applications on the point of sale seem vast.
How should nonprofits deal with AI? Steve MacLaughlin, author and Vice President of Data & Analytics at Blackbaud, underlines the vast opportunities for nonprofits but also relativizes the buzz around AI. MacLaughlin explains that AI for nonrpfits requires the availability of the right data, contextual expertise, and continuous learning. Given these factors, AI can support nonprofits to be impactful particularly through fundraising.
We also dealt with Big Data – probably the buzziest of all buzzwords from the field – in February this year. We quite liked a recent blog post by KDnuggets that starts as follows:
There's no denying that the therm Big Data is no longer what it used to be [..] note that the t. We now all just assume and understand that our everyday data is huge. There is, however, still value in treating Big Data as an entity or a concept which needs to be properly managed, an entity which is distinct from much smaller repositories of data in all sorts of ways.
What follows next is the gist of interviews with various experts from the field talking about their expectations for 2018 and beyond – a really recommendable holiday read.
The EU General Data Protection Regulation (GDPR) will be enforced from May 25th, 2018. It is yet to be seen to what extent customers and donors will, for instance, actively insist on the Right to be Forgotten – which might have implications for the availability of donor data for advanced modelling as well.
More and more organizations seem to develop an interest in Big Data experts, Data Architects, Data Scientists etc.. Without any doubt, advanced analytics and Data Science efforts need motivated and skilled women and men to succeed. Florian Teschner conducted an analysis on Data Science job offers recently. Although there is scope for absolute growth in available positions, there is some five fold increase since 2015.
Stay tuned to Data Science in 2018
We think it will be worthwhile to keep one’s eyes open in 2018.
If you are interested in Data Science as well as the conceptual and technological developments in the field and you are not on Twitter yet, following some influential thinkers from the area might be an interesting starting point. Maptive.com provides a list of potential influencers.
If you want to dive a little deeper, some experts’ Github accounts might be a place to go if you look for papers, code etc. - just follow the overview provided by analyticsvidya.com.
Last, not least ...
... we wish you and your dear one a happy and relaxed (data-free) Christmas and good start in a successful and dynamic 2018. See you next year on this blog!
The Caret package ("Caret" stands for Classification And REgression Training) includes functions to support the model training process for complex regression and classification problems. It is widely used in the context of Machine Learning which we have dealt with earlier this year. See this blog post from spring if you are interested in an overview. Apart from other, even more advanced algorithms, regression approaches represent and established element of machine learning. There are tons of resources on regression, if you are interested and we will not focus on the basics in this post. There is, however, an earlier post if you look for a brief introduction.
In the following, the output of a knitr report into html was pasted on the blog site. We used made-up data on donor lifetime value as dependent variable and the initial donation, the last donation and donor age as predictors.
In short: The explanatory and predictive power of the model are low (for which the dummy data has to be blamed to a certain extent). However, the code you will find below aims to illustrate important concepts of machine learning in R:
Preparing Training and Test Data
We see that InitialDonation, LastDonation and Lifetimesum are factos .. so let´s prepare the data.
As we have a decent dataset now we go ahead and load the promiment machine learning package Caret (Classification And Regression Training)
So called near-zero-variance variables (i.e. variables where the observations are all the same) are cleaned.
Before we fit the model, let´s have a look at intercorrelations
Fitting the model
Now it is the moment we fit the (regression) model:
The scatterplot illustrates the relative poorness of the prediction. So: Still some work (data collection, modelling) to be done :-)
Artificial intelligence (AI) has been an influential concept not only in the scientific community but also in popular culture. Depending on the respective attitude towards AI, one might associate characters such as inhibited but likeable android Data from Star Trek or neurotic and vicious HAL 9000 from the Space Odysee movies with it. The form of AI those two represent is called General AI, i.e. the intelligence of a machine that enables it to perform intellectual tasks as good as or better than a human. General AI is – at least for the moment – still a topic for science fiction. What is interesting for various industries is the notion of Narrow AI (also termed Weak AI). These are technologies that enable humans to fulfil specific tasks in an automated manner just like or even better than they could.
In the context of data, machine learning can be seen as an approach to achieve artifical intelligence. Machine learning is about analyzing data, learning from it and using the insights gained for decision making or predicitions about something.
The learning in machine learning can be attempted in two generic forms:
What can fundraisers do? I have to say that I did not come across many practice sharing posts or articles when I conducted research for this post. I doubt that either the availability of data or the competence portfolio of analysts and data scientists limit the possibilities of fundraising organizations in the context of machine learning techniques. However, there might be a certain level of insecurity regarding where and how to start. I found an inspiring blog post by Stephen W. Lambert in which he explains that all you basically need is a computer, a database with relevant data and your brain to start diving into machine learning techniques. I think Lambert’s text invites fundraising organizations to do their theoretical and conceptual homework, process and prepare their data accordingly and start experimenting with maching learning techniques. So - go ahead and try.