Datacraft Co-Founder Mike Anderson speaks up on modern data trends and its challenges
There are many ways that childhood interest can take root and influence a person's future, as Mike Anderson, founding partner at Datacraft, proved some three decades ago. Back then, the 8-year-old self-taught computer programmer had just written his first set of code on an old Atari XL, creating a sinuous Mandelbrot Set. Coined as a complicated mathematical object that exists in the wide universe of numbers, a Mandelbrot set is created using a computer using algorithms closely related to physics and cryptography. Despite its scientific sensibility, Mike considers these sets a creative regeneration of sorts. "They don't necessarily serve a purpose. It's just a very beautiful mathematical object, you just watch it as an art form."
In another life, Mike might have been an artist, blending creative with code. Having always been fascinated by the artistic side of numbers, he's dabbled in numerous side projects of the sort. One of which includes coding the 'brain' of Twitter chatbot Tweegeemee, a genetic algorithm that generates random pictorial blends based on it's database of viewership and likes. "It's an evolutionary algorithm, which means that Tweegeemee breeds, replicates and crossbreeds patterns based on the audience's likes of past images," he says, gesturing animatedly to the swirl of pop art decorating his screen. "Twegeemee creates random art. Some of them are rubbish, but some of them are absolutely brilliant. I particularly like the chrome and plasma cloud effect. Thing is, it's just simple code and math that creates such beautiful art pieces."
Despite his admiration for intricately coded art pieces, Mike decided to channel his fascination for data into improving healthcare. "This artistic stuff is nice, but it's not fundamentally changing people's lives. I think it's important to have a balance in life between the things that you do for entertainment, and the things that matter." The result is Mike's heavy involvement with IoT data, sensor data and predictive modelling tools, having competed in Hackathons back in his heyday and ending up winning three in a row. The devices behind Mike's winning streak? A predictive model that gauges blood sugar levels for a health hackathon, a machine-learning tool that detects night-time movements for an active ageing hackathon, as well as a detailed heat map of all the telco activity in Singapore for a data visualisation challenge.
The best inspiration, however, is Mike's unvarnished take on his past experiences - be it a success story or not. "Some of these challenges don't necessarily have an objective. Particularly when you work with data insights, your work is chiefly exploratory," he explains. "Data visualisation is a powerful technique, and it's only by exploring this data that you discover insights and spot opportunities." His brutally honest answers apportions significant trends and insights. Indeed, the reception of Mike's zealous passion to just the importance of data insights is constant, and if possible, captivating.
DATA HELPS FOCUS THE RIGHT RESOURCES ON THE RIGHT PEOPLE
I work with a lot of healthcare data, which is very interesting as you can do a lot of analysis with it. For example, we did some work in the UK for diabetes. We took patient history and built a stratified risk model, where each patient was assigned a risk category. These categories then help health professionals decide which health packages each patient needs. Health is very dependent on economics. You can't provide the most advanced service to everyone in the population, because it doesn't make economic sense. By having individual identifiable data in the right categories, you can use that to customise their treatment according to their healthcare situation.
DATA CLEANING STILL THE BIGGEST CHALLENGE FOR DATA SCIENTISTS
We actually ran a survey, and found that 60% of all data scientists' work is spent on data cleaning. This is posing to be a huge bottleneck on the opportunities around data, because the challenge is about getting the right data in a clean enough format, and being able to link it up with other sources of data. Datacraft is addressing this challenge with our current collaboration with IMDA, as we may be potentially launching a new groundbreaking tool that prepares and cleans up data. It's still in its early stages, but I'm quite excited for this project.
TOOLS ARE VERY POWERFUL WHEN YOU MAKE THEM EASY TO USE
To me, a great example would be Facebook. Everybody understands it, knows how to share a photo, post a status update etc, because it's very easy to use, and extremely accessible. I would like to do the same about data, but the data tools currently available are really primitive. Microsoft Excel is a terrible, terrible tool. For various reasons, Microsoft succeeded in monopolising the market in office productivity tools, and there has hardly been any innovation in the space for the last 20 years! That's what happens when you get these big monopolies with average products - They're certainly never going to drive the cutting edge of data innovation. Datacraft sees this as an opportunity to create a new tool to make data tabulation simple and easy to use.
A STARTUP CULTURE'S FLEXIBILITY IS VERY ATTRACTIVE
I worked for McKinsey and Company for 7 years as a Senior Consultant, but left because I wanted to do the things I was passionate about. When you're in a corporate company, you are beholden to what the clients want and your outputs tend to be powerpoint decks and excel spreadsheets. Sure, you have discussions with the board at the end of a project but that's often where your involvement ends. Personally, I find it more rewarding to get involved in the hands-on process with data and technology, where I get to build things. I like startup culture, and the flexibility it entails for one to pursue interesting projects. Hence, I made a very conscious decision to move into the startup space.
DATA PROTECTIONISM AND FRAGMENTATION STILL POSE HUGE CHALLENGES
I think there's too much protectionism around data. Even though these controls may be rightly set in place - due to personal protection and organisational privacy, for example - it still serves as a huge blockage that stops you from distributing the data, even if you had the best intentions for it. Additionally, data is very fragmented, most of which are currently sitting in individual silos and databases. There is no easy way to link this data up with other bits of data, and it's a huge technological and organisational challenge, as data is only powerful when you can combine them together.