1. 程式人生 > >Are you ready for data science?

Are you ready for data science?

Copyright Timo Elliott

Are you ready for data science?

Now that I spend a fair proportion of my time either meeting with potential clients or advising on data strategy, one of the questions that consistently comes up is about a business’ readiness for data science.

It’s true that a lot stands in the way of a successful machine learning or predictive analytics project, but there’s also lots that you can do to mitigate the risk of failure. Below are 4 potential barriers to data science success and my advice on what you can do to overcome them.

Your data isn’t in one place.

Data scientists love data. Anything that’s been accomplished in machine learning has been accomplished with a combination of relatively few smart people and a ton of data. Now, this doesn’t mean that you have to be Google to make anything worth while with data science, far from it, but you do have to have something.

As more and more people have become used to living their lives on computers and smart phones, there’s an awful lot of data that’s distributed in different applications. One single (small) company can use: a purpose built ERP system, Xero for finance, HubSpot and Google Analytics and HotJar for marketing, JIRA for tracking development work, Mailchimp for sending emails and Wordpress for blogging. That’s a lot of data sources.

Not every project is going to use to data from every one of these sources, and I’m not saying that a solid data warehouse and expertly designed ETL processes are a must before attempting anything cool with data science, but a good data strategy begins with an awareness of where your data is. And that awareness, in turn, fuels what can be achieved.

The best thing to do before calling someone in to start work on your pie-in-the-sky predictive analytics project is to talk to the people in your business, see what they’re using to get their work done and work out where the gold might be for the experts to mine.

You don’t have buy-in

This is a big one. Statistics are thrown around all the time about the failure rate of data science projects, but I’m willing to bet that lack of buy-in accounts for half or more of all those failures.

Data science is a tricky business, and one with potentially long turnaround times. If you’re brave enough to be attempting anything new (in your business, or in all of business) there are going to be periods of confusion and/or sub-optimal results. That’s the way these things go.

At those moments, buy-in from senior stakeholders is crucial. People who combine the vision and foresight necessary to trust in the results of an experimental project and the gravitas to make sure it’s given enough priority to go through to completion are sadly few in number.

If these kinds of people are lacking where you work, and if you’re an engineer or analyst who likes to dream big, my best advice is to start small. Find a process or system that is a nudge away from using data science and push it over the edge. For example, you could try fitting an ARIMA model to a dashboard that right now only displays historical order levels.

Soon enough small changes like these will get attention and your suggestions for more experimental or longer-term projects will be listened to.

It’s easy to focus on senior buy-in, but something that’s equally detrimental to data science projects is a lack of buy-in from the people who keep the wheels turning in a business.

The inverse of the situation where no-one in management is listening to your great ideas, is the one in which senior staff are all so hyped about their newest experiments in machine learning that everybody else starts to worry; “Will they automate me out of a job?”, “is this the way the business is going?”

These fears are natural and, occasionally, very well founded. If machine learning is the next best thing to happen to technology since the internet (and maybe even electricity), you’d have to be pretty absent-minded to think that your experiments with it wouldn’t have any impact on your staff.

Anyone considering building out a data science team or refining processes with machine learning should be mindful of communicating these projects in a way that gets everybody in the business excited. Once that happens, your path to politics-free data science projects is nearly clear.

You don’t have the team

The great search for a data science unicorn is an exercise in futility. I’ve seen far too many job ads that describe the perfect candidate as someone with expertise in Natural Language Processing and Computer Vision. Not only are both of these problems AI-complete, but the most cutting-edge techniques to deal with them (Recurrent Neural Networks and Convolutional Neural Networks, respectively) are nearly whole fields of study by themselves.

Add to this the desire for data engineering skills, lots of post-University experience (and maybe even a PhD!) and domain expertise in whatever it is your business does, and you’re fighting an uphill battle.

All of this means that you never end up hiring the right person, and that your experiments in data science never get off the ground.

A way better idea is to hire a curious data analyst, pair them with a developer who’s familiar with your data infrastructure and have them help solve a legitimate business problem or create a new product, under the watchful guidance of a senior staff member.

When they stumble and trip, which they always will at the beginning, urge them to kill the project early, dust themselves off and run another experiment, this is science after all.

This way, over time, you’d have built a team of engaged and informed managers, data analysts with product building skills, and developers who know a little more about maths and statistics then they did before. That’s starting to look like a unicorn data science team.

You don’t understand data science

Not having a good grasp of what data science actually is, is the cause of all of the issues above.

Data science is a science. That means failures, unexpected results and, hopefully, breakthroughs. It means new methods, unusual approaches and fastidious research. It can mean either 10x improvements or 10% improvements. It will cost money. It will take time.

Many people bemoan the academic approach to scientific research; driven by personal interests, supported by grants, taking many years and only resulting in a handful of citations. But they forget the power of science when there is a pressing problem to be solved. The reprogrammable computer and the atomic bomb were both invented during World War 2, and both were miraculous leaps forward for technology.

Point your resources at an actual problem and get out of their way. Industry is one of the best places for scientists to be, but only if they are allowed to remain scientists. Industry is filled with open problems and opportunities for improvement, and modern companies have more data available to them than most in academia could believe. By all means have managers and progress reports, but make sure the team is allowed to experiment and investigate.

The best thing you can do to prepare for data science is to treat the money you’ll invest more as if it’s been spent in HR that if it’d been spent in IT. Don’t watch for the bottom line to jump up, just notice that things are ticking along smoothly, and be pleasantly surprised at the innovations that come along and change your company for the better.