1. 程式人生 > >Steps to Get Started in Machine Learning: The Top

Steps to Get Started in Machine Learning: The Top

Getting started is much easier than you think.

In this post I show you the top-down approach for getting started in applied machine learning. You will discover the four steps to this approach. They should feel familiar because it’s probably the same top-down approach that you used to learn how to program. Namely, get the basics, practice a lot and dive into the details later after you’re hooked.

At the end of the post, I link to my mini-course that can shortcut the path and give you step-by-step instructions to follow to start and practice applied machine learning.

Beginners are Different

Beginners have an interest in machine learning but are not sure how to take that first step. They are confused because the material on blogs and in courses is almost always pitched at an intermediate level.

machine learning

Machine Learning
Photo by Erik Charlton, some rights reserved.

Typical books and university-level courses are bottom-up. They teach or require the mathematics before grinding through a few key algorithms and theories before finishing up. This can be a good approach if you have the time, patience and appropriate background. Not everyone has so much free time or the desire to move through so much low-level material before getting to the meat and potatoes of applied machine learning.

I get a lot of emails from beginners asking for advice on how to get started in machine learning. It’s a tough problem, because there are so many possibilities and so many things I could recommend. I tell them not to dive into the math and not to go straight back to school.

The students and professionals I advise are almost always programmers or have an engineering background, and I tell them that there is a much more efficient path into machine learning for them.

Solution is Top-Down

My advice for beginners in machine learning is to take a top-down approach.

Beginners are Different

Beginners are Different
Photo by mikebaird, some rights reserved.

I advise beginners to take a faster route to discover what applied machine learning is all about before dedicating huge time resources into studying the theory. It makes sense and it is familiar because it’s the way you get excited about programming first, before diving in and making it a focus of study and career.

The top down approach is to quickly learn the high-level step-by-step process of working through a machine learning problem end-to-end using a software tool. With modern platforms, it is possible to work through small problems in minutes to hours using complex state-of-the-art algorithms and rigorous validation and statistical hypothesis testing, all performed automatically within the tools.

It is after you are familiar and confident with the process that I advise you start looking deeper into the algorithms and theory side of machine learning. How first, why later.

We can summarize this top-down approach as follows:

  1. Learn the high-level process of applied machine learning.
  2. Learn how to use a tool enough to be able to work through problems.
  3. Practice on datasets, a lot.
  4. Transition into the details and theory of machine learning algorithms.

Applied Machine Learning Process

I have written a lot about the process of applied machine learning. I advocate a 6-step process for classification and regression type problems, the common problem types at the heart of most machine learning problems. The process is as follows:

  1. Problem Definition: Understand and clearly describe the problem that is being solved.
  2. Analyze Data: Understand the information available that will be used to develop a model.
  3. Prepare Data: Discover and expose the structure in the dataset.
  4. Improve Results: Leverage results to develop more accurate models.
  5. Present Results: Describe the problem and solution so that it can be understood by third parties.
Applied Machine Learning Process Overview

Applied Machine Learning Process Overview

By following this structured process on each problem you work through, you enforce a minimum level of rigour and dramatically increase the likelihood of getting good (or more likely excellent) results.

Use the Weka Machine Learning Workbench

The software platform for beginners to learn when getting started is the Weka Machine Learning Workbench.

I think the decision to use Weka when getting started is a complete no-brainer because:

  • It provides a simple graphical user interface that encapsulates the process of applied machine learning outlined above.
  • It facilitates algorithm and dataset exploration as well as rigours experiment design and analysis.
  • It is free and open source, licensed under the GNU GPL.
  • It is cross-platform and runs on Windows, Mac OS X and Linux (requires a Java virtual machine).
  • It contains state-of-the-art algorithms with an impressive abundance of Decision Trees, Rule Based Algorithms and Ensemble methods, as well as others.
Weka Explorer Interface with the Iris dataset loaded

Weka Explorer Interface with the Iris dataset loaded

You can see for yourself how easy the platform is to use, I have written a number of 5-minute Weka tutorials, such as:

Additionally, if you get right into Weka, you can run algorithms from the command line and integrate algorithms into your application via the application programming interface. It is an extensible platform and you can quickly and easily implement your own algorithms to the interface and use them in the GUI.

Practice, Practice, Practice, on Datasets

Once you are up and running with Weka, you need to practice the 6-step process of applied machine learning.

The Weka installation includes a data directory with many standard machine learning datasets, most taken from actual scientific problem domains. There is also a wealth of excellent datasets to trial and learn from on the UCI Machine Learning Repository. These datasets are an excellent place for you to get started learning and practicing.

  • The datasets are small and easily fit into memory.
  • The small size of the datasets also means that algorithms and experiments are quick to run.
  • The problems and data are real, including noise, biases in sampling and data collection that you need to consider.
  • The data is well understood so that you can leverage what is known and openly discuss the data with peers.
  • There are known “good results” for you to compare to and recreate.

You can choose your own level of detail on each step of the structured process. I recommend spending no more than one-hour on each step when getting started. You can do and learn a lot about a problem in one hour with Weka, especially when designing and running experiments. This will keep your motivation and project velocity high.

lots of data

Lots of Data
Photo attributed to cibomahto, some rights reserved

The structured process encourages you to make observations and record results and findings as you work through a given problem. It is wise to keep these observations and findings together, perhaps in a project directory or Github project.

I recommend blogging about each of your projects, even each step of a project as you complete it. You can do this on your own blog (if you have one) or as Facebook or Google+ updates (that now support images and text formatting). I like the honesty that publicly blogging projects encourages. It also provides an indicator to your peers and colleagues that you are interested, serious about and developing some chops in applied machine learning.

Transitioning Deeper

Because the projects are small and the process is structured, you can quickly learn a lot about a problem and move through a number of projects. You can also collect data on problems of your own and use the same process to deliver useful and meaningful results on projects at work or for your own benefit.

The next step is to dive deeper into the algorithms and learn why they work and how to get more out of them. I recommend transitioning deeper into the subject by picking up the book Data Mining: Practical Machine Learning Tools and Techniques. It is written by the original authors of the Weka platform and provides a treatment of how and why the algorithms used in Weka work and other deeper concerns of machine learning.

The deeper knowledge will allow you to get more from the platform on your own custom problems. It will also allow you to better appreciate the methods in Weka and you will start to build an intuition as to the mapping between problem and algorithm types

Summary

In this post you discovered the top-down approach to getting started in machine learning that advocates learning the specific structured process, a powerful tool that supports this process and to practice applied machine learning in a series of focused projects.

You learned that this is the exact opposite of the traditional bottom-up approach that expects you to perform the heavy lifting in the field first, (before you even know if the field is right for you) and leaves you to figure out how to apply algorithms in practice all by yourself.

相關推薦

Steps to Get Started in Machine Learning: The Top

Tweet Share Share Google Plus Getting started is much easier than you think. In this post I show

How To Get Started In Machine Learning: A Self

Tweet Share Share Google Plus Specifically, the original poster of the question had completed t

How To Get Started With Machine Learning Algorithms in R

Tweet Share Share Google Plus R is the most popular platform for applied machine learning. When

How to Get Started with Machine Learning in Python

Tweet Share Share Google Plus The Python conference PyCon2014 has held recently and the videos f

How To Get Started With Machine Learning in R (get results in one weekend)

Tweet Share Share Google Plus How do you get started with machine learning in R? R is a large an

Five steps for getting started in machine learning: Top data scientists share their tips

If you want to carve out a career in machine learning then knowing where to start can be daunting. Not only is the technology built on college-level math,

How Do I Get Started In Machine Learning?

Tweet Share Share Google Plus I get daily emails asking the question: How do I get started in ma

Eight Easy Steps To Get Started Learning Artificial Intelligence

What are the best sources to study machine learning and artificial intelligence? You're in luck - now is better than ever before to start studying machine

Top 4 Steps for Data Preprocessing in Machine Learning

Data Processing in the machine learning is a data mining technique. In this process, the raw data gathered and you analyze the data to find a way to transf

4 Practical Steps to Get Started with Artificial Intelligence CLEARPRISM

With so many technologies and use cases, getting started with artificial intelligence (AI) initiatives and deployments can be a daunting task for business

[Research] Help relating to a theorem in machine learning | AITopics

This is related to a theorem that I have proved and its relation (or not) to an existing result. Essentially, I have shown that PAC-learning is undecidable

A Quick Introduction to Text Summarization in Machine Learning

A Quick Introduction to Text Summarization in Machine LearningText summarization refers to the technique of shortening long pieces of text. The intention i

How to Get Started with Deep Learning for Natural Language Processing (7

Tweet Share Share Google Plus Deep Learning for NLP Crash Course. Bring Deep Learning methods to

How I Got Started In Machine Learning

Tweet Share Share Google Plus I get a lot of emails asking about how I got interested in machine

How To Get Better At Machine Learning

Tweet Share Share Google Plus Colorado Reed from Metacademy wrote a great post recently titled “

Practical Advice for Getting Started in Machine Learning

Tweet Share Share Google Plus David Mimno is an assistant professor in the Information Sciences

Use Watson Knowledge Studio to build a custom machine learning model in the medical domain

About this webcast One of the key benefits of building a machine learning annotator is the ability to train Watson in a complex domain such as medicine.

Facebook's PyTorch plans to light the way to speedy workflows for Machine Learning • DEVCLASS

Facebook's development department has finished a first release candidate for v1 of its PyTorch project – just in time for the first conference dedicated to

How Facebook Uses Bayesian Optimization to Conduct Better Experiments in Machine Learning Models

How Facebook Uses Bayesian Optimization to Conduct Better Experiments in Machine Learning ModelsHyperparameter optimization is a key aspect of the lifecycl

Regularization in Machine Learning: Connect the dots

Following are the various steps we will walk together and try gaining an understanding. In this post, we will consider Linear Regression as the algorithm w