1. 程式人生 > >Study Machine Learning Projects

Study Machine Learning Projects

There are many paths into the field of machine learning and most start with theory.

If you are a programmer then you already have the skills to decompose problems into their constituent parts and to prototype small projects in order to learn new technologies, libraries and methods. These are important skills for any professional programmer and these skills can be used to get started in machine learning, today.

These are important skills for any professional programmer and these skills can be used to get started in machine learning, today.

Self Study

Self Study
Photo by gfairchild, some rights reserved

You must learn the theory to be effective in machine learning, but you can use your interests and thirst for knowledge motivate you from working examples into mathematical understandings of algorithms.

In this post you will learn four strategies a programmer can follow to get started in machine learning. This is the path of the technician, which is practical and empirical and will require you to perform research and complete experiments in order to build up your own intuitions.

The four strategies are:

  1. Study a Machine Learning Tool
  2. Study a Machine Learning Dataset
  3. Study a Machine Learning Algorithm
  4. Implement a Machine Learning Algorithm

Read through these strategies and select one that you feel suits you the best, then execute with abandon.

1. Study a Machine Learning Tool

Select a tool or library that you like and learn how to use it well.

I recommend you start with an environment that provides tools for data preparation, machine learning algorithms and the presentation of results. Learning an environment like this will allow you to get good at the process of machine learning end-to-end which is more valuable to you than learning a specific data preparation technique or machine learning algorithm.

Alternatively, perhaps you are interested in a specific technique of family of techniques. You could use this as an opportunity to deep dive into a library or tool that offers these methods and master the technique by mastering the library that supplies access to the technique.

Study a Machine Learning Tool

Study a Machine Learning Tool
Photo by zzpza, some rights reserved

Some tactics you could follow for this strategy are:

  • Compare and contrast candidate tools from which you could choose.
  • Summarize the capabilities of your chosen tool.
  • Read and summarize the documentation for the tool.
  • Complete text or video tutorials for the tool and summarize the key learning points for each tutorial you complete.
  • Create tutorials for features or capabilities of the tool. Select things that you don’t know much about and create write a process for getting a result or record a 5-minute screencast on how to use the feature.

Some environments you should consider include: R, Weka, scikit-learn, waffles, and orange.

2. Study a Machine Learning Dataset

Select a dataset and understand it intimately and discover which algorithm class or type addresses it the best.

I recommend you select a modest sized dataset that fits into memory that may have been well studied before. There are excellent libraries of data sources available for you to browse and choose. Your objective is to understand the underlying problem that the data source represents, the structure in the dataset and the types of solutions that are most suited to the problem.

Use a machine learning or statistical environment to study the dataset. This will allow you to focus on the questions you are seeking to answer about the dataset rather than being distracted with learning about a given technique and learning how to implement it in code.

Study a Machine Learning Dataset

Study a Machine Learning Dataset
Photo by abhidg, some rights reserved

Some tactics that can help you with your study of an experimental machine learning dataset are:

  • Clearly describe the problem that the dataset represents.
  • Summarize the data using descriptive statistics.
  • Describe the structures you observe in the data and hypothesize about the relationships in the data.
  • Spot test a handful of popular machine learning algorithms on the dataset and discover which general class performs better than others
  • Tune well-performing algorithms and discover the algorithm and algorithm configuration that performs well on the problem

Some repositories of high-quality datasets you may like to consider are: UCI ML Repository, Kaggle and data.gov.

3. Study a Machine Learning Algorithm

Select an algorithm and understand it intimately and discover parameter configurations that are stable across different datasets.

I recommend that you start with an algorithm of modest complexity. Select an algorithm that is well understood, has many open source implementations from you to choose from and has few parameters for you to explore. Your objective is to build up intuitions for how the algorithm performs across a range of problems and parameter configurations.

Use a machine learning environment or library. This will allow you to focus on the behaviors of the algorithm as a “system” as opposed to concerning yourself with canonical mathematical descriptions and reference literature.

Study a Machine Learning Algorithm

Study a Machine Learning Algorithm
Photo by Unhindered by Talent, some rights reserved

Some tactics you can use when studying your chosen machine learning algorithm are:

  • Summarize the parameters of the system and the expected influences they have on the algorithm.
  • Select a range of datasets suited to the algorithm that are likely to elicit varied behaviors.
  • Select algorithm parameter configurations that you believe will elicit varied behaviors from the system and list the behaviors you may expect from the system.
  • Consider the behaviors of an algorithm that could be monitored as the algorithm is run over iterations of the algorithms update process or other interval of time.
  • Design small experiments using one or more combinations of datasets, algorithm configurations and behavior measures in order to answer a specific question and report results.

Your studies can be as simple or as complex as you like. At the higher-end you can explore so-called heuristics or rules of thumb for applying algorithms and empirically demonstrate whether they have merit and if so under what circumstances they correlate with successful outcomes.

Some algorithms you may consider to start with include: least squares linear regression, logistic regression, k-nearest neighbor classification, perceptron

4. Implement a Machine Learning Algorithm

Select an algorithm and implement or port an existing implementation to a language of your choice.

Select an algorithm of modest complexity to implement. I recommend performing some detailed research on the algorithm you which to implement, or select an implementation you like and port it to your chosen target programming language.

Implementing an algorithm by hand from scratch is a great way to learn about the myriad of micro-decisions that have to be made in transforming an algorithm description into a functioning system. By repeating this process with multiple algorithms you will quickly gain an intuition for how to read the mathematical descriptions of algorithms in research papers and books.

Implement a Machine Leaning Algorithm

Implement a Machine Learning Algorithm
Photo by Nic’s events, some rights reserved

Five tactics that may help you when implementing machine learning algorithms from scratch are:

  • Start by porting. Porting an open source algorithm implementation from one language to another will teach you how the algorithm is implemented and make it your own. It is the fastest way to get started and is highly recommended.
  • Select one algorithm description to work from and collect other algorithm descriptions to support your disambiguation of the primary reference material
  • Do not be afraid to reach out to algorithm authors, paper authors or even algorithm implementation authors to ask questions to help you disambiguate your understanding of the algorithm description.
  • Read lots of implementations of your target algorithm. Learn how different programmers interpret the algorithm description and turned it into code.
  • Do not get caught up on advanced methods. Many machine learning algorithms use advanced optimization methods in their core. Do not try to reimplement these methods unless that is the point of your project. Use a library that provides an optimization algorithm or use a simpler optimization algorithm that is easy to implement (like gradient descent) or is available to you in a library.

Small Projects Methodology

The four strategies being to a methodology I call “small projects”. It is an approach you can use to very quickly build up practical skills in technical fields of study, like machine learning. The general idea is that you design and execute on small projects that target a specific question you want to answer.

Small projects are small in a few dimensions to ensure that they completed and that you extract the learning benefits and move onto the next project. Below are constraints you should consider imposing on your projects:

  • Small in time: A project should not take any longer than 5-15 hours from inception to presentation of results. This will allow you to complete a small project in a week of nights and weekend time away from your 9-5 job.
  • Small in scope: A project should address the most narrow version of the question you are interested in and still be meaningful. For example, rather than addressing the problem “write a program that will tell me if tweet will be retweeted” in the general case, address the problem just for a specific twitter account for a given time period.
  • Small in resources: A project should be able to be completed on your desktop or laptop with a connection to the internet. You should not need exotic software, web infrastructure, or third party data or service. Collect the data you need to file, load it into memory and attack your narrow question using open source tools.

Additional Project Tips

The principle of these strategies is to take action and make use of your programmer skill set. Below are three tips to help you adjust your mindset in order to take action:

  • Write down what you learn. I recommend that you have a tangible work product for every step you take. This could be a note in a journal, a tweet, a blog post or an open source project. Each work product acts as an anchor and a milestone.
  • Do not write code unless that is the purpose of the project. This tip is not obvious but may be the biggest in terms of accelerating your understanding of machine learning.
  • The goal is for you to learn something not to create a unique resource. No one will read your studies or tutorials or notes on an algorithm, ignore this for now. They are your perspective and your work product to demonstrate that you now know something.

Summary

Here are the size strategies again with a clear one-liner for each to help you choose the one that is right for you.

  1. Study a Machine Learning Tool: Select a tool or library that you like and learn how to use it well.
  2. Study a Machine Learning Dataset: Select a dataset and understand it intimately and discover which algorithm class or type addresses it the best.
  3. Study a Machine Learning Algorithm: Select an algorithm and understand it intimately and discover parameter configurations that are stable across different datasets.
  4. Implement a Machine Learning Algorithm: Select an algorithm and implement or port an existing implementation to a language of your choice.

Pick One!

Which strategy would you choose and what will be your first step? Pick one and declare your intentions in a comment below.

相關推薦

Study Machine Learning Projects

Tweet Share Share Google Plus There are many paths into the field of machine learning and most s

sp3.1 Structuring Machine Learning Projects

分析與改進專案瓶頸:很多時候可能不知道下一步怎麼改善系統,錯誤的方法浪費大量時間 有這麼多策略 怎麼試   思維清晰知道要調整哪個引數 這些引數就像按鈕一樣啊 正交法:讓各種功能按鈕能夠分開 比如開車時候速度和方向 一個按鈕結合了其他按鈕

How to deliver on Machine Learning projects

As Machine Learning (ML) is becoming an important part of every industry, the demand for Machine Learning Engineers (MLE) has grown dramatically. MLEs comb

Case Study: Machine Learning vs. Natural Language Processing

Use of cookies: We our own and third-party cookies to personalise our services and collect statistical information. If you continue browsing the site, you

Common Pitfalls In Machine Learning Projects

Tweet Share Share Google Plus In a recent presentation, Ben Hamner described the common pitfalls

6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study

This goes back to what I originally stated. If you don't understand the basics, don't tackle an algorithm from scratch. For the Perceptron, let's go ahead

New machine learning technology to predict human blood pressure: Study

New York: Researchers, including one of an Indian-origin, have developed a wearable off-the-shelf and machine learning technology that can predict an indiv

Machine Learning Study Points To Lack Of Strategic Clarity

In this, the first in a two-part series, you will learn about artificial intelligence and machine learning, common missteps, success criteria, and how to t

Machine learning can help healthcare workers predict whether patients may require emergency hospital admission, new study finds

The research, published in the journal PLOS Medicine, suggests that using these techniques could help health practitioners accurately monitor the risks fa

Tune Machine Learning Algorithms in R (random forest case study)

Tweet Share Share Google Plus It is difficult to find a good machine learning algorithm for your

Build a Deep Understanding of Machine Learning Tools Using Small Targeted Projects

Tweet Share Share Google Plus Once you have chosen a machine learning tool you need to improve y

Study Guide to Machine Learning

Tweet Share Share Google Plus There are lots of things you can do to learn about machine learnin

Capital One Machine Learning Case Study

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

AWS Case Study: BuildFax & Amazon Machine Learning

The image above shows the machine-learning process used by BuildFax. It feeds known roof age and property characteristic data of buildings into

DigitalGlobe Machine Learning Case Study

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

a simple machine learning system demo, for ML study.

n-k mas ner study ont reg mode snapshot logout Machine Learning System introduction This project is a full stack Django/React/Redux app

machine learning--L1 ,L2 norm

lan font 更多 ora net 例如 參數 而已 內容   關於L1範數和L2範數的內容和圖示,感覺已經看過千百遍,剛剛看完此大牛博客http://blog.csdn.net/zouxy09/article/details/24971995/,此時此刻終於弄懂了那麽

Ng第十一課:機器學習系統的設計(Machine Learning System Design)

未能 計算公式 pos 構建 我們 行動 mic 哪些 指標 11.1 首先要做什麽 11.2 誤差分析 11.3 類偏斜的誤差度量 11.4 查全率和查準率之間的權衡 11.5 機器學習的數據 11.1 首先要做什麽 在接下來的視頻將談到機器

[Machine Learning (Andrew NG courses)]V. Octave Tutorial (Week 2)

img and learning text net con fonts http .net [Machine Learning (Andrew NG courses)]V. Octave Tutorial (Week 2)

Machine Learning in Action-chapter2-k近鄰算法

turn fma 全部 pytho label -c log eps 數組 一.numpy()函數 1.shape[]讀取矩陣的長度 例: import numpy as np x = np.array([[1,2],[2,3],[3,4]]) print x