1. 程式人生 > >編程英語之KNN算法

編程英語之KNN算法

報告 into book knn算法 rep ken self. 分類器訓練 isa

School of Computer Science

The University of Adelaide

Artificial Intelligence

Assignment 2

Semester 1, 2018

due 11:55pm, Thursday 14th May 2018

Introduction

介紹

In this assignment, you will develop several classification models to classify noisy input images into the classes square or circle

, as shown in Fig. 1

在這個作業中,你將開發各自的分類模塊對嘈雜的輸入圖像對正方形或則圓形進行分類,如圖一所示

技術分享圖片 技術分享圖片

Figure 1: Samples of noisy images labelled as square (left)and circle (right).

圖一被標記為正方形()和圓形()的噪聲圖像的樣本。

Your classification models will use the training and testing sets (that are available with this assignment) containing many image samples labelled as square

or circle. Your task is to write a Python code that can be run on a Jupyter Notebook session, which will train and validate the following classification models:

您的分類模型將使用包含許多標記為正方形或圓形的圖像樣本的訓練和測試集(該任務可用)。您的任務是編寫一個python代碼,該代碼可以在一個jupyter筆記本會話上運行,它將訓練並驗證以下分類模型:

1) K Nearest neighbour (KNN) classifier [35 marks]

. For the KNN classifier, you can only use standard Python libraries (e.g., numpy) in order to implement all aspects of the training and testing algorithms. You will need to implement two functions: a) one to build a K-d tree from the training set (this function takes the training samples and labels as its parameters), and b) another to test the KNN classifier and compute the classification accuracy, where the parameters are K and the test images and labels. Using matplotlib, plot a graph of the evolution of classification accuracy for the training and testing sets as a function of K, where K = 1 to 10. Clearly identify the value of K, where generalisation is best.

1)k近鄰(KNN)分類器[35]

對於KNN分類器,您只能使用標準的Python(例如numpy)來實現訓練和測試算法的所有方面。你需要實現兩個方法:a)一個從訓練集創建K-d樹(該函數以訓練樣本和標簽為參數),和b)另一個去測試KNN分類器和計算分類精度

Decision tree classifier [35 marks]. For the decision tree classifier, you can only use standard Python libraries (e.g., numpy) in order to implement all aspects of the training and testing algorithms. Essentially you will need to implement two functions: a) one to train the decision tree using the training samples and labels plus a pre-pruning parameter indicating the minimum information content before stop splitting, and b) another to test the decision tree and compute the classification accuracy (similarly to the KNN classifier, the test function takes as one of its parameters the test images and labels and returns the classification accuracy). Using matplotlib, plot a graph of the evolution of classification accuracy for the training and testing sets as a function of the information content, where information content = 0 to 0.5 bits. Clearly identify the value of information content, where generalisation is best.

決策樹分類器[35分]對於決策樹分類器,您只能使用標準的Python庫(例如,numpy)來實現訓練和測試算法的所有方面。本質上需要實現兩個功能:1)使用訓練樣本的決策樹和標簽+ pre-pruning參數表示停止分裂之前的最小信息內容的一個訓練,和b)另一個測試決策樹,計算分類精度(類似於KNN分類器,測試函數接受一個參數並返回測試圖片和標簽分類精度)。使用matplotlib,將訓練和測試集的分類精度的演變圖作為信息內容的函數,其中信息內容= 0到0.5位。清楚地確定信息內容的價值,概括是最好的。

2) Convolutional neural network (CNN) classifier [20 marks]. For the convolutional neural network, you are allowed to use Keras using TensorFlow backend, similar to the example shown in the code provided. The CNN structure is the lenet structure used in lecture. Using matplotlib, please plot a graph of the evolution of accuracy for the training and testing sets as a function of the number of epochs, where the max number of epochs is 200. Clearly identify the value of information content, where generalisation is best.

卷積神經網絡(CNN)分類器[20分].對於卷積神經網絡,您可以使用TensorFlow(張量流)後端使用Keras,類似於提供的示例代碼中的那樣。CNN的結構是課堂上使用的lenet結構。使用matplotlib,請繪制將訓練和測試集的精度的進化圖成一個時代的函數,eprochs(時期,時代)的最大數是200,最好清楚確定概括信息內容的價值

A sample code that trains and tests a multi-layer perceptron classifier that can run on a Jupyter Notebook session is provided, and it is expected that the submitted code can run on a Jupyter Notebook session in a similar manner. A held-out test set will be used to test the generalisation of the implemented classification models, but this held-out set will only be available after the assignment deadline – please note that this held-out set will contain samples obtained from the same distributions used to generate the training and testing sets.

提供了一種測試多層感知器分類器的示例代碼,它可以運行在Jupyter筆記本會話上,並且預期所提交的代碼可以以類似的方式運行在Jupyter筆記本上。

held-out測試集將用於測試實現的分類模型的概括,但這held-out集在作業的最後期限之後提供,請註意,這個held-out集將包含獲從被用於生成訓練和測試的集合的相同的分布中獲得的樣本。

You must write the program yourself in Python, and the code must be a single file that can run on a Jupyter Notebook session (file type .ipynb). You will only get marks for the parts that you implemented yourself. If you use a library package or language function call for training or testing a KNN or a Decision Tree classifier, then you will be limited to 50% of the available marks (noting that this assignment is a hurdle for the course). If there is evidence you have simply copied code from the web, you will be awarded no marks and referred for plagiarism

你必須使用python編寫程序,並且代碼必須是一個可以運行在Jupyter筆記本會話(文件類型ipynb)單獨的文件,你將只有在哪些你自己實現的部分獲得分數,如果您使用一個庫包或語言函數調用訓練或測試一個KNN或決策樹分類器,那麽你的有效分數將被限制在50%(註意這個作業是本課程的一個障礙),如果有證據表明你只是從網上復制了代碼,那麽你將不會被授予任何分數,並且視為剽竊。

Submission

提交

You must submit, by the due date, two files:

你必須在截止日期之前提交兩個文件

1. ipynb file containing your code with the three classifiers and all implementations described above

ipynb 文件包含你的三個分類器和以上描述的所有實現代碼

2. pdf file with a short written report detailing your implementation in no more than 1 page, and the following results:

pdf文件,一個簡短的書面報告,在不超過1頁的情況下詳細說明你的實現,和以下結果:

a) The training and testing accuracies at the best generalisation operating point for each type of classifier, using a table [5 marks]: 【通過表最好概括每種類型分類器訓練和測試精度工作點】

Training Accuracy

Testing Accuracy

K=1 NN

K=10 NN

DT (IC = 0 bits)

DT (IC = 0.5 bits)

CNN

b) Running time for training and testing algorithms accuracies of each type of classifier, using a table [5 marks]: 通過表記錄訓練和測試算法的每一種分類器的精度

Training Time

Testing Time

K=1 NN

K=10 NN

DT (IC = 0 bits)

DT (IC = 0.5 bits)

CNN

c) Bonus question: How can the classification accuracy of the decision tree classifier be improved? Please implement your idea (hint: dimensionality reduction) [10 marks].附加問題,如何提高決策樹分類器的精度?請實現你的想法(提示,維度減少)。,

Total number of marks: 100 + 10 bonus marks

總分數:100 + 10分。

This assignment is due 11.55pm on Thursday 14th May, 2018. If your submission is late, the maximum mark you can obtain will be reduced by 25% per day (or part thereof) past the due date or any extension you are granted.

這個作業的截止時間是五月十四星期三的晚上十一點五十五分,如果你提交的晚,每超過一天你獲得的最大分數會減少25%(或則一部分)

This assignment relates to the following ACS CBOK areas: abstraction, design, hardware and software, data and information, HCI and programming.

這個作業涉及以下ACS CBOK領域:抽象、設計、硬件和軟件、數據和信息、HCI和編程。

編程英語之KNN算法