資料分析系列精彩濃縮（二）

阿新 • • 發佈：2019-01-07

資料分析系列精彩濃縮（二）

那麼我們有了UCI提供的datasets，我們怎麼Perfect operation呢？

First，we download a data file to the localhost , such as crx.data file
we will use pure python operation crx.data file
step are as follows
- input : crx.data file
- output : A 2-D list
- it should look like
```
>>> output
[[data_0], [data_1], [data_2], ...]  
```
- individual data example
```
>>> data_[0]
['b', 30.83, 0, 'u', 'g', 'w', 'v', 1.25, 't', 't', '01', 'f', 'g', '00202', 0, '+']
```
- Mind the data types,Do't make all of them string.注意資料型別

my code is as follows，for reference only

 file_name = "E:\data\crx.data"
 data_ = open(file_name, 'r')
    # print(data_)
    lines = data_.readlines()
    output = []
    # never use built-in names unless you mean to replace it
    for list_str in lines:
        str_list = list_str[:-1].split(",")
        # keep it
        # str_list.remove(str_list[len(str_list)-1])
        data = []
        for substr in str_list:
            if substr.isdigit():
                if len(substr) > 1 and substr.startswith('0'):
                    data.append(substr)
                else:
                    substr = int(substr)
                    data.append(substr)
            else:
                try:
                    current = float(substr)
                    data.append(current)
                except ValueError as e:
                    if substr == '?':
                        substr = 'missing'
                    data.append(substr)
        output.append(data)
    return output

通過上面的操作，我們就可以感覺到已經做和資料相關的事情了，the importance of data types

ok back to the point , before you do anything

It is important for you to at least have a rough idea of what kind of data you are dealing with. For instance, if you have read through all the files in the data folder and the description on the website, you should at least know that:

This dataset consists of 690 credit card applicants' personal information and whether or not they are approved for the credit card.
Each data entry has 15 attributes, and data types of each attribute are on the website
- we see that A2, A3, A8, A11, A14, A15 are continuous (number)
- All others are categorical (choices)
37 cases (5%) have one or more missing values
This dataset has 2 classes, positive and negative, meaning approved and declined

If you haven't already read through all these information, go back and try to capture and understand your dataset first

Here is the link:

https://archive.ics.uci.edu/ml/datasets/Credit+Approval

通過對資料檔案和網站上的描述（By describing data folders and website )
我們已經瞭解了這些資料實際是幹什麼用的
也知道了python解析出來的每條資料對應的屬性和分類

既然知道了這些資料的attribute and classify，那就期待進一步Perfect operation吧。。。

Decmber 28.2018

資料分析系列精彩濃縮（二）

資料分析系列精彩濃縮（二）那麼我們有了UCI提供的datasets，我們怎麼Perfect operation呢？ First，we download a data file to the localhost , such as crx.data file we will use pur

資料分析系列精彩濃縮（三）

資料分析（三）在分析UCI資料之前，有必要先了解一些決策樹的概念（decision tree）此處推薦一個關於決策樹的部落格地址： http://www.cnblogs.com/yonghao/p/5061873.html 決策樹（decision tree (DT)）的基本特徵

數據分析系列精彩濃縮（三）

param 無法 gin 打印 can tput swe 數據分析 inf 數據分析（三）在分析UCI數據之前，有必要先了解一些決策樹的概念（decision tree）此處推薦一個關於決策樹的博客地址： http://www.cnblogs.com/yonghao

資料分析那點事兒（二）

在之前我們給大家講了講什麼是資料分析以及資料分析的目的，資料分析就是通過使用合適的方法進行統計，統計也不是隨隨便便的統計的，需要找對方法。統計分析方法對收集來的大量資料進行分析，提取有用資訊和形成結論而對資料加以詳細研究和概括總結的過程。而資料分析的目的就是通過分析資料找到企業未來的發展情況。今天就給大家

資料分析工具之Pandas（二）轉載

一、Pandas統計計算和描述示例程式碼： import numpy as np import pandas as pd df = pd.DataFrame(np.random.randn(5,4), columns = ['a', 'b', 'c', 'd']) print(d

資料分析行業的誤區（二）

在前面的文章中我們給大家介紹了資料分析行業的誤區，就是很多人認為自己可以擔任資料分析師的工作，並能夠將其發展為資料科學家。其實並不是這樣的，其中的原因我們也給大家介紹了。而資料分析行業中第二個誤區就是資料分析是資料科學的良好訓練。其實並不是這樣的，資料分析只是資料分析，而資料科學則是資

利用Python資料分析：pandas入門（二）

import pandas as pd import numpy as np from pandas import Series,DataFrame data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada'],

Python資料分析之pandas學習（二）

有關pandas模組的學習與應用主要介紹以下8個部分： 1、資料結構簡介：DataFrame和Series 2、資料索引index 3、利用pandas查詢資料 4、利用pandas的DataFrames進行統計分析 5、利用pandas實現SQL操作 6、利用panda

Python資料分析之numpy學習（二）

我們接著《Python資料分析之numpy學習（一）》繼續講解有關numpy方面的知識！統計函式與線性代數運算統計運算中常見的聚合函式有：最小值、最大值、中位數、均值、方差、標準差等。首先來看看陣列元素級別的計算： In [94]: arr11 = 5-np.

資料結構與演算法筆記（二）複雜度分析

2. 複雜度分析 2.1 什麼是複雜度分析資料結構和演算法的本質：快和省，如何讓程式碼執行得更快、更省儲存空間。演算法複雜度分為時間複雜度和空間複雜度，從執行時間和佔用空間兩個維度來評估資料結構和演算法的效能。複雜度描述的是演算法執行時間（或佔用空間）與資料規模的增長關

資料分析的資料架構知識詳解（二）

我們在前面的文章中提到了BI系統，從文章中我們不難發現BI系統處理資料的時候都是很有效的，但是當資料量過大的時候，我們系統的效能就會弱了很多。當然了，如果我們處理的資料在TB或者TB以上的資料量的時候，這個系統根本就不能夠正常執行，所以，我們就需要解決這個問題。大家都知道資料庫的規則是有很多的，資料庫

靜態分析之資料流分析與 SSA 入門（二）

什麼是靜態單賦值 SSA SSA 是 static single assignment 的縮寫，也就是靜態單賦值形式。顧名思義，就是每個變數只有唯一的賦值。以下圖為例，左圖是原始程式碼，裡面有分支， y 變數在不同路徑中有不同賦值，最後列印 y 的值。右圖是等價的 SS

Angular 5.x 系列教程筆記（二）——架構分析

前言 Angular 2.x 4.x 5.x 的逐個版本，遵循了模組化的思想，架構以及應用，相對於1.x的版本有了很大的改進，從專案中的使用來看，有很大的提升，今天我們就來看一下Angular 5.x架構的精髓所在。主要的構造塊 Angular整體來

【D3.js資料視覺化系列教程】--（二）最簡單的開始：新增元素

1. 新增元素語法：[selection].append("p"); 2. 怎麼做？將D3.js解壓到桌面，同時在桌面建立一個index.html<html> <head>

CDN HTTPS 最佳實踐系列——HTTP/2（二）

https 證書 cdn 背景HTTP/2 是最新的 HTTP 協議，已於2015年5月份正式發布，Chrome、 IE11、Safari 以及 Firefox 等主流瀏覽器已經支持 HTTP/2 協議。阿裏雲 CDN 在2016年7月份開始全網支持 HTTP/2，是國內第一家全網支持 HTTP/

Exchange 2016異地容災系列-路由器部署（二）

不同配置步驟 server 域控制器環境 alt 安裝啟用路由和遠程訪問功能搭建部署Exchange 2016異地容災環境，我們需要有兩個AD站點，在本次實驗當中，不可避免的需要有兩個不同網段的IP，所以需要部署一個路由器，可以使用windows server 2

Person Re-identification 系列論文筆記（二）：A Discriminatively Learned CNN Embedding for Person Re-identification

triplet put ali com multi 深度學習 native alt 出現　　A Discriminatively Learned CNN Embedding for Person Re-identification Zheng Z, Zheng L, Ya

用GraphX分析伴生網絡（二）

math 需要期望在一起 pregel 測試個數 maps shuf 8. 過濾噪聲邊在當前的伴生關系中，邊的權重是基於一對概念同時出現在一篇論文中的頻率來計算的。這種簡單的權重機制的問題在於：它並沒有對一對概念同時出現的原因加以區分，有時一對概念同時出現是由於它們

資料結構和演算法緒論（二）

1、演算法概念不同的演算法可以提高計算相同算術題的效率，那麼演算法的研究就變得有意義了。 2、演算法的特性輸入輸出有窮性（執行有限的步驟）確定性（每一個步驟僅有一個含義）可行性 3、演算法設計要求沒有無法錯誤、有合法輸入和輸出 4、演算法效率度量方法：事前分析估算方法

Android之測量APP效能-分析和除錯 APK（二）

分析和除錯預構建 APK Android Studio 3.0 允許您分析和除錯 APK，無需先從 Android Studio 專案構建這些 APK。不過，您需要確保使用可除錯版本的 APK。要開始除錯 APK，請在 Android Studio Welcome 歡迎螢幕中點選&nbs

資料分析系列精彩濃縮（二）

資料分析系列精彩濃縮（二）

那麼我們有了UCI提供的datasets，我們怎麼Perfect operation呢？

ok back to the point , before you do anything

既然知道了這些資料的attribute and classify，那就期待進一步Perfect operation吧。。。

Decmber 28.2018

相關推薦