1. 程式人生 > >python學習筆記 Day 18 下載資料 及 Web API

python學習筆記 Day 18 下載資料 及 Web API

Day 18 下載資料 及 Web API

  • python常用模組小結
    python常用模組

  • CSV資料檔案訪問分析

    • 使用CSV
    import csv
    
    filename = 'sitka_weather_07-2014.csv'
    with open(filename) as f:
    	reaer = csv.reader(f)
    	header_row = next(reader)
    
    • enumerate()函式:enumerate() 函式用於將一個可遍歷的資料物件(如列表、元組或字串)組合為一個索引序列,同時列出資料和資料下標,一般用在 for 迴圈當中。
      enumerate
      (sequence, [start=0])
      Sample:
      	with open(filename) as f:
      		reader = csv.reader(f)
      		header_row = next(reader)
      		for index, column_header in enumerate(header_row):
      			print (index, column_header)
      
    • 遍歷csv檔案並提取資料:for + append
      with open(filename) as f:
      reader = csv.reader(f)
      header_row = next(reader)
      dates, highs, lows = [], [], [] for row in reader: current_date = datetime.strptime(row[0], "%Y-%m-%d") high = int(row[1]) low = int(row[3]) dates.append(current_date) highs .append(high) lows.append(low)
    • 錯誤處理
      with open(filename) as f:
      reader = csv.reader(f)
      header_row = next(reader)
      
      dates,
      highs, lows = [], [], [] for row in reader: try: current_date = datetime.strptime(row[0], "%Y-%m-%d") high = int(row[1]) low = int(row[3]) except ValueError: print (current_date, 'missing data') else: dates.append(current_date) highs .append(high) lows.append(low)
  • JSON格式

    • pygal.i18n 不存在,No module named 'pygal.i18n’錯誤:

      • 改用pygal_maps_world.i18n:
        • OS X
          $ pip install pygal_maps_world
          
        • Windows
          \> python -m pip install pygal_maps_world
          
      • 將’ from pygal.i18n import COUNTRIES '改為
        from pygal_maps_world.i18n import COUNTRIES		```
        
        
    • module ‘pygal’ has no attribute ‘Worldmap’ 錯誤

      • 改用‘pygal_maps_world’
        import pygal_maps_world.maps
        
        wm = pygal_maps_world.maps.World()
        
  • Web API

    • Web API用於與網站進行互動,請求資料(以JSON或CSV返回)。

    • requests包,讓python能向網站請求資訊以及檢查返回的響應。

      • 安裝requests包
        • OS X
      $ pip install --user requests
      
        - Windows
      
      $ python -m pip install --user requests
      
    • 處理並響應字典

      	import requests
      	
      	#執行API呼叫並存儲響應
      	url = "https://api.github.com/search/repositories?q=language:python&sort=stars"
      	r = requests.get(url)
      	print ("Status code: ", r.status_code)
      	
      	#將API響應儲存在一個字典變數中
      	response_dict = r.json()
      	print ("Total repositories: ", response_dict['total_count'])
      	
      	#探索有關倉庫的資訊
      	repo_dicts = response_dict['items']
      	print ("Repositories returned: " , len(repo_dicts))
      	
      	#研究第一個倉庫
      	repo_dict = repo_dicts[0]
      	print ("\nKeys:", len(repo_dict))
      	for key in repo_dict.keys():
      		print (key)
      
    • 進一步研究‘倉庫’

      	#研究第一個倉庫
      	for repo_dict in repo_dicts:
      		print ("\nSelcted information about first repository: ")
      		print ('Name: ' + repo_dict['name'])
      		print ('Owner: ' , repo_dict['owner']['login'])
      		print ('Start: ' , repo_dict['stargazers_count'])
      		print ('Repository: ', repo_dict['html_url'])
      		print ('Created: ', repo_dict['created_at'])
      		print ('Updated: ', repo_dict['updated_at'])
      		print ('Description: ', repo_dict['description'])
      
    • ‘NoneType’ object has no attribute ‘decode’ 錯誤:執行下面的程式碼時出現上述錯誤:

      	names, plot_dicts = [], []
      	for repo_dict in repo_dicts:
      		names.append(repo_dict['name'])
      		plot_dict = {
      			'value': repo_dict['stargazers_count'],
      			'label': repo_dict['description'] ,
      			}
      		plot_dicts.append(plot_dict)
      		
      	#視覺化
      	my_style = LS('#333366', base_style = LCS)
      	
      	my_config = pygal.Config()
      	my_config.x_label_rotation = 45
      	my_config.show_legend = False
      	my_config.title_font_size = 24
      	my_config.label_font_size = 14
      	my_config.major_label_font_size = 18
      	my_config.truncate_label = 15
      	my_config_show_y_guides = False
      	my_config.width = 1000
      	
      	chart = pygal.Bar(my_config, style = my_style)
      	chart.title = 'Most-starred Python Projects on GitHub'
      	chart.x_labels = names
      	
      	chart.add('', plot_dicts)
      	chart.render_to_file('python_repos.svg')
      

      參考下面兩種解決辦法:

      第一種方法,即:

      'label': str(repo_dict['description']),
      

      改為:

      'label': str(repo_dict['description']),
      

      既簡單又方便。

    • Hacker News API,學習以下三個知識點:

      • 根據Web API呼叫返回的列表,動態生成WEB API呼叫網址,並再次呼叫WEB API訪問並獲取資料;
      • 字典的dict.get()函式,不確定某個鍵是否包含在字典中時,可使用方法dict.get(),它在指定的鍵存在時返回與之相關的值,在指定的鍵不存在時返回第二個實參指定的值
      • 模組operator中的函式item getter(),以及與sorted()函式的配合使用。這個函式傳遞鍵’comments’,它將從這個列表中的每個字典中提取與鍵’comments’相關的值,函式sorted()將根據這種值對列表進行排序
      import requests
      from operator import itemgetter
      
      #執行API呼叫並存儲響應
      url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
      r = requests.get(url)
      print ('Status code: ', r.status_code)
      
      #處理有關每篇文章的資訊
      submission_ids = r.json()
      #建立submission_dicts空列表,用於儲存熱門文章字典
      submission_dicts = []
      
      #取前30個熱門文章ID
      for submission_id in submission_ids[:30]:
      	#對於每篇文章,都執行一個API呼叫
      	#根據儲存在submission_ids列表中的ID生成URL
      	url = ('https://hacker-news.firebaseio.com/v0/item/' + 
      		str(submission_id) + '.json')
      	submission_r = requests.get(url)
      	print(submission_r.status_code)
      
      	response_dict = submission_r.json()
      
      	#為當前處理的文章生成一個字典	
      	submission_dict = {
      	'title': response_dict['title'],
      	'link': 'http://news.ycombinator.com/item?id=' + str(submission_id),
      	'comments': response_dict.get('descendants', 0)
      	}
      	submission_dicts.append(submission_dict)
      
      submission_dicts = sorted(submission_dicts, key = 
      	itemgetter('comments'),reverse = True)
      
      for submission_dict in submission_dicts:
      	print ('\nTitle: ', submission_dict['title'])
      	print ('Discussion link: ', submission_dict['link'])
      	print ('Comments: ', submission_dict['comments'])
      

      上面這段程式碼返回的資料結果:

      [{"title": "Glitter bomb tricks parcel thieves", 
      "link": "http://news.ycombinator.com/item?id=18706193", 
      "comments": 304}, 
      {"title": "Stop Learning Frameworks", 
      "link": "http://news.ycombinator.com/item?id=18706785", 
      "comments": 175}, 
      {"title": "Reasons Python Sucks", 
      "link": "http://news.ycombinator.com/item?id=18706174", 
      "comments": 175}, 
      {"title": "I need to copy 2000+ DVDs in 3 days. What are my options?", 
      "link": "http://news.ycombinator.com/item?id=18690587", 
      "comments": 167}, 
      {"title": "SpaceX Is Raising $500M at a $30.5B Valuation", 
      "link": "http://news.ycombinator.com/item?id=18706506", 
      "comments": 139}, 
      .........
      ]