Codeforces科學刷題指南，一圖一表便夠了

阿新 • • 發佈：2018-12-24

簡要介紹如何科學地刷演算法題，來提高自己解決問題的能力，並利用爬蟲抓取Codeforces的題庫，來分析題目難度以及演算法分類的關係

無論做什麼事，多嘗試、找套路、然後刻意練習都是至關重要的。對資訊科學競賽（Olympiad in Informatics）愛好者來說，找套路的關鍵就是多刷題。然而題海茫茫，單以Codeforces來說，截止2017年1月3日，總共有3206道題。換言之，如果一個人足夠勤奮，能夠一天刷三道題，那也得快三年才能把題目刷完，而且題目數量還在擴充。所以盲目的刷題簡直是浪費生命，本人從16年上半年一直按照題目解決人數從高到低排序，不斷的刷水題。顯然易見，刷水題的後果就是沒有長進，熟悉的還是熟悉，不懂的還是不懂，唯一讓自己開心的就是刷題數量的累積。所以科學刷題的本質在於不斷挑戰新高度，在一個平臺練習足夠久足夠熟練之後，就要進入下一個難度平臺。為了方便大家，我把Codeforces上截止2017年1月3日的所有題目的基本資訊用爬蟲收集了下來，並存儲到excel裡。更進一步，本文試圖分析不同演算法在不同難度等級上的出現頻率分佈，以及不同演算法在不同難度等級上被解決次數的分佈。最後，我會簡要介紹的我的刷題觀，以及如何爬取Codeforces上的資訊。

先說結論

一張圖

Codeforces_Algorithms_Tag_Frequency.jpg 上面這張圖反映了不同演算法（第一列）在不同問題難度（第一行）上的頻率分佈，基於該圖，大概就可以知道在什麼樣的水平下應該掌握什麼樣的演算法。不過這裡我沒有區分Div1和Div2之間的差別，僅僅是按照題號（A、B、C等等）來推斷難度。可以看到對簡單的A題而言，大部分都是考察基本的程式設計功底，諸如implementation（大概就是題目說什麼，你做什麼就是了），math（四則運算、取模取整等等）以及brute force（暴力列舉）。而隨著難度的增加，比如說E題，主要就在於考察對dp（動態規劃），data structures（資料結構）。當然了，從圖中也可以看出，高難度題目主要在math，geometry（計算幾何），shortest path（圖論）以及games（博弈）上。下面再免費附送領一張圖，反映了不同演算法在不同問題難度上被解決次數的頻率分佈。

Codeforces_Algorithms_Tag_Solved.jpg

一張表

Codeforces-ProblemSet.jpg 然後祭上刷題目錄，也就是這一張表，彙總了截止2017年1月3日Codeforces題目上的所有演算法題。基於這張表，一來可以按照解決人數來進行刷題，二來可以按照題目難度進行刷題，三來還可以進行主題刷題。具體的檔案下載連結請見文末。

我的刷題觀

如果想提升自己的思維能力，可以按照AC率或者解決人數由低到高二分查詢匹配自己當前水平難度的題目，然後適當挑戰高難度題（二分時間複雜度是 O(logN) ，至少比從易到難的 O

(N) 節省時間）
如果想鞏固某一專題，那自然應該按照tag來刷題，但是因為所用的方法在求解前已知，不太利於思維能力的提升
如果什麼都不懂，那麼建議隨機刷題，一來可以漲見識，二來進步空間比較大
如果想提高AC率或者增加自信，那麼建議刷水題
混搭以上策略，比如針對某一專題，然後用二分查詢來選擇問題求解

再有個建議，題目如果太難超過自己當前能力的話，嘗試一定時間後還是老老實實看題解吧，人與人之間還是有天賦差別的，但區別在於經驗可以慢慢積累。特別是即使做對題之後，還要想盡辦法看有沒有提高的餘地，並參考別人的程式碼，看如何精簡程式碼以及精簡時間空間複雜度。

tourist.jpg 據說大神們的刷題量都是上萬的，所以正式比賽裡可以看到諸多大神不到一分鐘就秒了一道題，手速太快。對Competitive Programming而言，把題目做對是基本要求（題目太難則另當別論），用更快的速度求解才是頂尖高手之間的核心區別。如果說真的有天賦存在的話，那我們也無能為力；但希望能像賣油翁一樣說出，『無他，但手熟爾』。

如何用爬蟲獲取資訊

必要的庫

1: import re
2: import urllib.request
3: from bs4 import BeautifulSoup
4: import os
5: import csv
6: import time

爬取Codeforces的所有演算法題

 1: #%% retrieve the problem set
 2: def spider(url):
 3:     response = urllib.request.urlopen(url)
 4:     soup = BeautifulSoup(response.read())
 5:     pattern = {'name': 'tr'}
 6:     content = soup.findAll(**pattern)
 7:     for row in content:
 8: 	item = row.findAll('td')
 9: 	try:
10: 	    # get the problem id
11: 	    id = item[0].find('a').string.strip()
12: 	    col2 = item[1].findAll('a')
13: 	    # get the problem title
14: 	    title = col2[0].string.strip()
15: 	    # get the problem tags
16: 	    tags = [foo.string.strip() for foo in col2[1:]]
17: 	    # get the number of AC submissions
18: 	    solved = re.findall('x(\d+)', str(item[3].find('a')))[0]
19: 	    # update the problem info
20: 	    codeforces[id] = {'title':title, 'tags':tags, 'solved':solved, 'accepted':0,}
21: 	except:
22: 	    continue
23:     return soup
24: 
25: codeforces = {}
26: wait = 15 # wait time to avoid the blocking of spider
27: last_page = 33 # the total page number of problem set page
28: url = ['http://codeforces.com/problemset/page/%d' % page for page in range(1,last_page+1)]
29: for foo in url:
30:     print('Processing URL %s' % foo)
31:     spider(foo)
32:     print('Wait %f seconds' % wait)
33:     time.sleep(wait)

標記已解決的演算法題

 1: #%% mark the accepted problems
 2: def accepted(url):
 3:     response = urllib.request.urlopen(url)
 4:     soup = BeautifulSoup(response.read())
 5:     pattern = {'name':'table', 'class':'status-frame-datatable'}
 6:     table = soup.findAll(**pattern)[0]
 7:     pattern = {'name': 'tr'}
 8:     content = table.findAll(**pattern)
 9:     for row in content:
10: 	try:
11: 	    item = row.findAll('td')
12: 	    # check whether this problem is solved
13: 	    if 'Accepted' in str(row):
14: 		id = item[3].find('a').string.split('-')[0].strip()
15: 		codeforces[id]['accepted'] = 1
16: 	except:
17: 	    continue
18:     return soup
19: 
20: wait = 15 # wait time to avoid the blocking of spider
21: last_page = 10 # the total page number of user submission
22: handle = 'Greenwicher' # please input your handle
23: url = ['http://codeforces.com/submissions/%s/page/%d' % (handle, page) for page in range(1, last_page+1)]
24: for foo in url:
25:     print('Processing URL %s' % foo)
26:     accepted(foo)
27:     print('Wait %f seconds' % wait)
28:     time.sleep(wait)

輸出爬取資訊到csv文字

 1: #%% output the problem set to csv files
 2: root = os.getcwd()
 3: with open(os.path.join(root,"CodeForces-ProblemSet.csv"),"w", encoding="utf-8") as f_out:
 4:     f_csv = csv.writer(f_out)
 5:     f_csv.writerow(['ID', 'Title', 'Tags', 'Solved', 'Accepted'])
 6:     for id in codeforces:
 7: 	title = codeforces[id]['title']
 8: 	tags = ', '.join(codeforces[id]['tags'])
 9: 	solved = codeforces[id]['solved']
10: 	accepted = codeforces[id]['accepted']
11: 	f_csv.writerow([id, title, tags, solved, accepted])
12:     f_out.close()

分析題目難度以及演算法分類的關係

 1: #%% analyze the problem set
 2: # initialize the difficult and tag list
 3: difficult_level = {}
 4: tags_level = {}
 5: for id in codeforces:
 6:     difficult = re.findall('([A-Z])', id)[0]
 7:     tags = codeforces[id]['tags']
 8:     difficult_level[difficult] = difficult_level.get(difficult, 0) + 1
 9:     for tag in tags:
10: 	tags_level[tag] = tags_level.get(tag, 0) + 1
11: import operator
12: tag_level = sorted(tags_level.items(), key=operator.itemgetter(1))[::-1]
13: tag_list = [foo[0] for foo in tag_level]
14: difficult_level = sorted(difficult_level.items(), key=operator.itemgetter(0))
15: difficult_list = [foo[0] for foo in difficult_level]
16: 
17: # initialize the 2D relationships matrix
18: # matrix_solved: the number of AC submission for each tag in each difficult level
19: # matrix_freq: the number of tag frequency for each diffiicult level
20: matrix_solved, matrix_freq = [[[0] * len(difficult_list) for _ in range(len(tag_list))] for _ in range(2)]
21: 
22: 
23: # construct the 2D relationships matrix
24: for id in codeforces:
25:     difficult = re.findall('([A-Z])', id)[0]
26:     difficult_id = difficult_list.index(difficult)
27:     tags = codeforces[id]['tags']
28:     solved = codeforces[id]['solved']
29:     for tag in tags:
30: 	tag_id = tag_list.index(tag)
31: 	matrix_solved[tag_id][difficult_id] += int(solved)
32: 	matrix_freq[tag_id][difficult_id] += 1

Codeforces科學刷題指南，一圖一表便夠了

先說結論

一張圖

一張表

我的刷題觀

如何用爬蟲獲取資訊

必要的庫

爬取Codeforces的所有演算法題

標記已解決的演算法題

輸出爬取資訊到csv文字

分析題目難度以及演算法分類的關係

Codeforces科學刷題指南，一圖一表便夠了

LintCode刷題指南：字串處理（C++，Python）

LeetCode 刷題指南（一）：為什麼要刷題

Leetcode刷題筆記python----只出現一次的數字

leetcode刷題--基礎陣列--只出現一次的數字（C）

leetcode刷題——子集，子集 II

刷題適可而止，演算法還要好好學習

刷題神器，軟考助手V1.4上線啦

平衡二叉樹(AVL樹)一圖一步驟程式碼實現左旋右旋，左右平衡操作

一步一圖一程式碼，一定要讓你真正徹底明白紅黑樹（平衡二叉樹）

LeetCode刷題指南

leecode刷題（7）-- 加一

LeetCode刷題指南(Java版)

leetcode 刷題指南

一步一圖一程式碼，一定要讓你真正徹底明白紅黑樹（July演算法！！！）

LeetCode刷題指南之排序篇--快速排序

理工男把自己本碩博10年經歷寫成了一篇學術論文，有圖有表，資料翔實！

POJ 刷題指南

leecode刷題（21）-- 刪除鏈表的倒數第N個節點

成為Java頂尖程序員，看這11本書就夠了

Codeforces科學刷題指南，一圖一表便夠了

先說結論

一張圖

一張表

我的刷題觀

如何用爬蟲獲取資訊

必要的庫

爬取Codeforces的所有演算法題

標記已解決的演算法題

輸出爬取資訊到csv文字

分析題目難度以及演算法分類的關係

相關推薦