Fibonacci數列高效解法大全及時間複雜度分析連載【9】

在數學上,斐波那契數列是以遞迴的方法來定義
……續上回 ofollow,noindex">Fibonacci數列高效解法大全及時間複雜度分析連載【8】
在家用電腦、手機都是一堆核心,數框框的今日
自然會想到的一項提速手段是——並行運算
就以 Fibonacci數列高效解法大全及時間複雜度分析連載【7】 中第14節的“生成數列的GMP內建fib2函式解法”來說
生成從0至n的斐波那契數列是一個數算完放入結果列表,再算下一個數
這完全可以用我們的CPU多核同時算幾個數,四核的話同時算四個數多好多快,完美
這就是分佈在多核的多程序並行演算法
四核下會是單核的4倍速嗎?也就是加速比能到達4嗎?
以四核為例,當實現加速比為4,那麼並行演算法效率就為100%
呃,那是理想狀態,實際遠遠到不了
到不了的原因在於變成多核多程序,一個大任務要在主程序分解成幾個子任務(子程序)分配到各核運算,子任務算完後再彙總回主程序,這就有了附加開銷。包括任務分解、任務排程、傳參、子程序上下文切換、程序間傳遞資料、程序間同步鎖、結果合並等,這些開銷很大。讓效率下降不少
就生成斐波那契數列而言,實現其並行演算法,中間開銷最大的應該是程序間傳遞資料,子程序要返回主程序超大規模的資料
那麼對於單機這情況,都知道通常是共享記憶體方式最快
可以套用時髦的概念“記憶體計算”
那麼下面我來實現一下,一個沒有多少優化的原型共享記憶體並行演算法
17. 並行共享記憶體解法
import time import multiprocessing as mp import gmpy2 import ctypes def fib2_processes(shared_variables_for_fib_seq_results: 'array ctypes.c_ubyte', element_length_record: 'array ctypes.c_ulonglong', write_protection_pointer: 'ctypes.c_ulonglong', unread_quantity: 'ctypes.c_ulonglong', overflow_of_n: 'ctypes.c_longlong', mutexlock: 'Lock', n_iterable: 'iterable') -> None: start_time = time.process_time_ns() Fib_seq_array_size = len(shared_variables_for_fib_seq_results) write_pointer = 0 for n in n_iterable: if (Fib_seq_array_size - write_pointer) < (n // 4): #預估第n和n-1項兩fib數總共可能佔用的位元組。公式是 n * 2 / 8 -> n / 4 write_pointer = 0 overflow_of_n.value = n for element in gmpy2.fib2(n): element_bytes = gmpy2.to_binary(element)[2:] element_length = len(element_bytes) next_write_pointer = write_pointer + element_length while (overflow_of_n.value > 0) and (next_write_pointer > write_protection_pointer.value): pass #在溢位狀態未消除情況下,如果要寫入的範圍高於防寫,就空轉等待。 element_length_record[n] = element_length with mutexlock: shared_variables_for_fib_seq_results[write_pointer: next_write_pointer] = element_bytes unread_quantity.value += 1 write_pointer = next_write_pointer n -= 1 end_time = time.process_time_ns() print ('計算和序列化用時 %.1f seconds.' % ((end_time - start_time)/10**9)) return None def Fibonacci_sequence_21 (n: int) -> list: #引數n是表示求n項Fibonacci數列 '多程序共享記憶體的返回列表的GMP內建fib函式解法' assert isinstance(n, int), 'n is an error of non-integer type.' if n>=0: start_time = time.process_time_ns() Number_of_processes = max(mp.cpu_count() - 1, 1) size_of_shared_variables_for_fib_processes = 1 * (1024 ** 2) // Number_of_processes #就是佔用的記憶體數量,單位Byte。注意太大的n也要隨之加大這個尺寸 list_of_shared_variables_for_fib_seq_results = [] list_of_element_length_record = [] list_of_write_protection_pointer = [] list_of_unread_quantity = [] list_of_overflow_n = [] list_of_mutexlock = [] for i in range(Number_of_processes): list_of_shared_variables_for_fib_seq_results.append(mp.RawArray(ctypes.c_ubyte, size_of_shared_variables_for_fib_processes)) list_of_element_length_record.append(mp.RawArray(ctypes.c_ulonglong, n + 1)) list_of_write_protection_pointer.append(mp.RawValue(ctypes.c_ulonglong, size_of_shared_variables_for_fib_processes - 1)) list_of_unread_quantity.append(mp.RawValue(ctypes.c_ulonglong, 0)) list_of_overflow_n.append(mp.RawValue(ctypes.c_longlong, -1)) list_of_mutexlock.append(mp.Lock()) list_of_n_iterable_for_fib2_processes = [range(n - i * 2, 0, -(Number_of_processes * 2)) for i in range(Number_of_processes)] fib2_process_list = [None] * Number_of_processes for i in range(Number_of_processes): fib2_process_list[i] = mp.Process(target = fib2_processes, args = (list_of_shared_variables_for_fib_seq_results[i], list_of_element_length_record[i], list_of_write_protection_pointer[i], list_of_unread_quantity[i], list_of_overflow_n[i], list_of_mutexlock[i], list_of_n_iterable_for_fib2_processes[i])) fib2_process_list[i].start() fib_list = [None] * (n + 1) n_list_for_fib2_processes = [] for n_iterable in list_of_n_iterable_for_fib2_processes: n_list_for_fib2_processes.append(list(n_iterable)) list_of_pointers_to_n_list = [0] * Number_of_processes list_of_number_of_reciprocating = [0] * Number_of_processes list_of_read_pointer = [0] * Number_of_processes while True: if mp.active_children() == []: #檢測 當全部子程序都不是活的以後,進行下一個檢測準備結束迴圈 for unread_quantity in list_of_unread_quantity: if unread_quantity.value != 0: break else: #檢測當所有程序結果的未讀數都等於零就結束迴圈 break for i in range(Number_of_processes): if list_of_unread_quantity[i].value != 0: n_for_fib2_processes = n_list_for_fib2_processes[i][list_of_pointers_to_n_list[i]] - list_of_number_of_reciprocating[i] if n_for_fib2_processes != list_of_overflow_n[i].value: next_read_pointer = list_of_read_pointer[i] + list_of_element_length_record[i][n_for_fib2_processes] fib_list[n_for_fib2_processes] = int.from_bytes(bytes(list_of_shared_variables_for_fib_seq_results[i][list_of_read_pointer[i]: next_read_pointer]), 'little') with list_of_mutexlock[i]: list_of_write_protection_pointer[i].value = next_read_pointer list_of_unread_quantity[i].value -= 1 list_of_read_pointer[i] = next_read_pointer list_of_pointers_to_n_list[i] += list_of_number_of_reciprocating[i] list_of_number_of_reciprocating[i] ^= 1 else: list_of_read_pointer[i] = 0 with list_of_mutexlock[i]: list_of_write_protection_pointer[i].value = 0 list_of_overflow_n[i].value = -1 if n & 1 == 0: fib_list[0] = 0 end_time = time.process_time_ns() print ('主程序用時 %.1f seconds.' % ((end_time - start_time)/10**9)) return fib_list else: return None if __name__ == '__main__': start_time = time.perf_counter() Fibonacci_sequence_21(1000000) end_time = time.perf_counter() print ('最終用時 %.1f seconds.' % (end_time - start_time))
在我的四核16GB實體記憶體電腦上算100萬斐波那契數列(資料總佔用記憶體40GB,所以包括虛擬記憶體)用時為:
1384秒
在之前14節單程序解法算100萬斐波那契數列用時為:
1857秒
縮短用時25.5%
然而這是4核並行,並行效率為33.5%。並不高效的並行演算法
各位大佬,以14節那個單程序解法為基礎,來嘗試寫出你們的高效並行解法,在下方留言吧
結尾,嘿,歡迎點贊和讚賞支援
未完待續……