1. 程式人生 > >python中的multiprocessing在map函式中的加鎖lock處理方式

python中的multiprocessing在map函式中的加鎖lock處理方式

def send_request(data):
	api_url = 'http://api.xxxx.com/?data=%s'
	start_time = clock()
	print urllib2.urlopen(api_url % data).read()
	end_time = clock()
	lock.acquire()
	whit open('request.log', 'a+') as logs:
		logs.write('request %s cost: %s\n' % (data, end_time - start_time))
	lock.release()

def init
(l): global lock lock = l if __name__ == '__main__': data_list = ['data1', 'data2', 'data3'] lock = Lock() pool = Pool(8, initializer=init, initargs=(lock,)) pool.map(send_request, data_list) pool.close() pool.join()
轉自:https://zhuanlan.zhihu.com/p/22223656

1、問題:

群中有同學貼瞭如下一段程式碼,問為何 list 最後列印的是空值?

from multiprocessing import Process, Manager
import os

manager = Manager()
vip_list = []
#vip_list = manager.list()

def testFunc(cc):
    vip_list.append(cc)
    print 'process id:', os.getpid()

if __name__ == '__main__':
    threads = []

    for ll in range(10):
        t = Process(target=testFunc, args=(ll,))
        t.daemon = True
threads.append(t) for i in range(len(threads)): threads[i].start() for j in range(len(threads)): threads[j].join() print "------------------------" print 'process id:', os.getpid() print vip_list
其實如果你瞭解 python 的多執行緒模型,GIL 問題,然後瞭解多執行緒、多程序原理,上述問題不難回答,不過如果你不知道也沒關係,跑一下上面的程式碼你就知道是什麼問題了。
python aa.py
process id: 632
process id: 635
process id: 637
process id: 633
process id: 636
process id: 634
process id: 639
process id: 638
process id: 641
process id: 640
------------------------
process id: 619
[]
將第 6 行註釋開啟,你會看到如下結果:
process id: 32074
process id: 32073
process id: 32072
process id: 32078
process id: 32076
process id: 32071
process id: 32077
process id: 32079
process id: 32075
process id: 32080
------------------------
process id: 32066
[3, 2, 1, 7, 5, 0, 6, 8, 4, 9]

2、python 多程序共享變數的幾種方式:

(1) Shared memory:

Data can be stored in a shared memory map using Value or Array. For example, the following code
from multiprocessing import Process, Value, Array

def f(n, a):
    n.value = 3.1415927
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == '__main__':
    num = Value('d', 0.0)
    arr = Array('i', range(10))

    p = Process(target=f, args=(num, arr))
    p.start()
    p.join()

    print num.value
    print arr[:]
結果:
3.1415927
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

(2) Server proc ess:

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies. 
A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array. 
程式碼見開頭的例子。

3、多 程序的問題遠不止這麼多:資料的同步

看段簡單的 程式碼:一個簡單的計數器:

from multiprocessing import Process, Manager
import os

manager = Manager()
sum = manager.Value('tmp', 0)

def testFunc(cc):
    sum.value += cc

if __name__ == '__main__':
    threads = []

    for ll in range(100):
        t = Process(target=testFunc, args=(1,))
        t.daemon = True
        threads.append(t)

    for i in range(len(threads)):
        threads[i].start()

    for j in range(len(threads)):
        threads[j].join()

    print "------------------------"
    print 'process id:', os.getpid()
    print sum.value
結果:
------------------------
process id: 17378
97
也許你會問:WTF?其實這個問題在多執行緒時代就存在了,只是在多程序時代又杯具重演了而已:Lock!
from multiprocessing import Process, Manager, Lock
import os

lock = Lock()
manager = Manager()
sum = manager.Value('tmp', 0)


def testFunc(cc, lock):
    with lock:
        sum.value += cc


if __name__ == '__main__':
    threads = []

    for ll in range(100):
        t = Process(target=testFunc, args=(1, lock))
        t.daemon = True
        threads.append(t)

    for i in range(len(threads)):
        threads[i].start()

    for j in range(len(threads)):
        threads[j].join()

    print "------------------------"
    print 'process id:', os.getpid()
    print sum.value

這段程式碼效能如何呢?跑跑看,或者加大迴圈次數試一下。。。

4、最後的建議:

 Note that usually sharing data between processes may not be the best choice, because of all the synchronization issues; an approach involving actors exchanging messages is usually seen as a better choice. See also Python documentation : As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes. However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.

5、Refer: