2018-2019 前期任務(一):資料閱讀&Python入門
2018-2019 前期任務(一):資料閱讀&Python入門
資料原文地址:Dumbcoin - An educational python implementation of a bitcoin-like blockchain【本文詳細解讀了比特幣的基礎技術,實現了Python中類比特幣區塊鏈中的大部分概念。雖不是真正的區塊鏈,但有助於對其技術的理解。】
————————CONTENTS————————
Requires # - pycryptodome (pycrypto for 3.6) # - numpy /scipy / matplotlib # - pandas import hashlib #包含常見hash演算法的標準庫 import random #生成隨機數 import string #字串運算 import json #檔案格式處理 import binascii #進位制轉換 import numpy as np #陣列操作 import pandas as pd #此處用於建立資料框 import pylab as pl #繪圖 import logging #日誌輸出 %matplotlib inline #生成影象
hash函式和挖礦
這裡展示了礦工挖礦的過程。此處方便起見,使用一輪SHA256雜湊函式。在比特幣中使用的是兩輪SHA256演算法。
該函式將任意長度的字串轉換為長度為64的十六進位制固定長度字串:
def sha256(message):
return hashlib.sha256(message.encode('ascii')).hexdigest()
- digest()和hexdigest():
hashlib是涉及安全雜湊和訊息摘要,提供多個不同的加密演算法介面,如SHA1、SHA224、SHA256、SHA384、SHA512、MD5等。
其中,hash.digest()返回摘要,作為二進位制資料字串值;hash.hexdigest()返回摘要,作為十六進位制資料字串值。
如下程式碼:
import hashlib
md5 = hashlib.md5()
md5.update("a".encode('utf-8'))
print(u"digest返回的摘要:%s"% md5.digest())
print(u"hexdigest返回的摘要:%s"% md5.hexdigest())
結果為:
digest返回的摘要:b'\x0c\xc1u\xb9\xc0\xf1\xb6\xa81\xc3\x99\xe2iw&a'
hexdigest返回的摘要:0cc175b9c0f1b6a831c399e269772661
挖礦的過程可以描述為:給定一個任意字串X,找到一個隨機數nonce,使得hash(x + nonce)
如下程式碼所示,我們將“挖掘”到一個nonce,使得訊息“hello bitcoin”的雜湊值與隨機數nonce連線時,值至少含有兩個前導字元:
message = 'hello bitcoin'
for nonce in range(1000):
digest = sha256(message + str(nonce))
if digest.startswith('11'):
print('Found nonce = %d' % nonce)
break
print(sha256(message + str(nonce)))
執行結果為:
Found nonce = 32
112c38d2fdb6ddaf32f371a390307ccc779cd92443b42c4b5c58fa548f63ed83
結果表示,當隨機數為32時,能夠產生以“11”開頭的雜湊值。
你規定的前導字元越多,就越難找到符合條件的nonce(平均而言)。在比特幣中,這被稱為挖礦的難度。比特幣不需要前導字元,而是要求雜湊值低於某個值,但思路與之類似。
因此,定義兩個稍後會用到的函式,一個用來計算字串的雜湊值,另一個用來挖掘給定字串的隨機數nonce:
def dumb_hash(message):
return sha256(message)
def mine(message, difficulty=1):
assert difficulty >= 1, "Difficulty of 0 is not possible"
i = 0
prefix = '1' * difficulty
while True:
nonce = str(i)
digest = dumb_hash(message + nonce)
if digest.startswith(prefix):
return nonce, i
i += 1
輸入字串,將會返回一個隨機數nonce,滿足hash(string + nonce)
以規定難度的字串開頭。
根據這個,我們可以挖掘各種難度的隨機數:
nonce, niters = mine('42', difficulty=1)
print('Took %d iterations' % niters)
nonce, niters = mine('42', difficulty=3)
print('Took %d iterations' % niters)
執行結果為:
`Took 23 iterations Took 2272 iterations
由此可見,難度為3所需的迭代次數比難度為1要大得多。因此,難度控制了平均嘗試次數
難度
對於每個難度級別為各種輸入字串挖掘一個符合條件的隨機數(本例中使用50),並記錄每個難度級別所需的平均迭代次數。
def random_string(length=10):
return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(length))
strings = [random_string() for i in range(50)]
levels = range(1, 5)
# An array of results with a row for each difficulty and a column for each test string
results = pd.DataFrame(index=strings, columns=levels, dtype=np.int)
results.fillna(value=0)
#results = np.zeros((N_LEVELS, len(strings)), dtype=np.int)
for level in levels:
for s in strings:
_, niters = mine(s, difficulty=level)
results[level][s] = niters
results.iloc[:5]
輸出如下:
pl.figure(figsize=(10, 5))
ax = pl.subplot(111)
ax.set_title('Number of iterations to mine a nonce for various difficulty')
results.plot.box(showfliers=False, ax=ax)
ax.set_xlabel('Difficulty')
ax.set_ylabel('Iterations')
錢包
在比特幣中,錢包是公鑰/私鑰對,公鑰用於接收交易,私鑰用於消費。通過私鑰簽署交易,其他任何人使用我們的公鑰驗證簽名。
在比特幣中,錢包是一組多個公鑰/私鑰對,地址並不直接是公鑰。這確保了隱私和安全性,但在這裡,我們將使用單個祕鑰並使用公鑰作為地址。
import Crypto
import Crypto.Random
from Crypto.Hash import SHA
from Crypto.PublicKey import RSA
from Crypto.Signature import PKCS1_v1_5
class Wallet(object):
def __init__(self):
random_gen = Crypto.Random.new().read
self._private_key = RSA.generate(1024, random_gen)
self._public_key = self._private_key.publickey()
self._signer = PKCS1_v1_5.new(self._private_key)
@property
def address(self):
return binascii.hexlify(self._public_key.exportKey(format='DER')).decode('ascii')
def sign(self, message):
h = SHA.new(message.encode('utf8'))
return binascii.hexlify(self._signer.sign(h)).decode('ascii')
def verify_signature(wallet_address, message, signature):
pubkey = RSA.importKey(binascii.unhexlify(wallet_address))
verifier = PKCS1_v1_5.new(pubkey)
h = SHA.new(message.encode('utf8'))
return verifier.verify(h, binascii.unhexlify(signature))
functionality works
w1 = Wallet()
signature = w1.sign('foobar')
assert verify_signature(w1.address, 'foobar', signature)
assert not verify_signature(w1.address, 'rogue message', signature)
交易
交易用來在錢包之間兌換貨幣,由以下內容組成:
- 一位消費者:對交易簽名;花錢
- 許多輸入:是其他交易的輸出,收件人為消費者的錢包
- 許多輸出:每個輸出都指定了金額和收件人
交易還可以包含“交易費”,這是礦工將交易包括在一個區塊中的激勵。交易費是總投入金額和總輸出金額之間的差值。
class TransactionInput(object):
"""
An input for a transaction. This points to an output of another transaction
"""
def __init__(self, transaction, output_index):
self.transaction = transaction
self.output_index = output_index
assert 0 <= self.output_index < len(transaction.outputs)
def to_dict(self):
d = {
'transaction': self.transaction.hash(),
'output_index': self.output_index
}
return d
@property
def parent_output(self):
return self.transaction.outputs[self.output_index]
class TransactionOutput(object):
"""
An output for a transaction. This specifies an amount and a recipient (wallet)
"""
def __init__(self, recipient_address, amount):
self.recipient = recipient_address
self.amount = amount
def to_dict(self):
d = {
'recipient_address': self.recipient,
'amount': self.amount
}
return d
def compute_fee(inputs, outputs):
"""
Compute the transaction fee by computing the difference between total input and total output
"""
total_in = sum(i.transaction.outputs[i.output_index].amount for i in inputs)
total_out = sum(o.amount for o in outputs)
assert total_out <= total_in, "Invalid transaction with out(%f) > in(%f)" % (total_out, total_in)
return total_in - total_out
class Transaction(object):
def __init__(self, wallet, inputs, outputs):
"""
Create a transaction spending money from the provided wallet
"""
self.inputs = inputs
self.outputs = outputs
self.fee = compute_fee(inputs, outputs)
self.signature = wallet.sign(json.dumps(self.to_dict(include_signature=False)))
def to_dict(self, include_signature=True):
d = {
"inputs": list(map(TransactionInput.to_dict, self.inputs)),
"outputs": list(map(TransactionOutput.to_dict, self.outputs)),
"fee": self.fee
}
if include_signature:
d["signature"] = self.signature
return d
def hash(self):
return dumb_hash(json.dumps(self.to_dict()))
class GenesisTransaction(Transaction):
"""
This is the first transaction which is a special transaction
with no input and 25 bitcoins output
"""
def __init__(self, recipient_address, amount=25):
self.inputs = []
self.outputs = [
TransactionOutput(recipient_address, amount)
]
self.fee = 0
self.signature = 'genesis'
def to_dict(self, include_signature=False):
# TODO: Instead, should sign genesis transaction will well-known public key ?
assert not include_signature, "Cannot include signature of genesis transaction"
return super().to_dict(include_signature=False)
基於以上類,我們可以實現Alice和Bob之間的交易。
alice = Wallet()
bob = Wallet()
t1 = GenesisTransaction(alice.address)
t2 = Transaction(
alice,
[TransactionInput(t1, 0)],
[TransactionOutput(bob.address, 2.0), TransactionOutput(alice.address, 22.0)]
)
assert np.abs(t2.fee - 1.0) < 1e-5
在比特幣中,使用者不會儲存錢包中的金額;相反,將通過計算整條交易鏈條來計算擁有的錢數。下面的函式將實現這一點:
alice = Wallet()
bob = Wallet()
walter = Wallet()
# This gives 25 coins to Alice
t1 = GenesisTransaction(alice.address)
# Of those 25, Alice will spend
# Alice -- 5 --> Bob
# -- 15 --> Alice
# -- 5 --> Walter
t2 = Transaction(
alice,
[TransactionInput(t1, 0)],
[TransactionOutput(bob.address, 5.0), TransactionOutput(alice.address, 15.0), TransactionOutput(walter.address, 5.0)]
)
# Walter -- 5 --> Bob
t3 = Transaction(
walter,
[TransactionInput(t2, 2)],
[TransactionOutput(bob.address, 5.0)])
# Bob -- 8 --> Walter
# -- 1 --> Bob
# 1 fee
t4 = Transaction(
bob,
[TransactionInput(t2, 0), TransactionInput(t3, 0)],
[TransactionOutput(walter.address, 8.0), TransactionOutput(bob.address, 1.0)]
)
transactions = [t1, t2, t3, t4]
def compute_balance(wallet_address, transactions):
"""
Given an address and a list of transactions, computes the wallet balance of the address
"""
balance = 0
for t in transactions:
# Subtract all the money that the address sent out
for txin in t.inputs:
if txin.parent_output.recipient == wallet_address:
balance -= txin.parent_output.amount
# Add all the money received by the address
for txout in t.outputs:
if txout.recipient == wallet_address:
balance += txout.amount
return balance
print("Alice has %.02f dumbcoins" % compute_balance(alice.address, transactions))
print("Bob has %.02f dumbcoins" % compute_balance(bob.address, transactions))
print("Walter has %.02f dumbcoins" % compute_balance(walter.address, transactions))
執行結果為:
Alice has 15.00 dumbcoins
Bob has 1.00 dumbcoins
Walter has 8.00 dumbcoins
除此之外,還需要驗證交易是否有效,這意味著:
- 使用者只能花自己的錢。這意味著檢查所有輸入是否由交易所有者擁有;
- 確保花費不會超過擁有的錢。這由上面的
compute_fee
函式檢查。
def verify_transaction(transaction):
"""
Verify that the transaction is valid.
We need to verify two things :
- That all of the inputs of the transaction belong to the same wallet
- That the transaction is signed by the owner of said wallet
"""
tx_message = json.dumps(transaction.to_dict(include_signature=False))
if isinstance(transaction, GenesisTransaction):
# TODO: We should probably be more careful about validating genesis transactions
return True
# Verify input transactions
for tx in transaction.inputs:
if not verify_transaction(tx.transaction):
logging.error("Invalid parent transaction")
return False
# Verify a single wallet owns all the inputs
first_input_address = transaction.inputs[0].parent_output.recipient
for txin in transaction.inputs[1:]:
if txin.parent_output.recipient != first_input_address:
logging.error(
"Transaction inputs belong to multiple wallets (%s and %s)" %
(txin.parent_output.recipient, first_input_address)
)
return False
if not verify_signature(first_input_address, tx_message, transaction.signature):
logging.error("Invalid transaction signature, trying to spend someone else's money ?")
return False
# Call compute_fee here to trigger an assert if output sum is great than input sum. Without this,
# a miner could put such an invalid transaction.
compute_fee(transaction.inputs, transaction.outputs)
return True
t1 = GenesisTransaction(alice.address)
# This is an invalid transaction because bob is trying to spend alice's money
# (alice was the recipient of the input - t1)
t2 = Transaction(
bob,
[TransactionInput(t1, 0)],
[TransactionOutput(walter.address, 10.0)]
)
# This is valid, alice is spending her own money
t3 = Transaction(
alice,
[TransactionInput(t1, 0)],
[TransactionOutput(walter.address, 10.0)]
)
區塊
現在我們有了:
- 定義錢包的方法(作為公私鑰對)
- 在錢包之間建立交易的方法
- 驗證交易的方法(通過檢查簽名是否匹配)
剩下的就是講交易分組成塊,並讓礦工開採區塊。挖礦包括兩部分:
- 驗證區塊中的交易
- 查詢一個nonce,使得區塊的雜湊值以0開頭
此外,挖礦通過以下方式產生資金:區塊中的第一個交易是GenesisTransaction,它為礦工選擇的任何地址提供25個硬幣。以同樣的方式,礦工可以將費用從塊中的交易重定向到他選擇的任何地址。
BLOCK_INCENTIVE = 25 # The number of coins miners get for mining a block
DIFFICULTY = 2
def compute_total_fee(transactions):
"""Return the total fee for the set of transactions"""
return sum(t.fee for t in transactions)
class Block(object):
def __init__(self, transactions, ancestor, miner_address, skip_verif=False):
"""
Args:
transactions: The list of transactions to include in the block
ancestor: The previous block
miner_address: The address of the miner's wallet. This is where the block
incentive and the transactions fees will be deposited
"""
reward = compute_total_fee(transactions) + BLOCK_INCENTIVE
self.transactions = [GenesisTransaction(miner_address, amount=reward)] + transactions
self.ancestor = ancestor
if not skip_verif:
assert all(map(verify_transaction, transactions))
json_block = json.dumps(self.to_dict(include_hash=False))
self.nonce, _ = mine(json_block, DIFFICULTY)
self.hash = dumb_hash(json_block + self.nonce)
def fee(self):
"""Return transaction fee for this block"""
return compute_total_fee(self.transactions)
def to_dict(self, include_hash=True):
d = {
"transactions": list(map(Transaction.to_dict, self.transactions)),
"previous_block": self.ancestor.hash,
}
if include_hash:
d["nonce"] = self.nonce
d["hash"] = self.hash
return d
class GenesisBlock(Block):
"""
The genesis block is the first block in the chain.
It is the only block with no ancestor
"""
def __init__(self, miner_address):
super(GenesisBlock, self).__init__(transactions=[], ancestor=None, miner_address=miner_address)
def to_dict(self, include_hash=True):
d = {
"transactions": [],
"genesis_block": True,
}
if include_hash:
d["nonce"] = self.nonce
d["hash"] = self.hash
return d
與驗證交易資訊的方式類似,我們還需要一種方法來驗證區塊:
def verify_block(block, genesis_block, used_outputs=None):
"""
Verifies that a block is valid :
- Verifies the hash starts with the required amount of ones
- Verifies that the same transaction output isn't used twice
- Verifies all transactions are valid
- Verifies the first transaction in the block is a genesis transaction with BLOCK_INCENTIVE + total_fee
Args:
block: The block to validate
genesis_block: The genesis block (this needs to be shared by everybody. E.g. hardcoded somewhere)
used_outputs: list of outputs used in transactions for all blocks above this one
"""
if used_outputs is None:
used_outputs = set()
# Verify hash
prefix = '1' * DIFFICULTY
if not block.hash.startswith(prefix):
logging.error("Block hash (%s) doesn't start with prefix %s" % (block.hash, prefix))
return False
if not all(map(verify_transaction, block.transactions)):
return False
# Verify that transactions in this block don't use already spent outputs
#
# Note that we could move this in verify_transaction, but this would require some passing the used_outputs
# around more. So we do it here for simplicity
for transaction in block.transactions:
for i in transaction.inputs:
if i.parent_output in used_outputs:
logging.error("Transaction uses an already spent output : %s" % json.dumps(i.parent_output.to_dict()))
return False
used_outputs.add(i.parent_output)
# Verify ancestors up to the genesis block
if not (block.hash == genesis_block.hash):
if not verify_block(block.ancestor, genesis_block, used_outputs):
logging.error("Failed to validate ancestor block")
return False
# Verify the first transaction is the miner's reward
tx0 = block.transactions[0]
if not isinstance(tx0, GenesisTransaction):
logging.error("Transaction 0 is not a GenesisTransaction")
return False
if not len(tx0.outputs) == 1:
logging.error("Transactions 0 doesn't have exactly 1 output")
return False
reward = compute_total_fee(block.transactions[1:]) + BLOCK_INCENTIVE
if not tx0.outputs[0].amount == reward:
logging.error("Invalid amount in transaction 0 : %d, expected %d" % (tx0.outputs[0].amount, reward))
return False
# Only the first transaction shall be a genesis
for i, tx in enumerate(block.transactions):
if i == 0:
if not isinstance(tx, GenesisTransaction):
logging.error("Non-genesis transaction at index 0")
return False
elif isinstance(tx, GenesisTransaction):
logging.error("GenesisTransaction (hash=%s) at index %d != 0", tx.hash(), i)
return False
return True
alice = Wallet()
bob = Wallet()
walter = Wallet()
genesis_block = GenesisBlock(miner_address=alice.address)
print("genesis_block : " + genesis_block.hash + " with fee=" + str(genesis_block.fee()))
t1 = genesis_block.transactions[0]
t2 = Transaction(
alice,
[TransactionInput(t1, 0)],
[TransactionOutput(bob.address, 5.0), TransactionOutput(alice.address, 15.0), TransactionOutput(walter.address, 5.0)]
)
t3 = Transaction(
walter,
[TransactionInput(t2, 2)],
[TransactionOutput(bob.address, 5.0)])
t4 = Transaction(
bob,
[TransactionInput(t2, 0), TransactionInput(t3, 0)],
[TransactionOutput(walter.address, 8.0), TransactionOutput(bob.address, 1.0)]
)
block1 = Block([t2], ancestor=genesis_block, miner_address=walter.address)
print("block1 : " + block1.hash + " with fee=" + str(block1.fee()))
block2 = Block([t3, t4], ancestor=block1, miner_address=walter.address)
print("block2 : " + block2.hash + " with fee=" + str(block2.fee()))
輸出:
genesis_block : 1162dce8ffec3acf13ce61109f121922eee8cceeea4784aa9d90dc6ec0e0fa92 with fee=0
block1 : 11af277c02c22a7e3c3a73102282ca5a0e01869b1d852527b6a842f0786ee8e3 with fee=0.0
block2 : 119e461d393b793478c7c7cb9fa6feb54fca35865a398d017e541027a78a2e9a with fee=1.0
def collect_transactions(block, genesis_block):
"""Recursively collect transactions in `block` and all of its ancestors"""
# Important : COPY block.transactions
transactions = [] + block.transactions
if block.hash != genesis_block.hash:
transactions += collect_transactions(block.ancestor, genesis_block)
return transactions
transactions = collect_transactions(block2, genesis_block)
# Alice mined 25 (from the genesis block) and gave 5 to bob and 5 to walter
print("Alice has %.02f dumbcoins" % compute_balance(alice.address, transactions))
# Bob received 5 from alice and 5 from walter, but then back 8 to walter with a transaction fee of 1
print("Bob has %.02f dumbcoins" % compute_balance(bob.address, transactions))
# Walter mined 2 blocks (2 * 25), received 8 from bob and go a transaction fee of 1 on block2
print("Walter has %.02f dumbcoins" % compute_balance(walter.address, transactions))
輸出:
Alice has 15.00 dumbcoins
Bob has 1.00 dumbcoins
Walter has 59.00 dumbcoins
攻擊
參考資料
- 廖雪峰官方網站-Python教程
- Python中hashlib模組
- python之binascii模組
- Python資料分析之numpy學習
- Python關於%matplotlib inline
- matplotlib繪圖例項:pyplot、pylab模組及作圖引數
- 【python筆記】使用matplotlib,pylab進行python繪圖
- python3_matplotlib_figure()函式解析