{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Dumbcoin - An educational python implementation of a bitcoin-like blockchain" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an experimental notebook that implements most of the concepts of a bitcoin-like blockchain in python.\n", "It is **NOT secure** neither a real blockchain and you should **NOT** use this for anything else than educational purposes. \n", "\n", "Regardless of the financial benefits/drawbacks of bitcoin, the underlying technology is very interesting and the original paper is relatively easy to grasp. \n", "\n", "You might want to have a look at the [hacker news](https://news.ycombinator.com/item?id=15945490) discussion about this notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "I used the [original bitcoin](https://bitcoin.org/bitcoin.pdf) paper as reference as well as [Michael Nielsen's bitcoin explanation](http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/). The [bitcoin wiki](https://en.bitcoin.it/wiki/Main_Page) is a great technical resource as well.\n", "\n", "\n", "## Table of contents\n", "\n", "1. [Hash function and mining](#mining)\n", "2. [Difficulty](#difficulty)\n", "3. [A wallet](#wallet)\n", "4. [Transactions](#transactions)\n", "5. [Blocks](#block)\n", "6. [Attacks](#attacks)\n", "7. [Majority attack](#majority_attack)\n", "8. [Differences with Bitcoin](#differences)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Requires\n", "# - pycryptodome (pycrypto for 3.6)\n", "# - numpy /scipy / matplotlib\n", "# - pandas" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import hashlib\n", "import random\n", "import string\n", "import json\n", "import binascii\n", "import numpy as np\n", "import pandas as pd\n", "import pylab as pl\n", "import logging\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Hash function and mining \n", "\n", "So we'll start a little backward and start with the miner implementation.\n", "For the example here, we'll use the SHA256 hash function because it's readily implemented in python. Note that bitcoin uses [two round of SHA256](https://en.bitcoin.it/wiki/Hashcash) instead of one.\n", "\n", "So our hash function will turn a string of arbitrary length into a fixed-length string of 64 hexadecimal characters. For example :" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def sha256(message):\n", " return hashlib.sha256(message.encode('ascii')).hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, the process of *mining* is : given an arbitrary string $x$, find a nonce such that $hash(x + nonce)$ produces a hash starting with a number of leading ones.\n", "\n", "For example here, we'll \"mine\" a nonce such that the hash of our message (\"hello bitcoin\") when concatenated with our nonce will have at least 2 leading ones." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found nonce = 32\n", "112c38d2fdb6ddaf32f371a390307ccc779cd92443b42c4b5c58fa548f63ed83\n" ] } ], "source": [ "message = 'hello bitcoin'\n", "for nonce in range(1000):\n", " digest = sha256(message + str(nonce))\n", " if digest.startswith('11'):\n", " print('Found nonce = %d' % nonce)\n", " break\n", "print(sha256(message + str(nonce)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The more you increase the number of leading ones you require, the harder it becomes (on average) to find a nonce. In bitcoin, this is called the mining difficulty. Note that bitcoin doesn't require a number of leading digits, but instead requires the hash to be below a certain value. But it's the same idea.\n", "\n", "So let's define two functions that we'll reuse later : one to hash a string and one to mine a nonce for a given string." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def dumb_hash(message):\n", " \"\"\"\n", " Returns an hexadecimal hash\n", " \"\"\"\n", " return sha256(message)\n", "\n", "\n", "def mine(message, difficulty=1):\n", " \"\"\"\n", " Given an input string, will return a nonce such that\n", " hash(string + nonce) starts with `difficulty` ones\n", " \n", " Returns: (nonce, niters)\n", " nonce: The found nonce\n", " niters: The number of iterations required to find the nonce\n", " \"\"\"\n", " assert difficulty >= 1, \"Difficulty of 0 is not possible\"\n", " i = 0\n", " prefix = '1' * difficulty\n", " while True:\n", " nonce = str(i)\n", " digest = dumb_hash(message + nonce)\n", " if digest.startswith(prefix):\n", " return nonce, i\n", " i += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With that in place, we can mine nonce of varied difficulty :" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Took 23 iterations\n", "Took 2272 iterations\n" ] } ], "source": [ "nonce, niters = mine('42', difficulty=1)\n", "print('Took %d iterations' % niters)\n", "\n", "nonce, niters = mine('42', difficulty=3)\n", "print('Took %d iterations' % niters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see in this example, the number of iterations required for a difficulty of 3 is much larger than for a difficulty of 1. Note though that you could get lucky and have a string where the first nonce (0 in our case) would yield the solution. So the difficulty controls the *average* number of tries. We can do a nice plot of that" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting difficulty " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each difficulty level, we'll mine a nonce for a variety of input strings (we'll use 50 in this example). We'll record the average number of iterations required at each difficulty level." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def random_string(length=10):\n", " return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(length))\n", "\n", "strings = [random_string() for i in range(50)]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "levels = range(1, 5)\n", "# An array of results with a row for each difficulty and a column for each test string\n", "results = pd.DataFrame(index=strings, columns=levels, dtype=np.int)\n", "results.fillna(value=0)\n", "\n", "#results = np.zeros((N_LEVELS, len(strings)), dtype=np.int)\n", "for level in levels:\n", " for s in strings:\n", " _, niters = mine(s, difficulty=level)\n", " results[level][s] = niters" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 1 | \n", "2 | \n", "3 | \n", "4 | \n", "
---|---|---|---|---|
Z6AR67BMX7 | \n", "23 | \n", "423 | \n", "879 | \n", "156631 | \n", "
2U6TP42RIM | \n", "4 | \n", "513 | \n", "1621 | \n", "16671 | \n", "
Q4XZ1QL2MI | \n", "0 | \n", "55 | \n", "688 | \n", "332199 | \n", "
PNQTJ3GD8I | \n", "4 | \n", "524 | \n", "3466 | \n", "50681 | \n", "
VIH18FKQ4K | \n", "9 | \n", "161 | \n", "19177 | \n", "44720 | \n", "