Parse Ethereum Mempool Data


Some times, e.g., in MEV applications, you want to parse/decode signed transactions that have not yet been executed and only reside in the mempool. As of today, I have not found any package that is specifically designed for this, but this blog post gives the idea to use py_evm in combination with eth-utils.

Ethereum Raw Transaction Data

Let’s look at the following transaction. This transaction has at some point been in the mempool before it got executed. At that point, the signed transaction did not yet have any logs from the EVM that can be observed and analyzed. Instead, the only data available is the raw transaction HEX string,


An Ethereum raw transaction hex is a hexadecimal string that encodes all the details of an Ethereum transaction, including the sender and recipient addresses, the amount of Ether being transferred, gas price and limit, and any associated data. It is used to transmit transactions across the Ethereum network and can be decoded to understand the details of a transaction.

Let’s define a function that decodes the signed raw transaction,

from eth.vm.forks.arrow_glacier.transactions import ArrowGlacierTransactionBuilder as TransactionBuilder
from eth_utils import encode_hex, to_bytes

def decode_mempool_tx(raw_tx):
    Decodes both EIP-1559 and Legacy Ethereum transactions.
    - 1559 tx: dict_keys(['type_id', '_inner'])
    - Legacy tx: dict_keys(['_nonce', '_gas_price', '_gas', '_to', '_value', '_data', '_v', '_r',
    '_s', '_cached_rlp'])

    raw_tx (str): A hexadecimal string representing a signed transaction.

    dict: Decoded transaction.

    Mock example of raw_tx: '0xf86901844190ab00825208943 ... 9a0a414587d4b1bbf713e42614d36a3f5b27'
    # Convert the hex string to bytes
    signed_tx_as_bytes = to_bytes(hexstr=raw_tx)

    # Deserialize the transaction using the latest transaction builder:
    decoded_tx = TransactionBuilder().decode(signed_tx_as_bytes)

    # Transform to dict
    decoded_tx_dict = decoded_tx.__dict__

    # Check for transaction type and process accordingly
    if 'type_id' in decoded_tx_dict:
        # EIP-1559 transaction
        decoded_tx_dict = decoded_tx_dict.get('_inner', {}).__dict__
        # Legacy transaction
        pass  # No special handling needed for legacy transactions

    # Common processing for both types
    if '_data' in decoded_tx_dict:
        decoded_tx_dict['_data'] = encode_hex(decoded_tx_dict['_data'])
    if '_cached_rlp' in decoded_tx_dict:
        decoded_tx_dict['_cached_rlp'] = encode_hex(decoded_tx_dict['_cached_rlp'])
    if '_to' in decoded_tx_dict:
        decoded_tx_dict['_to'] = encode_hex(decoded_tx_dict['_to'])

    return decoded_tx_dict

Now we can decode the raw transaction to the following:

{'_access_list': (),
 '_cached_rlp': '0xf8b10181a88405f5e100850402a4432082b46d94ed5af388653567af2f388e6224dc7c4b3241c544
 '_chain_id': 1,
 '_data': '0xa22cb465000000000000000000000000ef887e8b1c06209f59e8ae55d0e625c93734437600000000000000
 '_gas': 46189,
 '_max_fee_per_gas': 17224188704,
 '_max_priority_fee_per_gas': 100000000,
 '_nonce': 168,
 '_r': 24038638873168070677323615839725355891951985956712214956528418805691300789895,
 '_s': 10846381322016981702266540511656012158965283465913435801707244187222536387701,
 '_to': '0xed5af388653567af2f388e6224dc7c4b3241c544',
 '_value': 0,
 '_y_parity': 1}

Let’s break down each field in the provided raw transaction:

The Data field in the decoded transaction

In the data, the leading 0x in a string is an indicator that the string is a hexadecimal representation. One byte is represented with 2 HEX characters thus a 32 byte string is represented with 64 HEX characters plus the leading 0x yielding a string of length 66.

“data” stores a hexadecimal encoded hash representation of the target function and arguments for that function. The first 4 bytes after 0x (in this case a22cb465) represents the hashed signature of the function name being called and its input parameters. The rest of the bytes are function arguments passed to the function. The length of the input field can thus vary depending on the function called and its arguments. Every subsequent 32 bytes (64 characters) after the function is a different input variable.

To decode the data field we need to use the application binary interface (ABI) of the smart contract that is used. The ABI is a JSON object containing all the definitions of functions and events for a given smart contract. With the ABI we can match the hashed signatures to readable definitions.

Flashbots Mempool Dumpster

The Mempool Dumpster by flashbots is an excellent source of mempool data. As of today, the data source mempool data from 2023-08-08 and onwards. The data is collected from several node operators (alchemy, apool, blx, infura, “local”) and there are approximately 1-2 million transactions per day.