Parse Ethereum Mempool Data

Introduction

Some times, e.g., in MEV applications, you want to parse/decode signed transactions that have not yet been executed and only reside in the mempool. As of today, I have not found any package that is specifically designed for this, but this blog post gives the idea to use py_evm in combination with eth-utils.

Ethereum Raw Transaction Data

Let’s look at the following transaction. This transaction has at some point been in the mempool before it got executed. At that point, the signed transaction did not yet have any logs from the EVM that can be observed and analyzed. Instead, the only data available is the raw transaction HEX string,

0x02f8b10181a88405f5e100850402a4432082b46d94ed5af388653567af2f388e6224dc7c4b3241c54480b844a22cb4650
00000000000000000000000ef887e8b1c06209f59e8ae55d0e625c937344376000000000000000000000000000000000000
0000000000000000000000000001c001a03525632fa005bc5f7ea221641bd68d2852c54955535d5323d4c69f3e1da53e87a
017fad5931f8e547dd0659a41b8ae6763b0fe0f268d2ffe09f051d71d48c47c75

An Ethereum raw transaction hex is a hexadecimal string that encodes all the details of an Ethereum transaction, including the sender and recipient addresses, the amount of Ether being transferred, gas price and limit, and any associated data. It is used to transmit transactions across the Ethereum network and can be decoded to understand the details of a transaction.

Let’s define a function that decodes the signed raw transaction,

from eth.vm.forks.arrow_glacier.transactions import ArrowGlacierTransactionBuilder as TransactionBuilder
from eth_utils import encode_hex, to_bytes

def decode_mempool_tx(raw_tx):
    '''
    Decodes both EIP-1559 and Legacy Ethereum transactions.
    - 1559 tx: dict_keys(['type_id', '_inner'])
    - Legacy tx: dict_keys(['_nonce', '_gas_price', '_gas', '_to', '_value', '_data', '_v', '_r',
    '_s', '_cached_rlp'])

    Parameters:
    raw_tx (str): A hexadecimal string representing a signed transaction.

    Returns:
    dict: Decoded transaction.

    Mock example of raw_tx: '0xf86901844190ab00825208943 ... 9a0a414587d4b1bbf713e42614d36a3f5b27'
    '''
    # Convert the hex string to bytes
    signed_tx_as_bytes = to_bytes(hexstr=raw_tx)

    # Deserialize the transaction using the latest transaction builder:
    decoded_tx = TransactionBuilder().decode(signed_tx_as_bytes)

    # Transform to dict
    decoded_tx_dict = decoded_tx.__dict__

    # Check for transaction type and process accordingly
    if 'type_id' in decoded_tx_dict:
        # EIP-1559 transaction
        decoded_tx_dict = decoded_tx_dict.get('_inner', {}).__dict__
    else:
        # Legacy transaction
        pass  # No special handling needed for legacy transactions

    # Common processing for both types
    if '_data' in decoded_tx_dict:
        decoded_tx_dict['_data'] = encode_hex(decoded_tx_dict['_data'])
    if '_cached_rlp' in decoded_tx_dict:
        decoded_tx_dict['_cached_rlp'] = encode_hex(decoded_tx_dict['_cached_rlp'])
    if '_to' in decoded_tx_dict:
        decoded_tx_dict['_to'] = encode_hex(decoded_tx_dict['_to'])

    return decoded_tx_dict

Now we can decode the raw transaction to the following:

{'_access_list': (),
 '_cached_rlp': '0xf8b10181a88405f5e100850402a4432082b46d94ed5af388653567af2f388e6224dc7c4b3241c544
80b844a22cb465000000000000000000000000ef887e8b1c06209f59e8ae55d0e625c937344376000000000000000000000
0000000000000000000000000000000000000000001c001a03525632fa005bc5f7ea221641bd68d2852c54955535d5323d4
c69f3e1da53e87a017fad5931f8e547dd0659a41b8ae6763b0fe0f268d2ffe09f051d71d48c47c75',
 '_chain_id': 1,
 '_data': '0xa22cb465000000000000000000000000ef887e8b1c06209f59e8ae55d0e625c93734437600000000000000
00000000000000000000000000000000000000000000000001',
 '_gas': 46189,
 '_max_fee_per_gas': 17224188704,
 '_max_priority_fee_per_gas': 100000000,
 '_nonce': 168,
 '_r': 24038638873168070677323615839725355891951985956712214956528418805691300789895,
 '_s': 10846381322016981702266540511656012158965283465913435801707244187222536387701,
 '_to': '0xed5af388653567af2f388e6224dc7c4b3241c544',
 '_value': 0,
 '_y_parity': 1}

Let’s break down each field in the provided raw transaction:

The Data field in the decoded transaction

In the data, the leading 0x in a string is an indicator that the string is a hexadecimal representation. One byte is represented with 2 HEX characters thus a 32 byte string is represented with 64 HEX characters plus the leading 0x yielding a string of length 66.

“data” stores a hexadecimal encoded hash representation of the target function and arguments for that function. The first 4 bytes after 0x (in this case a22cb465) represents the hashed signature of the function name being called and its input parameters. The rest of the bytes are function arguments passed to the function. The length of the input field can thus vary depending on the function called and its arguments. Every subsequent 32 bytes (64 characters) after the function is a different input variable.

To decode the data field we need to use the application binary interface (ABI) of the smart contract that is used. The ABI is a JSON object containing all the definitions of functions and events for a given smart contract. With the ABI we can match the hashed signatures to readable definitions.

Flashbots Mempool Dumpster

The Mempool Dumpster by flashbots is an excellent source of mempool data. As of today, the data source mempool data from 2023-08-08 and onwards. The data is collected from several node operators (alchemy, apool, blx, infura, “local”) and there are approximately 1-2 million transactions per day.