Some times, e.g., in MEV applications, you want to parse/decode signed transactions that have not yet been executed and only reside in the mempool. As of today, I have not found any package that is specifically designed for this, but this blog post gives the idea to use py_evm in combination with eth-utils.
Let’s look at the following transaction. This transaction has at some point been in the mempool before it got executed. At that point, the signed transaction did not yet have any logs from the EVM that can be observed and analyzed. Instead, the only data available is the raw transaction HEX string,
0x02f8b10181a88405f5e100850402a4432082b46d94ed5af388653567af2f388e6224dc7c4b3241c54480b844a22cb4650
00000000000000000000000ef887e8b1c06209f59e8ae55d0e625c937344376000000000000000000000000000000000000
0000000000000000000000000001c001a03525632fa005bc5f7ea221641bd68d2852c54955535d5323d4c69f3e1da53e87a
017fad5931f8e547dd0659a41b8ae6763b0fe0f268d2ffe09f051d71d48c47c75
An Ethereum raw transaction hex is a hexadecimal string that encodes all the details of an Ethereum transaction, including the sender and recipient addresses, the amount of Ether being transferred, gas price and limit, and any associated data. It is used to transmit transactions across the Ethereum network and can be decoded to understand the details of a transaction.
Let’s define a function that decodes the signed raw transaction,
from eth.vm.forks.arrow_glacier.transactions import ArrowGlacierTransactionBuilder as TransactionBuilder
from eth_utils import encode_hex, to_bytes
def decode_mempool_tx(raw_tx):
'''
Decodes both EIP-1559 and Legacy Ethereum transactions.
- 1559 tx: dict_keys(['type_id', '_inner'])
- Legacy tx: dict_keys(['_nonce', '_gas_price', '_gas', '_to', '_value', '_data', '_v', '_r',
'_s', '_cached_rlp'])
Parameters:
raw_tx (str): A hexadecimal string representing a signed transaction.
Returns:
dict: Decoded transaction.
Mock example of raw_tx: '0xf86901844190ab00825208943 ... 9a0a414587d4b1bbf713e42614d36a3f5b27'
'''
# Convert the hex string to bytes
signed_tx_as_bytes = to_bytes(hexstr=raw_tx)
# Deserialize the transaction using the latest transaction builder:
decoded_tx = TransactionBuilder().decode(signed_tx_as_bytes)
# Transform to dict
decoded_tx_dict = decoded_tx.__dict__
# Check for transaction type and process accordingly
if 'type_id' in decoded_tx_dict:
# EIP-1559 transaction
decoded_tx_dict = decoded_tx_dict.get('_inner', {}).__dict__
else:
# Legacy transaction
pass # No special handling needed for legacy transactions
# Common processing for both types
if '_data' in decoded_tx_dict:
decoded_tx_dict['_data'] = encode_hex(decoded_tx_dict['_data'])
if '_cached_rlp' in decoded_tx_dict:
decoded_tx_dict['_cached_rlp'] = encode_hex(decoded_tx_dict['_cached_rlp'])
if '_to' in decoded_tx_dict:
decoded_tx_dict['_to'] = encode_hex(decoded_tx_dict['_to'])
return decoded_tx_dict
Now we can decode the raw transaction to the following:
{'_access_list': (),
'_cached_rlp': '0xf8b10181a88405f5e100850402a4432082b46d94ed5af388653567af2f388e6224dc7c4b3241c544
80b844a22cb465000000000000000000000000ef887e8b1c06209f59e8ae55d0e625c937344376000000000000000000000
0000000000000000000000000000000000000000001c001a03525632fa005bc5f7ea221641bd68d2852c54955535d5323d4
c69f3e1da53e87a017fad5931f8e547dd0659a41b8ae6763b0fe0f268d2ffe09f051d71d48c47c75',
'_chain_id': 1,
'_data': '0xa22cb465000000000000000000000000ef887e8b1c06209f59e8ae55d0e625c93734437600000000000000
00000000000000000000000000000000000000000000000001',
'_gas': 46189,
'_max_fee_per_gas': 17224188704,
'_max_priority_fee_per_gas': 100000000,
'_nonce': 168,
'_r': 24038638873168070677323615839725355891951985956712214956528418805691300789895,
'_s': 10846381322016981702266540511656012158965283465913435801707244187222536387701,
'_to': '0xed5af388653567af2f388e6224dc7c4b3241c544',
'_value': 0,
'_y_parity': 1}
Let’s break down each field in the provided raw transaction:
_access_list
: This is the access list of the transaction. An access list is a concept
introduced in EIP-2930 and it contains addresses and storage keys the transaction intends to
access. In this case, it’s empty ().
_cached_rlp
: This represents the cached Recursive Length Prefix (RLP) encoding of the
transaction. RLP is a serialization method used in Ethereum. The string here is the RLP-encoded
form of this transaction.
_chain_id
: This is the identifier of the Ethereum network where this transaction is intended to
be processed. 1 denotes the Ethereum main network.
_data
: This field holds the data payload of the transaction. It’s often used when interacting
with smart contracts, where this field would contain the function identifier and parameters.
_gas
: This is the gas limit for the transaction, indicating the maximum amount of gas that can
be consumed by this transaction. Here, it is set to 46189.
_max_fee_per_gas
: This is the maximum fee per gas that the sender is willing to pay, a concept
introduced in EIP-1559 for a more efficient fee market. The value 17224188704 is in wei.
_max_priority_fee_per_gas
: This represents the maximum priority fee per gas the sender is
willing to pay to miners or validators. Set to 100000000 wei, this is also known as a tip to
incentivize faster inclusion in a block.
_nonce
: This is the count of the number of transactions sent from the sender’s address. Here,
it’s 168, meaning this is the 169th transaction (count starts from 0) sent from the sender’s
address.
_r
, _s
, _y_parity
: These fields are components of the transaction’s digital signature. _r
and _s
are the signature values, and _y_parity
indicates the recovery id, which helps in
recovering the sender`s address from the signature.
_to
: This is the recipient’s address. The transaction is intended to be sent to
0xed5af388653567af2f388e6224dc7c4b3241c544.
_value
: This is the amount of Ether (in wei) to be transferred. In this case, it’s 0,
indicating that this transaction may be purely for interacting with a contract, without
transferring any Ether.
In the data, the leading 0x
in a string is an indicator that the string is a hexadecimal
representation. One byte is represented with 2 HEX characters thus a 32 byte string is represented
with 64 HEX characters plus the leading 0x yielding a string of length 66.
“data” stores a hexadecimal encoded hash representation of the target function and arguments for
that function. The first 4 bytes after 0x (in this case a22cb465
) represents the hashed signature
of the function name being called and its input parameters. The rest of the bytes are function
arguments passed to the function. The length of the input field can thus vary depending on the
function called and its arguments. Every subsequent 32 bytes (64 characters) after the function is
a different input variable.
To decode the data field we need to use the application binary interface (ABI) of the smart contract that is used. The ABI is a JSON object containing all the definitions of functions and events for a given smart contract. With the ABI we can match the hashed signatures to readable definitions.
The Mempool Dumpster by flashbots is an excellent source of mempool data. As of today, the data source mempool data from 2023-08-08 and onwards. The data is collected from several node operators (alchemy, apool, blx, infura, “local”) and there are approximately 1-2 million transactions per day.