Home / News / Bounty - EVM OpCodes and Precompiles in Motoko
Austin Fatheree, February 09 2024
These bounties were funded by the IC community through the Gitcoin Grants Season 19 initiative. Thank you all for your contribution.
We are funding the creation of a set of motoko EVM opcodes and precompiles. Implementation of Ethereum Virtual Machine (EVM) opcodes in motoko is an educational and potentially foundational exercise. The benefit to the broader ecosystem are first allowing motoko canisters to simulate transactions and longer-term to build a motoko EVM that can be used to simulate other networks or bootstrap new, purpose built EVMs.
Please consult this list of opcodes and this list of precompiles for more information about EVM Opcodes and Precompiles. For more information on the implementation of each opcode and precompile
We do not currently have a fully functioning EVM in motoko. For this bounty we will assume the following Execution Context as necessary to complete the Bounties. In the future a full EVM engine can populate these:
import Stack "mo:base/Stack";
type Address = Blob;
type Byte = Nat8;
type Word = Nat; //256-bit for EVM compliance. Op codes will need to validate that results do not exceed 256-bit numbers and take overflows into consideration
type OpCode = (Nat8,?Blob); // Considering opcodes range from 0x00 to 0xFF. plus a set of bytes that can be included
// A simplified representation of the stack element in EVM.
type StackElement = Nat; // May need to represent 256-bit integers.
// A simplified structure for representing EVM memory.
// uses https://github.com/research-ag/vector
type Memory = Vec<Byte>;
// Represents the EVM storage, mapping 32-byte keys to 32-byte values.
type Storage = Map<[Nat8], [Nat8]>;
type LogEntry = {
topics: Vec<Blob>; // Topics are usually the hashed event signature and indexed parameters
data: Blob; // Non-indexed event parameters
};
type Logs = Vec<LogEntry>;
type StorageSlotChange = {
key: Blob; // Storage key, typically a 32-byte array.
originalValue: ?[Nat8]; // Optional, represents the value before the change. `None` can indicate the slot was empty.
newValue: ?[Nat8]; // Optional, represents the value after the change. `None` can indicate a deletion.
};
type CodeChange = {
key: Blob; // Storage key, typically a 32-byte array.
originalValue: Array<OpCode>; // Optional, represents the value before the change. `None` can indicate the slot was empty.
newValue: ?Array<OpCode>; // Optional, represents the value after the change. `None` can indicate a deletion.
}; //code may not be changeable...only deletable
// The execution context of an EVM call.
type ExecutionContext = {
origin: Blob; //originator of the transaction
code: Array<OpCode>; // Array of opcodes constituting the smart contract code.
programCounter: Nat; // Points to the current instruction in the code.
stack: Stack.Stack<StackElement>; // The stack used for instruction params and return values.
memory: Memory; // Memory accessible during execution.
contractStorage: Storage; // Persistent storage for smart contracts.
caller: Address; // Address of the call initiator.
callee: Address; // Address of the contract being executed.
currentGas: Nat; // Amount of gas available for the current execution.
gasPrice: Nat; // Current gas price.
incomingEth: Nat; //amount of eth included with the call
balanceChanges: vec<({
from: Blob;
to: Blob;
amount: Nat;
})>; //keep track of eth balance changes and commit at the end. Each new context will have to adjust balances based off of this array.
storageChanges: Map<(Blob, StorageSlotChange)>;
codeAdditions: Map.Map<Blob, CodeChange>; //storage DB for EVM code stored by Hash Key
blockHashes: vec<(Nat,Blob)>; //upto last 256 block numbers and hashs
codeStore: Map.Map<Blob, Array<OpCode>>; //storage DB for EVM code stored by Hash Key
storageStore: Trie.Map<Blob, Blob>; //storage DB for Contract Storage stored by Hash Key. CALL implementors will need to keep track of storage changes and revert storage if necessary.
accounts: Trie.Trie<Blob,Blob>; //a merkle patricia tree storing [binary_nonce, binary_balance, storage_root, code_hash] as RLP encoded data - the account bounty hunter will need to create encoders/decoders for use with the trie - https://github.com/relaxed04/rlp-motoko - https://github.com/f0i/merkle-patricia-trie.mo
logs: Logs; //logs produced during execution
var totalGas : Nat; // Used for keeping track of gas
var gasRefund : Nat; // Used for keeping track of gas refunded
var return : ?Blob; set for return
blockInfo: {
number: Nat; //current block number
gasLimit: Nat; //current block gas limit.
difficulty: Nat; //current block difficulty;
timestamp: Nat; //current block timestamp
coinbase: Blob;
chainId: Nat;
};
calldata: Blob; // Input data for the contract execution.
};
Your op code functions should take in the execution context as in input variable and update it as is demanded by the op code.
This document categorizes and lists the opcodes in numerical order within each category, providing a structured schema for implementation in Motoko programming language.
01 ADD02 MUL03 SUB04 DIV05 SDIV06 MOD07 SMOD08 ADDMOD09 MULMOD0A EXP0B SIGNEXTEND10 LT11 GT12 SLT13 SGT14 EQ15 ISZERO16 AND17 OR18 XOR19 NOT1A BYTE1B SHL1C SHR1D SARThe Basic Math and Bitwise Logic section encompasses a collection of EVM opcodes dedicated to performing elementary arithmetic operations, such as addition, subtraction, multiplication, and division, as well as bitwise operations including AND, OR, XOR, and NOT. These opcodes serve as the foundational building blocks for more complex contract logic and computations on the Ethereum Virtual Machine (EVM), and replicating their functionality within the Motoko environment is essential for EVM compatibility.
Implementing these opcodes within the Motoko programming language requires careful attention to the specifics of each operation, including handling overflows, underflows, and division by zero in accordance with the EVM’s behavior. Each opcode function should accept the ExecutionContext as an input variable and modify this context as dictated by the opcode’s semantics, ensuring changes are reflected across the stack, memory, and any intermediate computations.
For arithmetic operations such as ADD, SUB, MUL, and DIV, the opcodes should manage 256-bit integer arithmetic, respecting the bounds of these operations as per the EVM specification. Bitwise operations (AND, OR, XOR, NOT) operate directly on the binary representations of these integers, enabling manipulation of data at the bit level.
30 ADDRESS31 BALANCE32 ORIGIN33 CALLER34 CALLVALUE35 CALLDATALOAD36 CALLDATASIZE37 CALLDATACOPY38 CODESIZE39 CODECOPY3A GASPRICE3B EXTCODESIZE3C EXTCODECOPY3D RETURNDATASIZE3E RETURNDATACOPY3F EXTCODEHASH40 BLOCKHASH41 COINBASE42 TIMESTAMP43 NUMBER44 DIFFICULTY45 GASLIMIT46 CHAINID47 SELFBALANCE48 BASEFEEThe Environmental Information category of opcodes allows smart contracts to access information about the blockchain environment in which they are executed. These opcodes provide functionalities to retrieve details such as the address of the caller, the contract itself, the balance of an account, input data of a call, and more. Implementing these opcodes in Motoko is crucial for building an EVM-compatible environment, enabling smart contracts to make decisions based on their current execution context.
When implementing these opcodes, the bounty hunter should consider that each opcode is designed to retrieve specific environmental information and interact with the ExecutionContext data structure accordingly. The structure of your op code calls should be as follows:
Input Handling: Each opcode function should accept the ExecutionContext as an input. This context contains all necessary information about the current state of execution, such as caller’s address, contract address, call value, and more.
Operation Execution: Based on the opcode being implemented, extract the required information from the appropriate field within the ExecutionContext. For example, CALLER would retrieve the address of the initiator of the current call from the caller field.
Stack Update: After retrieving the required information, push the result onto the stack contained within the ExecutionContext. Ensure that the data type and size are compliant with the EVM standards (e.g., addresses and balances should be represented as 256-bit integers).
Return and Update: The opcode function should not return any value directly. Instead, it updates the ExecutionContext that was passed as an input, reflecting changes on the stack, and any other relevant modifications based on the opcode’s logic.
Balance Queries (BALANCE): When implementing opcodes like BALANCE, ensure you’re querying the correct information source within the ExecutionContext. For instance, you might need to access both the accounts data structure and the balanceChanges vector to accurately calculate the current balance of a queried address, taking into account any in-flight transactions or changes during execution.
Environmental Data (CALLDATALOAD, CALLDATASIZE, CALLDATACOPY): For opcodes that interact with call data, make sure to handle data offsets and lengths accurately, ensuring safe access to the calldata field without risking out-of-bounds errors.
Gas and Blockchain Context (GASPRICE, BLOCKHASH, CHAINID): Implementing these opcodes requires careful handling of the ExecutionContext’s fields related to blockchain context and execution gas. Remember to access the correct information, considering potential updates during transaction execution.
Block information opcodes like TIMESTAMP, NUMBER, and DIFFICULTY are useful for operations that depend on blockchain specifics, like generating randomness or enforcing time-dependent conditions.
50 POP51 MLOAD52 MSTORE53 MSTORE854 SLOAD55 SSTORE56 JUMP57 JUMPI58 PC59 MSIZE5A GAS5B JUMPDESTThe Memory Operations category encompasses a variety of EVM opcodes designed to interact with and manipulate the memory space available during the execution of smart contracts. Memory in the EVM is a volatile data storage area that is erased between external function calls and transactions. The primary purpose of these operations is to enable the reading, writing, and management of data in memory during contract execution, allowing for dynamic data manipulation within the scope of a single transaction or function call.
Implementation of memory operations opcodes in Motoko requires understanding and manipulation of the Memory data structure within the ExecutionContext. Memory operations include loading data from memory (MLOAD), storing data in memory (MSTORE, MSTORE8), and querying the size of the active memory space (MSIZE). Additionally, there are opcodes dedicated to jumping within the contract code, based on conditions (JUMP, JUMPI, JUMPDEST), and accessing or modifying the program counter (PC).
Memory Access: The Memory type, represented as a vector of bytes (Vec<Byte>), should be accessed and modified by memory opcodes. For instance, MLOAD reads a specific location from memory, while MSTORE writes to a given location.
Data Encoding and Decoding: Memory operations might require encoding data into EVM’s big-endian format before storing and decoding it back into Motoko’s native types upon loading. Careful management of data sizes and alignments according to the specification is crucial.
Dynamic Memory Expansion: The size of the Memory should dynamically increase to accommodate writes to previously unallocated areas. This expansion should be reflected in the MSIZE operation and factored into gas calculations, as memory expansion incurs gas costs.
Jump Operations: Implementations of JUMP and JUMPI must validate the destination against a list of valid jump destinations (denoted by JUMPDEST opcodes) within the contract code. This is a critical security mechanism to prevent unauthorized jumps to arbitrary code locations.
Memory Initialization: Initially, memory is empty. The first store operation should allocate memory space dynamically, conforming to the EVM gas costing for memory expansion.
Bounds and Safety Checks: Memory operations should include bounds checking to prevent overflows and underflows. For instance, attempting to read beyond the current memory size should either result in an error or return zeros (consistent with EVM behavior).
Program Counter Management: For jump operations (JUMP, JUMPI), updating the programCounter within the ExecutionContext accurately is critical to ensure proper execution flow. Validation against JUMPDEST instructions ensures that jumps are only made to authorized points in the code.
Gas Calculation: The implementation must calculate the gas costs for memory operations, particularly for memory expansion. This includes updating the currentGas within the ExecutionContext, following the EVM’s gas pricing structure.
5F PUSH060 PUSH161 PUSH262 PUSH363 PUSH464 PUSH565 PUSH666 PUSH767 PUSH868 PUSH969 PUSH106A PUSH116B PUSH126C PUSH136D PUSH146E PUSH156F PUSH1670 PUSH1771 PUSH1872 PUSH1973 PUSH2074 PUSH2175 PUSH2276 PUSH2377 PUSH2478 PUSH2579 PUSH267A PUSH277B PUSH287C PUSH297D PUSH307E PUSH317F PUSH3280 DUP181 DUP282 DUP383 DUP484 DUP585 DUP686 DUP787 DUP888 DUP989 DUP108A DUP118B DUP128C DUP138D DUP148E DUP158F DUP1690 SWAP191 SWAP292 SWAP393 SWAP494 SWAP595 SWAP696 SWAP797 SWAP898 SWAP999 SWAP109A SWAP119B SWAP129C SWAP139D SWAP149E SWAP159F SWAP16PUSH1 0x60 to PUSH32 0x7F: These opcodes push 1 to 32 bytes onto the stack, respectively. The number of bytes to push is determined by the opcode. For example, PUSH1 pushes 1 byte and PUSH32 pushes 32 bytes onto the stack. The bytes are read from the program immediately following the opcode.DUP1 0x80 to DUP16 0x8F: These opcodes duplicate 1 to 16th stack element to the top of the stack, respectively. For example, DUP1 duplicates the top stack element, and DUP16 duplicates the 16th stack element from the top.SWAP1 0x90 to SWAP16 0x9F: These opcodes swap the top stack element with one of the 1 to 16th elements below it. For example, SWAP1 swaps the top two elements of the stack, and SWAP16 swaps the top element with the 16th element below it.DUP operation. Attempting to duplicate an element not present should result in an error.currentGas in the ExecutionContext for each operation performed.A0 LOG0A1 LOG1A2 LOG2A3 LOG3A4 LOG4Logging operations in the EVM are essential for emitting events that can be consumed by external entities monitoring blockchain activity. These opcodes (LOG0 to LOG4) allow smart contracts to record indexed information and data blobs, which external applications can use to track contract events, state changes, or any notable occurrences dictated by the contract logic.
Implementing logging opcodes in Motoko requires interaction with the ExecutionContext to update the logs vector with new log entries. Each LOG opcode differs in the number of topics it allows for indexing, ranging from zero (LOG0) to four (LOG4). The data portion of the log is a binary blob, which can contain arbitrary data from the contract’s execution environment.
Bounty hunters implementing these opcodes should structure their operations as follows:
LOG opcode being executed.LogEntry by packaging the extracted topics and data. The structure of a LogEntry includes a list of topics (Vec<Blob>) and the data blob itself (Blob).LogEntry to the logs vector within the ExecutionContext. This ensures that all emitted logs are captured in the context of the current transaction execution.currentGas field in the ExecutionContext appropriately ensures that gas usage reflects the computational and storage resources consumed by logging.LOG opcode used. Ensure that your implementation respects these limits and properly handles cases where too many topics are provided.ExecutionContext.Logging operations do not produce a direct output on the stack but modify the execution context’s state by appending new entries to the logs vector. This indirect output is instrumental for off-chain applications and tools to monitor, index, and interpret contract activity, making these operations crucial for contract transparency and external integration.
00 STOPFD REVERTFE INVALIDFF SELFDESTRUCTF0 CREATEF1 CALLF2 CALLCODEF3 RETURNF4 DELEGATECALLF5 CREATE2FA STATICCALLFB TXHASHFC CHAINIDThis category encompasses a range of EVM opcodes designed for controlling contract execution flow, system-level interactions, and the creation and management of contracts. These operations are critical for implementing contract logic that responds to execution conditions, interfaces with other contracts, and dynamically generates new contracts. Understanding the nuances of these opcodes is essential for building compliant and secure smart contracts on an EVM-compatible platform like the Internet Computer (IC) using Motoko.
Execution and system operations, within the context of Motoko and the Internet Computer’s architecture, must meticulously manage the ExecutionContext to accurately reflect changes in state, control flow, and contract interactions. The design of these opcode functions demands careful consideration of the IC’s unique features, such as cycles management and canister interactions, while adhering to EVM specifications. Here are crucial points to consider while implementing these opcodes:
Flow Control: Opcodes like STOP, RETURN, and REVERT are fundamental in managing the execution flow. They dictate the end of execution, returning data to the caller or reverting state changes, respectively. Implementations must ensure that these opcodes accurately update the ExecutionContext, particularly setting the return field where appropriate, and managing gas accounting for partial or full executions.
Error and Exception Handling: The INVALID opcode represents an explicit exception in contract execution, typically leading to the termination of execution and reverting of all changes. Motoko implementations must correctly signal errors and ensure that state changes are not persisted in such cases, aligning with the atomic transaction model of the EVM.
Contract Creation and Interaction: The CREATE, CREATE2, CALL, and DELEGATECALL opcodes facilitate the dynamic creation of contracts and interaction between contracts. These require intricate handling of the ExecutionContext to simulate nested transactions/calls within the IC’s environment. Implementors must manage the creation of new ExecutionContexts for each call or contract creation, accurately passing gas, value, and data between contexts, and correctly merging state changes upon successful completion.
System Level Information and Operations: Opcodes like CHAINID and TXHASH provide access to blockchain-specific data. Implementations must derive these values from the IC’s environment or simulate equivalent values where direct analogues might not exist.
State Isolation and Commitment: Ensure isolation between execution contexts for CALL and CREATE operations, committing state changes only upon successful execution. Revert state changes on operation failures.
Gas Management: Accurately calculate and deduct gas costs for execution and system operations, updating currentGas in ExecutionContext. Implement gas forwarding rules for calls and creations, respecting gas stipends for calls with value transfers.
Return Data Handling: For the RETURN and REVERT operations, properly set the return field in the ExecutionContext to manage returning data to the caller or reverting state with an error message.
Secure Contract Interaction: Validate destination addresses for CALL and DELEGATECALL operations, ensuring they reference valid contracts or precompiles. For CREATE and CREATE2, implement address generation according to EVM specifications and ensure non-collision with existing addresses.
SELFDESTRUCT Implementation: Implement self-destruct functionality with caution, considering the permanence of such an action within the IC’s architecture. This might involve marking contracts as inactive rather than deleting them, reflecting the EVM’s semantics while aligning with the IC’s model.
Security and Compliance: Rigorously test opcode implementations against common security pitfalls and compliance with EVM specifications. This includes handling deep call stacks, stack underflows/overflows, and ensuring atomicity of transactions.
0001 ECDSA Recovery (Elliptic Curve Digital Signature Algorithm)0002 SHA-256 Hash Function0003 RIPEMD-160 Hash Function0004 Identity Function (Data Copy)0005 Modular Exponentiation0006 Elliptic Curve Addition0007 Elliptic Curve Scalar Multiplication0008 Elliptic Curve Pairing Check0009 Blake2 Compression Function FPrecompiled contracts in Ethereum are a set of contracts provided as part of the Ethereum protocol. These contracts are implemented at the protocol level but are presented and interacted with as if they are smart contracts at specific addresses. Precompiles are designed to perform specific, computationally intensive operations such as cryptographic operations or hashing at a lower gas cost than if they were implemented in EVM bytecode in a regular contract. This makes certain operations practical and efficient within the blockchain context, which otherwise would be prohibitively expensive and slow.
When implementing precompile operations in Motoko for the Internet Computer (IC) Ecosystem, bounty hunters should structure their opcode calls to simulate the behavior of these precompiled contracts closely. Given that the IC does not natively support these operations as precompiles, developers must create efficient Motoko implementations that mimic their Ethereum counterparts. The precompiles address a range of operations, from cryptographic functions like ECDSA recovery to various hashing functions, each having a unique precompile address in Ethereum. Various libraries may be imported to support the implementation of these precompiles. In instances where an existing library does not exist, the developer should implement it.
These bounty hunter should intercept any CALL(or related) opcodes that might reference the precompile addresses and route operation through the developed precompile functions.
Efficiency and Accuracy: Implementations must focus on both computational efficiency and accuracy. Since the primary advantage of precompiles lies in their low gas cost for complex operations, your Motoko implementation should aim to be as optimized as possible while producing the correct results.
Execution Context Interaction: Similar to EVM opcodes, precompile function calls must accept and interact with the ExecutionContext. While they may not alter the execution context as extensively as some opcodes do, they must accurately calculate and deduct their gas costs based on the input size and operation complexity, updating the ExecutionContext’s currentGas.
Return Values and Error Handling: Precompile calls are expected to return values or throw errors in specific cases, much like regular smart contract functions. Successful operations should return their result in a manner expected by EVM semantics (typically pushing the result onto the stack), and errors or exceptions should revert any changes made during their execution, preserving the atomicity of transactions.
Interface Definition: Define a clear and consistent interface for each precompile, considering the input parameters and expected output. This assists in abstracting the precompile’s implementation and facilitating future optimizations or revisions.
Gas Costing: Implement gas costing according to the predefined rules set by the Ethereum specifications for each precompile. Accurate gas calculation ensures the economic equivalence of precompile operations across Ethereum and the IC Ecosystem.
Testing and Validation: Rigorous testing is essential. Compare your outputs against known outputs generated by Ethereum’s precompiles to validate correctness. Consider edge cases and input extremes to ensure reliability and robustness.
Documentation: Provide comprehensive documentation for each precompile implementation, detailing the operation performed, expected inputs, outputs, and any limitations or deviations from Ethereum’s behavior. Documentation supports maintainability and usability by other developers in the ecosystem.
Testing is a crucial aspect of software development, ensuring that each module of your code behaves as expected under various conditions. For the implementation of EVM opcodes and precompiles in Motoko, we recommend a robust approach to unit testing, covering each opcode’s functionality comprehensively. This section outlines key considerations and recommendations for writing effective unit tests for the opcodes implementation.
ADD, MUL, SUB, DIV, etc.: Apart from normal operation, include tests for special cases like division by zero, multiplication resulting in overflow, and subtraction resulting in underflow.MLOAD, MSTORE, SLOAD, SSTORE: It’s crucial to test not only successful reads and writes but also attempts to access invalid memory or storage locations (e.g., out-of-bounds or unallocated memory).JUMP, JUMPI: Test for both valid and invalid jump destinations, ensuring that JUMPDEST validation is correctly implemented.JUMPI, include scenarios where the jump should and should not be taken, based on the condition provided.The bounty associated with the implementation of EVM opcodes and precompiles in Motoko will be assigned to a single individual or team. This approach allows for concentrated effort and a consistent vision throughout the development process. However, we recognize the variability of personal circumstances and the challenges that a project of this depth may present.
Code will be pushed to https://github.com/icdevsorg/evm.mo.
The project is structured to allow for modular completion. Contributors can achieve progress in distinct blocks or stages, aligning with the separate categories or sets of opcodes and precompiles laid out in the specifications. This structure facilitates manageable goals and provides clear checkpoints for progress assessment.
We understand that unforeseen circumstances may prevent a contributor from completing the entire bounty. In such cases, compensation will be awarded proportionally based on the percentage of the project that has been completed and approved by the project’s review team. This ensures that all efforts are recognized and valued, even if the project’s full scope cannot be realized by the initial assignee.
Should the original assignee be unable to complete the bounty, a clear and structured handover process will be implemented. This process involves: