When working with a low-level machine like the EVM, you don’t have access to higher-level features like variables. Instead, you must manage the state of a single stack in order to pass parameters to the opcodes you want to use. This involves the usual operations you’d expect when working with a stack (pushing, etc).
However, even if these operations are simple and easy to learn, using them to manage the stack can be quite intricate.
The EVM is a stack-based virtual machine. This means that “the stack” is the primary data structure that programs work with when executing EVM opcodes.
As it turns out, most hardware machines choose a stack as the way to read, write, and manage opcode arguments. The reason being, those arguments need to live somewhere, and a stack is a simple and effective data structure to house them.
Consequently, virtual machines typically follow hardware machines in this respect, and the EVM is no different.
The main stack operations on the EVM are:
All contracts will need to work with a known value at some point. For example, consider the following Solidity contract:
contract Example3_3a {
uint counter;
function increment() public {
counter = counter + 7;
}
}
Note how the increment
function is increasing the counter
state variable by a specific amount: 7
. Any time you have a hardcoded value like this – whether a number, boolean, address, string, and so on – this will involve a push opcode.
The compiled output of above example will include this bytecode:
6007
which translates to the following assembly:
PUSH1 0x07
This pushes 0x07
as a 32-byte value onto the stack.
Recall that all stack items are always 32 bytes. Even though 0x07
is only one byte, it becomes a full 32 byte value when pushed onto the stack!
This is due to the EVM's 256-bit word size.
To be explicitly clear: 0x07
becomes 0x0000000000000000000000000000000000000000000000000000000000000007
when pushed onto the stack.
The above 6007
example is a rare case where some bytes (0x07
in this case) do not represent opcodes, but instead represent raw data.
To put it another way, 60
is the opcode for PUSH1
. However, even though 07
is the opcode for SMOD
, it is not being treated as such here. Instead, because 07
follows 60
, 07
is treated as the raw data argument for the 60
opcode.
You will often times need to push larger values than one byte.
For example, a full Ethereum address is 20 bytes. To push a known address onto the stack, you need the PUSH20
opcode, which would look something like this:
PUSH20 0x6B175474E89094C44Da98b954EedeAC495271d0F
Again, this would push a 20 bytes of data as a 32-byte value onto the stack.
The EVM has PUSH opcodes ranging from PUSH1
up to PUSH32
. There is no PUSH33
or above, as such values would not be able to fit onto the stack.
The stack’s main purpose is to pass arguments to opcodes.
For instance, take the ADD
opcode. This opcode takes exactly two arguments. This means that when the ADD
opcode runs, it pops two items off the stack to use as its arguments.
When the ADD
opcode is done, it pushes back onto the stack its computed result.
Let’s see an example:
PUSH1 0x03
PUSH1 0x04
PUSH1 0x05
ADD
What is the final state of the stack after executing this bytecode?
The final state is a stack with two items: 0x03
on bottom, and 0x09
(the result from running ADD
) on top.
The max number of items the stack can have is 1024. If you try to push more items than this, then execution will crash due to a stack overflow.
If you attempt to run an opcode that requires more parameters that currently in the stack (e.g. running ADD
with a stack size of zero or one), then execution will crash due to a stack underflow.
Because opcodes always consume the topmost items in the stack, you will sometimes need to swap the top of the stack with other items deeper in the stack in order to operate on the right data.
As an example, let’s say you have a stack with two items:
A
(labeled for the sake of explanation)B
And let’s say you want to add seven to each of these items.
Here’s an approach that does not work:
; Does not work!
PUSH1 0x07
ADD
PUSH1 0x07
ADD
Why doesn’t this approach work?
7
to A
.A + 7
.7
to B
, but instead adding it to A + 7
A + 14
on top, and the original B
value on bottom.Based on the answer, you can likely see that a SWAP
is needed. Here are two approaches that work:
; Works!
PUSH1 0x07
ADD
SWAP1
PUSH1 0x07
ADD
; Also works!
PUSH1 0x07
ADD
PUSH1 0x07
SWAP1
SWAP2
ADD
The SWAP opcodes range from SWAP1
to SWAP16
. This means that you cannot reach back farther than 16 stack items.
This is part of the reason you'll run into Solidity's "stack too deep" compile-time error. Because Solidity stores variables onto the stack, this EVM limitation makes it difficult for Solidity to have 16 (or near 16) variables within a function.
In theory, it's possible for Solidity to compile to opcodes that store stack variables into memory when necessary, but that may result in too much complexity, or too much bytecode size to be worth it.
Recall that when working with a stack-based machine, you don’t have access to variables.
Therefore, when you want to use a value more than once, and avoid redundant computation while doing so, a good option is to duplicate that value using one of the DUP
opcodes.
For example, let’s say there are three addresses on the stack (sourced from storage, parameters, etc.):
STACK (grows downwards visually)
- C
- B
- A
Now let’s say you want to check if one of them is the known address 0x11102220333044405550666077708880
.
You could write a PUSH20
and an EQ
for each address:
PUSH20 0x1110222033304440555066607770888099904444
EQ
SWAP1
PUSH20 0x1110222033304440555066607770888099904444
EQ
SWAP2
PUSH20 0x1110222033304440555066607770888099904444
EQ
OR
OR
EQ
takes two arguments and produces a 1
if the arguments are equal, or a 0
if they are not.A
0
or 1
from EQ
with B
B
0
or 1
from EQ
with C
C
0
s and/or 1
s into a single value that represents “do any of A
, B
, or C
equal to the known address?”This approach works. However, notice how we need 63 bytes of bytecode for the known address pushes! Can we do better?
An alternate approach is to PUSH20
once for the first use, and then DUP
to reuse the value:
PUSH20 0x1110222033304440555066607770888099904444
DUP1
DUP1 ; Dup all upfront
SWAP3 ; Swap with A
EQ
SWAP3 ; Swap with B
EQ
SWAP3 ; Swap with C
EQ
OR
OR
With this new approach, we save 40 bytes in bytecode size by not needing to include a PUSH20
more than once.
Also note the order of DUP
s in this approach. If you want to reuse a value, you must DUP
it before the first use. If you don’t, then the first use will consume that value before you can DUP
it for a second use.
The DUP opcodes range from DUP1
to DUP16
. This means that you cannot reach back farther than 16 stack items.
Working with basic stack operations is simple, but intricate. Because we don’t have access to variables when working with the EVM, we must push, dup, swap, and consume items on the stack in order to run the operations we desire.
In a future chapter (⚙️ Internal Function Calls (coming soon)) we will learn how to use the stack to simulate internal function calls.