Solidity has many useful types: uint256
, bool
, string
, struct
, mapping
, arrays, and so on.
However, the EVM itself does not have any awareness of these types. There are no opcodes that deal with or are aware of most Solidity types. Instead, they deal with much lower level types, as we will see below.
If you're already familiar with binary and bytes, feel free to skip this section.
For those that need a refresher, here's a quick review:
0101 1000
is one byte (the space in between is just to make it more readable).0101 1000
as an unsigned integer, converting each bit to its respective power of 2:0+64+0+16 + 8+0+0+0
= 88
x
.0000 0000
, 0000 0001
, 0000 0010
, 0000 0011
, 0000 0100
, etc.0
, while the maximum is 255
.The "types" that the EVM understands are:
uint256
), supported by ADD
, SUB
, LT
, GT
, GAS
, CALLVALUE
, TIMESTAMP
etc.int256
), supported by SIGNEXTEND
, SDIV
, SLT
, SGT
, etc.AND
, OR
, XOR
, BYTE
, SHL
, SHR
, SHA3
, BLOCKHASH
, etc.ADDRESS
, SENDER
, CALL
, etc.CREATE
, CODECOPY
, EXTCODECOPY
, etc.All other types that you see in Solidity are built on top of the above.
Note that any value in the EVM – whether on the stack, in memory, or storage – is all just binary, and only interpreted as a type the moment an opcode operates on that data.
A word size is something that all machines have – it's the “natural chunk size” of data that they operate in.
But what does that mean?
Fundamentally, your computer has limits. It has a limited amount of memory, a limited amount of disk space, and so on.
The same kind of limit exists at the lowest level of binary. For a machine, stack items must have a limit of some length.
The EVM's stack item size is its word size: 256 bits (32 bytes).
This means every stack item is exactly 256 bits. No more, no less. A single boolean, for example, will take a full 256 bits on the stack, even though booleans only need one bit to be fully represented. That's potentially a lot of unused space.
EVM's large word size has many implications about how you can interact with the system. Let's explore each one.
As hinted, each item on the stack is a full 32 bytes. Any items that an opcode puts on the stack will be exactly this size; there is no way to specify otherwise.
As large as 32 bytes is, not everything can fit in a single word. Complex data structures such as arrays and strings take at least two full words – one for the length, plus one or more for the data.
Because of this, oftentimes you will have to move data to memory instead of the stack. The most common example of this is the RETURN
opcode; in order to return data from a function call, you must put that data into memory first, and then execute the RETURN
opcode with what location in memory that data resides.
You'll learn more about working with memory in 🧱 Working with Memory.
When working with memory, you specify a memory offset – a 32-byte offset, starting from zero, of where your data resides in memory.
When working with storage, you specify a storage key – a 32-byte key, of any value, of where you want to read or write a 32-byte value of data.
Consider how a 32-byte key has 2^256 possible values. That's an incredibly large number:
2^256 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
This works out well for storage, since EVM storage is similar to a key-value database; any 32-byte key can be assigned any 32-byte value. It's a benefit to have such a large key space, as it helps avoid conflicts.
But for memory, you will never reach anywhere near the max range. Starting from zero, any memory you use costs additional gas for allocation. For example, using an offset of 0xff
is fine; using an offset of 0xff000000000000000000
is not, as the EVM will try to allocate memory from zero up to that offset.
Contract storage values are also limited to 256 bits. This is one benefit to having the word size so large; you can store and retrieve large values – such as keccak hashes – much easier.
For example, if the EVM were 64-bit instead, you would have to break up an address into three 64-bit storage slots, complicating reads and writes.
When you send a message to call a function, you send function arguments along with it. These arguments are encoded using ABI encoding, which generally encodes data using EVM's word size.
This means, just like the stack example, encoding a bool
for a function call will take a full 32 bytes in the ABI encoded message. That's potentially a lot of unused space.
You'll learn more about ABI encoding in the ABI chapters, specifically in 🔩 The Anatomy of an ABI-Encoded Function Call.
Another benefit for the 256-bit word size is the convenience of working with keccak hashes, which take a full 32 bytes. These hashes can be manipulated directly on the stack.