I recently started working on a Bitcoin client so I could learn more about Bitcoin internals and the P2P protocol. I've written some C++ code that can connect to the Bitcoin P2P network and do basic things like syncing block headers. It's been a lot of fun, and I've learned a lot from the experience.
One of the things that I learned as a result of this project is exactly how blocks are mined and organized. Bitcoin uses a proof-of-work system to mine new blocks, which is implemented by repeatedly computing SHA-256 hashes. The way I always assumed mining worked is that miners repeatedly vary some fields in a block like the nonce and then hash the block. They do this over and over again until they find a block whose hash is lower than the difficulty target. Since Bitcoin blocks are currently about 1 MB in size, I always assumed that this proof-of-work function requires hashing 1 MB of data. I've since learned that this is dead wrong.
Miners don't hash blocks, they hash block headers. A block header is an 80-byte message that contains metadata about a block (which I covered briefly in a recent post). Since miners are hashing block headers rather than the blocks themselves, the amount of time it takes to do the proof-of-work hash for a block is independent of the number of transactions in the block. Miners are always hashing 80 bytes of data, regardless of the actual size of the underlying block.
Technically miners do have to hash the full block data to generate a Merkle root, which is part of the block header. This means that at some point in time, a miner really does need to hash all of the data in a block. But let's examine this closer. The nonce field in a block header is 32 bits, so there's four billion possible variations of a given Merkle root and timestamp just via the nonce field. Timestamps in block headers are allowed to have some skew: you can be approximately two hours off from true time. If you take full advantage of this, just between the timestamp and the nonce there are trillions of possible hashes per Merkle root. The Bitcoin difficulty target is high enough that trillions of hashes is rarely sufficient to find a block, so miners periodically update the block's coinbase transaction to generate a new Merkle root. You might think that this operation at least would require rehashing the full block, but that's also not true! A Merkle tree is a binary tree of hashes, where the value for each interior node in the tree is the hash of its children. This means that if one of the leaves in an $N$ element Merkle tree is updated, the new Merkle root can be recomputed in $O(\log_2 N)$ hashing operations. Any way you cut it, miners only have to do a full hash over the block data once per block, which is only about every ten minutes.
My point is that miners are insensitive to block size because it doesn't really affect them. Bitcoin could switch to gigabyte blocks, and it would have almost no impact on miners. On the other hand, block size very much impacts everyone else. Bigger blocks mean a bigger blockchain, and that deters users from running full nodes. The block size even impacts users running pruned nodes, as it affects the amount of data involved in the initial blockchain sync, as well as the amount of data transferred over the P2P network when new blocks are announced. The block size debate is a complex and nuanced issue. I see how increasing the block size would be an effective short term fix to Bitcoin scaling and transaction fees. But I also see how large blocks might ultimately make running full nodes too onerous for hobbyists, and how this will lead to increased centralization. There's no easy answer here, and I can understand both points of view on the matter. But I hope that others evaluating the issue are at least aware of why miners have an incentive to push for bigger blocks: it increases the amount of Bitcoin they collect from transaction fees at almost no cost to them.