See, if you're just in the market for self-reference-based impossibility, the Halting Problem will do: it says, for a model of computation that is a little funny and cooked up, but not so far away from our modern intuitions of computers, that a computer program can't have infinite foresight in predicting what absolutely any another program will do. And if you've had a bit of training in theoretical computer science, this fact becomes kind of second nature to you; some programs are cleverer than others, but no program can outclever them all.
Seen through modern eyes, Gödel's theorem is a strengthening of this fact. (although it came out 5 years earlier!) Not only can we dream up some funny model of computation and show that it can't hoist itself by its bootstraps, that self-same powerful model of computation already lives right in our backyard, in the basic arithmetic of addition and multiplication.
And we come to the cute technical number-theoretic trick. It's "Gödel's β-function". It's a way of representing sequences of numbers as a pair of numbers in such a way that asking for the ith element of the sequence is still easy. We say (b,c) represents the sequence (a1, a2, ... an) if
b mod (1 + i * c) = ai
for every i from 1 to n. (The historical β function I believe is β(b, c, i) = b mod (1 + (i + 1) * c), but I'm dropping a plus-one because I'm 1-indexing my sequences) If someone hands us a (b, c) pair, it's certainly easy to do the + and the * and the mod and get out the sequence, but it's not so obvious how to cook a sequence down into a (b, c) pair. But it's easy to show that it's always possible. Suppose someone gives you a sequence (a1, a2, ... an). Then let c be the first multiple of n! that's at least as big as max(1, a2, ... an), and abbreviate ui = 1 + i * c. First note that
(*) c is relatively prime to all the ui.
For if any prime p divided c, then ui would be one more than a multiple of p, and p wouldn't divide ui. Second, we can see that
(**) all of the ui are relatively prime to each other.
For suppose there were some p that divided both ui and uj. Then p would also have to divide their difference,
ui - uj = (1 + i * c) - (1 + j * c) = (i-j) * c
So, we know that p divides (i-j) * c... we just established that it doesn't divide c, so it's got to divide i-j. But i-j is really small, right? It's less than n, because both i and j are just indexes into a length-n list. So p itself is less than n, too. But every number less than n divides c, because it chose it to be a multiple of n!. So now we see that p divides ui (not to mention uj) and also c, again contradicting (*).
This result (**) hands us a pile of numbers all relatively prime to each other, and they're all at least as big as 1 + c, so they're all bigger than every ai. The Chinese Remainder Theorem tells us that when we have such a pile, we can find a b that has any sequence of remainders that we like. Our favorite sequence of remainders is
b mod u1 = a1
b mod un = an
so this is what we ask for, and it arrives FedEx the very next morning. But the b and the c that we constructed are now exactly the (b,c) pair we seek to represent the sequence (a1, ... an).
This relation between pairs (b,c) and sequences isn't one-to-one or anything, but we don't need it to be. We just need to know that querying the mathematical universe whether there exists a (b,c) that has certain properties is the same as querying whether there exists a sequence with corresponding properties. And since we just showed that every finite sequence has at least some (b,c) that represents it, this is indeed true!
Now we can do things like solve Hofstadter's puzzle about powers of 10!
x is a power of 10 if and only if there exist numbers (b,c) and n ("there exists a sequence of length n") such that b mod (1 + c) = 1 ("the first element of the sequence is 1") and for every k between 1 and (n-1), we have that 10 * (b mod (1 + k * c)) = b mod (1 + (k+1) * c) ("each element of the sequence is 10 times the previous element") and b mod (1 + n * c) = x ("the last element of the sequence is x")
Or, better yet, we can define the three-way predicate that y is x to the nth power:
z is x to the yth power if and only if there exist numbers (b,c) ("there exists a sequence of length n") such that b mod (1 + c) = 1 ("the first element of the sequence is 1") and for every k between 1 and (n-1), we have that x * (b mod (1 + k * c)) = b mod (1 + (k+1) * c) ("each element of the sequence is x times the previous element") and b mod (1 + n * c) = y ("the last element of the sequence is y")
The real point here is that if we can express the relationship in arithmetic between one legitimate step of a process and the next, (like being able to multiply by a constant) then we can capture what it means to be a valid entire process (like exponentiation).
This ability to iterate things is exactly what gets us, in the jargon, the power of primitive recursion, and the property of valid-proof-ness is eminently primitive recursive --- this is the bulk by page-count of Gödel's proof, just building up the definition of "is a valid proof" syntactically from first principles --- and then you turn the self-reference crank and you get the incompleteness theorems.
So yeah. The very first part of the proof doesn't get as much press as the fancy self-reference bits, and it's still pretty cute.