S T U S T U | | | | | | | | | | | / \ / | | \ \ | | \ \ | | \ \ | | / \ / \ | \ / | | \ | \ | | \ = \ | | \ \ | | / \ / \ / \ / | | \ \ | | \ \ | | \ \ | | / \ / \ | | / \ U T S U T S
(This is also called the "third Reidemeister move" in knot theory)
The punchline is that this is all you need for four or more monads composed in a row too! Just individual swaps for each pair and Yang-Baxter equations for each triplet. Crazy.
I think I got the idea now. It's quite beautiful. In order to show associativity of the composite monad, you just need to fudge the lines around in a string diagram. The pentagon axioms for distributivity itself let you push a string across a multiplication 2-cell, and Yang-Baxter lets you push strings across crossings of other pairs of monads.
All in all it has a very tricategorical feel to it, since the distributivity laws give you "just enough" 3-dimensionality to be able to compose monads. Strangely, if you were talking monads that arise as monoid objects in a braided monoidal category, (a twice-degenerate tricategory, remember) you'd have so much wiggle room as to be able to compose monads in a commutative way, since you have a clockwise and counterclockwise twist.