Here is a potential architecture to connect all the WIZes in the rectangular array just shown. Other architectures are possible, but this is the one we will use now. Each horizontal row above represents a group of up to 256 WIZes. Above each row is the horizontal bus of a "parent" WIZ. This parent WIZ has up to 256 gateway registers, each connecting to a single "child" WIZ. The parent WIZ runs the OS software. An "A -> B" instruction where both A and B are siblings on the same parent bus merely requires that the OS copy from the gateway register of WIZ A to the gateway of WIZ B. One gateway on each parent WIZ is reserved to connect to the "grandparent" WIZ, shown running down the left side of the above diagram. An "A -> B" instruction where A and B are both "cousins" under different parent WIZes but under the same grandparent, can be executed by having the grandparent ask A's parent to execute a copy from the gateway of WIZ A to its own gateway to the grandparent's gateway that it connects to. Then the grandparent can reverse that and copy from that gateway to its own gateway to the parent of the cousin WIZ, which will then copy from its gateway to the gateway of the B WIZ. We can duplicate all of the above diagram up to 256 times and connect all 256 grandparent WIZes in the same manner to a great-grandparent WIZ. A single parent supports up to 256 WIZes. An execution of "A -> B" among these WIZes requires just one cycle of action by their parent. A single grandparent supports up to 256x256 = 65,536 WIZes. An execution of "A -> B" among any of these requires just two cycles of action by their parent and grandparent. A single great-grandparent supports up to 256x256x256 = about 1.7 million WIZes. An execution of "A -> B" between these requires three cycles of OS action. And finally, if Moore's law or other phenomenon occur, we could duplicate this one more time to get 256^4 = 2^32 = 4.2 billion WIZes on a single chip. An "A -> B" between any pair of WIZes on the chip would require four cycles of OS action. Thus, the closer two WIZes are, the faster their copy times, in a logarithmic step function. That is, from any WIZ, up to 255 other WIZes can be reached in one cycle. Up to 65 thousand WIZes can be reached in two cycles; up to 1.7 million WIZes could be reached in three cycles; and up to 4.2 billion WIZes can be reached in four cycles. The OS always tries to place code in WIZes as close as possible to other WIZes that uses it. This is the "fabric" which connects all the WIZes on a chip. Note that nothing in this chapter has created any new hardware circuits. The buses which interconnect WIZes are simply themselves just parent WIZes in a hierarchy of parent WIZes. We need no specialized "bus controller" circuits, and using WIZes for "bus control" gives an unprecedented level of control options, as will be seen. And also note: while these drawings are nice and square and regular, I by no means mean to represent the physical layout as being so orderly. Obviously WIZes of different sizes will not line up so neatly, and I haven't even show the backend logic on each register. So the above, and the many drawings like it here, are not meant to represent exact physical placements. As an aside, Charles Babbage suggested, almost 200 years ago, a circular layout! He was dealing with mechanical parts, but he may have had an idea there worthy of our consideration here. Imagine a portion of our WIZ chip layed out like a round pizza cut into many triangular slices, with a small hole cut out of the center. Each slice is a WIZ, with its gateway register at the narrow end (toward the hole), and running around the edge of that hole is the parent-WIZ's bus in a small circle connecting to all those gateways! This would produce a very short parent bus with tiny distances between WIZ gateways, with each WIZ's bus running radially down the center of the slice toward the "crust" end, like spokes on a wheel. And different sized WIZes could fill different width "slices". Interesting idea, eh?