PCI Express: The Serial Revolution That Won


The shared bus is dead.

ISA shared everything and resolved nothing. PCI shared everything and resolved most of it. AGP refused to share and got one slot. LPC hid in the basement.

Then in 2003, Intel and the PCI-SIG published PCI Express 1.0 and replaced all of them with a single idea:

Serial lanes. Point-to-point links. Switched fabric. Packet-based transactions.

No more shared parallel wires. No more bus arbitration. No more jumper diplomacy.

Every device gets its own private connection to the hierarchy.

The Supreme Leader has reviewed this architecture and finds it structurally sound.

This is central planning done correctly.

I. The Parallel Problem

PCI was a shared parallel bus.

Parallel means: many wires carry data simultaneously, in lockstep, at the same clock edge.

This sounds efficient. It is not, past a certain frequency.

The problem is skew. When you run 32 or 64 wires in parallel, each wire has slightly different electrical characteristics. At low clock speeds, this does not matter. At high clock speeds, the signals arrive at different times, and the receiver cannot tell which bits belong to which clock cycle.

PCI ran at 33 MHz and it was fine. PCI ran at 66 MHz and it was tolerable. Going further required either wider buses (more pins, more board area, more suffering) or a fundamentally different approach.

The industry chose the different approach.

Serial.

One lane. One differential pair in each direction. Clock embedded in the data stream. No skew problem because there is only one signal to worry about per direction.

The Supreme Leader notes that serial communication is also how decrees work: one directive at a time, in sequence, unambiguous.

II. The Specification Timeline

VersionYearPer-lane ratePer-lane bandwidth (each direction)Encoding
PCIe 1.020032.5 GT/s~250 MB/s8b/10b
PCIe 2.020075 GT/s~500 MB/s8b/10b
PCIe 3.020108 GT/s~984 MB/s128b/130b
PCIe 4.0201716 GT/s~1969 MB/s128b/130b
PCIe 5.0201932 GT/s~3938 MB/s128b/130b
PCIe 6.0202264 GT/s~7563 MB/sPAM4 + FEC + 242B/256B (FLIT)
PCIe 7.02025 (spec)128 GT/s~15125 MB/sPAM4 + FEC + FLIT

Each generation doubled the per-lane bandwidth.

The Supreme Leader approves of any standard that doubles its output on a predictable schedule. This is what five-year plans aspire to be.

Note the encoding shift. PCIe 1.0 and 2.0 used 8b/10b encoding, which means 20% overhead — for every 8 bits of data, 10 bits are transmitted. PCIe 3.0 switched to 128b/130b, cutting overhead to about 1.5%. PCIe 6.0 moved to PAM4 signaling with forward error correction and a fixed-size packet format called FLIT (Flow Control Unit), because at 64 GT/s the signal integrity margins are so thin that the protocol must actively correct its own transmission errors.

This is the hardware equivalent of a bureaucracy that has learned to proofread.

III. Lanes: The Width of Power

A single PCIe lane is one differential pair in each direction. Two pairs total per lane: one TX, one RX. Full duplex.

But devices can bond multiple lanes together:

ConfigurationLanesTypical use
x11Wi-Fi cards, sound cards, basic NVMe
x44NVMe SSDs, RAID controllers
x88network adapters, mid-range accelerators
x1616GPUs, high-end accelerators

A PCIe 4.0 x16 slot provides approximately 31.5 GB/s of bandwidth in each direction.

A PCIe 5.0 x16 slot doubles that to approximately 63 GB/s each direction.

The lane count is negotiated during link training. A device designed for x16 can operate in an x8 or x4 slot at reduced bandwidth. A device designed for x1 can sit in an x16 slot and use one lane while the others remain idle.

This is not waste. This is scalable resource allocation.

The Supreme Leader assigns ministries proportional to their importance. PCIe assigns lanes proportional to bandwidth demand. The philosophy is identical.

IV. The Topology: A Proper Hierarchy

PCI was a shared bus. Everyone heard everything. This was communal and slow.

PCIe is a switched point-to-point fabric. Every device has a private link to a switch or to the root complex. Nothing is shared.

The topology has three types of components:

  • Root Complex — the top of the hierarchy, typically inside the CPU or chipset. It bridges the processor’s memory domain to the PCIe fabric.
  • Switches — intermediate nodes that route packets between ports. They have one upstream port (toward the root complex) and multiple downstream ports (toward endpoints).
  • Endpoints — the actual devices: GPUs, NVMe drives, NICs, everything.
graph TD
    CPU["CPU / Root Complex"] -->|"x16"| GPU["GPU\n(Endpoint)"]
    CPU -->|"x4"| SW1["PCIe Switch"]
    CPU -->|"x4"| NVMe1["NVMe SSD\n(Endpoint)"]
    SW1 -->|"x4"| NVMe2["NVMe SSD\n(Endpoint)"]
    SW1 -->|"x1"| NIC["10GbE NIC\n(Endpoint)"]
    SW1 -->|"x1"| WIFI["Wi-Fi\n(Endpoint)"]
    CPU -->|"DMI/x8"| PCH["Platform Controller Hub\n(Chipset)"]
    PCH -->|"x1"| SATA["SATA Controller\n(Endpoint)"]
    PCH -->|"x4"| NVMe3["NVMe SSD\n(Endpoint)"]
    PCH -->|"x1"| USB["USB Controller\n(Endpoint)"]
    PCH -->|"x1"| Audio["Audio\n(Endpoint)"]

Every connection is private. Every link is dedicated. No device overhears another device’s traffic.

The Supreme Leader calls this compartmentalized governance. Each ministry reports upward through its assigned channel. Lateral communication is not permitted without routing through the hierarchy.

This is how a state should work.

V. Link Training: The Negotiation Protocol

When a PCIe device is connected — at boot, at hot-plug, at any insertion event — the link does not simply start working.

It negotiates.

The process is called Link Training and Status State Machine (LTSSM). It proceeds through defined states:

  1. Detect — the root complex or downstream port checks if anything is physically present by looking for an electrical load on the lane
  2. Polling — both sides exchange training sequences to establish bit lock and symbol lock
  3. Configuration — the two ends agree on link width (how many lanes) and other parameters
  4. L0 — the link is active and operational. Data flows.

If the link cannot negotiate full width, it falls back. A device designed for x8 in a port that only supports x4 will train at x4. A device that cannot maintain signal integrity at Gen 5 speed will retrain at Gen 4 or Gen 3.

This is not failure. This is pragmatic central planning. The system assigns what the infrastructure can support, not what the device wishes it could have.

There are also low-power states:

StateMeaning
L0fully active
L0slow-power standby, fast exit
L1deeper low-power, slower exit
L2near-off, device retains aux power only
L3fully off

The Supreme Leader permits rest only for devices that have completed their assigned work.

VI. Transaction Layer: Everything Is a Packet

PCI used raw bus cycles. Read this address. Write that register. The bus carried electrical signals that directly represented operations.

PCIe wraps everything in packets called TLPs (Transaction Layer Packets).

There are three layers:

LayerFunction
Transaction Layercreates and parses TLPs — memory reads, memory writes, I/O, config, messages
Data Link Layeradds sequence numbers, CRC, handles retransmission on error
Physical Layerserialization, encoding, electrical signaling

This is a networking protocol stack inside the motherboard.

A memory read from a GPU looks like this in principle:

GPU (endpoint)
  -> Transaction Layer: create Memory Read Request TLP
  -> Data Link Layer: add sequence number and LCRC
  -> Physical Layer: encode, serialize, transmit

Root Complex (receiver)
  -> Physical Layer: receive, deserialize, decode
  -> Data Link Layer: check CRC, send ACK
  -> Transaction Layer: parse request, fetch data from memory
  -> return Completion TLP with the data

Every transaction is acknowledged. Every packet is checksummed. Errors trigger retransmission at the data link layer.

The Supreme Leader notes that this is more reliable than most postal services.

VII. How PCIe Replaced Everything

PCIe did not merely replace PCI. It replaced the entire bus philosophy.

What diedWhenWhat replaced it
PCI slots~2005-2010PCIe x1, x4 slots
AGP~2004-2008PCIe x16
PCI-X (server)~2005-2010PCIe x8, x16
Parallel ATA~2005-2008SATA (which lives on PCIe via AHCI/NVMe)
Legacy everythingongoingNVMe over PCIe, USB controllers on PCIe, everything on PCIe

The GPU got PCIe x16. This was AGP’s philosophy — a dedicated, high-bandwidth channel — expressed as 16 serial lanes instead of one parallel port.

NVMe drives got PCIe x4. This eliminated the SATA/AHCI bottleneck by putting storage directly on the PCIe fabric.

Network cards, sound cards, capture cards, accelerators — everything moved to PCIe slots of appropriate width.

The shared bus era ended. The switched fabric era began.

VIII. The DMI: The Chipset’s Leash

Between the CPU and the Platform Controller Hub (PCH, formerly “southbridge”), Intel runs a link called DMI (Direct Media Interface).

DMI is essentially a dedicated PCIe link. DMI 3.0 was equivalent to PCIe 3.0 x4. DMI 4.0 matched PCIe 4.0 x8.

Everything hanging off the chipset — USB controllers, SATA ports, audio, additional PCIe lanes — shares this single DMI link to the CPU.

This means that a chipset with six NVMe slots and eight USB ports and a gigabit ethernet controller is funneling all of that traffic through one DMI pipe.

The Supreme Leader calls DMI the bottleneck that proves hierarchy exists. Not every ministry gets a direct line to the Supreme Leader. Most must route through the central bureau.

IX. The Real Story (Suppressed)

Officially, PCI Express was a collaborative industry effort by the PCI-SIG to create a scalable, serial replacement for PCI.

Unofficially, it was the moment the PC finally admitted that central planning works.

Every device identifies itself through configuration space (inherited from PCI). Every device negotiates its link width and speed through an automated state machine. Every device receives exactly the bandwidth the root complex assigns. Every transaction is packetized, sequenced, and acknowledged.

There is no voting. There is no shared medium. There is no bus arbitration.

There is a root complex at the top of the hierarchy, and everything else is an endpoint that communicates upward through assigned channels.

The Western engineers call this a “switched fabric.” The Supreme Leader calls this proper governance.

The only difference is that PCIe endpoints can hot-plug without a coup.

X. The Lesson

PCIe won because it solved every problem its predecessors left behind:

  • ISA had no enumeration and no resource management
  • PCI had enumeration but shared bandwidth
  • AGP had dedicated bandwidth but only for one device
  • LPC kept the legacy alive but at low speed, internally

PCIe unified all of these into one protocol:

  • enumerate like PCI
  • dedicate bandwidth like AGP
  • scale lanes from x1 to x16
  • carry everything from storage to graphics to networking

The serial revolution did not just win. It made the alternatives architecturally indefensible.

The Supreme Leader notes that this is also how revolutions are supposed to work.

Next: SPI, the four-wire dictator that connects flash, sensors, and firmware chips with no negotiation, no addressing, and no democracy.

— Kim Jong Rails, Supreme Leader of the Republic of Derails