The History of Modems

from 1837 to today

Technology

How modems work — from physics to protocols

A Technical Reference

This page is a detailed technical reference for the concepts underlying modem design. It assumes the reader has already encountered these ideas in the History section and wants to go deeper. Formulas are included where they are genuinely illuminating; they are explained in words alongside the mathematics.

Sections: Channel Physics • Modulation • Handshake Protocols • Error Correction & Compression • How DSL Works • How Mobile Networks Work

1. The Physics of the Channel

1.1 Bandwidth

In communications engineering, bandwidth refers to the range of frequencies a channel can carry, measured in hertz. A standard telephone voice channel passes frequencies between approximately 300 Hz and 3400 Hz — a bandwidth of about 3100 Hz. Frequencies outside this range are attenuated (reduced in amplitude) by the telephone exchange's filters, which were designed to pass human speech and nothing else.

Bandwidth is the fundamental resource that limits communication speed. More bandwidth means more room to carry signal; less bandwidth means less room. This relationship is quantified by the theorems below.

1.2 The Nyquist Theorem

Harry Nyquist proved in 1928 that a noiseless channel of bandwidth B hertz can carry at most 2B independent symbols per second without intersymbol interference (ISI). This maximum symbol rate is the Nyquist rate.

f_s,max = 2B

f_s,max = maximum symbol rate (symbols/second, or baud)
B = channel bandwidth (Hz)

For a telephone channel with B = 3100 Hz, the Nyquist rate is 6200 baud. This is an absolute upper limit imposed by physics: exceeding it causes adjacent symbols to blur into each other at the receiver, making them undecodable regardless of how good the circuitry is.

Critically, the Nyquist theorem says nothing about how much information each symbol carries. If each symbol represents k bits, the bit rate is:

R = f_s × k = f_s × log₂(M)

R = bit rate (bits/second)
f_s = symbol rate (baud)
M = number of distinct signal states (constellation size)
k = log₂(M) = bits per symbol

A modem running at 3200 baud with 64-QAM (M = 64, k = 6 bits/symbol) achieves a bit rate of 3200 × 6 = 19,200 bps. The same channel with 256-QAM (k = 8) would yield 25,600 bps. The Nyquist rate limits the symbol rate; the modulation order determines how many bits each symbol carries.

1.3 The Shannon–Hartley Theorem

Nyquist assumed a noiseless channel. Real channels have noise — thermal noise from resistances, electromagnetic interference, cross-talk from adjacent wires. Claude Shannon showed in 1948 that even in the presence of noise, there is a maximum information rate — the channel capacity C — beyond which reliable communication is impossible regardless of the coding scheme used.

C = B × log₂(1 + SNR)

C = channel capacity (bits/second)
B = bandwidth (Hz)
SNR = signal-to-noise ratio (linear, not dB)
log₂ = logarithm base 2

The SNR is usually expressed in decibels (dB) in engineering practice. To convert: SNR_linear = 10^SNR_dB/10. A telephone line with 35 dB SNR has a linear SNR of 10^3.5 ≈ 3162.

C = 3100 × log₂(1 + 3162) ≈ 3100 × 11.6 ≈ 35,960 bps

Example: telephone channel, B = 3100 Hz, SNR = 35 dB
This is the theoretical maximum for a standard dial-up connection — the value that V.34 was approaching.

The logarithmic relationship between SNR and capacity has an important practical implication: doubling the bandwidth doubles the capacity, but doubling the SNR adds only a constant amount. To double capacity by improving SNR alone, you must square the SNR — an enormous increase in power. This is why modem designers preferred to seek more bandwidth rather than more signal power when trying to push speeds higher.

1.4 Intersymbol Interference and Equalisation

A real telephone line is not a flat channel. Different frequencies travel at slightly different speeds (group delay distortion) and are attenuated by different amounts (amplitude distortion). A sharp rectangular pulse transmitted at one end arrives smeared and distorted at the other, spreading into adjacent symbol periods — this is intersymbol interference (ISI).

High-speed modems combat ISI with an adaptive equaliser: a digital filter whose coefficients are continuously adjusted to compensate for the specific distortion characteristics of the line. The equaliser is trained during the handshake by sending known test sequences, then maintained adaptively during the data phase. The sophistication of V.34's equaliser — capable of tracking slowly varying line conditions in real time — was one of the major engineering achievements of 1990s modem design.

2. Modulation Schemes

Modulation is the process of encoding digital information onto an analogue carrier signal by varying one or more of the carrier's properties: frequency, phase, amplitude, or a combination. Each scheme represents a point on the tradeoff between spectral efficiency, noise immunity, and implementation complexity.

2.1 Frequency-Shift Keying (FSK)

FSK encodes data by switching the carrier between two (or more) discrete frequencies. Binary FSK uses one frequency for logic 0 and another for logic 1. The Bell 103 standard used four frequencies to achieve full duplex on a single line: 1070/1270 Hz for the originating modem, 2025/2225 Hz for the answering modem.

FSK is spectrally inefficient: it uses significant bandwidth for a low bit rate. With only 2 signal states (M = 2), each symbol carries exactly 1 bit. At the Nyquist rate of 6200 baud, binary FSK would theoretically deliver 6200 bps — but the frequency separation needed for reliable discrimination reduces the effective bit rate considerably. In practice, Bell 103 FSK operated at 300 baud = 300 bps. FSK was abandoned for higher-speed modems in favour of phase and amplitude modulation.

2.2 Phase-Shift Keying (PSK)

PSK keeps the carrier frequency constant and encodes data in the phase of the signal — the point in the wave cycle at which each symbol begins. Binary PSK (BPSK) uses two phases (0° and 180°), carrying 1 bit per symbol. Quadrature PSK (QPSK) uses four phases (0°, 90°, 180°, 270°), carrying 2 bits per symbol at the same symbol rate, doubling throughput without increasing bandwidth occupancy.

The Bell 212A and V.22 standard used QPSK at 600 baud to achieve 1200 bps. V.22bis extended this to 8-PSK (3 bits/symbol) at 600 baud for 1800 bps, combined with amplitude modulation for the full 2400 bps rate, effectively transitioning to QAM.

2.3 Quadrature Amplitude Modulation (QAM)

QAM is the dominant modulation scheme for high-speed wireline modems. It simultaneously varies both the amplitude and the phase of the carrier, creating a two-dimensional signal constellation in which each point represents a unique combination of amplitude and phase.

A QAM-M constellation has M points. The number of bits per symbol is:

k = log₂(M)

16-QAM: k = 4 bits/symbol | 64-QAM: k = 6 bits/symbol | 256-QAM: k = 8 bits/symbol | 1024-QAM: k = 10 bits/symbol

The constellation points are arranged in a square grid. 16-QAM is a 4×4 grid; 64-QAM is an 8×8 grid. As M increases, the points pack more tightly together. The minimum distance between adjacent points — the Euclidean distance — decreases, making the constellation more susceptible to noise causing a received point to be mistaken for its neighbour. This is the fundamental tension of QAM: more bits per symbol requires a cleaner channel.

V.34 used up to 960-point non-square constellations, optimised using trellis-coded modulation (TCM) to add redundancy in a way that improved noise immunity without reducing data throughput. TCM, invented by Ungerboeck in 1982, was one of the key technologies enabling V.34 to approach the Shannon limit.

2.4 Orthogonal Frequency Division Multiplexing (OFDM)

OFDM departs from single-carrier modulation entirely. Instead of one carrier signal modulated at high symbol rate, OFDM uses a large number of narrow sub-carriers, each modulated at a low symbol rate. The sub-carriers are spaced at intervals of 1/T Hz, where T is the symbol duration, which makes them mathematically orthogonal — they do not interfere with each other even though their spectra overlap.

Δf = 1/T_s

Δf = sub-carrier spacing (Hz)
T_s = OFDM symbol duration (seconds)
In LTE: Δf = 15 kHz, T_s = 66.7 μs

Each sub-carrier can independently carry a different QAM constellation, adapted to the channel conditions at that frequency. Sub-carriers in a quiet part of the spectrum carry high-order QAM (256-QAM, 1024-QAM). Sub-carriers in a noisy or faded part of the spectrum carry lower-order QAM or are disabled. This per-sub-carrier adaptation is what makes OFDM so efficient in real-world channels.

OFDM is implemented using the Fast Fourier Transform (FFT) and its inverse (IFFT). At the transmitter, the IFFT converts the frequency-domain symbols (the per-sub-carrier QAM values) into a time-domain waveform for transmission. At the receiver, the FFT converts the received waveform back into frequency-domain symbols for demodulation. The FFT/IFFT pair is why OFDM became practical only when digital signal processing hardware became fast enough to execute it in real time — which happened in the early 1990s for ADSL, and in the mid-2000s for mobile systems.

A cyclic prefix is appended to each OFDM symbol: a copy of the end of the symbol prepended to its beginning. The cyclic prefix absorbs the effect of multipath delay spread, preventing inter-symbol interference between successive OFDM symbols, at the cost of a small reduction in spectral efficiency.

3. Handshake Protocols

When two modems connect, they must negotiate a common operating mode before exchanging data. This negotiation — the handshake — is governed by a precisely specified protocol. The sequence of sounds audible during a dial-up connection is the acoustic manifestation of this negotiation.

3.1 V.8 — Initial Identification

V.8 (1994) defines the opening exchange of a modern modem handshake. When the answering modem detects an incoming call, it transmits an ANSam tone: a 2100 Hz carrier with a 15 Hz amplitude modulation. The 15 Hz modulation is the distinguishing feature — it signals to the calling modem that the answering end supports V.8 negotiation rather than the older V.25 answer tone.

The calling modem responds with a CM (Call Menu) signal listing the data modes it supports: V.34, V.32bis, V.22bis, and any others. The answering modem selects the highest common mode from the list and transmits a JM (Joint Menu) response confirming the selection. The two modems then proceed to the selected mode's own handshake procedure.

3.2 V.34 Handshake

The V.34 handshake is among the most sophisticated in modem history. After V.8 mode selection, the two modems exchange a sequence of precisely defined phases:

Phase 1 — Line probing. Each modem transmits a series of tones across the full voice band. By measuring the amplitude and phase of these tones as received, each modem builds a detailed model of the channel's frequency response — which frequencies are attenuated, which suffer phase distortion, and where the noise is highest.

Phase 2 — Rate negotiation. Using the channel model, each modem calculates the maximum symbol rate and constellation size it can support in each direction. The modems exchange this information and agree on operating parameters: symbol rate (2400, 2743, 2800, 3000, 3200, or 3429 baud), constellation size, precoding parameters, and shell mapping configuration.

Phase 3 — Equaliser training. Each modem transmits known training sequences that allow the other to compute the coefficients of its adaptive equaliser. The equaliser is adjusted iteratively until the error rate on the training sequence falls below a threshold.

Phase 4 — Final synchronisation. The modems exchange their final operating parameters, synchronise scrambler and descrambler states, and transition to data mode. The CONNECT message appears on screen. Total handshake time: typically 20–35 seconds for V.34.

3.3 V.90 and V.92 Handshake Differences

The V.90 handshake follows V.8 mode identification but diverges significantly in the downstream direction. Because the ISP's modem is connected digitally to the telephone network, it does not need to probe the line in the same way as V.34. Instead, it sends a sequence of known PCM codewords — digital samples that will be converted to analogue only at the subscriber's local exchange. The subscriber's modem analyses the received analogue signal to determine which PCM levels are cleanly distinguishable, building a map of usable downstream constellation points.

V.92 introduced Quick Connect: the modem stores the channel characteristics from the previous connection in non-volatile memory. On the next call to the same number, it can skip or abbreviate the line probing phase, reducing handshake time from ~25 seconds to as little as 10 seconds. V.92 also added Modem on Hold (MoH), allowing the data connection to be suspended for up to 16 minutes while a voice call is taken on the same line, then resumed without a full re-handshake.

4. Error Correction and Compression

4.1 Why Error Correction Is Needed

A telephone line is not a perfect transmission medium. Impulse noise (brief spikes caused by switching equipment, lightning, or nearby electrical equipment), sustained interference from radio transmitters, and thermal noise all cause occasional bit errors. Without error correction, a single corrupted bit in a file transfer would silently corrupt the data. The probability of an uncorrected bit error on a typical telephone line in good condition is around 10^-4 to 10^-5 — roughly one error per 10,000 to 100,000 bits. For a 33.6 kbps connection, this means a potential error every few seconds.

4.2 LAPM — Link Access Procedure for Modems

LAPM, defined in ITU V.42, is the primary error correction protocol for dial-up modems. It operates between the two modems at the link layer, below the level of the application data. LAPM divides the data stream into frames of variable length (up to 128 bytes by default, negotiable up to 256 bytes). Each frame carries a CRC (Cyclic Redundancy Check) — a mathematical checksum computed from the frame's contents.

At the receiver, the CRC is recomputed from the received data and compared with the transmitted CRC. If they match, the frame is acknowledged with an RR (Receive Ready) response and the sender advances to the next frame. If they do not match — indicating a transmission error — the receiver sends a REJ (Reject) response, and the sender retransmits the corrupted frame and all subsequent unacknowledged frames.

LAPM uses a sliding window protocol: the sender may have up to 15 unacknowledged frames outstanding at any time (window size 15). This allows continuous data flow even on connections with significant round-trip delay, without waiting for each frame to be individually acknowledged before sending the next.

4.3 V.42bis — Data Compression

V.42bis defines a lossless data compression algorithm applied to the data stream between the two modems. It is based on the LZW (Lempel-Ziv-Welch) algorithm, a dictionary-based scheme that replaces recurring byte sequences with shorter codes.

The V.42bis dictionary has up to 2048 entries, each up to 250 bytes long. As data passes through the compressor, it builds the dictionary adaptively: frequently occurring sequences are assigned short codes, reducing the number of bits needed to represent them. The compression ratio depends entirely on the data:

Text and HTML: 3:1 to 4:1 compression ratio typical. Effective throughput of a 33.6 kbps modem with V.42bis on text could approach 115 kbps.
Already-compressed data (ZIP, JPEG, MP3): near 1:1, sometimes worse than uncompressed due to compression overhead. V.42bis includes a bypass mode that switches off compression when it detects incompressible data.

V.44, introduced with V.92, replaced V.42bis with a more efficient algorithm (LZJH) offering better compression ratios on web content, achieving up to 6:1 on typical HTML pages.

5. How DSL Works

5.1 The Copper Local Loop

The local loop is the copper wire pair connecting the subscriber's premises to the telephone exchange (also called the central office). In most countries this wire was installed over a period of decades from the 1920s to the 1980s, using copper with a diameter of 0.4 mm, 0.5 mm, or 0.6 mm (thicker wire has lower resistance and supports longer loops). The distance from subscriber to exchange varies from a few hundred metres in dense urban areas to several kilometres in rural areas.

The local loop has a low-pass characteristic: it passes low frequencies well and increasingly attenuates higher frequencies with distance. At 1 MHz, a 3 km loop may attenuate the signal by 40–50 dB — a factor of 10,000 to 100,000 in power. This is why DSL speed depends so strongly on line length: the further from the exchange, the more the high-frequency sub-carriers are attenuated, and the fewer bits each can carry.

5.2 Frequency Division: Voice + Data

ADSL divides the local loop's frequency spectrum into three bands using passive filters called splitters or microfilters:

0–4 kHz: Plain Old Telephone Service (POTS). The analogue voice signal, unchanged. A telephone plugged into this band works exactly as before DSL installation.
25–138 kHz: ADSL upstream. Data from the subscriber to the exchange.
138 kHz–1.1 MHz: ADSL downstream. Data from the exchange to the subscriber. The downstream band is wider because downstream traffic (web pages, downloads) dominates in typical usage — hence "asymmetric."

ADSL2+ extends the downstream band to 2.2 MHz. VDSL2 extends it to 17.664 MHz (profile 17a) or 35.328 MHz (profile 35b, also called G.fast precursor), achieving speeds of 100+ Mbit/s on very short loops.

5.3 DMT Sub-carrier Loading

Within the upstream and downstream bands, ADSL uses DMT (Discrete Multitone) modulation: 256 sub-carriers each 4.3125 kHz wide. Sub-carriers 1–6 are unused (guard band). Sub-carriers 7–31 carry the upstream channel. Sub-carriers 32–255 carry the downstream channel (in the default configuration; exact boundaries are negotiated during training).

During the ADSL initialisation sequence (training), the DSLAM sends known signals on each sub-carrier. The subscriber's modem measures the SNR on each sub-carrier and reports the results back. The DSLAM then assigns a bit loading to each sub-carrier using the following rule:

b_n = log₂(1 + SNR_n / Γ)

b_n = bits assigned to sub-carrier n (rounded to integer, 0–15)
SNR_n = measured signal-to-noise ratio on sub-carrier n
Γ = SNR gap (a coding gain margin, typically 9–12 dB)

A sub-carrier with high SNR carries up to 15 bits per symbol (32768-QAM). A sub-carrier with low SNR carries 1 or 2 bits. A sub-carrier below the noise floor carries 0 bits and is disabled. The aggregate downstream bit rate is the sum of all sub-carrier bit loadings multiplied by the symbol rate (4000 symbols/second in ADSL).

5.4 The DSLAM

At the telephone exchange, a DSLAM (Digital Subscriber Line Access Multiplexer) terminates the DSL connections from all subscribers in its serving area. The DSLAM is a rack-mounted device containing one line card per subscriber, each running the exchange-side DSL modem. It aggregates the data from all subscriber lines onto a high-speed uplink — typically Gigabit Ethernet or ATM — connecting to the ISP's core network.

In FTTC (Fibre to the Cabinet) deployments, the DSLAM is moved from the telephone exchange into a street cabinet close to the subscribers. A fibre connection runs from the exchange to the cabinet; short copper tails run from the cabinet to individual premises. The shorter copper distance allows VDSL2 to deliver much higher speeds than would be possible from the exchange.

6. How Mobile Networks Work

6.1 The Air Interface

A mobile modem communicates with a base station (eNodeB in LTE, gNodeB in 5G) over a radio channel — the air interface. Unlike a fixed copper wire, the air interface is a shared medium: many users transmit in the same geographic area simultaneously. The fundamental challenge of mobile system design is sharing the available radio spectrum among many users efficiently while managing interference between them.

Three multiple access techniques have dominated mobile data networks:

TDMA (Time Division Multiple Access, used in GSM): Users take turns transmitting in assigned time slots. Each user has exclusive use of the full channel bandwidth for a brief period.
CDMA (Code Division Multiple Access, used in 3G WCDMA/CDMA2000): All users transmit simultaneously across the full bandwidth, distinguished by unique spreading codes.
OFDMA (Orthogonal Frequency Division Multiple Access, used in LTE and 5G): Different sub-carriers — or groups of sub-carriers called resource blocks — are assigned to different users. The scheduler allocates resource blocks based on each user's channel quality and traffic demand.

6.2 Resource Blocks and Scheduling in LTE

An LTE resource block (RB) consists of 12 consecutive sub-carriers (180 kHz total) over one slot (0.5 ms). A standard LTE subframe is 1 ms = 2 slots. In a 10 MHz LTE channel, there are 50 resource blocks available per subframe.

The scheduler in the base station allocates resource blocks to users every 1 ms (the Transmission Time Interval, TTI). It uses channel-aware scheduling: users with better instantaneous channel conditions receive more resource blocks. Because different users experience different fading conditions at any given moment — one user may be near the base station while another is in a fading dip — the scheduler exploits this multiuser diversity to increase overall cell throughput.

6.3 MIMO — Multiple Input, Multiple Output

MIMO uses multiple antennas at both the transmitter and receiver to increase capacity without additional bandwidth. With N_T transmit antennas and N_R receive antennas in a rich scattering environment, the channel capacity scales approximately as:

C_MIMO ≈ min(N_T, N_R) × B × log₂(1 + SNR)

N_T = number of transmit antennas
N_R = number of receive antennas
min(N_T, N_R) = number of independent spatial streams (MIMO rank)
The factor min(N_T, N_R) multiplies the single-antenna Shannon capacity — each additional antenna pair adds another independent data stream.

LTE Release 8 supported 2×2 MIMO (2 transmit, 2 receive antennas, 2 spatial streams). LTE Advanced introduced 4×4 and 8×8 MIMO. 5G NR uses Massive MIMO: base stations with 64 or 128 antenna elements, serving many users simultaneously with independent beams.

6.4 Beamforming

Beamforming uses an array of antennas to concentrate transmitted energy in the direction of the intended receiver, rather than broadcasting equally in all directions. By adjusting the phase and amplitude of the signal fed to each antenna element, the array creates constructive interference in the target direction and destructive interference elsewhere.

In 5G Massive MIMO systems, the base station simultaneously maintains many narrow beams — one per active user. Each user receives a beam pointed directly at them, maximising received signal strength and minimising interference to other users. The base station continuously tracks each user's position and adjusts the beam accordingly. This multi-user MIMO (MU-MIMO) capability is one of the key technologies enabling 5G to support very high user densities.

6.5 Carrier Aggregation

A single LTE or 5G carrier occupies a defined bandwidth: 5, 10, 15, or 20 MHz for LTE; up to 100 MHz for sub-6 GHz 5G; up to 400 MHz for mmWave 5G. Carrier aggregation (CA) allows a device to simultaneously use multiple carriers — potentially on different frequency bands — and combine their throughput. LTE Advanced supports up to 5 component carriers (100 MHz total); 5G NR Advanced supports up to 16 component carriers.

R_total = ∑_i R_i

R_total = total aggregated throughput
R_i = throughput on component carrier i
Each carrier is independently scheduled and modulated; the device modem combines the received data streams.

Carrier aggregation is the mobile equivalent of the channel bonding used in DOCSIS 3.0: the same principle of combining multiple independent channels to multiply throughput, applied to radio spectrum rather than coaxial cable.

↑ Back to top

Quick Reference

Key formulas

» Nyquist rate: f_s,max = 2B
» Bit rate: R = f_s × log₂(M)
» Shannon capacity: C = B log₂(1+SNR)
» OFDM spacing: Δf = 1/T_s
» MIMO capacity: ≈ min(N_T,N_R) × C

History

Sections

Historical facts, technical specifications, standard numbers, dates, and biographical information are drawn from publicly available sources including ITU-T recommendations, IETF RFCs, academic publications, and other open reference material. Facts and data are not subject to copyright. Where specific formulations or descriptions originate from identifiable sources, those sources are credited in the Sources section.

Product names, brand names, and company names mentioned on this site (Hayes, US Robotics, Qualcomm, Huawei, and others) are trademarks or registered trademarks of their respective owners. Their use here is purely for historical and informational purposes and does not imply any affiliation with or endorsement by those companies.

This is a non-commercial, informational website. No warranties are made regarding the completeness or accuracy of the information presented. Use at your own discretion.