ISP Column | ReadFlow.Org

The ISP Column

A column on things Internet Other Formats:

A Transport Protocol’s View of Starlink

May 2024

Geoff Huston

Digital communications systems always represent a collection of design trade-offs. Maximising one characteristic of a system may impair others, and various communications services may chose to optimise different performance parameters based on the intersection these design decisions with the physical characteristics of the communications medium. In this article I’ll look at the Starlink service [1], and how TCP, the workhorse transport protocol of the Internet, interacts with the characteristics of the Starlink service.

Figure 1 – Orbiting Bodies from Newton’s Principia Mathematica

To start, its useful to recall a small item of Newtonian physics from 1687 [2]. On the surface of the earth if you fire a projectile horizontally it will fall back to earth due to the combination of the effects of the friction from the earth’s atmosphere and the earth’s gravitational force. However, assuming that the earth has no friction-inducing atmosphere, then if you fire this projectile horizontally fast enough it will not return to the earth, but head into space. If you are high enough to clear various mountains that may be in the way, there is, however, a critical velocity where the projectile will be captured by the earth’s gravity and neither fall to ground nor head out into space (Figure 1).

This orbital velocity at the surface of the earth is some 40,320 km/sec. The orbital velocity decreases with altitude, and at an altitude of 35,786 km above the surface of the earth the orbital velocity of the projectile relative to a point on the surface of the spinning earth is 0 km/sec. This is the altitude of a geosynchronous equatorial orbit, where the orbiting object appears to sit at a fixed location in the sky.

Geosynchronous Services

Geosynchronous satellites were the favoured approach for the first wave of satellite-based communications services. Each satellite could “cover” an entire hemisphere. If the satellite was on the equatorial plane, then it was at a fixed location in the sky with respect to the earth, allowing the use of large antenna. These antennas were able to operate at a low signal to noise ratio, allowing the signal modulation to use an encoding with a high density of discrete phase amplitude points, which lifted the capacity of the service.

All this must be offset against the less favourable aspects of a geosynchronous service. Consideration of crosstalk interference between adjacent satellites in geosynchronous orbits using the same radio frequencies resulted in international agreements that require a 2° spacing for geosynchronous satellites that use the same frequency, so this orbital slot is a limited resource that is limited to just 180 spacecraft if they all use K band (18 – 27Ghz) radio systems. At any point on the earth there is an upper bound to the signal capacity that can be received (and sent) using geosynchronous services.

Depending on whether the observer is on the equator directly beneath the satellite, or further away from this point, a geosynchronous orbit satellite is between 35,760 and 42,664 km away, so a signal round trip time to the geosynchronous satellite and back will between 238ms and 284ms in terms of signal propagation time. In IP terms, that’s a round trip time of between 477 and 569ms and to this needs to be added the signal encoding and decoding times. In addition, there is a delay for the signal to be passed between the terrestrial end points and the satellite earth stations. In practice, a round trip time of around two thirds of a second (660ms) for Internet paths that include a geosynchronous satellite service is a common experience.

This extended latency means that the endpoints need to use large buffers to hold a copy of all the unacknowledged data, as is required by the TCP protocol. TCP is a feedback-governed governed protocol, using ACK pacing. The longer the round time the greater the lag in feedback, and the slower the response from end points to congestion or to available capacity. The congestion considerations lead to the common use of large buffers in the systems driving the satellite circuits which can further exacerbate congestion-induced instability. In geosynchronous service contexts, individual TCP sessions are more prone to instability and experience longer recovery times following low events as compared to their terrestrial counterparts, when such counterparts exist [3].

Low Earth Orbit Systems

A potential response to the drawbacks of geosynchronous satellites is to bring the satellite closer to earth. This approach has several benefits. The earth’s spinning iron core generates a magnetic field, which traps energetic charged solar particles and redirects them through what is called the “Van Allen Belt”, thus deflecting solar radiation. Not only does this allow the earth to retain its atmosphere, but it also protects the electronics of orbiting satellites that use an orbital altitude below 2,000 km or so from the worst effects of solar radiation (such as the recent solar storms). It’s also far cheaper to launch satellites into a low earth orbit, and these days SpaceX is able to do so using re-usable rocket boosters.

The reduced distance between the earth and the orbiting satellite reduces the latency in sending a signal to the satellite and back which can improve the efficiency of the end-to-end packet transport protocols that include such satellite circuits.

This group of orbital altitudes, from some 160km to 2,000km, are collectively termed Low Earth Orbit (or LEOs) [4]. The objective here is to keep the satellite’s orbit high enough to prevent it slowing down by grazing the upper parts of the earth’s ionosphere, but not so high that it loses the radiation protection afforded by the Inner Van Allen belt [5]. At a height of 550km, the minimum signal propagation delay to reach the satellite and back from the surface of the earth is just 3.7ms.

But all of this comes with some different issues. At a height of 550km an orbiting satellite can only be seen by a small part of the earth. If the minimum effective elevation to establish communication is 25 degrees of elevation above the horizon, then the satellite’s footprint is a circle with a radius of 940km, or a circle of area 2M km2. (Lower angles of elevation are possible but the longer the path segment through the earth’s atmosphere decreases the signal to noise ratio, compromising the available signal capacity as well as increasing the total path delay.) To provide continuous service to any point on the earth’s surface (510.1M km2) the minimum number of orbiting satellites is 500. This use of a constellation of satellites implies that a LEO satellite-based service is not a simple case of sending a signal to a fixed point in the sky and having that single satellite mirror that signal down to some other earth location. A continuous LEO satellite service needs to hop across a continual sequence of satellites as they pass overhead and switch the virtual circuit path across to successive satellites as they come into view of both the end user and the user’s designated earth station.

At the altitude of 550 km, an orbiting satellite is moving with a relative speed of 27,000 km/hour relative to a point on the earth’s surface and passes across the sky from horizon to horizon in under 5 minutes. This has some implications for the design of the radio component of the service. If the satellite constellation is large enough, then the satellites are close enough to each other that there is no need to use larger dish antennae that require some mechanised steering arrangement that tracks individual satellites in their path across the sky, but this itself it not without its downside. An individual signal carrier might be initially acquired as a weak signal (in relative terms), increases in strength as the satellite’s radio transponder and the earth antenna move into alignment, and weakens again as the satellite moves on. Starlink’s antennae use a phased array arrangement using a grid of smaller antennae on a planar surface. This allows the antenna to be electronically steered by altering the phase difference between each of the antennae in the grid. Even so, this is a relatively coarse arrangement, so the signal quality is not consistent. This implies a constantly varying Signal to Noise Ratio (SNR) as the phased array antenna tracks each satellite during its overhead path.

It appears that Starlink services use dynamic channel rate control as a response to this constantly varying SNR. The transmitter constantly adjusts the modulation and coding scheme to match the current SNR, as described in the IEEE 802.11ac standard. The modulation of this signal uses adaptive phase amplitude modulation, and as the SNR improves the modulator can use a larger number of discrete code points in this phase amplitude space, thus increasing the effective capacity of the service while using a constant frequency carrier signal. What this implies is that the satellite service is attempting to operate at peak carriage efficiency, and to achieve this the transmitter constantly adapts its signal modulation to take advantage of the instantaneous SNR from the satellite system. To the upper layer protocol drivers, the transmission service appears to have a constantly varying channel capacity and latency.

The Starlink satellite’s Ku-band downlink has a total of 8 channels using frequency division multiplexing. Each channel has an analogue bandwidth of 240Mhz. Each channel is broken into frames, which is subdivided using time division multiplexing into 302 intervals, each of 4.4µs, which together with a frame guard interval makes each frame 1,333µs, or 750 frames per second. Each frame contains a header that contains satellite, channel and modulation information [6]. The implication is that there is a contention delay of up to 1.3ms assuming that each active user is assigned at least one interval per frame.

This leaves us with four major contributory factors for variability of the capacity of the Starlink service, namely:

the variance in signal modulation capability, which is a direct outcome of the varying SNR of the signal,

the variance in the satellite path latency due to the relative motion of the satellite and the earth antennae,

the need to perform satellite switching on a regular basis, and

the variability induced by sharing the common satellite transmission medium with other users, which results in slot contention.

One way to see how these variability factors impacts on the service characteristics is to use a capacity measurement tool to measure the service capacity on a regular basis. The results of such a capacity measurement test in a Starlink service are shown in Figure 2. Here the test is a SpeedTest measurement test [7], performed on a 4-hourly basis for the period August 2023 through March 2024. The service appears to have a median value of around 120Mbps of download capacity, with individual measurements reading as high as 370Mbps and as low as 10Mbps, and 15Mbps of upload capacity, with variance of between 5Mbps to 50Mbps.

Figure 2 – Starlink Performance

In Internet terms ping [8] is a very old tool, but at the same time it’s still a very useful tool, which probably explains its longevity. Figure 3 is a plot of a continuous (flood) ping across a Starlink connection from the customer side terminal to the first IP end point behind the Starlink earth station for a 380 second interval. (A “flooding” ping sends a new ping packet each time a packet is received from the remote end).

Figure 3 – Starlink Ping Profile

The first major characteristic that is visible in this ping data is that the minimum latency changes regularly every 15 seconds. It appears that this change correlates to the Starlink user’s terminal being assigned to a different satellite. That implies that the user equipment “tracks” each satellite for a 15 second interval, which corresponds to a tracking angle of 11 degrees of arc.

The second characteristic is that loss events are seen to occur at times of switchover between satellites, as well as occurring less frequently as a result of either obstruction, signal quality or congestion.

Figure 4 – Starlink Ping Profile showing satellite handover

The third is that there is a major increase in latency at the point when the user is assigned to a different spacecraft. The worst case in this data set is a shift from 30ms to 80ms.

Finally, within each 15s satellite tracking interval the latency variation is relatively high. The average variation of jitter between successive RTT intervals is 6.7ms. The latency spikes at handover impose an additional 30ms to 50ms indicating the presence of deep buffers in the system to accommodate the transient issues associated with satellite handover. To illustrate this link behaviour the ping data set has been filterred to remove the effects of the satellite assignment at second 283 and second 298. (Figure 5).

Figure 5 – Starlink Ping Profile showing latency variance

The overall packet loss rate, when measuring using 1-second paced pings over an extended period is a little over 1% as a long-term average loss rate.

TCP Protocol Performance

TCP [9] is an instance of a sliding window positive acknowledgement protocol. The sender maintains a local copy of all data that has been passed into the local system’s IP layer and will only discard that local copy of sent data when it has received a positive acknowledgement from the receiver.

Variants to TCP are based on the variations in the sender’s control of the rate of passing data into the network and variations in the response to data loss. The classic version of TCP is one that uses a linear inflation of the sending window size while there is no loss, and halves this window in response to packet loss. This is the RENO TCP control algorithm. Its use in today’s Internet has been largely supplanted by the CUBIC TCP control algorithm [10] which uses a varying window inflation rate that attempts to stabilise the sending rate at a level just below the build-up of network queues (which ultimately leads to packet loss).

In general terms there is a small set of common assumptions about the characteristics of the network path for such TCP control algorithms:

There is a stable maximal capacity of the path, where the term stability describes a situation where the available path capacity is relatively constant across a number of RTT intervals.

The amount of jitter (variation in end-to-end delay) is low in proportion to the RTT.