Our general strategy is for the Mosh client to make
an echo prediction each time the user hits a key, but not
necessarily to display this prediction immediately.
The predictions are made in groups known as
“epochs,” with the intention that either all of the pre-
dictions in an epoch will be correct, or none will. An
epoch begins tentatively, making predictions only in the
background. If any prediction from a certain epoch is
confirmed by the server, the rest of the predictions in that
epoch are immediately displayed to the user, along with
any future predictions in the same epoch.
Some user keystrokes are likely to alter the host’s echo
state from echoing to not, or are otherwise hard to pre-
dict, including the up- and down-arrow keys and control
characters. These cause Mosh to lose confidence and in-
crement the epoch, so that future predictions are made in
the background again.
In practice, this approach accommodates a wide vari-
ety of application behaviors, including multi-mode edi-
tors like vi (which sometimes echo conventionally and
sometimes don’t), and the possibility that the user might
type a command at the prompt (e.g., passwd) that stops
server-side echoes after the ENTER key is typed.
Because the decision to perform local echo is made en-
tirely based on the application’s observed behavior, ap-
plications need not be rewritten to accommodate local
echo. Unlike prior work, Mosh’s local echo works even
with full-screen programs (like emacs) that put the ter-
minal driver in “raw” mode and do their own echoing.
In typical use, Mosh can display immediately the ef-
fects of almost all “typing,” which constitutes more than
two-thirds of user keystrokes in our captures. The re-
maining keystrokes are principally “navigation” (such as
“n” to move to the next e-mail message in a mail reader),
which cannot be predicted locally.
Server-side assistance for prediction evaluation
For the above algorithm to work properly, the Mosh
client must be able to reliably determine whether its
echo predictions are correct. Early versions of Mosh at-
tempted to do this with the client only, by simply exam-
ining whether a predicted echo was present on the screen
by the time the Mosh server had acknowledged the cor-
responding keystroke.
Unfortunately, in trials, we found that applications
sometimes take tens of milliseconds after input is pre-
sented to them before echoing to the screen. This can
lead the Mosh server to acknowledge an input keystroke
before the echo is present in the screen state, and causes
the client to conclude that its prediction was incorrect,
even though the echo is on the way. This produces an-
noying flicker as the echo is (mistakenly) removed from
the screen, then reinstated when it eventually arrives
from the server.
Our initial solution to this problem was a client-side
timeout, so that a prediction is not considered incor-
rect until the corresponding keystroke has been acknowl-
edged by the server and a certain amount of time has
elapsed. Unfortunately, because of network jitter that can
delay the eventual echo beyond the timeout, this too pro-
duced an annoying number of false-negatives and result-
ing flicker. (By contrast, setting the timeout long enough
to accommodate large amounts of jitter causes mistaken
predictions to linger on the screen for too long.)
Our final solution was to implement a server-side time-
out of 50 ms, chosen to contain the vast majority of le-
gitimate application echoes on loaded servers, while still
fast enough to rapidly detect mistaken predictions. The
terminal object that is synchronized to the client contains
an “echo ack” field, representing the latest keystroke that
has been presented to the application for at least 50 ms
and whose effects ought to be reflected in the current
screen. The client has no timeouts of its own, and con-
sequently network jitter does not adversely affect the
client’s ability to evaluate whether a prediction is correct.
The cost is increased network traffic, because the server
often sends an extra datagram 50 ms after a keystroke to
convey the echo ack.
In practice, this has eliminated the flicker caused by
false-negatives.
4 Results
We evaluated Mosh using traces contributed by six users,
covering about 40 hours of real-world usage and includ-
ing 9,986 total keystrokes. These traces included the
timing and contents of all writes from the user to a re-
mote host and vice versa. The users were asked to con-
tribute “typical, real-world sessions.” In practice, the
traces include use of popular programs such as the bash
and zsh shells, the alpine and mutt e-mail clients, the
emacs and vim text editors, the irssi and barnowl chat
clients, the links text-mode Web browser, and several
programs unique to each user.
To evaluate typical usage of a “mobile” terminal, we
replayed the traces over an otherwise unloaded Sprint
commercial EV-DO (3G) cellular Internet connection in
Cambridge, Mass. A client-side process played the user
portion of the traces, and a server-side process waited for
the expected user input and then replied (in time) with
the prerecorded server output. We sped up long periods
with no activity. The average round-trip time on the link
was about half a second.
We replayed the traces over two different remote shell
applications, SSH and Mosh, and recorded the user inter-
face response latency to each simulated user keystroke,
as seen by the user. The Mosh predictive algorithm and
4