Last update: 20.09.98 (change log)
With Solaris 2.5.1 and patches 103582-15 or above applied, or starting with Solaris 2.6, a new TCP bug/feature is available to the user. This page will show on that activating the bug on webservers will speedup the data transfer with buggy client implementation, e.g. Irix 6.2 or Windows. Since Solaris does recognize the initial phase of a data transfer, you will not experience a speedup with Solaris.
At first the delayed ACK
interval on the IRIX client
or possibly windows client has to be found out about. As the toggles
and dial of an Irix client are unknown to me, a modified version of
the sock
utility described in W. Richard Stevens' books
come in very handy. The modifications allow it to do sub-second pauses
between data transmissions. Looking back from the results to this
point, finding the exact delayed ACK interval is not possible. You can
verify how BSDish your test object is.
It suffices to connect a sock to the Irix discard port and trace
the connection. It is not a good idea to connect to the echo port, as
the echo server's answer will be ready to send before the delayed
ACK
interval timeout occurred. You can do the same test
with your Windows systems, but vanilla Windows 95 does not suppy an
inetd
, and therefore has no server listening on the
discard port. Nevertheless, the procedure is the same.
server $ sock -i -p 0.75 -n 4 server discard server # tcpdump -ttNl | some_filter 1: 0.000000 client.30957 > server.discard: S ...:...(0) win 33580 <mss 1460> 2: 0.000984 server.discard > client.30957: S ...:...(0) ack ... win 61320 <mss 1460> 3: 0.001057 client.30957 > server.discard: . ack 1 win 33580 4: 0.001277 client.30957 > server.discard: P 1:1025(1024) ack 1 win 33580 5: 0.148547 server.discard > client.30957: . ack 1025 win 61320 6: 0.746173 client.30957 > server.discard: P 1025:2049(1024) ack 1 win 33580 7: 0.748428 server.discard > client.30957: . ack 2049 win 61320 8: 1.496185 client.30957 > server.discard: P 2049:3073(1024) ack 1 win 33580 9: 1.548272 server.discard > client.30957: . ack 3073 win 61320 10: 2.246127 client.30957 > server.discard: P 3073:4097(1024) ack 1 win 33580 11: 2.348140 server.discard > client.30957: . ack 4097 win 61320 12: 2.996492 client.30957 > server.discard: F 4097:4097(0) ack 1 win 33580 13: 2.997147 server.discard > client.30957: . ack 4098 win 61320 14: 2.997974 server.discard > client.30957: F 1:1(0) ack 4098 win 61320 15: 2.998017 client.30957 > server.discard: . ack 2 win 33580Figure 1: Trying to find out about the delayedACK
interval of a BSDish host.
A slightly modified version of the tcpdump output is shown. First of all, line numbers are introduced and relative timestamps are used. The client and server are abstracted names. Please mind that the server in this case is the Irix box, and the client the Solaris machine. The initial sequence numbers are replaced with three dots, as they contain no relevant information. Finally, the trailing (DF) is cut off, as both hosts are doing path MTU discovery. The connection initiation and tear down is separated from the data transfer phase.
From the output, segment 4 is the client data transfer and segment 5 the server (Irix) delayed acknowledgment of about 150 ms. The second pair does not seem to show any delay, the third pair 40 ms and finally 100 ms. This seemingly erratic behaviour of the Irix host can be subscribed to the BSD networking code. The delayed ACK timer of TCP goes off every 200 ms, but it goes off at fixed points in time, that is, every 200 ms relative to the last boot [compare: Stevens, TCPIPV1, 19.3].
This first experiment did not show us the delayed ACK interval. It only verified that Irix is behaving very BSDish.
The major performance increase works due to the fact that immediate
ACK
s are elicted instead of running into the delayed
ACK
interval timeout. The performance difference is
visible, as the experiments show.
The client (Irix) requests an about 13K plain HTML document, with no server side includes nor anything else special to do. The 13K is the average document size as seen on caches. Including the HTTP transmission headers, the document grows to 13549 Byte on TCP level. The webserver is an Apache/1.2.5 PHP/FI-2.0b13. The browser is simulated on the commandline with the help of the sock utility:
client $ ( echo -ne "GET /test.html HTTP/1.0\r\n\r\n" ; sleep 1 ) | sock server 80
Please note that the shell builtin echo command needs to
know about extended options, and that the CRLF sequences
which terminate a request are sent explicitely. If the sleep
were missing, the sock
program would immediately close
both directions of the data transfer, possibly before receiving a
reply. Even though it is possible to use the TCP halfclose feature of
sock
, and Apache will do the correct thing, some
webserver do not understand nor handle them correctly. This is
especially true for webcaches. Only recent recommendations argue that
the directions of a HTTP transfer should be handled independently of
each other. Also, if half-closes had been used, the experiment would
not have simulated real browser behaviour.
1: 0.000000 client.1469 > server.www: S ...:...(0) win 61440 <mss 1460> 2: 0.000250 server.www > client.1469: S ...:...(0) ack ... win 33580 <mss 1460> 3: 0.000909 client.1469 > server.www: . ack 1 win 61320 4: 0.002138 client.1469 > server.www: P 1:28(27) ack 1 win 61320 5: 0.004606 server.www > client.1469: P 1:1461(1460) ack 28 win 33580 6: 0.038097 client.1469 > server.www: . ack 1461 win 61320 7: 0.038186 server.www > client.1469: . 1461:2921(1460) ack 28 win 33580 8: 0.038253 server.www > client.1469: . 2921:4097(1176) ack 28 win 33580 9: 0.238699 client.1469 > server.www: . ack 4097 win 61320 10: 0.238731 server.www > client.1469: . 4097:5557(1460) ack 28 win 33580 11: 0.238820 server.www > client.1469: . 5557:7017(1460) ack 28 win 33580 12: 0.238889 server.www > client.1469: . 7017:8449(1432) ack 28 win 33580 13: 0.256021 client.1469 > server.www: . ack 8449 win 60040 14: 0.256104 server.www > client.1469: . 8449:9909(1460) ack 28 win 33580 15: 0.256167 server.www > client.1469: . 9909:11369(1460) ack 28 win 33580 16: 0.256230 server.www > client.1469: . 11369:12829(1460) ack 28 win 33580 17: 0.256285 server.www > client.1469: F 12829:13550(721) ack 28 win 33580 18: 0.261071 client.1469 > server.www: . ack 11369 win 60448 19: 0.261424 client.1469 > server.www: . ack 13551 win 59139 20: 0.262634 server.www > client.1469: . ack 29 win 33580 21: 0.262662 client.1469 > server.www: F 28:28(0) ack 13551 win 61320Figure 2.1: Solaris server with
tcp_slow_start_initial
set to 1 and a BSDish client.
The first experiment has set the server behaviour to the Solaris default
before the patch. The server pushes out the start of its reply insegment 5,
but it has to wait until the client acknowledges the answer. As you can
see, the next part of the reply almost immediately follow the ACK coming
in. The second client acknowledge of the server's reply in segment 9 is a
delayed ACK
, as it is delivered almost exactly 200 ms after
the first ACK
.
Please note that in figure 2.1 the transfer took over 250 ms, most
of it waiting for a delayed ACK
.
1: 0.000000 client.1472 > server.www: S ...:...(0) win 61440 <mss 1460> 2: 0.000292 server.www > client.1472: S ...:...(0) ack ... win 33580 <mss 1460> 3: 0.000933 client.1472 > server.www: . ack 1 win 61320 4: 0.001453 client.1472 > server.www: P 1:28(27) ack 1 win 61320 5: 0.004541 server.www > client.1472: P 1:1461(1460) ack 28 win 33580 6: 0.004659 server.www > client.1472: P 1461:2921(1460) ack 28 win 33580 7: 0.008695 client.1472 > server.www: . ack 2921 win 61320 8: 0.008744 server.www > client.1472: . 2921:4097(1176) ack 28 win 33580 9: 0.008846 server.www > client.1472: . 4097:5557(1460) ack 28 win 33580 10: 0.008908 server.www > client.1472: . 5557:7017(1460) ack 28 win 33580 11: 0.013305 client.1472 > server.www: . ack 7017 win 60884 12: 0.013329 server.www > client.1472: . 7017:8449(1432) ack 28 win 33580 13: 0.013377 server.www > client.1472: . 8449:9909(1460) ack 28 win 33580 14: 0.013447 server.www > client.1472: . 9909:11369(1460) ack 28 win 33580 15: 0.013514 server.www > client.1472: . 11369:12829(1460) ack 28 win 33580 16: 0.018901 client.1472 > server.www: . ack 9909 win 61320 17: 0.018961 server.www > client.1472: FP 12829:13550(721) ack 28 win 33580 18: 0.019976 client.1472 > server.www: . ack 12829 win 61320 19: 0.020307 client.1472 > server.www: . ack 13551 win 60599 20: 0.021076 server.www > client.1472: . ack 29 win 33580 21: 0.021100 client.1472 > server.www: F 28:28(0) ack 13551 win 61320Figure 2.2: Solaris server withtcp_slow_start_initial
set to 2 and a BSDish client.
The second experiment sets the new value 2 of the slow start patch for Solaris on the server. The most notable difference is that Solaris now seems to be "miscounting" when sending its reply. The "miscount" only concerns RFC compliance. It is not a mistake, Solaris. behaves like that on purpose. As the client receives now two segments instead of one, it has to send an acknowledge immediately. "Immediately" is still four to five milliseconds processing time for an Irix Indy.
Please note that in figure 2.2 the transfer is finished after just 21 ms. Even though the Solaris host still seemed to be waiting on the (slow) Indy, the network was not really idle, as no delayed acknowledgments were encountered.
Doing things with two Solaris machines does not look as good as
having a buggy BSDish client. The test machines are a Solaris 2.5.1
server and a Solaris 2.6 client. Both machines have a delayed
ACK
interval of 200 ms.
1:0.0000 client.8381 > server.www: S ...:...(0) win 33580 <mss 1460> 2:0.0003 server.www > client.8381: S ...:...(0) ack ... win 33580 <mss 1460> 3:0.0008 client.8381 > server.www: . ack 1 win 33580 4:0.0011 client.8381 > server.www: P 1:28(27) ack 1 win 33580 5:0.0045 server.www > client.8381: P 1:1461(1460) ack 28 win 33580 6:0.0051 client.8381 > server.www: . ack 1461 win 33580 7:0.0052 server.www > client.8381: . 1461:2921(1460) ack 28 win 33580 8:0.0053 server.www > client.8381: P 2921:4097(1176) ack 28 win 33580 9:0.0059 client.8381 > server.www: . ack 2921 win 33580 10:0.0060 server.www > client.8381: . 4097:5557(1460) ack 28 win 33580 11:0.0061 server.www > client.8381: . 5557:7017(1460) ack 28 win 33580 12:0.1980 client.8381 > server.www: . ack 7017 win 33580 13:0.1981 server.www > client.8381: . 7017:8449(1432) ack 28 win 33580 14:0.1982 server.www > client.8381: . 8449:9909(1460) ack 28 win 33580 15:0.1983 server.www > client.8381: . 9909:11369(1460) ack 28 win 33580 16:0.1983 server.www > client.8381: . 11369:12829(1460) ack 28 win 33580 17:0.3980 client.8381 > server.www: . ack 12829 win 33580 18:0.3981 server.www > client.8381: FP 12829:13550(721) ack 28 win 33580 19:0.3986 client.8381 > server.www: . ack 13551 win 33580 20:0.3989 client.8381 > server.www: F 28:28(0) ack 13551 win 33580 21:0.3990 server.www > client.8381: . ack 29 win 33580Figure 3.1: Server
tcp_slow_start_initial
1 and clienttcp_slow_start_initial
1.
1:0.0000 client.8384 > server.www: S ...:...(0) win 33580 <mss 1460> 2:0.0003 server.www > client.8384: S ...:...(0) ack ... win 33580 <mss 1460> 3:0.0007 client.8384 > server.www: . ack 1 win 33580 4:0.0011 client.8384 > server.www: P 1:28(27) ack 1 win 33580 5:0.0045 server.www > client.8384: P 1:1461(1460) ack 28 win 33580 6:0.0052 client.8384 > server.www: . ack 1461 win 33580 7:0.0053 server.www > client.8384: . 1461:2921(1460) ack 28 win 33580 8:0.0054 server.www > client.8384: P 2921:4097(1176) ack 28 win 33580 9:0.0060 client.8384 > server.www: . ack 2921 win 33580 10:0.0061 server.www > client.8384: . 4097:5557(1460) ack 28 win 33580 11:0.0062 server.www > client.8384: . 5557:7017(1460) ack 28 win 33580 12:0.2002 client.8384 > server.www: . ack 7017 win 33580 13:0.2003 server.www > client.8384: . 7017:8449(1432) ack 28 win 33580 14:0.2004 server.www > client.8384: . 8449:9909(1460) ack 28 win 33580 15:0.2004 server.www > client.8384: . 9909:11369(1460) ack 28 win 33580 16:0.2005 server.www > client.8384: . 11369:12829(1460) ack 28 win 33580 17:0.4001 client.8384 > server.www: . ack 12829 win 33580 18:0.4002 server.www > client.8384: FP 12829:13550(721) ack 28 win 33580 19:0.4007 client.8384 > server.www: . ack 13551 win 33580 20:0.4010 client.8384 > server.www: F 28:28(0) ack 13551 win 33580 21:0.4011 server.www > client.8384: . ack 29 win 33580Figure 3.2: Server
tcp_slow_start_initial
1 and clienttcp_slow_start_initial
2.
1:0.0000 client.8388 > server.www: S ...:...(0) win 33580 <mss 1460> 2:0.0004 server.www > client.8388: S ...:...(0) ack ... win 33580 <mss 1460> 3:0.0009 client.8388 > server.www: . ack 1 win 33580 4:0.0011 client.8388 > server.www: P 1:28(27) ack 1 win 33580 5:0.0047 server.www > client.8388: P 1:1461(1460) ack 28 win 33580 6:0.0048 server.www > client.8388: P 1461:2921(1460) ack 28 win 33580 7:0.0053 client.8388 > server.www: . ack 1461 win 33580 8:0.0055 server.www > client.8388: . 2921:4097(1176) ack 28 win 33580 9:0.0056 server.www > client.8388: . 4097:5557(1460) ack 28 win 33580 10:0.0056 client.8388 > server.www: . ack 2921 win 33580 11:0.0058 server.www > client.8388: . 5557:7017(1460) ack 28 win 33580 12:0.0058 server.www > client.8388: . 7017:8449(1432) ack 28 win 33580 13:0.2009 client.8388 > server.www: . ack 7017 win 33580 14:0.2010 server.www > client.8388: . 8449:9909(1460) ack 28 win 33580 15:0.2011 server.www > client.8388: . 9909:11369(1460) ack 28 win 33580 16:0.2012 server.www > client.8388: . 11369:12829(1460) ack 28 win 33580 17:0.2013 server.www > client.8388: F 12829:13550(721) ack 28 win 33580 18:0.2022 client.8388 > server.www: . ack 12829 win 33580 19:0.2024 client.8388 > server.www: . ack 13551 win 33580 20:0.2027 client.8388 > server.www: F 28:28(0) ack 13551 win 33580 21:0.2029 server.www > client.8388: . ack 29 win 33580Figure 3.3: Server
tcp_slow_start_initial
2 and clienttcp_slow_start_initial
1.
1:0.0000 client.8387 > server.www: S ...:...(0) win 33580 <mss 1460> 2:0.0003 server.www > client.8387: S ...:...(0) ack ... win 33580 <mss 1460> 3:0.0008 client.8387 > server.www: . ack 1 win 33580 4:0.0010 client.8387 > server.www: P 1:28(27) ack 1 win 33580 5:0.0043 server.www > client.8387: P 1:1461(1460) ack 28 win 33580 6:0.0045 server.www > client.8387: P 1461:2921(1460) ack 28 win 33580 7:0.0050 client.8387 > server.www: . ack 1461 win 33580 8:0.0051 server.www > client.8387: . 2921:4097(1176) ack 28 win 33580 9:0.0052 server.www > client.8387: . 4097:5557(1460) ack 28 win 33580 10:0.0053 client.8387 > server.www: . ack 2921 win 33580 11:0.0054 server.www > client.8387: . 5557:7017(1460) ack 28 win 33580 12:0.0055 server.www > client.8387: . 7017:8449(1432) ack 28 win 33580 13:0.2019 client.8387 > server.www: . ack 7017 win 33580 14:0.2020 server.www > client.8387: . 8449:9909(1460) ack 28 win 33580 15:0.2021 server.www > client.8387: . 9909:11369(1460) ack 28 win 33580 16:0.2021 server.www > client.8387: . 11369:12829(1460) ack 28 win 33580 17:0.2022 server.www > client.8387: F 12829:13550(721) ack 28 win 33580 18:0.2032 client.8387 > server.www: . ack 12829 win 33580 19:0.2034 client.8387 > server.www: . ack 13551 win 33580 20:0.2037 client.8387 > server.www: F 28:28(0) ack 13551 win 33580 21:0.2037 server.www > client.8387: . ack 29 win 33580Figure 3.4: Server
tcp_slow_start_initial
2 and clienttcp_slow_start_initial
2.
Figures 3.1 and 3.2 each displayed two delayed acknowledgments in segment 12 and 17. Common to both experiments is the RFC compliant behaviour of the server. Figures 3.3 and 3.4 just show one delayed acknowledgment in segment 13. In those experiments, the server uses the patched feature.
The experiments with the Solaris hosts talking to each other show that not all unnecessary deemable delayed acknowlegdement can be avoided. But just avoiding one can half your transfer time for the average web document.
On the other hand, some experiments of your own might turn up
different results, and even seem to indicate a worse performance with
the patch enabled, if both hosts are using it. After all, 200 ms is
just barely better than the transaction time of the regular buggy
cient, and 400 ms is even worse. What we would like to see here is a
fast access below 50 ms, too, without having to tune the delayed
ACK
interval.
Indeed, as the load on servers and lines changes, you might even experience an undelayed transfer between two Solaris hosts. It is possible, but maybe not as common as we would like it. I had to repeat the experiment several times in order to get the data for figure 3.5.
1:0.0000 client.8391 > server.www: S ...:...(0) win 33580 <mss 1460> 2:0.0003 server.www > client.8391: S ...:...(0) ack ... win 33580 <mss 1460> 3:0.0005 client.8391 > server.www: . ack 1 win 33580 4:0.0008 client.8391 > server.www: P 1:28(27) ack 1 win 33580 5:0.0041 server.www > client.8391: P 1:1461(1460) ack 28 win 33580 6:0.0042 server.www > client.8391: P 1461:2921(1460) ack 28 win 33580 7:0.0048 client.8391 > server.www: . ack 1461 win 33580 8:0.0048 server.www > client.8391: . 2921:4097(1176) ack 28 win 33580 9:0.0049 server.www > client.8391: . 4097:5557(1460) ack 28 win 33580 10:0.0049 client.8391 > server.www: . ack 2921 win 33580 11:0.0051 server.www > client.8391: . 5557:7017(1460) ack 28 win 33580 12:0.0051 server.www > client.8391: . 7017:8449(1432) ack 28 win 33580 13:0.0058 client.8391 > server.www: . ack 7017 win 33580 14:0.0059 server.www > client.8391: . 8449:9909(1460) ack 28 win 33580 15:0.0059 server.www > client.8391: . 9909:11369(1460) ack 28 win 33580 16:0.0060 server.www > client.8391: . 11369:12829(1460) ack 28 win 33580 17:0.0060 server.www > client.8391: P 12829:13550(721) ack 28 win 33580 18:0.0063 server.www > client.8391: F 13550:13550(0) ack 28 win 33580 19:0.0067 client.8391 > server.www: . ack 12829 win 33580 20:0.0068 client.8391 > server.www: . ack 13551 win 33580 21:0.0074 client.8391 > server.www: F 28:28(0) ack 13551 win 33580 22:0.0075 server.www > client.8391: . ack 29 win 33580Figure 3.5: Expample for an unimpended transfer between two Solaris hosts.
To have a look at how many bytes were transferred at ethernet level, RFC 894 encapsulation, we have to count SYN, sole FINs and sole ACKs as 64 Byte (40 Byte + 2\6 pads + 14 header + 4 trailer). Everthing else will have 48 Bytes added (20 TCP + 20 IP + 18 Ethernet). Thus the transfer in figure 3.5 shipped 14809 on Ethernet level in 7.5 ms using up a capacity of (minimum) 15 Mbps on a 100 Mpbs channel. As the inter-frame gaps were not accounted for, this is still lower than reality.
Sun, Sun Microsystems, the Sun Logo and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.
Please send your suggestions, bugfixes, comments, and ideas for new items to soltune at sean dot deLast Modified: Thursday, 22-Sep-2005 16:15:51 MEST