[S.u.S.E. Linux] What does traceroute really do?
I am telnetted into my system at home from a Sun box at work. I run traceroute back to this subnet and get the following: Lakshmana kde/bin# traceroute cpkwebser3.ncr.disa.mil traceroute to cpkwebser3.ncr.disa.mil (164.117.138.43), 30 hops max, 40 byte packets 1 sdn-ts-001dcwash002t.dialsprint.net (206.133.1.3) 206.013 ms 159.635 ms 169.683 ms 2 206.133.1.2 (206.133.1.2) 169.562 ms 168.944 ms 149.986 ms 3 sdn-pnc2-dc-1-0.dialsprint.net (207.143.16.69) 169.974 ms 179.892 ms 159.721 ms 4 sl-bb1-dc-10-0.sprintlink.net (207.143.1.173) 159.494 ms 168.909 ms 159.819 ms 5 sl-bb5-dc-4-0-0-155M.sprintlink.net (144.232.0.2) 159.900 ms 159.863 ms 159.620 ms 6 sl-bb10-pen-0-0-155M.sprintlink.net (144.232.8.1) 159.519 ms 159.834 ms 159.302 ms 7 144.232.5.62 (144.232.5.62) 169.568 ms 189.324 ms 159.787 ms 8 sprint-nap.disa.mil (192.157.69.45) 200.035 ms 199.747 ms * 9 137.209.200.205 (137.209.200.205) 229.969 ms 211.317 ms 218.435 ms 10 198.26.132.34 (198.26.132.34) 219.496 ms * 220.008 ms 11 * 33.252.200.210 (33.252.200.210) 209.669 ms 299.953 ms 12 33.252.200.110 (33.252.200.110) 249.717 ms 279.866 ms 199.424 ms 13 209.22.98.5 (209.22.98.5) 279.627 ms 220.101 ms 209.283 ms 14 hqs-gw2.ncr.disa.mil (164.117.144.5) 219.758 ms 249.631 ms 270.177 ms 15 164.117.1.2 (164.117.1.2) 219.182 ms 220.143 ms 279.363 ms 16 cpkwebser.ncr.disa.mil (164.117.138.43) 220.162 ms 290.075 ms 259.434 ms
I am telnetted into my system at home from a Sun box at work. I run traceroute back to this subnet and get the following:
[trace snipped]
These numbers are far better than what I was getting earlier today. But I don't really understand what they mean. I guess I could read the man page again and try to figure it out. I am wondering if anybody has an intuitive explanation of what these numbers represent. I used to think that each line reported the time it took to get from one node to the next. Looking at basically the same path from both directions tells me that this is not the whole story. What gives here?
Yes yes, you should read the man page. :D It's always the logical place to start. Sometimes there are confusing bits floating in them, at which point asking what those confusing bits are about is your best bet. from 'man traceroute' on my SuSE 5.1 box: This program attempts to trace the route an IP packet would follow to some internet host by launching UDP probe packets with a small ttl (time to live) then listening for an ICMP "time exceeded" reply from a gateway. We start our probes with a ttl of one and increase by one until we get an ICMP "port unreachable" (which means we got to "host") or hit a max (which defaults to 30 hops & can be changed with the -m flag). Three probes (change with -q flag) are sent at each ttl setting and a line is printed showing the ttl, address of the gateway and round trip time of each probe. If the probe answers come from dif ferent gateways, the address of each responding system will be printed. If there is no response within a 5 sec. timeout interval (changed with the -w flag), a "*" is printed for that probe. Thus the numbers are the round trip times for each probe. If you wish to better understand the networking terms, then your best free resource is the RFCs at ns.internic.net. In any case, TTL is the Time To Live value, set on all packets to ensure a maximum number of transfers. This prevents packets from going in infinite loops and generally causing mischief. So, when a router recieves a packet with a TTL of 0, it is supposed to send a message back to the originator, the above mentioned ICMP "time exceeded" reply. So traceroute sends out 3 packets with TTL of 0, then 1, then 2, then 3, etc etc. By doing so, it is, in a manner, requesting each router along the way to identify itself. The times listed are just how long it takes for you to receive such replies. This could depend on router latencies (which is what you'd like to see), prioritization of dealing with such messages, and some other things I can't think of. Thus the numbers are general indications of performance to that point along your link, but are instantaneous snapshots and unreliable indicators. Mostly, traceroute will only tell you which part of your link is causing problems in extreme situations, such as oversaturation. For more on TCP/IP, read "Internetworking with TCP/IP" by Douglas E. Comer. <A HREF="http://www.amazon.com/exec/obidos/ISBN=0132169878/001-3786462-5815564"><A HREF="http://www.amazon.com/exec/obidos/ISBN=0132169878/001-3786462-5815564</A">http://www.amazon.com/exec/obidos/ISBN=0132169878/001-3786462-5815564</A</A>> -josh -- To get out of this list, please send email to majordomo@suse.com with this text in its body: unsubscribe suse-linux-e
Josh, Thanks for the reply. I wish I had time to read Comers's books. Perhaps when I am done with school I will have time to learn. ;-). I did read the man pages on this and got basically what you said out of them. I have some confusion about this. You say:
Thus the numbers are the round trip times for each probe. If you wish <<
To me this means the first hop would return a small number, the second would be a larger number, and the third even larger, etc. I often see numbers that are further downstream come back significantly lower than the ones that are closer. I guess this has to do with both the random nature of the values and the fact (if I am understanding this) generating the TTL error might take longer than passing the packet. I will acknowledge that the times in the subsequent lines of the sample traceroute that I put in my last e-mail were increasing with the hop count (in general). So the way I am understanding this, is that traceroute tells me how long between the time the ICMP leaves home and the time it comes back. The reason that it may take more time to get back from a closer router is because closer router may not process it's TTL error as quickly as a downstream router. Does this make sense? Steve Joshua Rodmanius wrote:
I am telnetted into my system at home from a Sun box at work. I run traceroute back to this subnet and get the following:
[trace snipped]
These numbers are far better than what I was getting earlier today. But I don't really understand what they mean. I guess I could read the man page again and try to figure it out. I am wondering if anybody has an intuitive explanation of what these numbers represent. I used to think that each line reported the time it took to get from one node to the next. Looking at basically the same path from both directions tells me that this is not the whole story. What gives here?
Yes yes, you should read the man page. :D It's always the logical place to start. Sometimes there are confusing bits floating in them, at which point asking what those confusing bits are about is your best bet.
from 'man traceroute' on my SuSE 5.1 box:
This program attempts to trace the route an IP packet would follow to some internet host by launching UDP probe packets with a small ttl (time to live) then listening for an ICMP "time exceeded" reply from a gateway. We start our probes with a ttl of one and increase by one until we get an ICMP "port unreachable" (which means we got to "host") or hit a max (which defaults to 30 hops & can be changed with the -m flag). Three probes (change with -q flag) are sent at each ttl setting and a line is printed showing the ttl, address of the gateway and round trip time of each probe. If the probe answers come from dif ferent gateways, the address of each responding system will be printed. If there is no response within a 5 sec. timeout interval (changed with the -w flag), a "*" is printed for that probe.
Thus the numbers are the round trip times for each probe. If you wish to better understand the networking terms, then your best free resource is the RFCs at ns.internic.net.
In any case, TTL is the Time To Live value, set on all packets to ensure a maximum number of transfers. This prevents packets from going in infinite loops and generally causing mischief.
So, when a router recieves a packet with a TTL of 0, it is supposed to send a message back to the originator, the above mentioned ICMP "time exceeded" reply. So traceroute sends out 3 packets with TTL of 0, then 1, then 2, then 3, etc etc. By doing so, it is, in a manner, requesting each router along the way to identify itself. The times listed are just how long it takes for you to receive such replies. This could depend on router latencies (which is what you'd like to see), prioritization of dealing with such messages, and some other things I can't think of. Thus the numbers are general indications of performance to that point along your link, but are instantaneous snapshots and unreliable indicators.
Mostly, traceroute will only tell you which part of your link is causing problems in extreme situations, such as oversaturation.
For more on TCP/IP, read "Internetworking with TCP/IP" by Douglas E. Comer. <A HREF="http://www.amazon.com/exec/obidos/ISBN=0132169878/001-3786462-5815564"><A HREF="http://www.amazon.com/exec/obidos/ISBN=0132169878/001-3786462-5815564</A">http://www.amazon.com/exec/obidos/ISBN=0132169878/001-3786462-5815564</A</A>>
-josh -- To get out of this list, please send email to majordomo@suse.com with this text in its body: unsubscribe suse-linux-e
-- "Alles Vergaengliche Ist nur ein Gleichniss" -Goethe, as quoted in Ludwig Boltzmann's Vorlessungen ueber Gastheorie. -- To get out of this list, please send email to majordomo@suse.com with this text in its body: unsubscribe suse-linux-e
I did read the man pages on this and got basically what you said out of them.
Yep, sometimes painful, but always useful.
Thus the numbers are the round trip times for each probe. If you wish <<
To me this means the first hop would return a small number, the second would be a larger number, and the third even larger, etc. I often see numbers that are further downstream come back significantly lower than the ones that are closer. I guess this has to do with both the random nature of the values and the fact (if I am understanding this) generating the TTL error might take longer than passing the packet.
Yes indeed.
traceroute tells me how long between the time the ICMP leaves home and the time it comes back.
Actually when the UDP packet leaves home and when the ICMP-based error comes back, but close enough. The random nature is due to taking a 'snapshot' of the highly dynamic Internet.
a closer router may not process it's TTL error as quickly as a downstream router.
This second factor has to do with the nature of routers. Most any router or routing agent exists on a multitasking machine which has many many needs to attend to. Sending TTL errors is a rather low priority as compared with routing valid packets along. Thus this is the kind of task which is scheduled at low priorities and only completed when the system is 'idle'. In fact, some router's won't send this error at all, and you'll see entries in your traceroute like so: 13 * * * How frustrating! Happy hacking, -josh -- To get out of this list, please send email to majordomo@suse.com with this text in its body: unsubscribe suse-linux-e
participants (2)
-
hattons@CPKWEBSER5.ncr.disa.mil
-
jrodman@skaro.nightcrawler.com