[Dev] Performance Profiler

Tue, 8 Jan 2002 16:22:05 -0500

> compile your prog with 'gcc -pg ...'.  When you run the prog, a
> gmon.out file will be created, which can processed by gprof to list
> called routines, and time spent in routines.
>
> $ gcc -g -pg src.c -o exe
> $ ./exe
> $ gprof exe gmon.out

Cool.  I tried it but it busted on a "select".  I will play with this more in 
the future.

I may have worked around the problem.  I have a proprietary wrapper around 
datagrams as they pass through the box.  I added 6 new fields to the wrapper 
as follows:

struct timeval timeStampA
struct timeval timeStampB
struct timeval timeStampC
struct timeval timeStampD
struct timeval timeStampE
struct timeval timeStampF

I loaded the fields with gettimeofday() after the recvfrom on entry and 
before the sendto on exit from each of the 3 daemons n my app.

I found the D to E transfer in one direction was taking about 1/2 minute 
consistently.  After substituting code from a "fast" daemon into the "slow" 
daemon and gnashing my teeth in frustration for a while I noticed that the 
fast daemon had a blocking select (wait-forever).  The "slow" daemon used a 
polling select (no-wait).  I changed the slow daemon select to wait for 10000 
usecs.  Voila. Transfer time overall goes down dramatically.  I went through 
the entire code body and changed all polling selects to wait for 10000 usecs 
selects. 

The change is dramatic.  Before the change, using LOTs of syslogging, I got 
cross 2-box delays of about 30 secs and with almost no syslogging I got .9 
sec delays.  With the new change I get delays of .2-.02 secs with LOTs of 
syslogging.  These numbers are much more reasonable. (Great sigh of relief.)

This result does not seem consistent with how I read Stevens explanation of 
no-wait select in section 5.6 of Unix Network Programming Vol 1.

I am using a 2.2.14 kernel.

Mike