Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

DO NOT REPLY [Bug 7617] New: - Apache 1.3.x race condition causes gratuitous 3-second CGI delay

From: <bugzilla(at)apache.org>
Date: Fri Mar 29 2002 - 14:58:37 EST


DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7617>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7617

Apache 1.3.x race condition causes gratuitous 3-second CGI delay

           Summary: Apache 1.3.x race condition causes gratuitous 3-second
                    CGI delay
           Product: Apache httpd-1.3
           Version: 1.3.24
          Platform: Sun
        OS/Version: Solaris
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: mod_cgi
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: andrew@tellme.com

This is a repost of a bug that I reported to dev@httpd.apache.org list in 2001. Since that forum is usually primarily concerned with development of Apache 2, I am open this as a Bugzilla bug. The bug never made it into the former Apache bugtracking system, although it did have some similarities to some (VERY old) existing bug reports for various architectures.

There is an apparent race condition in Apache 1.3.x CGI handling which results in occasional unnecessary 3-second delays resulting from a pause between when a CGI child process closes it output pipe and when that process subsequently exits. Under normal circumstances, it appears that only Solaris x86 is majorly affected.

Specifically, the code in mod_cgi.c reads from its child process until the child process breaks the pipe. The cleanup code in alloc.c then calls waitpid() with WNOHANG to check to see if its child process has died; if its pid is not waiting, Apache assumes that the process has hung. It sends a SIGTERM, waits 3 seconds, then sends a SIGKILL. The relevant code is in free_proc_chain() in alloc.c.

The former assumption (if the child pid is not waiting to be reaped, the child process must have hung and should be killed) appears to be erroneous on at least some configurations. Specifically, imagine that the CGI child process exits 10ms after the cleanup code in alloc.c is run. In this case, the Apache process sleeps 3 seconds, when really it didn't need to.

This problem is only client-visible with HTTP/1.1 keep-alive, Apache running as a single process, or bad luck where the client talks to the same child more than once. The user-visible symptom is then a 3-second delay following a CGI request, before the next request is serviced.

To try to reproduce the problem:

  • Build Apache "out of the box" with a straight configure
  • Enable .cgi processing. Here is the diff between the default config file and the one with .cgi processing enabled:
          317c317
          <     Options Indexes FollowSymLinks MultiViews
          ---
          >     Options Indexes FollowSymLinks MultiViews ExecCGI
          784c784
          <     #AddHandler cgi-script .cgi
          ---
          >     AddHandler cgi-script .cgi

* Put a test CGI under the default DocumentRoot. Here is one that
explicitly triggers the bug: #!/usr/local/bin/perl # break.cgi - triggers the 3-second delay on any system print "Content-Type: text/plain\n\n"; print "Hello, world.\n"; close STDOUT; sleep 1; And here is one that should NOT trigger the bug purposely, but still exhibits problems on our Solaris x86 systems: #!/usr/local/bin/perl # test.cgi - on Solaris x86, sometimes exhibits 3-second delay print "Content-Type: text/plain\n\n"; print "Hello, world.\n";
* Connect to the HTTP server via telnet, and make a Keep-Alive request.
Repeat the request after getting a response. With break.cgi, you should see a 3-second delay after every response. With test.cgi on an affected system, the 3-second delay occurs regularly but sporadically.
Do you need help?X

On Solaris x86 on a dual-processor box, we see this behavior perhaps 10-20% of the time for any particular child (using the test.cgi case above). On most other systems we tested, you have to explicitly try to trigger the bug (for example, using the break.cgi above).

We're not sure why Solaris x86 exhibits the delay even without a forced delay between pipe closing and process exit. Perhaps Solaris is doing some cleanup that Linux is not, or there is some child reaping issue with the multiple processors.

Here are the configurations we tested. Patched Apaches (with mod_perl or mod_ssl capabilities) had the same behaviors as straight out-of-the-box configurations; having DSOs enabled was likewise irrelevant.

  • Solaris x86, dual processor Intel boxes, Apache 1.3.9, 1.3.1[247], 1.3.24
    • On Apache 1.3.14, mod_perl and mod_ssl and non-DSO variants
    • All configurations display sporadic 3-second CGI delays even in a simple Hello, world CGI.
  • Solaris on a single processor Sparc box, Apache 1.3.12, 1.3.24; Linux, single processor Intel boxes, Apache 1.3.12, 1.3.14; FreeBSD, dual processor Intel box, Apache 1.3.12; OpenBSD, single process Intel box, Apache 1.3.12
    • Without explicitly closing STDOUT, the bug doesn't appear, but if you close STDOUT and do really anything at all (including just a timing loop), the bug appears

I will attach my test script, a simple Perl script that opens a socket connection to a webserver and does repeated HTTP/1.1 Keep-Alive requests, timing each trial, to this bug. It vastly simplifies the last step in the repro case above.

This bug may be the same as PR 6961 (repeated requests for a simple cgi invoke delay of Apache) and is related loosely to PR 6226 (closing STDOUT doesn't end session to allow background processing of code). I also originally sent an email  to dev@httpd.apache.org about this, which came up with a couple followups. The URL to that in the archive is here:

    http://groups.yahoo.com/group/new-httpd/message/19853

There was a very short discussion (apparently this problem has a bit of a history!) but no resolution.



To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org For additional commands, e-mail: bugs-help@httpd.apache.org Received on Fri Mar 29 19:58:34 2002

This archive was generated by hypermail 2.1.8 : Wed Aug 23 2006 - 16:43:02 EDT

Do you need more help?X

Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library