Table of Contents
Apache - Troubleshooting - Determine why an Apache process hung
Sometimes a web server will just hang, or hard crash, or return malformed content.
Finding the reason for this typically requires reviewing each stack frame that led to the current frame, and selectively dumping server data structures.
Debugging in this manner takes time, and may not be appropriate for sites that require constant availability.
To determine the cause of this issue, view the stack backtrace of the hung process.
- The stack backtrace will contain a list of functions that have been invoked to get to the currently executing function.
- Since the hung condition was most likely triggered by the currently executing function, the function name can be used as a search string in bug databases, and as the starting point in analyzing source code for problems.
NOTE: Stack Backtrace
There are several tools that can be used to get a stack backtrace, including:
- pstack: attaches to the active processes named by the pids on the command line, and prints out an execution stack trace, including a hint at what the function arguments are.
- gcore: generate a core file from a hung web server process.
- gdb: the GNU Project debugger, allows you to see what is going on 'inside' another program while it executes.
The results from these methods could be fed into the Apache bug database to see if the problem is caused by a well known issue.
Use pstack to prints a stack backtrace for the process id
pstack 42367
returns:
42367: bin/httpd -k start ff040628 accept (3, 11c560, 11c54c, 1) 0004c3c4 unixd_accept (ffbff904, 7d490, 11c3a0, 0, 2710, 0) + 10 0004a3c0 child_main (7d490, 74400, 4e2e, 74000, 0, 74000) + 2ec 0004a6c8 make_child (4a000, 0, 1, 5, 72c00, 74000) + ec 0004b0e8 ap_mpm_run (72c00, 74000, 74000, 74000, 74000, 74400) + 934 000272d8 main (7ef18, 71c00, 73800, 73800, 0, 0) + 710 00026618 _start (0, 0, 0, 0, 0, 0) + 5c
NOTE: This shows:
- accept: Apache was in the accept() system call when the error was received.
Some systems do not have the pstack utility. In these cases try using the gdb and gcore utilities to get a stack backtrace from a process.
Attach directly to a process with gdb and retrieve a stack backtrace
The gdb utility can be run with the -p option and a process identifier.
The backtrace command can be run in the gdb shell:
gdb -q -p 3472
returns:
(gdb) backtrace #0 0x0046e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x0063b681 in accept () from /lib/tls/libpthread.so.0 #2 0x00b14814 in apr_socket_accept (new=0xbff85740, sock=0x9671538, connection_context=0x97115d8) at network_io/unix/sockets.c:187 #3 0x080819ce in unixd_accept (accepted=0xbff85774, lr=0x9671518, ptrans=0x97115d8) at unixd.c:466 #4 0x0807fd2e in child_main (child_num_arg=Variable "child_num_arg" is not available.) at prefork.c:621 #5 0x0807ffc2 in make_child (s=Variable "s" is not available.) at prefork.c:736 #6 0x08080050 in startup_children (number_to_start=5) at prefork.c:754 #7 0x0808089b in ap_mpm_run (_pconf=0x96730a8, plog=0x96a1160, s=0x9674f48) at prefork.c:975 #8 0x08061b08 in main (argc=3, argv=0xbff85a84) at main.c:717
NOTE: This shows:
- accept (): Apache was in the accept() system call when the SIGSEGV signal was received.
- accept() was called by the portable runtime method apr_socket_accept().
Use gcore utility to force a hung process to dump its core
gcore 4932
Use gdb utility to retrieve a stack backtrace from the core file
gdb -q /usr/sbin/httpd core.4932
returns:
(gdb) backtrace #0 0x0046e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x0063b681 in accept () from /lib/tls/libpthread.so.0 #2 0x00b14814 in apr_socket_accept (new=0xbff85740, sock=0x9671538, connection_context=0x97115d8) at network_io/unix/sockets.c:187 #3 0x080819ce in unixd_accept (accepted=0xbff85774, lr=0x9671518, ptrans=0x97115d8) at unixd.c:466 #4 0x0807fd2e in child_main (child_num_arg=Variable "child_num_arg" is not available.) at prefork.c:621 #5 0x0807ffc2 in make_child (s=Variable "s" is not available.) at prefork.c:736 #6 0x08080050 in startup_children (number_to_start=5) at prefork.c:754 #7 0x0808089b in ap_mpm_run (_pconf=0x96730a8, plog=0x96a1160, s=0x9674f48) at prefork.c:975 #8 0x08061b08 in main (argc=3, argv=0xbff85a84) at main.c:717
NOTE: This shows:
- accept (): Apache was in the accept() system call when the SIGSEGV signal was received.
- accept() was called by the portable runtime method apr_socket_accept().