nixCraft Linux Forum

nixCraft

Linux / UNIX Tech Support Forum

Segfault error 4

This is a discussion on Segfault error 4 within the Linux software forums, part of the Linux Getting Started category; i am running pgcluster 1.9rc5 for some months, recently i am getting alerts in in message log for segfaults error ...


Go Back   nixCraft Linux Forum > Linux Getting Started > Linux software

Linux answers from nixCraft.


Linux software General questions and discussion about Redhat/Fedora Core/Cent OS, Debian and Ubuntu Linux related to softwares should go here.

Reply

 

LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 14-10-2008, 04:05 PM
kasimani's Avatar
Senior Member
User
 
Join Date: Jul 2006
Location: India, Delhi
OS: CentOS, RedHat, Fedora, Ubuntu
Posts: 151
Thanks: 3
Thanked 1 Time in 1 Post
Rep Power: 4
kasimani is on a distinguished road
Send a message via Yahoo to kasimani
Exclamation Segfault error 4

i am running pgcluster 1.9rc5 for some months, recently i am getting alerts in in message log for segfaults error 4...

What could be the problem and any solution for this..
Can anyone give me why this error occurs and what is it's meaning.


I am running Centos 5 on 64bit Blade servers and they are parted in 4 part using VMWARE
with disabled HT.

Here is the alerts that i am getting in message log

/var/log/messages.2:Oct 4 10:18:49 ibn-cluster3 kernel: postgres[13458]: segfault at 00002aaaae097004 rip 0000000000536e10 rsp 00007fff97608930 error 4
/var/log/messages.2:Oct 4 10:18:49 ibn-cluster3 kernel: postgres[13438]: segfault at 00002aaaae097004 rip 0000000000536e10 rsp 00007fff97608930 error 4
/var/log/messages.2:Oct 4 13:59:26 ibn-cluster3 kernel: postgres[25406]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fff932bcad0 error 4
/var/log/messages.2:Oct 4 19:07:23 ibn-cluster3 kernel: postgres[5698]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fff08347500 error 4
/var/log/messages.4:Sep 20 14:02:43 ibn-cluster3 kernel: postgres[31633]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007ffff4f76370 error 4
/var/log/messages.4:Sep 20 14:48:01 ibn-cluster3 kernel: postgres[32302]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fffc9b4aca0 error 4
/var/log/messages.4:Sep 20 14:48:23 ibn-cluster3 kernel: postgres[32330]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fffc9b4aca0 error 4
/var/log/messages.4:Sep 20 14:48:25 ibn-cluster3 kernel: postgres[32338]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fffc9b4aca0 error 4
/var/log/messages.4:Sep 20 14:48:28 ibn-cluster3 kernel: postgres[32347]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fffc9b4aca0 error 4
/var/log/messages.4:Sep 20 15:23:31 ibn-cluster3 kernel: postgres[1474]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fff34cd0340 error 4
/var/log/messages.4:Sep 20 16:46:46 ibn-cluster3 kernel: postgres[2480]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007ffffc58a6f0 error 4
/var/log/messages.4:Sep 20 16:52:53 ibn-cluster3 kernel: postgres[2984]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fff5e7e59e0 error 4
/var/log/messages.4:Sep 20 20:00:01 ibn-cluster3 kernel: postgres[6654]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fffa24b2df0 error 4
/var/log/messages.4:Sep 20 20:00:03 ibn-cluster3 kernel: postgres[6662]: segfault at 0000000000000094 rip 00000000005ad88e rsp 00007fffa24b2df0 error 4


I some more details needed then pl. let me know

Regards
Reply With Quote
  #2 (permalink)  
Old 14-10-2008, 04:11 PM
nixcraft's Avatar
Never say die
User
 
Join Date: Jan 2005
Location: BIOS
OS: RHEL
Scripting language: Bash and Python
Posts: 2,710
Thanks: 11
Thanked 245 Times in 184 Posts
Rep Power: 10
nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute
Default

May be suggestion posted below will help you out:
Why Does The Segmentation Fault Occur on Linux / UNIX Systems?
__________________
Vivek Gite
Linux Evangelist
Be proud RHEL user, and let the world know about your enterprise choices! Join RedHat user group.
Always use CODE tags for posting system output and commands!
Do you run a Linux? Let's face it, you need help
Reply With Quote
  #3 (permalink)  
Old 14-10-2008, 08:54 PM
Junior Member
User
 
Join Date: Aug 2008
OS: Debian
Posts: 11
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
websissy is on a distinguished road
Exclamation

In my experience Apache segfaults can also be caused by having one or more damaged run-time components in Apache or one of its dependent modules. I had this happen to me last weekend when fixes applied to a corrupted file system apparently ended up damaging some of apache's components.

Shortly after that happened, I noticed cascades of segment faults occurring in Apache on my system. In an effort to fix it, I used aptitude on my Debian system, to carefully build a detailed list of Apache2 and all components it depended on. Once I had a complete list, I then used Aptitude to remove all those apps (except one which the kernel depended on) and then I used "clean" to remove all traces of those apps except their config files from my system. Finally I reinstalled Apache2 and all components it depended on and the result was I managed to eliminate almost all the segfaults. Whereas before I was getting several of those errors at once (and hundreds or thousands in a 24 hour period), I'm now seeing only 5 or 6 in a 24 hour period.

The point is segfaults can also occur as a by-product of a damaged file system as well as because of a hardware problem or a poorly written program. try removing and reinstalling Apache and the components it depends on then be sure to delete the binary runtimes for all those apps too -- THAT's the trickiest part -- but try to save your config files (if possible). In my case this solution reduced my number of Apache segfaults from hundreds per hour to 4 - 6 per day.

Another thing to bear in mind is that Apache makes heavy use of memory and spawns dozens of dynamic tasks (e.g. php and its underlying applications, Python and its apps, Ruby and its apps, etc.) which then issue requests to other apps as well (e.g. mysql database request, perl requests and God KNOWS what else. In short, Apache and the tools it relies on very thoroughly exercise your system's memory. So if you had a bad stick of ram that was throwing random errors -- especially during periods of high system demand, Apache and the apps it calls might encounter that bad block of memory quite often. Have you tried running a hardware diagnostic on your system?

Last edited by websissy; 14-10-2008 at 09:08 PM.
Reply With Quote
  #4 (permalink)  
Old 16-07-2009, 05:28 PM
Junior Member
User
 
Join Date: Jun 2007
OS: Debian
Posts: 6
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
manishkochar is on a distinguished road
Default

I read the articles posted on the links, suggested by Vivek.
Good articles, but really they couldn't do much to diagnose and solve a problem specially if you are not the software developer.

Most people who use Linux systems, use open source software, but that does not mean they can understand all the programming that goes inside.

Developers write programs, package and distribute it to people all over the world. Developers don't even know most of the users, and most users wouldn't know how to debug a crash. If you believe in Peter's Principle, believe me a crash occurs when you are least expecting it. I suppose we need an HowTo, that bridges the developers and the users, for the purpose of eliminating such crashes and flaws from the software.

I hope somebody could write an article like:
Suppose you are using a software that randomly experiences a crash, open up /var/log/messages and grep for segfault. You will notice one or more lines like:

segfault at a594dec8 eip b7cc6283 esp ab78e658 error 4

I really wish I knew more about what to do next, and write the rest of the article. But I think the article should cover things like:

How and why all software developers, both open source and closed source should ship the symbol tables of the executables and libraries?

How should end-user use the symbol tables, and use them to analyse a crash, even when they do not have the original source code?

All segfaults are not necessarily caused by a flaw in the application software. How to make sure that a crash reported as segfault is definitely NOT caused by a fault in the software?


Most of the articles as seen on the web ask you to start an application under gdb, and then keep using it until a crash occurs. I am amazed, that so many people still believe that a software user really has nothing better to do. And moreover, if replicating the crash was so easy, wouldn't every developer be able to simply ship out stable releases, and cut out all that alpha, and beta crap. On numerous occasions I have even seen segfaults without any core dumps. And even other forms of crashes, like stack smashing etc. that doesn't even generate a core dump, and most users wouldn't even be able to record them, if they occurred in a background application or a daemon. Hopefully this article might cover points like capturing ALL outputs emitted by an application, due to systemic errors, like stack smashing, OOM, etc.

Maybe all the stuff that I am wishing for does exist somewhere, and I am too stupid to have not discovered them, so I thank in advance all those who might be kind enough to point me to the correct links.

Hopefully admins and the users of this forum will be able to get together and get such an article in place, under Vivek's stewardship.
Reply With Quote
  #5 (permalink)  
Old 17-07-2009, 02:48 PM
nixcraft's Avatar
Never say die
User
 
Join Date: Jan 2005
Location: BIOS
OS: RHEL
Scripting language: Bash and Python
Posts: 2,710
Thanks: 11
Thanked 245 Times in 184 Posts
Rep Power: 10
nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute nixcraft has a reputation beyond repute
Default

Yes, segfault errors are royal pain in a$$. gdb is the best tool to debug these problem. There is another good alternative called DTrace which is dynamic tracing framework for troubleshooting kernel and application problems on production systems in real time. But, it only works on Solaris / FreeBSD / Mac OS x but not on Linux.

So as a sys admin you get to train yourself using gdb. There are good books out there that teaches gdb.

HTH
__________________
Vivek Gite
Linux Evangelist
Be proud RHEL user, and let the world know about your enterprise choices! Join RedHat user group.
Always use CODE tags for posting system output and commands!
Do you run a Linux? Let's face it, you need help
Reply With Quote
  #6 (permalink)  
Old 17-07-2009, 08:46 PM
Junior Member
User
 
Join Date: Jun 2007
OS: Debian
Posts: 6
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
manishkochar is on a distinguished road
Default

I have been on this subject for a while now, and did a bit of surfing around in search of nirvana.

Let me share with you what I discovered, and let's hope there's people on this forum who might be interested to add in further:

A super link that initiates people to the world of post-mortem analysis:
YouTube - Gilad Ben-Yossef on using ldd and nm

Another discussion thread at Getting stack traces on Unix systems, automatically - Stack Overflow
is worth a visit.

The second link requires a lot of recoding of any existing software, whereas the first link encourages analysis from whatever you already have.

A bit more of surfing around, and I found that after compiling any application, it is possible to export it's symbols, into a separate library.
Thus, even applications that are actually put into production after stripping, can be analysed with gdb, without necessarily requiring the source code.

For example, one could do:
Code:
gdb ./${EXECUTABLE_BINARY} --readnow <<- _EOF
maint print symbols ${SYMBOLS_FILE_FOR_THE_EXECUTABLE_BINARY}
quit
_EOF
wait
The above would produce or rather extract the symbols file for the gdb.

gdb allows invocation by specifying an executable, and a separate file that contains the symbols, in case you don't not have the source code, and are using a stripped executable. But I am not sure if maint print symbols is the accurate option, must be verified before used.

But I guess, if that works, and does not allow reverse engineering, then even developers of closed source software, could be encouraged to release the symbols file within released packages.

I discovered another good reference at Tuxology - a Linux embedded, kernel and training blog
It's by the same gentleman in the youtube link.

Dtrace, mtrace, strace, ptrace, etc. are good, but the only problem with them is they are good if you know the application is going to soon crash. Or if you know how to replicate the crash. All of them leave me miserably occupied on the console, waiting for hours for an application to crash. The scene is worse when the app's basically supposed to be run as a daemon, and we run it in the foreground just for witch-hunting. And if the stupid thing crashes just when you left for a quick cuppa, ..... !!!!

The second link I mentioned above, discusses possibilities of making your application capable capturing a lot of details, when it gets a sigsegv. And I suppose it's just that enough examples need to be collected, so that newbies can learn it too.

Btw. Does anybody know how to actually use objdump and nm?
Their documentation only discusses how to invoke it. Nothing much about how to interpret the output and use it to analyse a segfault with fine accuracy.

Cheers
Reply With Quote
  #7 (permalink)  
Old 29-07-2009, 03:50 PM
Junior Member
User
 
Join Date: Jun 2007
OS: Debian
Posts: 6
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
manishkochar is on a distinguished road
Default

Ok I figured out a bit about the objdump!

It is possible to identify the location in source code, that causes problems like:
segfault at XXXXXX eip YYYYYY esp ZZZZZZ error 4
Typically such lines would be witnessed in /var/log/messages in the following format:

Code:
Jul 28 20:51:32 ubuntu804 kernel: [ 8146.280653] YOUR_APPLICATION[992]: segfault at 0000004c eip 08094952 esp a7acddc0 error 4
First generate an objdump of the application "YOUR_APPLICATION" with the following command:

Code:
objdump -DCl "/path/to/YOUR_APPLICATION" > APPLICATION_DEBUG
then simply locate the eip location that is YYYYYY in the APPLICATION_DEBUG.

In the Above example 08094952 represents YYYYYY
so I would typically do this:

Code:
grep -n -A 6 -B 6 "8094952" APPLICATION_DEBUG
Note instead of "08094952" I trimmed the leading "0" and and used "8094952"

The resulting output should give you a fair idea of where the problem lies in the code. grep -n would tell you the line number of the relevant information in the APPLICATION_DEBUG and you might even cat or less to view that entire file to look at things more holistically. -A 6 -B 6 simply show 6 lines before and after the matching position in the APPLICATION_DEBUG.

Though the information in /var/log/messages could be different like:
Code:
segfault at 00002aaaae097004 rip 0000000000536e10 rsp 00007fff97608930 error 4
I still haven't figured out that, will surely post when I do!

Happy Hunting, and if anybody else has notes to add, I guess this thread will be very useful to everybody, so please accept my thanks in advance.
Reply With Quote
Reply

Tags
apache , application problems , dtrace , gdb , segfault error , troubleshooting , unix


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads

Thread Thread Starter Forum Replies Last Post
Invalid ICMP type 3 code 3 error to a broadcast -samba error raj Linux software 0 09-07-2006 04:38 AM


All times are GMT +5.5. The time now is 11:38 AM.


Powered by vBulletin® Version 3.8.5 - Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2
©2005-2010 nixCraft. All rights reserved

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38