Linux / UNIX Tech Support Forum
This is a discussion on Why did linux boot with main file system read only after kernel install? within the Ubuntu / Debian forums, part of the Linux Distribution category; A few days ago, I tried migrating mailman 2.1.11 to my Debian etch4.0r3 server by backing up my old install ...
|
|||||||
| Ubuntu / Debian Discussion about Debian or Ubuntu Linux related problems. |
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
A few days ago, I tried migrating mailman 2.1.11 to my Debian etch4.0r3 server by backing up my old install of the mailman app and its data from my old Redhat server (where it had been running under python 2.4) and moving it to my new Debian server and extracting it from the archive in usr/local/mailman (where it would also be running under python 2.4!)
That approach was unsuccessful. I'm not sure why. It seemed to me that what I did should have worked fine. But for some reason it did not. So today I started over. I began by renaming the /usr/local/mailman directory (to protect its archives (data) directory contents from accidental deletion). Next, I checked to confirm the python version installed on my debian server was still okay using python -v from the shell prompt to verify that python ran and to confirm what version it was. Python came right up saying it was v2.4. So far so good... My third step was to download a fresh copy of mailman 2.1.1 from the gnu server and unzip and untar it into a fresh mailman directory where it would soon be installed. As soon as that was finished, I started gradually working through the setup process to install Mailman using the admin installation guide on gnu's mailman site. However, when I got to the step that told me to run ./configure, I did that and the configure immediately bitched that there was something wrong with the python installation and insisted python should be repaired before continuing. The interesting part is python 2.4 had been installed with aptitude but had not yet been used since because I hadn't needed it yet. It was installed specifically for the needs of this site and for the mailman application. So, I'm not sure how python got "damaged". Okay... now I was suddenly on a whole new trouble shooting path. What I did next was fire up aptitude and uninstall python 2.4 completely with the intent of reinstalling it immediately. Of course the way aptitude works it simultaneously removed a list of other apps and libraries that were no longer needed. When the uninstall completed, I turned around and reinstalled python and its docs along with a python runtime speedup tool named psyco all at once. Halfway through that install, aptitude informed me it was "now installing a new version of the linux kernel" (Oops! I hadn't ASKED for or authorized a kernel upgrade. So where the devil did it come from? I dunno!). In that 'informational notice', aptitude recommended that I should reboot the server immediately after the install was finished so the kernel upgrade and configuration process could be completed. Naturally, I followed those instructions to the letter, but when I went back and logged on to the system after the reboot I discovered that:
Thanks! Last edited by websissy; 11-10-2008 at 08:17 AM. |
| Sponsored Links | ||
|
|
|
||||
|
I suggest boot system into a single user mode and run fsck (check disk program) on all file system to get idea about the problem. When you boot, pay attention to scree for error messages.
__________________
Vivek Gite Linux Evangelist |
|
|||
|
Quote:
Okay... What you suggested is exactly what I had done while I awaited a reply. I contacted the server center and got KVM remote access. Then I rebooted the server under manual control and selected Grub's "single user mode" option at startup. What I got was this message during boot: Code:
Checking root file system...fsck 1.40-WIP (14-Nov-2006) /lib/init/rw/rootdev contains a file system with errors. check forced. /lib/init/rw/rootdev: modes that were part of a corrupted orphan linked list found. /lib/init/rw/rootdev: UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY. (i.e.. without -a or -p options) fsck died with exit status 4 failed. (code 4). * An automatic file system check (fsck) of the root filesystem failed. A manual fsck must be performed. then the system restarted. The fsck should be performed in maintenance mode with the root filesystem mounted in read-only mode. * The root filesystem is currently mounted in read-only node. A maintenance shell will now be started. After performing system maintenance, press CONTROL-D to terminate the maintenance shell and restart the system. Quote:
When fsck runs during boot-up it takes 3 or 4 minutes before it fails (it checks 59% of the drive before doing so). Yet when I run fsck manually from maintenance mode as instructed using this command: fsck -sAV -t ext3 /dev/sda1 -r or this command fsck -sAV -t ext3 /lib/init/rw/rootdev -r It instantly reports: Code:
Checking all file systems. [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -r /dev/sda1 e2fsck 1.40-WIP (14-Nov-2006) /dev/sda1: clean, 1054350/59572224 files, 13243893/119136024 blocks (check after next mount) Yet, when I reboot the errors are still there. So, can you tell me what I'm doing wrong here NOW? For the record, BTW, this is a Debian Etch4.0r3 system
|
|
|||
|
Since the primary "visible result" from my "read only" server is that the OLD version of the mailman directory from back before I started this whole process yesterday has "reappeared" and the NEW mailman directory seems to have vanished, I'm wondering if -- unbeknownst to me -- the system might have had that old mailman directory in use when I renamed it?
If that's the case and I'm forced to revert to the old mailman directory to fix this, that's okay. It would be less than an hour's work for me to get back to that point again. But so far, I haven't been able to get fsck to find and report the same problem when I run it manually as it reports when it runs at startup. I've clearly got to get past THAT issue first. |
|
|||
|
I finally got fsck to run using:
Code:
fsck /lib/init/rs/rootdev -r fsck ran and found errors which (not knowing what else to do) I told it to fix. It found perhaps 16 inode errors in pass 1 but things got worse from there. Again, not knowing what else to do, I kept telling it to fix the errors it found until it had finished. Until the last phase it gave me no clue what files were involved in the damage. In the last pass it reported 2 or 3 dozen errors in /var/spool/postfix files. At that point I thought, okay, so I've got some damaged email files no big deal... Again, not knowing what else to do, I told it to clone the mulitply-linked directory entries and create seperate chains for the files involved. It finally finished and requested a reboot. I did that. When I check for the /lib/init/rw/rootdev file it's gone now. It was apparently either removed by the fsck or removed during the last (clean) reboot... When I try another fsck it instantly reports the file system is "clean" now... However, when I tried to boot in multiuser mode and apache started, it reported segment faults... Relevant facts: 1. I have a dd full drive clone backup of this drive that was made a few nights ago (I believe Tuesday Oct 7) . I'm not sure exactly how to tell the precise date the backup was made, but from what I saw, the damaged files fsck reported on the server dated back as far as Sunday, October 5 (6 days ago). So it's possible the backup suffers from the same damage. Sadly, I have NO idea what caused this damage and at this point, I may never know. The backup file system hasn't been mounted since that backup was made; but I did check it a few minutes ago and it appears to be intact. However, when I checked for /backup/lib/init/rw/rootdev it does not exist there either. Q: Does this particular file system only exist at boot time? Is that the deal? 2. It's important to understand that except for the swap file ALL filesystems on this drive are in one partition. There's ONLY 1 partition on the drive because that's the way our server host configured the drive when they laid debian down and I never saw why we should change that until now. In short, all our sites and all installed apps reside on this partition. Can anyone suggest what to do next here? Should I 1. try to replace the contents of the damaged file systems with the same files from the backup? 2. something else... and if so, what? Thanks! |
|
||||
|
Good to know you have a backup.
Quote:
How do I test if my Linux server SCSI / SATA hard disk going bad? If there is no hard disk problem or file system problem, just restore data and move on. If server stores critical data you may consider RAID as an option to improve protection.
__________________
Vivek Gite Linux Evangelist |
|
|||
|
Yeah, well we had a primary hd full-drive-clone backup (via dd) on the secondary hd and several intermediate backups made this past week stored on the primary hd as well.
But apparently the primary hd's main file system was glitched in an undocumented server crash and restart last weekend. My guess is the server center lost power sometime Sunday night and their Ops just restarted their servers without doing any sort of fsck recovery. As a result once the journaling file system had "recovered" yesterday, we lost all our intermediate backups -- which had been stored as "tarballs" out in "no-man's-land" on the primary 500gb hd. The "journaled" recovery basically rolled us clear back to last Sunday night shortly before the system crash occurred and about 24 hours after the drive backup to the secondary was made. (sigh...) Unfortunately, this is a brand new server. So, althogh we had backups working, we didn't yet have the overnight FTPs of intermediate backups to a remote B/U drive here in our data-center running yet. But yeah we "recovered" in a manner of speaking if one doesn't consider a week's work lost a big deal. But I'm still getting intermittent segfaults from apache on that server which is in a datacenter 1500 miles away. ![]() I'll certainly try your suggestions for additional hd testing; but the way Apache has been crashing with random segfaults, I'm betting on a bad stick of memory in the server and not hd per se. Meanwhile the ops center at the server host have taken their much-in-demand kvm device off our machine and that means we no longer have console-op access to the server for a few hours. Thanks a lot for the insights, tips, thoughts and suggestions. Your advice has been valuable and much appreciated. This is one of "those situations" where there's no one around here with the tech competence to diagnose a problem of this nature or for me to discuss this with except my wife and the cat and neither of them is very helpful in situations like this. Wish us luck. This battle isn't over yet... Last edited by websissy; 12-10-2008 at 07:46 PM. |
![]() |
| Tags |
| read only file system python |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) |
|
| Thread Tools | |
| Display Modes | |
|
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Linux kernel deleted from /boot - how do I resinstall the kernel | vaibhav.kanchan | Getting started tutorials | 2 | 25-03-2008 12:31 AM |
| File System Error During Boot | svoltmer | Ubuntu / Debian | 3 | 21-02-2008 01:50 PM |
| Read arguments from a file and pass them to binary file | AHJ | Shell scripting | 1 | 31-10-2007 06:04 PM |
| problem with linux-2.6.12 kernel at boot time | khadar | Getting started tutorials | 0 | 12-09-2007 03:38 PM |
| Script file to be run when linux boot | adhikari.rohit | Shell scripting | 7 | 22-05-2007 07:59 PM |