Help my SCSI is cursed!

Forum: LXer Meta ForumTotal Replies: 21
Author Content
techiem2

Mar 18, 2009
4:48 PM EDT
So I've got this nice Dell PowerEdge 2500SC that I've been working on setting up as a new router/server for the church/school, and recently it started having issues (like taking forever to boot). I figured out today that it's not reading one of the drives. After messing around and rebooting bunches, I finally figured out that one of the disks is getting the same ID as the backplane instead of the ID it SHOULD be using. Obviously this causes that disk to be unusable. Does anyone have experience with a similar server that could point me in the direction to fix it? The drives are all in removable carriers that plug into the backplane.

Thanks guys/gals!

Sander_Marechal

Mar 18, 2009
4:58 PM EDT
Eh, hardware glitch? Can you swap the disks around in their slots and not destroy the data on it or will that mess up your RAID setup? If you can get away with it, I'd try that first. If it's a RAID unit you could also try to replace the disk with the wrong ID and rebuild the array. Then simply use the disk you took out somewhere else where it's ID doesn't collide.
techiem2

Mar 18, 2009
5:14 PM EDT
Ok. I just pulled that drive and move the rest up a slot and they all got the correct IDs according to their slot. Once I see if that did anything odd to my install, I'll stick the odd drive in the last slot and see if it gets the correct ID there or if it stills tries to use the same as the backplane.
Sander_Marechal

Mar 18, 2009
5:20 PM EDT
If it still gets the backplane ID, stick the disk in a different machine with a different backplane and see what ID it gets then. It's own ID? The faulty ID it got before? The ID of the backplane it's connected to now?
techiem2

Mar 18, 2009
5:31 PM EDT
I stuck it in the last slot and it got the appropriate ID. So apparently some firmware glitch locked the wrong ID to that drive in that slot? *shrug*

Now I just gotta get that particular array to rebuild.

So how do I go about telling it to rebuild the array? I get errors and it won't activate the array (sort of understandable as the disks have change locations and such). On boot dmesg gives me:

md: Autodetecting RAID arrays. md: Scanned 8 and added 8 devices. md: autorun ... md: considering sdf1 ... md: adding sdf1 ... md: adding sde1 ... md: adding sdd1 ... md: adding sdc1 ... md: sdb2 has different UUID to sdf1 md: sdb1 has different UUID to sdf1 md: sda2 has different UUID to sdf1 md: sda1 has different UUID to sdf1 md: created md2 md: bind md: bind md: bind md: bind md: running: md: kicking non-fresh sdf1 from array! md: unbind md: export_rdev(sdf1) md: kicking non-fresh sde1 from array! md: unbind md: export_rdev(sde1) raid5: device sdd1 operational as raid disk 2 raid5: device sdc1 operational as raid disk 1 raid5: not enough operational devices for md2 (2/4 failed) RAID5 conf printout:
techiem2

Mar 18, 2009
5:52 PM EDT
hmm...Do I just have to remake the array since 2 of the disks are in a different order now, one of them being the first disk of the array?
Sander_Marechal

Mar 18, 2009
5:57 PM EDT
I believe it should do that by itself, so I don't know what's going wrong here. After all, that's why they use UUIDs instead of device names these days.

Can you post your mdadm.conf?

I suggest you ask in the IRC channel of your distro (there seems to be no dedicated IRC channel for mdadm)
techiem2

Mar 18, 2009
6:05 PM EDT
That's what I thought. hehe.

Here's my mdadm.conf md2 is the offending array.

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=8be6ea05:0b6c381a:81b73605:e007ee08 ARRAY /dev/md1 level=raid1 num-devices=2 UUID=14d1f74f:f42d9ee5:7800f9f3:1700b32b ARRAY /dev/md2 level=raid5 num-devices=4 spares=1 UUID=4ac10eab:238ef132:e230c6e1:39b18b05
techiem2

Mar 18, 2009
6:13 PM EDT
Well, I decided to just remake the array. It just the fileserver portion of the server, so there really wasn't anything on it yet besides a few blank directories for the sharing setup. Not sure why it wouldn't just recover automatically....seems odd to me. *shrug*
Sander_Marechal

Mar 18, 2009
6:18 PM EDT
I do suggest you find out anyway, just in case this happens again when there *is* important data on the drives.
techiem2

Mar 18, 2009
6:20 PM EDT
Yeah...that would be a good idea. :)

Thanks for the help as always. :)
Sander_Marechal

Mar 18, 2009
6:22 PM EDT
You're welcome :-)
NoDough

Mar 18, 2009
8:33 PM EDT
Lemme guess. Has it got a PERC RAID controller?
caitlyn

Mar 18, 2009
8:47 PM EDT
If NoDough is right you need the PERC driver (proprietary) from Dell. I realize that I used the evil proprietary word but is the only way to make their PERC controllers work consistently correctly IME.
techiem2

Mar 18, 2009
8:48 PM EDT
Adaptec AIC-7899P controllers according to lspci.
techiem2

Mar 18, 2009
8:50 PM EDT
The box was working quite fine until thing randomly went screwy. I'm really not sure why or what happened. Maybe one of the power surges confused the SCSI bios or something? *shrug*
jdixon

Mar 19, 2009
5:28 AM EDT
> Maybe one of the power surges confused the SCSI bios or something?

If you're going to use it as a file server, it should probably be on an UPS.
techiem2

Mar 19, 2009
12:22 PM EDT
Yeah, I know. It's actually going to be a router/firewall/mini fileserver for the church/school. I've mentioned several times that the important machines there really should be on UPSs....but....small church...no money...etc. :(
Sander_Marechal

Mar 19, 2009
12:40 PM EDT
You can buy a second-hand UPS for as little as $100. If you ask around at businesses I am sure you can find one that you can have for next-to-nothing (it's amazing how much equipment sits unused at businesses around the world). Surely they have the money for the power bill?
techiem2

Mar 19, 2009
12:49 PM EDT
Maybe I should bug higher up the chain...i.e. dad (the Pastor).... Considering we do powerpoint from 2 machines (we could make due with one, but things are soo much nicer with two), icecast stream from another, recording of the service to another, the library system (fairly new) on another machine....those machines plus the router should really be protected. hmm. Maybe I should make a list of the machines and what they do and approximate cost of a tolerable UPS for each and give it to him.... *fires up openoffice*
jdixon

Mar 19, 2009
2:02 PM EDT
A 350 VA UPS from APC used to be around $40 from Staples/Walmart. A 500 VA unit was around $60.
techiem2

Mar 19, 2009
2:13 PM EDT
Yeah, I was thinking a 450 or 500 would be fine for most machines. I've got 3 servers and 2 switches and a few other lan components humming away on a 1350. Given, it only has about 5 min of batt life, but that's plenty of time to shut them down if it looks like the power's going to be out for a while. :)

Posting in this forum is limited to members of the group: [Editors, MEMBERS, SITEADMINS.]

Becoming a member of LXer is easy and free. Join Us!