Kicking Udev into re-evaluating disk dev entries?

Forum: LinuxTotal Replies: 15
Author Content
techiem2

Mar 14, 2010
10:21 PM EDT
I have on drive on my vm server that likes to drop offline and reset itself every couple months or so. it is normally /dev/sdc, but when it drops and comes back it is re-detected as /dev/sde...

Is there a way to kick udev/kernel into re-evaluating the devices so that disk will return to sdc so I can resync the array without having to reboot the machine to reset the dev entries so the array can resync?

I know I need to just replace that disk at some point with one that's more reliable, but I'd like to find a way to fix the problem quickly until I can do that. Because shutting down the vms and rebooting the server is rather annoying. :P And when the drive goes offline the array performance drops significantly. Like I'm currently copying a file from the fileserver vm to another machine over nfs and it's poking along at 712KB/s....

gus3

Mar 14, 2010
11:00 PM EDT
My first question is, are you sure it's the drive and not the bus it's connected to?

There's little so irritating as shelling out cash for a fix, only to find out that wasn't where the problem was.
techiem2

Mar 15, 2010
12:17 AM EDT
There's 2 disks on the mobo and 2 disks on the external card. That's the only disk that has any problems, so I'm pretty sure it's that disk.
gus3

Mar 15, 2010
1:29 AM EDT
Do these drive pairs share controllers/cables, or does each have its own dedicated controller?

Also, in case it's corrosion or a thermally affected intermittent, have you re-seated the cable on the problem drive?
techiem2

Mar 15, 2010
1:34 AM EDT
All sata disks. 2 using onboard motherboard ports, 2 using a promise controller card.

I don't think I've tried re-seating the cable. The problem drive will drop offline then come right back, at which point the raid array is broken (obviously) and the system re-detects the disk at the next available dev entry. It seems to happen every couple months. All disks are identical models.
Sander_Marechal

Mar 15, 2010
4:32 AM EDT
Instead of worrying about udev, just change your mdadm configuration to use the disk UUID instead. On most Debian derivatives you can use /dev/disk/by-id/[UUID]. Have a look around /dev if you don't use a Debian derivative. The UUID is always unique and always the same for a disk, no matter what the device name is.

Change your mdadm commands to match. E.g:

mdadm /dev/md0 -A 
/dev/disk/by-id/[UUID]#partition ...


Even better, don't use disks at all. RAID arrays built with mdadm write a superblock to the device which mdadm can use to automatically assemble the arrays. This superblock contains an UUID for the entire array. You can use this in your mdadm configuration. Here is my mdadm.conf. Note that there isn't a single device name in it.

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions

# auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system HOMEHOST

# instruct the monitoring daemon where to send mail alerts MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on # Mon, 14 Jan 2008 16:41:49 +0100

# Note: all on one line ARRAY /dev/md/0 metadata=0.90 UUID=4365dc18:2d73c384:7f610a2b:5d4077cb
ComputerBob

Mar 15, 2010
9:16 AM EDT
@ Sander -- your code appears to have broken the forum's layout.
gus3

Mar 15, 2010
9:28 AM EDT
@CB:

In Seamonkey+Firefox menus:

View -> ..Page Style -> ....No Page Style

Until Sander gets off his lazy duff and fixes it!
techiem2

Mar 15, 2010
9:48 AM EDT
Well, I added the arrays to mdadm.conf using the uuids and rebooted the box. The arrays are still showing the devs and I had to re-add the dropped disk (as usual). I don't know if my server actually looked at the mdadm.conf or not...since it is doing raid on boot...and there was nothing in my mdadm.conf before. I guess I just wait and see what happens.
Sander_Marechal

Mar 15, 2010
10:02 AM EDT
@techiem2: Of course the disk will still get dropped. But you don't need to manually re-add it. You should be able to just refresh mdadm and it will pick up the superblock and UUID from the new device.

You could automate this away somehow with a udev rule to initiate the refresh when a new device is hooked up. But perhaps it is better to simply have mdadm mail you when a drive disappears so you can take action yourself. That way you can keep an eye on it and see if any of the other advices about e.g. re-seating helps.

@ComputerBob: I'm not seeing anything. What broke?
gus3

Mar 15, 2010
10:46 AM EDT
@Sander:

Long config file lines Lengthened margins, but clipping Chops off right-side text.

;-) ;-) ;-)
techiem2

Mar 15, 2010
1:44 PM EDT
Thanks. I'll give it a try and see what happens. How do I go about refreshing mdadm to have it recheck?

Sander_Marechal

Mar 15, 2010
7:19 PM EDT
Try:

mdadm --assemble --scan


or:

mdadm --assemble --scan --auto


When you don't specify any devices it should look them up in mdadm.conf. The --scan makes it rescan the superblocks of the devices. --auto makes sure that it created new md devices as they come online. It's the same as the auto=yes option in the CREATE statement in mdadm.conf, so you can usually skip this.
techiem2

Mar 15, 2010
7:50 PM EDT
aaah cool.
techiem2

Mar 16, 2010
8:38 AM EDT
I don't know if it would have any bearing on this odd issue or not, but I just noticed that apparently the on board SATA controller is running in IDE mode rather than AHCI mode.

Quoting: 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode]


I guess I should reboot the server again when I get home and check the BIOS settings... If this board behaves anything like the machines in the lab here I've tested on when imaging, there could be a fairly significant difference in disk performance by running the controller in IDE rather than AHCI mode like it should be.

techiem2

Apr 05, 2010
8:25 AM EDT
Ok, so the drive reset again.

Here's the problem: To rebuild the array the failed disk needs to be removed. But mdadm won't remove the failed disk because the dev entries for it no longer exist (duh..).

Quoting: legion ~ # mdadm /dev/md1 --manage --remove /dev/sdd2 mdadm: cannot find /dev/sdd2: No such file or directory


dmesg output from trying to run the auto scan (sde is the new entry for the flaky drive):

Quoting: md: Autodetecting RAID arrays. md: Scanned 3 and added 3 devices. md: autorun ... md: considering sde3 ... md: adding sde3 ... md: sde2 has different UUID to sde3 md: sde1 has different UUID to sde3 md: md3 already running, cannot run sde3 md: export_rdev(sde3) md: considering sde2 ... md: adding sde2 ... md: sde1 has different UUID to sde2 md: md1 already running, cannot run sde2 md: export_rdev(sde2) md: considering sde1 ... md: adding sde1 ... md: md0 already running, cannot run sde1 md: export_rdev(sde1) md: ... autorun DONE.

You cannot post until you login.