Getting SSD wear information from behind MegaRAID SAS controllers on Linux

Just sending this information out into the world it case it helps somebody who is Googling for answers.

I’ve been experimenting with using SSDs on one of our slave database servers. The drives are hotplugged as usual and I can’t access them directly in Linux because they are fronted by an LSI MegaRAID SAS/SATA controller. In theory, the current version of smartmontools from SVN can get S.M.A.R.T. data from behind these controllers but it doesn’t work on many, including my LSI 9260-8i and LSI 8888.

Here is a kernel patch that will expose all of the MegaRAID drives as read-only, raw /dev/sgX devices so that you can query them with utilities such as smartctl from smartmontools. I have no idea if it is dangerous – notice the comment. You probably shouldn’t do this :)

In drivers/scsi/megaraid/megaraid_sas.c#megasas_slave_configure:

/*
 * Don't export physical disk devices to the disk driver.
 * 
 * FIXME: Currently we don't export them to the midlayer at all.
 *        That will be fixed once LSI engineers have audited the
 *        firmware for possible issues.
 */

- if (sdev->channel < MEGASAS_MAX_PD_CHANNELS && sdev->type == TYPE_DISK) - return -ENXIO;

+ if (sdev->channel < MEGASAS_MAX_PD_CHANNELS && sdev->type == TYPE_DISK) { + sdev->no_uld_attach = 1; + sdev->writeable = 0; + }

After that, I can run smartctl -a /dev/sg5 (in my case) to get SMART information from my Intel SSDs. This drive isn’t in the database yet so I had to look up the unknown attributes. 233 is the “media wearout indicator” and 232 is the available reserved space.

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH 
  3 Spin_Up_Time            0x0020   100   100   000    Old_age
  4 Start_Stop_Count        0x0030   100   100   000    Old_age
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age
  9 Power_On_Hours          0x0032   100   100   000    Old_age
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
225 Load_Cycle_Count        0x0030   200   200   000    Old_age
226 Load-in_Time            0x0032   100   100   000    Old_age
227 Torq-amp_Count          0x0032   100   100   000    Old_age
228 Power-off_Retract_Count 0x0032   100   100   000    Old_age
232 Unknown_Attribute       0x0033   100   100   010    Pre-fail
233 Unknown_Attribute       0x0032   099   099   000    Old_age
184 Unknown_Attribute       0x0033   100   100   099    Pre-fail

Comment (1)

  1. Giovanni wrote:

    Any idea why they wouldn’t allow direct access to physical disks ?

    On Solaris I can’t easily map a logical device (c0t0d0) to a physical slot easily and I was hoping directly access would help, instead of going through a RAID 0 volume.

    Have you tried writeable access ? Have you tried without creating RAID volumes on top of your disks ?

    Monday, January 4, 2010 at 10:22 am #

Trackback/Pingback (1)

  1. links for 2010-03-08 « Donghai Ma on Monday, March 8, 2010 at 10:58 pm

    [...] Getting SSD wear information from behind MegaRAID SAS controllers on Linux — Code Monkey Islan… (tags: raid storage linux drivers) [...]