FreeNAS Alert
While away on holiday I recieved the following email:
FreeNAS: Critical Alerts Device: /dev/ada2, 1 Currently unreadable (pending) sectors
Fortunately having OpenVPN set up I was able to VPN into my home network to do some investigation.
The FreeNAS web interface was showing the red alert button due to the error but unfortunately did not show much detail. I connected to my FreeNAS server via ssh to look into the issue further.
Running smartctl -a /dev/ada2
showed that there was indeed 1 pending sector on the drive.
After some reading I found that this issue can occur if a drive fails to read a sector but the sector will not be reallocated until a write attempt fails.
In an effort to reduce disk activity while I figured out what to do I stopped all plugins and jails and made sure that no tasks were scheduled to run.
I also decided to temporarily remove the drive from the pool while working on it although I now suspect that this was unnecessary. After some googling I identified the disk and removed it from the pool as follows:
This did technically degrade the array and I think if I have to repeat this procedure in the future I would be best to avoid this, especially in scenarios with single disk redundancy.
I then started a long S.M.A.R.T test to get some more information:
Smartctl reported that this would take 417 minutes to complete so I decided I would resume looking into the issue the next day.
Once the test completed I ran smartctl -a /dev/ada2
but although the “Current Pending Sector count” was still 1 the test had completed successfully and was not showing any errors.
This was rather irritating as I was hoping to find the location of the sector that was causing the issue.
After some reading I ran the following which revealed the sector:
Now knowing that the “Logical Block Address” of the erroring sector was 5206288520 I set a kernel option to allow direct access to the disk:
As I knew that the disk has 4K sectors I ran the following to write zeros to this sector:
This resulted in an input/output error which I found quite confusing. After further reading I found that LBA (for historic reasons) always specifies sectors in 512 bytes.
I adjusted my command as follows:
As I no longer needed direct access to the disk I reset the kernel flag and started a short smart test:
This completed in 3 minutes and running smartctl -a /dev/ada2
showed a pending sector count of 0. Strangely in my case the reallocated sector count did not increase so I suspect that the drive was able to recover the sector.
I then added the drive back to the pool and started a scrub as follows:
After waiting approximately 9 hours the scrub completed and was able to repair the zeroed sectors:
As the URE was caused by my overwriting of a sector I cleared the error with a zpool clear
.
The Alert light went back to green in the webUI and I am fairly confident that this has resolved the issue.
Many thanks to Dan Smith whose blog post was of great help. Also see the FreeBSD Diary for further information.