Couple of days ago I experienced my openSUSE 11.3/KDE freezing completely (3 times). Mouse and keyboard didn't react on anything and couldn't login via ssh. Pushing the button was the only solution. In the logs I noticed netatalk and smartd were spitting some errors. Although netatalk was running fine I decided to just remove it for now to make sure it's not related.
The errors I'm worried about are from smartd spitting out every half hour: pastebin.com/NNZ79usx
Googling this errors I *think* I can savely ignore them because /dev/sda is a OCZ Vertex Turbo SSD and I have read that smart was not supported on those (possibly because of Firmware 1.0) However, after firmware upgrade from 1.0 to 1.5 and a fresh install of openSUSE I am still getting these errors:
Except for the 3 times that the PC freezed it is running smooth and fast. No corrupted files of which I noticed. I rebooted in a live Parted Magic USB stick and checked fsck but nothing strange.Code:# grep uncorrectable /var/log/messages Aug 17 23:47:20 karin smartd[2544]: Device: /dev/sda [SAT], 19419506 Offline uncorrectable sectors (changed +650105) Aug 18 00:12:58 karin smartd[2519]: Device: /dev/sda [SAT], 20401475 Offline uncorrectable sectors (changed +981969) Aug 18 00:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 20879899 Offline uncorrectable sectors (changed +478424) Aug 18 01:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 20879899 Offline uncorrectable sectors Aug 18 01:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 20879899 Offline uncorrectable sectors Aug 18 02:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 20880267 Offline uncorrectable sectors (changed +368) Aug 18 02:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 20886699 Offline uncorrectable sectors (changed +6432) Aug 18 03:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 20903139 Offline uncorrectable sectors (changed +16440) Aug 18 03:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 20913379 Offline uncorrectable sectors (changed +10240) Aug 18 04:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 20920771 Offline uncorrectable sectors (changed +7392) Aug 18 04:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 20927563 Offline uncorrectable sectors (changed +6792) Aug 18 05:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 20941667 Offline uncorrectable sectors (changed +14104) Aug 18 05:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 20949867 Offline uncorrectable sectors (changed +8200) Aug 18 06:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 20949867 Offline uncorrectable sectors Aug 18 06:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 20949867 Offline uncorrectable sectors Aug 18 07:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 20949875 Offline uncorrectable sectors (changed +8) Aug 18 07:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 20949875 Offline uncorrectable sectors Aug 18 08:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 20949875 Offline uncorrectable sectors Aug 18 08:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 21113315 Offline uncorrectable sectors (changed +163440) Aug 18 09:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 21286611 Offline uncorrectable sectors (changed +173296) Aug 18 09:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 21286619 Offline uncorrectable sectors (changed +8) Aug 18 10:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 21571339 Offline uncorrectable sectors (changed +284720) Aug 18 10:42:58 karin smartd[2631]: Device: /dev/sda [SAT], 21572683 Offline uncorrectable sectors (changed +1344) Aug 18 11:12:58 karin smartd[2631]: Device: /dev/sda [SAT], 21573963 Offline uncorrectable sectors (changed +1280)
Don't know if the freezes has anything to do with these errors and I don't know if these errors are even accurate due to the support of smartd on SSD disks. For now I guess I just have to wait and see if the system will freeze again.
Some questions:
- What should I do with the smartd messages, what does it mean ?
- Can I savely ignore them ?
- Why is smart even spitting this errros if it doesn't support SSD ?
- Is my SSD dying ?
- Is there a way to monitor and 'capture' dmesg on the moment it freezes without ability to login ?
Some extra info:
AMD Phenom II X4 955
RAM 8 GB
ATI HD 3300 (Motherboard Integrated)
Thank you for your timeCode:# smartctl -a /dev/sda smartctl 5.39.1 2010-01-28 r3054 [x86_64-unknown-linux-gnu] (openSUSE RPM) === START OF INFORMATION SECTION === Model Family: OCZ Vertex SSD Device Model: OCZ VERTEX-TURBO Serial Number: 6487P9QPAUMNRJAB7042 Firmware Version: 1.5 User Capacity: 32,017,047,552 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Aug 18 11:49:11 2010 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x1d) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x00) Error logging NOT supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 0) minutes. Extended self-test routine recommended polling time: ( 0) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x0000 012 000 000 Old_age Offline - 0 9 Power_On_Hours 0x0000 --- --- --- Old_age Offline - 18 12 Power_Cycle_Count 0x0000 --- --- --- Old_age Offline - 2 184 Initial_Bad_Block_Count 0x0000 --- --- --- Old_age Offline - 2 195 Program_Failure_Blk_Ct 0x0000 --- --- --- Old_age Offline - 0 196 Erase_Failure_Blk_Ct 0x0000 --- --- --- Old_age Offline - 0 197 Read_Failure_Blk_Ct 0x0000 --- --- --- Old_age Offline - 0 198 Read_Sectors_Tot_Ct 0x0000 --- --- --- Old_age Offline - 21577123 199 Write_Sectors_Tot_Ct 0x0000 --- --- --- Old_age Offline - 35413337 200 Read_Commands_Tot_Ct 0x0000 --- --- --- Old_age Offline - 245801 201 Write_Commands_Tot_Ct 0x0000 --- --- --- Old_age Offline - 188367 202 Error_Bits_Flash_Tot_Ct 0x0000 --- --- --- Old_age Offline - 202 203 Corr_Read_Errors_Tot_Ct 0x0000 --- --- --- Old_age Offline - 202 204 Bad_Block_Full_Flag 0x0000 --- --- --- Old_age Offline - 0 205 Max_PE_Count_Spec 0x0000 --- --- --- Old_age Offline - 10000 206 Min_Erase_Count 0x0000 --- --- --- Old_age Offline - 1 207 Max_Erase_Count 0x0000 --- --- --- Old_age Offline - 263 208 Average_Erase_Count 0x0000 --- --- --- Old_age Offline - 7 209 Remaining_Lifetime_Perc 0x0000 --- --- --- Old_age Offline - 100 211 Unknown_Attribute 0x0000 000 000 000 Old_age Offline - 0 212 Unknown_Attribute 0x0000 000 000 000 Old_age Offline - 0 213 Unknown_Attribute 0x0000 000 000 000 Old_age Offline - 0 Warning: device does not support Error Logging Warning! SMART ATA Error Log Structure error: invalid SMART checksum. SMART Error Log Version: 1 No Errors Logged Warning! SMART Self-Test Log Structure error: invalid SMART checksum. SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Device does not support Selective Self Tests/Logging
P.S.
Not sure if this 'problem' has anything to do with openSUSE but since I run openSUSE I thought I give it a try here. Please feel free to move it to the right section.

Reply With Quote