Rocksolid Light

News from da outaworlds

mail  files  register  groups  login

Message-ID:  

You will forget that you ever knew me.


comp / comp.unix.bsd.freebsd.misc / Re: errors using new Samsung 870 EVO SSD

SubjectAuthor
* errors using new Samsung 870 EVO SSDWinston
+- Re: errors using new Samsung 870 EVO SSDMarco Moock
+* Re: errors using new Samsung 870 EVO SSDWinston
|`* Re: errors using new Samsung 870 EVO SSDSteven G. Kargl
| `* Re: errors using new Samsung 870 EVO SSDWinston
|  `- Re: errors using new Samsung 870 EVO SSDWinston
`* Re: errors using new Samsung 870 EVO SSDMatthias Meyser
 `- Re: errors using new Samsung 870 EVO SSDDetlef Sax

1
Subject: errors using new Samsung 870 EVO SSD
From: Winston
Newsgroups: comp.unix.bsd.freebsd.misc
Organization: A noiseless patient Spider
Date: Wed, 21 Aug 2024 08:47 UTC
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: wbe@UBEBLOCK.psr.com.invalid (Winston)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: errors using new Samsung 870 EVO SSD
Date: Wed, 21 Aug 2024 04:47:33 -0400
Organization: A noiseless patient Spider
Lines: 110
Message-ID: <ydo75mb6u2.fsf@UBEblock.psr.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Wed, 21 Aug 2024 10:47:37 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b1d451a92e46347df12f2552fca32cf2";
logging-data="4004338"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/J/bS1S12iARNecZ/MSiys"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:YFfHbwHWyzVppZc6QerRfoZjzfc=
sha1:6KCiV6l5JJAkAaBXH4uxnLrAhuM=
Mail-Copies-To: never
View all headers

This is the first time I've used a solid state drive. It looks like
there's some kind of compatibility or interface problem. The output
from smartctl -x also points to some kind of interface problem.

ZFS counted about 180 write errors while resilvering ~80GB. Most seemed
to be retryable and succeeded on the second try (see logs below).

The system is using AMD-AHCI, not IDE.
The SATA interface is running at 3.0Gb/s, half the SSD's 6.0Gb/s speed.
Temperature is fine (~29C).

So far, the errors only occur during heavy activity: write errors during
resilvering, and 2 read errors later during a brief burst of read
activity.

[Note: I swapped the SATA cables: that's why ada1 during resilvering
became ada0 later. Since I swapped cables at the drive end, not the
motherboard end, I think it unlikely to be a cable/bad connection problem.]

Any ideas what the problem might be? Thanks,
-WBE
----------
[read error log entries:] [mildly edited]

Aug 21 03:01:24: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 ff 64 40 13 00 00 00 00 00
Aug 21 03:01:24: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed
Aug 21 03:01:24: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
Aug 21 03:01:25: ahcich0: Timeout on slot 9 port 0
Aug 21 03:01:25: ahcich0: is 04000000 cs 00000200 ss 00000000 rs 00000200 tfd 451 serr 00400000 cmd 0000e917
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 58 00 65 40 13 00 00 00 00 00
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 c0 36 65 40 13 00 00 00 00 00
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): ATA status: 00 ()
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 c8 ff 64 40 13 00 00 00 00 00
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): ATA status: 00 ()
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
Aug 21 03:01:25 crystal ZFS[1332]: vdev I/O failure, zpool=zp path=/dev/ada0p3 offset=149417648128 size=4096 error=5
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 38 88 e4 80 40 13 00 00 00 00 00
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 10 48 29 40 05 00 00 00 00 00
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 c0 e4 80 40 13 00 00 00 00 00
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain.

[end of read error log entries]
----------
[typical write errors during resilvering:] [mildly edited]

Aug 21 00:33:01: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 e0 b2 e7 40 02 00 00 00 00 00
Aug 21 00:33:01: (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
Aug 21 00:33:01: (ada1:ahcich1:0:0:0): Error 5, Unretryable error
Aug 21 00:33:02: ahcich1: Timeout on slot 19 port 0
Aug 21 00:33:02: ahcich1: is 04000000 cs 00080000 ss 00000000 rs 00080000 tfd 451 serr 00400000 cmd 0000f317
Aug 21 00:33:02: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 48 f0 b2 e7 40 02 00 00 00 00 00
Aug 21 00:33:02: (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
Aug 21 00:33:02: (ada1:ahcich1:0:0:0): Error 5, Unretryable error
Aug 21 00:33:02 crystal ZFS[1322]: vdev I/O failure, zpool=zp path=/dev/ada1p3 offset=7774244864 size=36864 error=5
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 78 2e f4 40 02 00 00 00 00 00
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 58 2e f4 40 02 00 00 00 00 00
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain

[end of write error log entries]
----------

Here's smartctl -x output, keeping only what looked "interesting"/relevant:

SATA Version is: SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)

199 CRC_Error_Count -OSRCK 099 099 000 - 64
235 POR_Recovery_Count -O--C- 099 099 000 - 7
241 Total_LBAs_Written -O--CK 099 099 000 - 213396105

0x06 0x018 4 64 --- Number of Interface CRC Errors

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 2 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 65535+ R_ERR response for non-data FIS
0x0006 2 65535+ R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 5 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 5 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 65535+ Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x0013 2 65535+ R_ERR response for host-to-device non-data FIS, non-CRC

SCT Error Recovery Control:
Read: Disabled
Write: Disabled
----------
[END] [Thanks for reading.]

Subject: Re: errors using new Samsung 870 EVO SSD
From: Marco Moock
Newsgroups: comp.unix.bsd.freebsd.misc
Organization: A noiseless patient Spider
Date: Thu, 22 Aug 2024 18:46 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: mm+usenet-es@dorfdsl.de (Marco Moock)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: errors using new Samsung 870 EVO SSD
Date: Thu, 22 Aug 2024 20:46:52 +0200
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <va812t$hj07$2@dont-email.me>
References: <ydo75mb6u2.fsf@UBEblock.psr.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 22 Aug 2024 20:46:53 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f74df63084f3f3419b5615d72618a802";
logging-data="576519"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18cAaARby+al6djVVDQqZ8k"
Cancel-Lock: sha1:3uNJNk6FbUyrgnNAX9LrsjynTG4=
View all headers

On 21.08.2024 um 04:47 Uhr Winston wrote:

> ZFS counted about 180 write errors while resilvering ~80GB. Most
> seemed to be retryable and succeeded on the second try (see logs
> below).

That means a serious fault. I assume the SSD you have is broken.
You may test it with badblocks or such, but be aware SSD do block
remapping in a massive way.

TLDR: Get another SSD.

--
kind regards
Marco

Send spam to 1724208453muell@cartoonies.org

Subject: Re: errors using new Samsung 870 EVO SSD
From: Winston
Newsgroups: comp.unix.bsd.freebsd.misc
Organization: A noiseless patient Spider
Date: Sun, 15 Sep 2024 06:40 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: wbe@UBEBLOCK.psr.com.invalid (Winston)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: errors using new Samsung 870 EVO SSD
Date: Sun, 15 Sep 2024 02:40:44 -0400
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <yded5lju6r.fsf@UBEblock.psr.com>
References: <ydo75mb6u2.fsf@UBEblock.psr.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Sun, 15 Sep 2024 08:40:41 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a80ae9c2429e29564dcdd9e5b5415a34";
logging-data="2147391"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/AMb3PdbivTW7oRFGFr1Ju"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:8+Lj4zTTsvzS+8Nre1MQK1RBOxo=
sha1:IIcxk+ILkfB+iqI+zUU4Uld44WE=
Mail-Copies-To: never
View all headers

I previously wrote (in part):
> It looks like there's some kind of compatibility or interface problem.

> Aug 21 03:01:24: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed
> Aug 21 03:01:24: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
...
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
...
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
> ----------
> Here's smartctl -x output, keeping only what looked "interesting"/relevant:
...
> 0x06 0x018 4 64 --- Number of Interface CRC Errors

Careful reading of the error messages and check of smartctl info
indicated to me some kind of data/interface/compatibility error, not a
read/write problem in the SSD.

Google search turned up various Samsung 860 and 870 EVO articles.

What I've determined:

* It's not a bad data cable:

Some of the articles suggested the problem might be a bad cable.
I ordered two new ones (SATA III).
Result: no improvement.
That's as I expected, since I was pretty sure my cables were good,
but it was cheap to try.

* Fix 1:

Based on the articles I read, this problem with Samsung 870 EVO and
860 EVO SSDs appears to affect not just FreeBSD, but *BSD and at
least some Linuxes.

The solution/workaround (at least for FreeBSD) is to disable command
queueing ("camcontrol negotiate $theSSD -T disable").

* Fix 2:

Connect the SSD with a USB-to-SATA adapter cable.
I found such a cable in stock at Best Buy for $12.
Perhaps this works because there's no command queueing over USB.

As a side note, instead of turning off tagged queueing, I also tried
reducing the number of tags from 32 to 2. Didn't help: the errors
continued to happen.

Maybe some day Samsung will come out with new firmware that fixes this
problem.

Is this something I should post to bugzilla (it's not a FreeBSD bug,
though) or to some FreeBSD forum (which one)? Google no longer gets
USENET, so I don't expect this article would be found by Google search.

HTH,
-WBE

Subject: Re: errors using new Samsung 870 EVO SSD
From: Steven G. Kargl
Newsgroups: comp.unix.bsd.freebsd.misc
Organization: A noiseless patient Spider
Date: Sun, 15 Sep 2024 15:03 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: sgk@REMOVEtroutmask.apl.washington.edu (Steven G. Kargl)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: errors using new Samsung 870 EVO SSD
Date: Sun, 15 Sep 2024 15:03:50 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <vc6t0m$27os5$1@dont-email.me>
References: <ydo75mb6u2.fsf@UBEblock.psr.com>
<yded5lju6r.fsf@UBEblock.psr.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 15 Sep 2024 17:03:50 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="03b0b85208b40622ac9bb14c3194b6b4";
logging-data="2352005"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+UyfV2p4tnuWDcc2bLzTXP"
User-Agent: Pan/0.145 (Duplicitous mercenary valetism; d7e168a
git.gnome.org/pan2)
Cancel-Lock: sha1:U0of5B3My7wgxVz9C6w5kPp/QFE=
View all headers

On Sun, 15 Sep 2024 02:40:44 -0400, Winston wrote:

> I previously wrote (in part):
>> It looks like there's some kind of compatibility or interface problem.
>
>> Aug 21 03:01:24: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed
>> Aug 21 03:01:24: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
> ...
>> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
> ...
>> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
>> ----------
>> Here's smartctl -x output, keeping only what looked "interesting"/relevant:
> ...
>> 0x06 0x018 4 64 --- Number of Interface CRC Errors
>
> Careful reading of the error messages and check of smartctl info
> indicated to me some kind of data/interface/compatibility error, not a
> read/write problem in the SSD.
>
> Google search turned up various Samsung 860 and 870 EVO articles.
>
> What I've determined:
>
> * It's not a bad data cable:
>
> Some of the articles suggested the problem might be a bad cable.
> I ordered two new ones (SATA III).
> Result: no improvement.
> That's as I expected, since I was pretty sure my cables were good,
> but it was cheap to try.
>
> * Fix 1:
>
> Based on the articles I read, this problem with Samsung 870 EVO and
> 860 EVO SSDs appears to affect not just FreeBSD, but *BSD and at
> least some Linuxes.
>
> The solution/workaround (at least for FreeBSD) is to disable command
> queueing ("camcontrol negotiate $theSSD -T disable").
>
> * Fix 2:
>
> Connect the SSD with a USB-to-SATA adapter cable.
> I found such a cable in stock at Best Buy for $12.
> Perhaps this works because there's no command queueing over USB.
>
> As a side note, instead of turning off tagged queueing, I also tried
> reducing the number of tags from 32 to 2. Didn't help: the errors
> continued to happen.
>
> Maybe some day Samsung will come out with new firmware that fixes this
> problem.
>
> Is this something I should post to bugzilla (it's not a FreeBSD bug,
> though) or to some FreeBSD forum (which one)? Google no longer gets
> USENET, so I don't expect this article would be found by Google search.

Report your findings in bugzilla. A developer may add a quirk to
the scsi subsystem to automatically detect the drive and "do the
right thing". I would use somthing like "SCSI tag-queue error with
Samsung 870 EVO SSD" as the title.

--
steve

Subject: Re: errors using new Samsung 870 EVO SSD
From: Winston
Newsgroups: comp.unix.bsd.freebsd.misc
Organization: A noiseless patient Spider
Date: Sun, 15 Sep 2024 23:54 UTC
References: 1 2 3
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: wbe@UBEBLOCK.psr.com.invalid (Winston)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: errors using new Samsung 870 EVO SSD
Date: Sun, 15 Sep 2024 19:54:04 -0400
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <yda5g8jwwz.fsf@UBEblock.psr.com>
References: <ydo75mb6u2.fsf@UBEblock.psr.com>
<yded5lju6r.fsf@UBEblock.psr.com> <vc6t0m$27os5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Mon, 16 Sep 2024 01:54:00 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c3fb88a97799281316ae9c2324f376b3";
logging-data="2577751"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19S6UKpR+T7S7l0Kmz6BDhs"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:WXM82zjaCHp/36rCczFwzNOkMkY=
sha1:8phNeexoDheOWWZ/R4AIwcrsAGQ=
Mail-Copies-To: never
View all headers

I asked:
>> Is this something I should post to bugzilla (it's not a FreeBSD bug,
>> though) or to some FreeBSD forum (which one)? Google no longer gets
>> USENET, so I don't expect this article would be found by Google
>> search.

to which "Steven G. Kargl" <sgk@REMOVEtroutmask.apl.washington.edu>
kindly replied:
> Report your findings in bugzilla. A developer may add a quirk to
> the scsi subsystem to automatically detect the drive and "do the
> right thing". I would use somthing like "SCSI tag-queue error with
> Samsung 870 EVO SSD" as the title.

I'll do that now. Thanks,
-WBE

Subject: Re: errors using new Samsung 870 EVO SSD
From: Winston
Newsgroups: comp.unix.bsd.freebsd.misc
Organization: A noiseless patient Spider
Date: Mon, 16 Sep 2024 14:52 UTC
References: 1 2 3 4
Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: wbe@UBEBLOCK.psr.com.invalid (Winston)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: errors using new Samsung 870 EVO SSD
Date: Mon, 16 Sep 2024 10:52:11 -0400
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <yd5xqvk5wk.fsf@UBEblock.psr.com>
References: <ydo75mb6u2.fsf@UBEblock.psr.com>
<yded5lju6r.fsf@UBEblock.psr.com> <vc6t0m$27os5$1@dont-email.me>
<yda5g8jwwz.fsf@UBEblock.psr.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Mon, 16 Sep 2024 16:52:04 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c3fb88a97799281316ae9c2324f376b3";
logging-data="3058263"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/pzp8WAKkjbhwF0EyaKVYY"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:hOxV/Vbz2MrRTF4Fltdn54RZNo0=
sha1:Ae5faHZJOxrBWswlfiQV/ePK0q0=
Mail-Copies-To: never
View all headers

"Steven G. Kargl" <sgk@REMOVEtroutmask.apl.washington.edu> said:
>> Report your findings in bugzilla. A developer may add a quirk to
>> the scsi subsystem to automatically detect the drive and "do the
>> right thing". I would use somthing like "SCSI tag-queue error with
>> Samsung 870 EVO SSD" as the title.

> I'll do that now. Thanks,

Done: Bug #281528.
-WBE

Subject: Re: errors using new Samsung 870 EVO SSD
From: Matthias Meyser
Newsgroups: comp.unix.bsd.freebsd.misc
Organization: XeNET GmbH, 38678 Clausthal-Zellerfeld
Date: Wed, 18 Sep 2024 11:16 UTC
References: 1
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!news.mixmin.net!news.neodome.net!nntp.xenet.de!.POSTED.xenet.gate.xenet.de!not-for-mail
From: Meyser@xenet.de (Matthias Meyser)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: errors using new Samsung 870 EVO SSD
Date: Wed, 18 Sep 2024 13:16:15 +0200
Organization: XeNET GmbH, 38678 Clausthal-Zellerfeld
Message-ID: <vcecpf$26jq$1@nntp.serx01.xenet.de>
References: <ydo75mb6u2.fsf@UBEblock.psr.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 18 Sep 2024 11:15:59 -0000 (UTC)
Injection-Info: nntp.serx01.xenet.de; posting-host="xenet.gate.xenet.de:213.221.94.32";
logging-data="72314"; mail-complaints-to="usenet@nntp.xenet.de"
User-Agent: Mozilla Thunderbird
Content-Language: de-DE
In-Reply-To: <ydo75mb6u2.fsf@UBEblock.psr.com>
View all headers

If not alread done install and anable "cpu-microcode" pkg.

Just a try.

With best regards
Matthias Meyser

Am 21.08.2024 um 10:47 schrieb Winston:
> This is the first time I've used a solid state drive. It looks like
> there's some kind of compatibility or interface problem. The output
> from smartctl -x also points to some kind of interface problem.
>
> ZFS counted about 180 write errors while resilvering ~80GB. Most seemed
> to be retryable and succeeded on the second try (see logs below).
>
> The system is using AMD-AHCI, not IDE.
> The SATA interface is running at 3.0Gb/s, half the SSD's 6.0Gb/s speed.
> Temperature is fine (~29C).
>
> So far, the errors only occur during heavy activity: write errors during
> resilvering, and 2 read errors later during a brief burst of read
> activity.
>
> [Note: I swapped the SATA cables: that's why ada1 during resilvering
> became ada0 later. Since I swapped cables at the drive end, not the
> motherboard end, I think it unlikely to be a cable/bad connection problem.]
>
> Any ideas what the problem might be? Thanks,
> -WBE
> ----------
> [read error log entries:] [mildly edited]
>
> Aug 21 03:01:24: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 ff 64 40 13 00 00 00 00 00
> Aug 21 03:01:24: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed
> Aug 21 03:01:24: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
> Aug 21 03:01:25: ahcich0: Timeout on slot 9 port 0
> Aug 21 03:01:25: ahcich0: is 04000000 cs 00000200 ss 00000000 rs 00000200 tfd 451 serr 00400000 cmd 0000e917
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 58 00 65 40 13 00 00 00 00 00
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 c0 36 65 40 13 00 00 00 00 00
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): ATA status: 00 ()
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 c8 ff 64 40 13 00 00 00 00 00
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): ATA status: 00 ()
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00
> Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
> Aug 21 03:01:25 crystal ZFS[1332]: vdev I/O failure, zpool=zp path=/dev/ada0p3 offset=149417648128 size=4096 error=5
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 38 88 e4 80 40 13 00 00 00 00 00
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 10 48 29 40 05 00 00 00 00 00
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 c0 e4 80 40 13 00 00 00 00 00
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
> Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain.
>
> [end of read error log entries]
> ----------
> [typical write errors during resilvering:] [mildly edited]
>
> Aug 21 00:33:01: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 e0 b2 e7 40 02 00 00 00 00 00
> Aug 21 00:33:01: (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
> Aug 21 00:33:01: (ada1:ahcich1:0:0:0): Error 5, Unretryable error
> Aug 21 00:33:02: ahcich1: Timeout on slot 19 port 0
> Aug 21 00:33:02: ahcich1: is 04000000 cs 00080000 ss 00000000 rs 00080000 tfd 451 serr 00400000 cmd 0000f317
> Aug 21 00:33:02: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 48 f0 b2 e7 40 02 00 00 00 00 00
> Aug 21 00:33:02: (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
> Aug 21 00:33:02: (ada1:ahcich1:0:0:0): Error 5, Unretryable error
> Aug 21 00:33:02 crystal ZFS[1322]: vdev I/O failure, zpool=zp path=/dev/ada1p3 offset=7774244864 size=36864 error=5
> Aug 21 00:33:05: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 78 2e f4 40 02 00 00 00 00 00
> Aug 21 00:33:05: (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
> Aug 21 00:33:05: (ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain
> Aug 21 00:33:05: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 58 2e f4 40 02 00 00 00 00 00
> Aug 21 00:33:05: (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
> Aug 21 00:33:05: (ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain
>
> [end of write error log entries]
> ----------
>
> Here's smartctl -x output, keeping only what looked "interesting"/relevant:
>
> SATA Version is: SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
>
> 199 CRC_Error_Count -OSRCK 099 099 000 - 64
> 235 POR_Recovery_Count -O--C- 099 099 000 - 7
> 241 Total_LBAs_Written -O--CK 099 099 000 - 213396105
>
> 0x06 0x018 4 64 --- Number of Interface CRC Errors
>
> SATA Phy Event Counters (GP Log 0x11)
> ID Size Value Description
> 0x0001 2 2 Command failed due to ICRC error
> 0x0002 2 0 R_ERR response for data FIS
> 0x0003 2 0 R_ERR response for device-to-host data FIS
> 0x0004 2 0 R_ERR response for host-to-device data FIS
> 0x0005 2 65535+ R_ERR response for non-data FIS
> 0x0006 2 65535+ R_ERR response for device-to-host non-data FIS
> 0x0007 2 0 R_ERR response for host-to-device non-data FIS
> 0x0008 2 0 Device-to-host non-data FIS retries
> 0x0009 2 5 Transition from drive PhyRdy to drive PhyNRdy
> 0x000a 2 5 Device-to-host register FISes sent due to a COMRESET
> 0x000b 2 0 CRC errors within host-to-device FIS
> 0x000d 2 65535+ Non-CRC errors within host-to-device FIS
> 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
> 0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC
> 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
> 0x0013 2 65535+ R_ERR response for host-to-device non-data FIS, non-CRC
>
> SCT Error Recovery Control:
> Read: Disabled
> Write: Disabled
> ----------
> [END] [Thanks for reading.]

Subject: Re: errors using new Samsung 870 EVO SSD
From: Detlef Sax
Newsgroups: comp.unix.bsd.freebsd.misc
Date: Wed, 18 Sep 2024 16:45 UTC
References: 1 2
Path: eternal-september.org!news.eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: sax@noart.de (Detlef Sax)
Newsgroups: comp.unix.bsd.freebsd.misc
Subject: Re: errors using new Samsung 870 EVO SSD
Date: 18 Sep 2024 16:45:38 GMT
Lines: 24
Message-ID: <ll0ediFkro8U1@mid.individual.net>
References: <ydo75mb6u2.fsf@UBEblock.psr.com>
<vcecpf$26jq$1@nntp.serx01.xenet.de>
X-Trace: individual.net jyYzk2qCH8DCm1gxckh30Q9YVAAKNmJPjsBTUhkLMms2i5o3jx
Cancel-Lock: sha1:7EECRLQlVnxuTz5WB3aK9E3GAoo= sha256:Mu5nN6553PkYaLTGaNTvcGagjAXtf8b2Kfob4uNF40E=
User-Agent: slrn/1.0.3 (FreeBSD)
View all headers

On Wed, 18 Sep 2024 13:16:15 +0200, Matthias Meyser wrote:
> If not alread done install and anable "cpu-microcode" pkg.
>
> Just a try.
>
> With best regards
> Matthias Meyser
[...]

Sorry for jumping in. I only want to say thanks.

Using long time FreeBSD (> 20y) private and small business.
Never heard or read about this package.

Last times I used to boot Windows to update the BIOS because my
Fujitsu cannot update from BIOS like some HPs and others can.
This job is the only reason I got an old harddisk with this MS Windows
installed. (Usually it's power cord is not connected.-)

Detlef

--
https://www.12schrittefrei.de/
https://www.noart.de/

1

rocksolid light 0.9.8
clearnet tor