Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, with a 9+ year old server, i wouldn't trust those disks, plus the amount of power...


I buy old SAS/SATA drives that have lots of hours on them. They tend to look like this:

https://pastebin.com/raw/E39HuK59

Run SMART, check the stats, make sure the grown defect list is 0, make sure it's never been over-temp, run a SMART long test, run bad blocks, run another SMART long test. If it passes all those I'd be fine continuing to use it.

The HUH728080ALE600 drives in that server[1] idle at 5.1 watts[2], so it's 184 watts just on the drives. I'm guessing at idle the server runs around 300 watts. Which is not great, but it's not terrible. I pay ~ $1/watt/year in NC, so running that server 24x7 would cost me $300 annually.

If it were me I'd disconnect at least half the drives to keep as spares.

1. https://imgur.com/a/zb6Mqty

2. https://documents.westerndigital.com/content/dam/doc-library...


Cold spares can sometimes disappoint. Lots of techs preferred hot spares not just for immediate availability for the rebuild, but just because the spin-up was just as likely to kill the drive as other failure options. With hot-spares, you could rotate the drives by migrating data around like the old farmers partitioning their fields to let sections rest.


I use HGST / old Hitachi drives in most of my servers, typically either 2 TB or 3 TB, though I do have some older 1 TBs still kicking (over 10 years, running 24x7.) In all the years, I've only had 1 fail. I just checked and one of them has over 91000 hours!


I would love to have this. Old disks are no problem. They will most likely continue to run fine until they no longer are worth it due to either insane electricity prices or 50TB hard drives becomes available for a bargain.

My backup server is still running almost 17-18 years old 500GB hard drives. They will be retired soon though, as they consume too much power and lack spindown when not in use.


I think there's a difference between the wear and tear in a personal backup server and in a Netflix cache server. We manage several storage servers which see intensive usage, and usually at around 5 years we start to see disk failures.


its fair to assume netflix is mostly doing reads and not writes, and also fairly sequential.

Thats very good for longevity of the disks


True - but presumably they've been spinning and doing seeks pretty much 24/7. It's not like backblaze/coldline where the drives are practically powered down most of the time.

And of course, 36x the drives means 36x the drive failures - and even if you avoid losing data, you've still got the chore of swapping each failed drive.


Constantly spinning is better for the motor than starting and stopping.


> And of course, 36x the drives means 36x the drive failures

I think you must mean 36x the chance of drive failure. Drive failure probability per year is published at around 1% though practically as high as 10%. 36 drives means there's somewhere between a 30.35% (1 - 0.99^36) and 97.7% (1 - 0.9^36) chance per year of a single drive failing.


> and also fairly sequential

... No.

When you hit hundreds of concurrent operations there is no such thing as sequential. Also there is no need for sequential reads because a stripe of 7-15 disks would give you much more than streaming bitrate needs.


>We manage several storage servers which see intensive usage, and usually at around 5 years we start to see disk failures.

Which begs the question of how old the drives actually are. If intensive usage would result in lots of disk failures after 5 years, then a 9-year-old server would surely have a bunch of new(er) disks inside it? Or do they just reduce the amount of cache space they have with every failure?


The latter. The case design clearly doesn’t lend itself to replacing drives.

https://old.reddit.com/r/homelab/comments/ydollm/so_i_got_a_...


Of course, the timing depends on the usage pattern. In our case we write a lot to the disks, which means they fail faster. Time to failure also depends on manufacturer, batch, server conditions... But in general I wouldn't trust too much that have been continuously running under a significant load (I don't think a Netflix cache server had a lot of idle time) for so many years.


I think that is a good point, however MTBF is usually a lot higher on enterprise grade hard drives. My hard drives are consumer grade.


> […] and lack spindown when not in use.

I was under the impression that constant on-off was what put the most wear on a drive, and that if it is constantly on it would last longer.


Yep. In a previous iteration of my file server I killed a WD green drive in about two years because I unwittingly left the head park feature on. It was parking after 8 seconds of inactivity, and in two years it had accumulated like 2.5 million parks.


Not really. With minimum 50000 cycles, you can spin up and down 28 times every day for 5 years. It may not sound a lot, but in reality it is. I have my NAS set to spindown after 15 minutes idle time and have on average around 1000 cycles every year.


>or 50TB hard drives becomes available for a bargain.

Maybe my old age is showing, but this just frightens me to no end. 1 50TB failure, and you've lost 50TB. Build an array of smaller drives to get to 50TB, and much less catastrophe if 1 drive dies.


Maintenance costs money too. What if you have to replace 3 times more often because of the smaller drives?


if it costs some $ to be able to recover is still better than not being able to recover at all because the single point of failure fails


For some reason the make and model of those disks is absent from your comment, must've been lost in the wires.


They are a mix of Samsung, WD and Seagate. 2 of them are PATA, while the rest is SATA


With enough redundancy anything can be made reliable. ZFS can do some crazy RAID levels that will pretty much make anything reliable given enough disks and CPU resources. Whether it's a good idea (especially in terms of power usage) is a good question though.


You'd think so, but failing disks in raidz arrays can cause extreme performance problems, and failing disk electronics can cause lockups on SAS/SATA controllers.

Old disks are always risky. Avoid.


Also: noise. Those things were loud.


According to this article, it consumes about 500W of power: https://www.phoronix.com/news/MTExNDM#:~:text=500w

Though I assume that's at typical Netflix workload levels.


You should never trust any disk, that's why we have backups and redundancy in the first place.


Only about 180w average draw from the disks depending on workload


Would the hard disks be unreliable too?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: