dRAID vdevs can be quickly reused with additional capacity in place of additional disks.
On Friday afternoon, the OpenZFS project released version 2.1.0 of our favorite "always complicated but worth it" file system. The new version is compatible with FreeBSD 12.2-RELEASE and later and with Linux 3.10-5.13 kernels. This release introduces many general performance improvements as well as many completely new features - most of which are aimed at high-end businesses and applications. p>
Today we want to add the most popular OpenZFS 2.1.0 topology feature - vdev vRAID. dRAID has been in active development since at least 2015 and reached beta when it merged with OpenZFS master in November 2020. Since then, testing has been extensively tested at several major OpenZFS development stores - which means today's release is "fresh" to the case. production, not "new" because it has not been tested.
Distribution RAID (dRAID) Overview
Read More ZFS 101 - Understanding ZFS Storage and Performance If you previously thought ZFS architecture was a complex issue, prepare to be stunned. Distributed RAID (dRAID) is a brand new vdev architecture that we first encountered in a presentation at the OpenZFS 2016 Development Summit.
When creating a vRAID dRAID, the administrator defines the number of data, parity, and power storage segments in each strip. These numbers are independent of the number of physical vdev disks. We can see this in the following example taken from the dRAID Basic Concepts documentation:root @box: ~ #zpool create mypool draid2: 4d: 1s: 11c wwn-0 wwn-1 wwn-2 .. wwn-A root @box: ~ # zpool status mypool pool: mypool state: ONLINE config: NAME STATE READ WRITE CKSUM repository ONLINE 0 0 0 draid2: 4d: 11c: 1s-0 ONLINE 0 0 0 wwn-0 ONLINE 0 0 0 wwn-1 online 0 0 0 wwn-2 connected 0 0 0 wwn-3 connected 0 0 0 wwn-4 connected 0 0 0 wwn-5 connected 0 0 0 wwn-6 connected 0 0 0 wwn-7 connected 0 0 0 wwn-8 connected 0 0 0 wwn-9 online 0 0 0 wwn-A online 0 0 0 spares draid2-0-0 AVAIL
In the example above, we have eleven disks: wwn-0 to wwn-a. We created a dRAID vdev with 2 parity devices, 4 datasets, and one additional device per strip - in condensed terms, a driid2:4:1.
Even if we have eleven disks total in draid2:4:1, it is Use only six per data bar - and one per physical bar. In a world of full blanks, frictionless surfaces, and spherical chickens, the design on a tapered 2:4:1 disc looks like this:0 1 2 3 4 5 6 7 8 9 A s PPDDDDPPDDD s DPPDDDDDDPDPDD s DDPPDDDDPPD s DDDPPDDDD . . s. . . . . . . . . . s. . . . . . . . . . s. . . . . . . . . . s. . . . . . . . . . s. . . . . . . . . . s. . . . . . . . . . s
Effectively, dRAID takes the RAID concept of "diagonal equality" one step further. The first RAID architecture was not RAID5 parity - it was RAID3 where parity was fixed on a single drive rather than distributed across the array. RAID5 eliminates the hard parity drive and distributes the balancing across array disks instead - making random writes much faster than RAID3, since not all writes are tight on a hard parity disk.Advertise on all disks, rather than placing them all on one or two hard disks - and extending them to additional parts. In the event of a disk failure in a vdev dRAID, the parity and the pieces of data that live on the dead disk are copied to the additional partition(s) stored for each bad tape.
Let's take a look at the simplified diagram above, and see what happens if the disk leaves the array. The initial failure creates holes in most datasets (in this simple, wavy diagram):0 1 2 4 5 6 7 8 9 A P P D D D P D D D D D P D D D D D P D D D D P D D D D P D D D P D D D P D D D D s. . . . .
But when we store, we do it on the additional capacity previously reserved:0 1 2 4 5 6 7 8 9 A D P D D D D D D D D D D D P D D D P D D P D D D D D P D D P D P D P D D D P D D P D D P D D D D P D D P D D P D D. . . . .
Please note that these graphs are simplified. The full picture contains the groups, sections, and rows that we do not intend to include here. The logical schema is also randomly replaced so that things are evenly distributed across drives based on compensation. For those who are interested in the smallest details of hair, it is recommended to refer to these details described in the original code. If we are using 4kn disks, the draved2:4:1 vdev screen, as shown above, requires 24KB per block of metadata on the disk, while the widescreen RAIDz2 vdev six requires only 12KB. This difference gets worse as the d+p values get higher - for draud2:8:1 for the same metadata block, it requires a large 40KB!
For this reason, special vdev allocation in pools using vdevs dRAID - when pool with driid2:8:1 and special triple storage requirement for 4KiB metadata blocks does it in only 12KB instead of 40KB in Derrida 2:8:1.
DRAID Functionality, Fault Tolerance and RecoveryThis graph shows the frequent times for the 90-disc pool. The dark blue line above is a constant over the time of the hot disk impedance reset. The colored lines below indicate that the elasticity time is distributed over the overcapacity. OpenZFS docs
In most cases, dRAID vdev works similarly to an equivalent set of traditional vdevs - for example, run 1:2:0 on roughly nine equivalent disks for a set of three RAVEz1 vdevs 3 wide. The fault tolerance is similar - you'll guarantee that you will fail with p = 1, just as you do with RAIDz1 vdevs.Advertising
Note that we said the error tolerance is the same and not the same. A traditional RAIDz1 video pool with three widths will surely only survive a single disk failure, but it will probably survive a second - as long as the second disk isn't damaged, it's not like vdev, everything is fine.
If this failure occurred before reinstalling, in the case of a 1:2 disk, not a disk, this would definitely throw away vdev (and its associated assembly). Since there is no fixed range for separate tapes, a second disk failure will likely cause additional partitions to be lost on previously damaged tapes, regardless of which disk fails in the second.
The fault tolerance is reduced somewhat as the retrofit time increases. In the above graph, we can see that on a 16TB node disk cluster, rebooking on a conventional and static spare takes about 30 hours, no matter how we configure dRAID vdev - but rebooking on the additional distributed capacity can take as little as 30 hours. hour.
This is mostly because reusing an additional distributed partition divides the write load among the remaining disks. When reinstalling on conventional spares, the backup disk itself is tight - readings are done from all vdev disks, but writing must be completed with additional components. But when the additional distributed capacity is used, the read and write loads are divided among all the remaining disks.
A partition-resistant material can be sequential elastic rather than optimization-resistant material - meaning that ZFS can easily copy most affected partitions without worrying about which blocks those partitions belong to? Conversely, the cure resistors have to scan the entire block tree - so there is a random read once, not a sequential read once.
When physical replacement of the damaged disk is added to the pool, this process will be optimized, not sequentially - and instead of the entire vdev, write performance to the disk will be restricted. But the timing of this operation is minimal, because vdev is not in a bad state at first.
RAID vdevs distribution is mostly intended for large storage servers. OpenZFS dump design and testing are rotated further on 90-disk systems. On a smaller scale, traditional vdev drives and parts remain as useful as ever.
We're particularly careful with stocking novices about draining - this is a much more complex design than a pool with traditional vdev hardware. Fast elasticity is great - but due to the necessarily constant ripple length, the deflection is multiplied by the pressure level and in some performance scenarios.
With the addition of regular tweaks without significantly improving performance, quick and accurate drilling may be desirable even on smaller systems - but you need to know exactly where to start with the ideal point. Also, please note that RAID is not a backup - and that includes fear! p>
Dive deep into the OpenZFS 2.1 distributed RAID topologyID
For at least a decade, privacy advocates have yearned ...
On Wednesday, Amazon sent out an email notification to customers who pu...
Despite facing global chip shortage, US export ban and sharp ...
Open source packages estimated to have been down...