Originally, ZFS was an acronym for "Zettabyte File System." The largest SI prefix we liked
('yotta' was out of the question) was 'zetta'. Since ZFS is a 128-bit
file system, it was a reference to the fact that ZFS can store 256 quadrillion
zettabytes (where each ZB is 270 bytes). Over time, ZFS gained a lot
more features besides 128-bit capacity, such as rock-solid data integrity, easy
administration, and a simplified model for managing your data.
Filesystems have proven to have a much longer lifetime than most traditional
pieces of software, due in part to the fact that the on-disk format is extremely
difficult to change. Given the fact that UFS has lasted in its current form
(mostly) for nearly 20 years, it's not unreasonable to expect ZFS to last at
least 30 years into the future. At this point, Moore's law starts to kick in
for storage, and we start to predict that we'll be storing more than 64 bits of
data in a single filesystem. For a more thorough description of this topic, and
why 128 bits is enough, see Jeff's
blog entry.
The limitations of ZFS are designed to be so large that they will never be
encountered in any practical operation. ZFS can store 16 Exabytes in each
storage pool, file system, file, or file attribute. ZFS can store billions of
names: files or directories in a directory, file systems in a file system, or
snapshots of a file system. ZFS can store trillions of items: files in a file
system, file systems, volumes, or snapshots in a pool.
There are two basic reasons to have an fsck(1M)-like utility.
Verify filesystem integrity - Many times, administrators simply
want to make sure that there is no on-disk corruption within their filesystems.
With most filesystems, this involves running fsck(1M) while the filesystem is
offline. This can be time consuming and expensive. Instead, ZFS provides the
ability to 'scrub' all data within a pool while the system is live, finding and
repairing any bad data in the process. There are future plans to enhance this
to enable background scrubbing as well as keep track of exactly which files
contained uncorrectable errors.
Repair on-disk state - If a machine crashes, the on-disk state of
some filesystems will be inconsistent. The addition of journalling has solved
some of these problems, but failure to roll the log may still result in a
filesystem which needs to be repaired. In this case, there are well known
pathologies of errors (such as creating a directory entry before updating the
parent link) which can be reliably repaired. ZFS does not suffer from this
problem because data is always consistent on disk.
A more insidious problem occurs with faulty hardware or software. Even those
filesystems or volume managers which have per-block checksums are vulnerable to
a variety of other pathologies that result in valid but corrupt data. In this
case, the failure mode is essentially random, and most filesystems will panic
(if it was metadata) or silently return bad data to the app. In either case, an
fsck(1M) utility will be of little benefit. Since the corruption matches no known
pathology, it will be likely be unrepairable. With ZFS, these errors will be
(statistically) nonexistent in a redundant configuration. In an non-redundant
config, these errors will correctly be detected, but will result in an I/O error
when trying to read the block. It is theoretically possible to write a tool to
repair such corruption, though any such attempt would likely be a one-off
special tool. Of course, ZFS is equally vulnerable to software bugs, but the
bugs would have to result in a consistent pattern of corruption to be repaired
by a generic tool. During the 5 years of ZFS development no such pattern has
been seen.
On UFS, du(1) reports the size of the data blocks within the file. On ZFS,
du(1) reports the actual size of the file as stored on disk. This includes
metadata, as well as compression. This really helps answer the question of "how
much more space will I get if I remove this file?" So even when compression is
off, you will still see different results between ZFS and UFS.
ZFS is designed to survive arbitrary hardware failures through the use of
redundancy (mirroring or RAID-Z). Unfortunately, certain failures in
non-replicated configurations can cause ZFS to panic when trying to load the
pool. This is a bug, and will be fixed in the near future (along with several
other nifty features, such as background scrubbing). In the meantime, if you find yourself in the situation where
you cannot boot due to a corrupt pool, do the following:
boot using '-m milestone=none'
# mount -o remount /
# rm /etc/zfs/zpool.cache
# reboot
This will remove all knowledge of pools from your system. You will have to
recreate your pool and restore from backup.
Yes, the ZFS hot spares feature is available in the Solaris Express Community Release, build 42, the Solaris Express July 2006 release, and the Solaris 10 11/06 release. For more information about hot spares, see the ZFS Administration Guide.
You can remove a device from a mirrored ZFS configuration by using the zpool detach command. Removal of a top-level vdev, such as an entire RAID-Z group or a disk in an unmirrored configuration, is not currently supported. This feature is planned for a future release.
Currently, ZFS file systems cannot be used as a root file system on Solaris 10 releases. However, a small subset of ZFS root and boot support is available in the SX community release for x86 systems. For more information,
see ZFS Boot.
Please stay tuned for the Solaris 10 ZFS boot schedule.
ZFS can be used as a zone root path in the Solaris Express release, but it cannot be patched or upgraded until those tools recognize ZFS file systems. Zone root paths on ZFS are not supported in the Solaris 10 release.
For more information, see the Zones FAQ.
In addition, you cannot create a cachefs cache on a ZFS file system.
SunCluster 3.2 supports a local ZFS file system as highly available (HA) in the Solaris 10 11/06 release. This support allows for live failover between systems, with automatic import of pools between systems.
If you use SunCluster 3.2 to configure a local ZFS file system as highly available, review the following caution:
Do not add a configured quorum device to a ZFS storage pool. When a configured quorum device is added to a storage pool, the disk is relabeled and the quorum configuration information is lost. This means the disk no longer provides a quorum vote to the cluster. After a disk is added to a storage pool, you can configure that disk as a quorum device. Or, you can unconfigure the disk, add it to the storage pool, then reconfigure the disk as a quorum device.
Using SunCluster 3.2 with HA-ZFS in the Nevada release is not recommended.
ZFS is not a native cluster, distributed, or parallel file system and
cannot provide concurrent access from multiple, different hosts. ZFS works great when shared in a distributed NFS environment.
In the long term, we plan on investigating ZFS as a native cluster file system to allow concurrent access. This work has not yet been scoped.
EMC Networker 7.3.2. backs up and restores ZFS file systems, including ZFS ACLs.
Veritas Netbackup 6.5 backs up and restores ZFS file systems, including ZFS ACLs.
IBM Tivoli Storage Manager client software (5.4.1.2) backs up and restores ZFS file systems with both the CLI and the GUI. ZFS ACLs are also preserved.
Computer Associates' BrightStor ARCserve product backs up and restores ZFS file systems, but ZFS ACLs are not preserved.
Yes, ZFS works with either direct-attached devices or SAN-attached devices.
However, if your storage pool contains no mirror or RAID-Z top-level devices,
ZFS can only report checksum errors but cannot correct them. If your storage
pool consists of mirror or RAID-Z devices built using storage from SAN-attached
devices, ZFS can report and correct checksum errors.
For example, consider a SAN-attached hardware-RAID array, set up to present
LUNs to the SAN fabric that are based on its internally mirrored disks. If
you use a single LUN from this array to build a single-disk pool, the pool
contains no duplicate data that ZFS needs to correct detected errors.
In this case, ZFS could not correct an error introduced by the array.
If you use two LUNs from this array to construct a mirrored storage pool, or
three LUNs to create a RAID-Z storage pool, ZFS then would have duplicate data
available to correct detected errors. In this case, ZFS could typically correct
errors introduced by the array.
In all cases where ZFS storage pools lack mirror or RAID-Z top-level virtual
devices, pool viability depends entirely on the reliability of the underlying
storage devices.
If your ZFS storage pool only contains a single device, whether from
SAN-attached or direct-attached storage, you cannot take advantage of
features such as RAID-Z, dynamic striping, I/O load balancing, and so on.
ZFS always detects silent data corruption. Some storage arrays can detect
checksum errors, but might not be able to detect the following class of
errors:
Accidental overwrites or phantom writes
Mis-directed reads and writes
Data path errors
Overall, ZFS functions as designed with SAN-attached devices, but if you
expose simpler devices to ZFS, you can better leverage all available features.
In summary, if you use ZFS with SAN-attached devices, you can take advantage of the self-healing features of ZFS by configuring redundancy in your ZFS storage pools even though redundancy is available at a lower hardware level.
ZFS file systems can be used as logical administrative control points, which allow you to view usage, manage properties, perform backups, take snapshots, and so on. For home directory servers, the ZFS model enables you to easily set up one file system per user. ZFS quotas are intentionally not associated with a particular user because file systems are points of administrative control.
ZFS quotas can be set on file systems that could represent users, projects, groups, and so on, as well as on entire portions of a file system hierarchy. This allows quotas to be combined in ways that traditional per-user quotas cannot. Per-user quotas were introduced because multiple users had to share the same file system.
ZFS file system quotas are flexible and easy to set up. A quota can be applied when the file system is created. For example:
# zfs create -o quota=20g tank/home/users
User file systems created in this file system automatically inherit the 20-Gbyte quota set on the parent file system. For example:
ZFS quotas can be increased when the disk space in the ZFS storage pools is increased while the file systems are active, without having any down time.
Rather than attempt to make user-based quotas fit an administration model that is based on file systems as points of control, the ZFS team is working to improve multiple file system management.
An alternative to user-based quotas for containing disk space used for mail, is using mail server software that includes a quota feature, such as the Sun Java System Messaging Server. This software provides user mail quotas, quota warning messages, and expiration and purge features.
Currently, ZFS does not support the ability to split a mirrored configuration for cloning or backup purposes. The best method for cloning and backups is to use ZFS clone and snapshot features. For information about using ZFS clone and snapshot features, see the ZFS Admin Guide. See RFE 6421958 to recursively send snapshots that will improve the replication process across systems.
In addition to ZFS clone and snapshot features, remote replication of
ZFS file systems is provided by the Sun StorageTek Availability Suite
product. AVS/ZFS demonstrations are available here.
Keep the following cautions in mind if you attempt to split a mirrored
ZFS configuration for cloning or backup purposes:
Splitting a mirrored ZFS configuration is not supported by ZFS.
RFE 6421958 is filed to provide this feature.
You cannot remove a disk from a mirrored ZFS configuration, back up
the data on the disk, and then use this data to create a cloned pool.
If you want to use a hardware-level backup or snapshot feature instead of the ZFS snapshot feature, then you will need to do the following steps:
zpool export pool-name
Hardware-level snapshot steps
zpool import pool-name
Any attempt to split a mirrored ZFS storage pool by removing disks or changing the hardware that is part of a live pool could cause data corruption.