Summary: When testing ZFS read performance with fio, compression settings on the file system may cause you to test cache performance instead of physical disk performance.
Background: Testing was done on a FreeBSD 8.3-STABLE system, with an eleven-disk, non-root zpool:
$ zpool status
pool: tank01
state: ONLINE
scan: scrub repaired 0 in 0h11m with 0 errors on Fri Jun 14 17:20:12 2013
config:
NAME STATE READ WRITE CKSUM
tank01 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
da5 ONLINE 0 0 0
da6 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
da7 ONLINE 0 0 0
da8 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
da9 ONLINE 0 0 0
da10 ONLINE 0 0 0
logs
da11 ONLINE 0 0 0
errors: No known data errors
ZFS and zpool versions were 5 and 28, respectively. Recordsize was 128K. Compression was set to “on”.
Data disks were Toshiba model MK1001TRKB; the separate log was an STEC ZeusRAM C018. Drives were connected to a single LSI SAS2008 HBA.
The system had a single 2.13GHz Intel Xeon E5606 processor, and 12GB of RAM; via vfs.zfs.arc_max
in /boot/loader.conf
, ARC size was limited to 6GB.
Tests run:
For $test
types of “read” and “randread”, fio was run as follows, five times for each test:
fio --directory=.
--name=$filename
--rw=$test
--bs=128k
--size=36G
--numjobs=1
--time_based
--runtime=60
--group_reporting
System activity was monitored during each run using a combination of sysctl, iostat, vmstat and top. Fio “IO file” size was measured using “du -h”.
IO files were then written to using the following command:
fio --directory=.
--name=$filename
--rw=write
--bs=128k
--size=36G
--numjobs=1
--group_reporting
After writing to the IO files, tests were re-run, again five times per test.
Results:
Fio appears to create new IO files in a highly-compressible format. With compression on, files that have not been written to are 512 bytes in size; without compression, they are the full size specified in the fio command line (36GB in this case). Once the files have been written to, on a file system with compression turned on, they grew in size to about 14GB (i.e. the files had a compression ratio slightly better than 2:1).
Performance was substantially better on files that had not been written to as opposed to those that had been written to; run-to-run variation, as measured by the standard deviation, was larger on the written files:
Test |
mean MB/s unwritten file |
stdev MB/s unwritten file |
mean IOPs unwritten file |
stdev IOPs unwritten file |
mean MB/s written file |
stdev MB/s written file |
mean IOPs written file |
stdev IOPs written file |
read |
2458.4 |
11.7 |
19666 |
93 |
897.4 |
33.4 |
7178 |
267 |
randread |
2337.8 |
2.3 |
18703 |
20 |
40.6 |
14.2 |
324 |
114 |
The vmstat
, iostat
and top
values suggest that benchmark performance was bounded by the CPU on unwritten files, and zpool disks on the written files.
sysctl
counters and iostat indicated that effectively no reads for the unwritten files were served from disk but came instead from (prefetch) cache; written files exercised the disks, but when data was read from cache for the written files, it came predominantly from the ARC.
The randread/written file results, in aggregate, have far greater variation as a proportion of the achieved performance; looking at individual runs an interesting pattern emerges:
Test number |
MB/s |
IOPs |
ARC cache hit ratio |
MRU hits |
MFU hits |
1 |
20.1 |
160 |
0.15 |
856 |
646 |
2 |
36.2 |
289 |
0.51 |
6060 |
2790 |
3 |
40.4 |
323 |
0.56 |
5504 |
5289 |
4 |
47.8 |
382 |
0.63 |
1348 |
12992 |
5 |
58.3 |
466 |
0.69 |
1761 |
17633 |
Specifically, performance got better with each run, apparently as a result of ARC caching. Initially, cache hits seem to be drawn from the MRU, but by the fourth and fifth tests, the MFU is more heavily used. The ARC caches uncompressed data, but even at 36GB of file data with a 6GB ARC, it is reasonable that some proportion of “random” read data will be served from the ARC. The possible implication that the ARC is successfully able to adapt to fio’s random read workload would be interesting to look at in greater depth.
Conclusion: Although this is a limited set of data, two results can reasonably be drawn from it:
- The combination of cache and compression in ZFS can have impressive performance benefits; and
- ARC and prefetch cache are each relevant in different performance domains.
Acknowledgements: I am indebted to my former employer for allowing me free usage of the system tested in this blog post.