Oracle on Flash: The Case of the 4K Redo Log Block Size
Introduction: Redo Block Size Myths
Recently I presented a webinar about Oracle on flash, and demonstrated that many of the traditional storage considerations and compromises facing DBA’s and System Adminstrators are irrelevant on a Pure Storage FlashArray. In particular, you no longer need to worry about RAID levels, stripe sizes, block sizes and so forth. Nor do you need to make any fundamental changes to your current database configuration when you migrate to a Pure Storage array.
In this post we’ll examine the impact of redo log block size on performance in our array. You may have come across blogs recommending a 4K redo log block size for redo logs on flash. This option is new in Oracle 11gR2 and is designed to take advantage of Advanced Format drives which use a 4K sector size instead of the standard 512 byte sector size. The cited advantage of the 4K redo block size is that it minimizes block misalignment problems, and hence improves performance. There is no question that redo log block size can have a significant impact on performance on certain types of SSD’s. Guy Harrison, for example, observed a redo write time improvement of over 3x using 4K redo logs. Note that the 4K block size significantly increases redo wastage (redo blocks written to disk before they are full), but usually this is not a big performance concern.
How to Change Redo Block Size
To create redo logs with a non-default block size (512 bytes on most linux platforms), you must specify the “blocksize” setting in the when you create the logfile group. Your choices are 512, 1024, and 4096. For example:
13:28:59 firstname.lastname@example.org SQL> alter database add logfile group 5 blocksize 4096 13:29:09 2 /
If you see an error such as:
alter database add logfile group 5 size 2g blocksize 4096 * ERROR at line 1: ORA-01378: The logical block size (4096) of file +ORARECO is not compatible with the disk sector size (media sector size is 512 and host sector size is 512)
you need to set the _disk_sector_size_override parameter to TRUE:
13:17:21 email@example.com SQL> alter system set "_disk_sector_size_override"=TRUE scope=both;
Pure Storage Lab Testing
For our load test, we ran the hammerora TPC-C workload. We used 20 logfile groups sized at 2 gigabytes, first using 512 byte block size, and then 4k block size. The redo logs rolled roughly once a minute for all tests (i.e. we generated about 2 gigabytes of redo per minute). We ran the comparisons in a 3 environments:
- ASM in a virtual machine
- EXT4 file system on a physical machine
- ASM on a physical machine
Although the performance chacteristics varied from one test bed to another, there was very little difference within any single environment. The VMware differences were typical — test durations differ by about 2%, and redo wastage is nearly 10x greater with the larger block size:
Test Duration and redo wastage in VMware environment:
At a macro level (Top Activity in Enterprise Manager), the load profiles are nearly identical as well:
Enterprise Manager Top Activity Graph: 512 Byte Redo Logs:
Enterprise Manager Top Activity Graph: 4K Block Redo Logs:
The transaction rate was also essentially identical in both configurations:
512 byte redo block size
4K redo block size:
In Test 2, we mounted the EXT4 file system “noatime, discard”. The discard flag is specific to SSD devices and thinly provisioned LUNs; it makes the file system issue trim commands to the block device when blocks are freed. As with the VMware environment, we see virtually no difference in performance with the different redo log block sizes:
Test Duration and redo wastage on physical machine with EXT4 file system:
Perhaps most significantly, redo write time for these tests was also virtually identical. This chart illustrates the metric for a test on a physical machine running ASM:
It’s true that some ssd product’s do indeed benefit from a 4K redo log block size. That is because they are architected with a fixed size RAID geometry. The notion of a sector (traditionally a pie slice of spinning disk) really has no meaning or context in a Pure Storage FlashArray; why would it?
You can think our Purity Operating Enviroment as utilizing a variable sector size, with the smallest being 512 bytes. Thus we have neither block misalignment issues nor performance compromises. The Purity Operating Environment can certainly understand and process i/o requests that are presented in the context of sectors, but by the time that data makes it to the flash, sectors have been abstracted away. Before we actually write data to the array, we write it to NVRAM where we perform deduplication and compression. The actual bits that are written to the underlying array represent the original i/o, but do not necessarily resemble the original i/o. In addition, since we also perform inline RAID, these bits are not necessarily written to a single physical SSD.
Obviously the Pure Storage array has nothing in common with spinning disks. But it might not be so obvious that it has little in common with other flash arrays out there either. We strive to make your life simple by leveraging flash’s unique capabilities as opposed to hindering it by mimicking the idiosyncrasies of disks. While there are Oracle and o/s settings that take take advantage of flash, your existing configuration will work as-is. Besides redo log block size, things like database block size, ASM vs. file system, LUN ”spindle” count have no bearing on the performance of a Pure Storage array, which means you can deploy your database on Pure Storage without modficiations, and you can continue to adhere to whatever operational policies you may have in place.