Understanding Storage Technology – Part 3: Sizing Storage beyond the Terabytes

By - February 11, 2014

How do you get the right size storage device?  You add up all the gigabytes and terabytes of data it will store, head on down to SANs-R-Us, and pick up a shiny new appliance in your size.  Right?  Not exactly.  As we will see, much more goes into the proper sizing of a storage solution than just focusing on the raw storage capacity.

A storage device can represent a significant investment in an I.T. budget.  The storage device is also a key component in ensuring optimal performance to any associated devices or user endpoints.  This makes selecting the right device, based on empirical data, all the more important.  It isn’t enough to just add up the terabytes. Not all capacity is the same, and is just one of the factors to consider when sizing a storage solution.  We will also take a look at an equally critical factor of storage, IOPS.

Capacity

While a storage device should never be selected based upon capacity alone, capacity is still a critical component in sizing a storage solution.  It would be natural to assume that the capacity of a given storage appliance is simple to determine.  After all, storage vendors should clearly indicate how much data can fit on their devices.  Unfortunately, this is seldom the case.

Some manufacturers advertise the raw storage capacity of their device.  Since almost all storage devices are made up of one or more arrays of individual disks, the total raw capacity of the appliance can be found by multiplying the capacity of one disk by the number of disks.  Alas, this measurement is not very useful.  When disks become part of an array, the total capacity of the array is always less than the sum of the individual disks.  This is because some disks are used for parity data, as hot spares, or as system disks for the operation of the storage appliance.  Some storage systems take an additional chunk out of the usable space on any given disk by using only the outer most part of the disk platters; since data is transferred to and from this part of the disk quicker than the inner part of the platters.  In the end, usable space may only be 50% to 75% of the raw capacity of the total of all drives in the appliance. 

The good news is, some storage manufacturers use techniques to squeeze more data into the usable capacity.  By employing deduplication and compression routines, the effective space of a storage appliance may be much higher than the usable capacity.  However, be careful before relying on this figure too heavily.  Different types of data are better suited to deduplication and compression than others.  Too often manufacturers will tout an aggressive effective storage capacity for an appliance that can only be practically realized under very ideal conditions. 

Finally, the total capacity of the storage device should be considered while keeping in mind that additional features of the appliance may consume a significant portion of the usable storage space.   Many storage devices offer internal backup features, which consume storage space.  Many also offer the ability to replicate data from one appliance to another.  While this feature offers an excellent method of performing off-site backups and lends some exciting disaster recovery potential, it comes at the expense of consuming capacity from the appliances.  We will look at some of these features in our next blog entry.

IOPS – Input Output Operations Per Second

Imagine a backyard swimming pool.  Now imagine being limited to filling and draining the pool with a drinking straw.  How quickly could you get water in and out of that pool if you upgraded to a garden hose, a fire hose, or even a huge 6-inch water pipe?  You could have an Olympic sized backyard pool, but if you can only fill and drain it with a garden hose, the pool may not be very enjoyable.  Our garden hose may only be able to move about a quart of water per second.  Where the 6-inch pipe may be able to move 20 gallons per second.  These are the “IOPS” for our swimming pools.  The pool fitted with the garden hose can move water in or out at about ¼ gallon per second.  The pool fitted with the pipe can move water in or out at about 20 gallons per second. 

In a storage device, IOPS starts as a function of the individual physical disks.  The average SATA disk, spinning at 7,200 RPM, can provide about 80 IOPS.  A 10,000 RPM SAS disk can provide about 140 IOPS.  A 15,000 RPM SAS disk can provide 190 IOPS.  For even greater IOPS, a solid state disk (SSD) with no spinning parts can provide well over 5,000 IOPS.  The number of IOPS the storage device can provide can roughly be figured as the sum of the IOPS of the individual disks. 

Certain workloads demand high IOPS from their connected storage appliances and not factoring in adequate IOPS will result in degraded performance.  But raw IOPS come at a price.  Increasing the number of spinning disks in a storage appliance can get expensive, and solid state technology can be very expensive to use as primary storage.

Luckily, appliance manufacturers have come up with ways to increase the IOPS delivered by their storage appliances without the high cost of adding large numbers of spinning disks.  The two most prominent ways of increasing IOPS of a storage device is caching and tiering.

Caching is the use of higher speed storage or memory (RAM) as a buffer for read and/or write operations.  In the case of read operations, when data is read from a spinning disk, it is also loaded into RAM or a SSD array.  If that data is needed again within a predetermined time, the data is served from the cache instead of utilizing spinning disk IOPS.  For read operations, a read cache accepts and acknowledges all read requests to the storage device.  When the spinning disks are free, the cache data is written to disk.  Different storage arrays may offer one or both of these cache methods.  However, both read and write cache are dependent on and consume some of the IOPS of the underlying physical disks. 

Tiering is the ability for the storage device, either automatically or manually, to provide different classes of storage.  Tier 1 storage might be a low-capacity array of SSD devices.  While this wouldn’t provide a lot of capacity, it would provide a very high number of IOPS.  Tier 2 storage might contain an array of SAS disks, or SAS disks with some type of caching.  This array would usually have a moderate capacity for storage and be able to serve a moderate number of IOPS.  Tier 3 storage might contain an array of SATA disks, typically with no caching.  This array could provide a massive amount of capacity, but lower IOPS than the other tiers.  By dividing up the data into the appropriate tier, IOPS and capacity can be served at the appropriate levels. 

As we have seen, storage sizing goes well beyond the terabytes.  A successful storage implementation starts with a keen understanding of the requirements needed both from a storage standpoint and an IOPS standpoint.  A complete sizing of the data and systems destined for the storage appliance will help to provide the information needed to properly size, select, and design a storage solution.

In our next blog entry, we will look at some of the add-on technologies that storage vendors have created to effectively store, protect, and optimize data.   We will see how a storage device can be the heart of an effective backup and disaster recovery solution.  Finally we’ll take a look at ways to efficiently use a maximum of available space on a storage system.

In the meantime, if you are considering a storage project or if you are making any updates or changes to your datacenter, it is a great time to think about modern storage technologies.  RSM has information technology consultants that specialize in storage evaluation, sizing, design, implementation, administration, and optimization.  To check out our offerings please visit our Infrastructure area on our website. To talk with one of our storage consultants, call 800-274-3978, or send us an email.  We look forward to talking with you!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Receive Posts by Email

Subscribe to the IT Infrastructure blog and receive notifications of new posts by email.
  • This field is for validation purposes and should be left unchanged.