Solid State Drive Primer
I have reviewed a lot of products over the last few years. Graphics cards, Central Processing Units, Power supplies, after market CPU and graphics card coolers. I don’t claim to be a master of reviewing computer hardware, but I have done my fair share of testing and know a thing or two (well more than this) about computer hardware and how not to test them. The emphasis is on “not”. Because if you can avoid the “not” bit everything else is pretty easy. At times I have had to compromise on testing mainly due to lack of technical facilities. But I have always avoided the absolute “not”.
I have also tested a couple of optical disk drives. I am not usually very keen on reviewing optical storage devices. The reason I discovered during my testing of these drives. What I figured out was that it was not how the drive was made (specifically the controller on the drive), but media selection that made or broke the drive. If I could find the right media for a given drive it would happily write to it all its life. If I couldn’t I’d just burn coaster after coaster. Ditto for reliability. Sure there is always some difference in performance characteristics but nothing that drastically separates on drive from the other (apart from using the drive as a scanner, but that is not the main reason why an average consumer would buy an optical drive).
The point of all this “catharsis”? I have always had some technical know how into a product that I had to test and review. What ever gaps were present were easily filled by skimming the web. Recently I was given a Solid State Drive (SSD) to test. All I knew about SSDs was that were really fast initially, slowly slowed down and that there was this TRIM command that would magically restore them to their former glory. I needed to go back to school…
It is here that I found the excellent “Anthology of an SSD” by Anand Lal Shimpi of Anandtech. The article, if nothing else, got me a high school diploma in understanding the basics of a SSD. The article has been referred to by Intel, Wikipedia and Linus Torvalds. Anand’s work on SSDs forced at-least one major SSD manufacturer to rethink its strategy when it came to SSD performance (OCZ). You can imagine the impact factor of the author’s work when he can force a change in a way an entire company thinks about its SSDs.
So before I delve into the review proper I am doing to digress and talk about SSDs in general what makes them, how they tick and their limitations.
NOTE: The article is meant for novices and at times relies on over simplification of technical concepts. For those looking for a more in depth look, please refer to the Anandtech article linked here.
IT’S STILL ABOUT SIZE & MONEY
Almost all SSDs available today are tiny (in physical size as well as capacity) as compared to their magnetic cousins the hard disk drive. Where 1.5TB hard disk drives are ubiquitous and 3.0TB hard disks (HDD) have started to ship this year, most SSDS are about a quarter the size of their 1.0TB hard disk counter parts. Those that do match hard disks for capacity do so at an insane price (about 3500 dollars for 1.0TB SSD; a 1.0TB HDD costs about a 100 dollars). Even for an average consumer grade drive the cost per gigabyte (GB) for a SSD is much higher as compared to HDD. A 160GB SSD ships for about ~400 dollars (2.5 dollars per GB). A 1.0TB HDD ships for about ~100 dollars (.09 dollars per GB). The table below summarizes these facts
Prices change all the time. Rather than just looking at price per GB it is a better idea to look at the ratio between price per GB for SSDs and HDDs.
After this article went live, 2TB HDDs were selling for about US$119. Similarly a 128GB SSD could be found for about US$235. The price per GB for these drives comes out to be US$ 1.84 and US$ .06. The ratio between the two now stands at 31, which is around the figure given above.
I am sure everyone understands the price aspect, but why am I going on about the size part?
If your SSD is going to about, say of ~60GB, you are not going to put the entire GTA IV and its DLC on it. Neither are you going to use it to stream HD movies. In either case you’re going to run out of space real fast. The most likely scenario, today, for an SSD is to act as an Operating System (OS) drive. You might want to put your productivity applications on it as well, but once you’re done with an initial installation you are not going to be doing large scale writes (files with size in megabytes or gigabytes). What you will end up doing is writing small files at random locations. These, perhaps, will pertain to OS updates, browser cache, your own documents, antivirus updates.
Thus rather than relying on a consistently fast sequential write speed (see below for definition), you want a drive that can do relatively fast random writes for smaller files. This is the quantum shift that the author I mentioned above forced one SSD maker to adopt and consequently the industry basically followed suit.
Summarizing you are not going to get a terabyte sized SSD unless you’re willing to pay an insanely large amount of money for it. Once you do get your GB sized drive, you’ll probably end up using it as a Operating system drive with some productivity apps. For this random write speed for small files (in order of kilobytes) is more important that sequential write speed for large files (in order of gigabytes. This is the most important performance criteria for a SSD as things stand today.
There are 5 courses in this class. Every course deals with one aspect of a SSD.
- Course #1: Terminology
- Course #2: Structure of a SSD
- Course #3: Functioning of a SSD
- Course #4: Achilles heels of a SSD
- Course #5: Hiding the heels
Course 1 provides definition of terms that will be used through out the course. Courses 2 and 3 explain what a SSD is made up of and how it works. Finally the last two courses deal with unique problems associated with SSD function and how they are worked around.
EARNING YOU DIPLOMA COURSE #101: TERMINOLOGY
You wont get very far unless you understand basic SSD terminology. So the first step towards your goal is familiarizing yourself with terms that will be used through out this article.
1. SEQUENTIAL ACCESS (WRITE & READ)
This term refers to writing a file to a disk or reading a file from a disk (be it a HDD or a SSD) contiguously. If you have a large file that needs to be copied to a HDD or a SSD, say for example a High Definition (HD) movie, it will be written sequentially to the disk i.e. as a one large chunk. Sequential read would thus mean that it would be read in sequence as a one large chunk. Sequential access is important for large files. In almost all circumstances sequential writes (and read) occur for large amounts of data.
2. RANDOM ACCESS (WRITE & READ)
This terms refers to writing a file to a disk or reading a file from a disk (be it a HDD or a SSD) at random positions. This happens for data that is in magnitude of Kilobytes. Random writes do not need lots of contiguous space as the data that needs to be written is small. As most writes to a disk, especially a SSD fall into this category (see above for why this is “It’s still about size & money”), random access bandwidth and latency (see below), are more important than sequential access and bandwidth.
This is the time taken by your drive to complete a request. If the amount of data you are dealing with is small, latency takes precedence over bandwidth. Latency is measured in milliseconds (ms)
This refers to the amount of data that is transferred over a unit of time. This is measured in Megabytes per second (MB/sec) If you are dealing with a lot of data (in orders of gigabytes) bandwidth takes center stage.
To elaborate the importance of the two I’ll take the example of a paper boy. A paper boy needs to deliver his news papers on time. The amount of data (i.e. papers) he has is limited –maybe enough to cover his neighborhood. To him the important factor is complete the request (of delivering papers) on time. The data he is carrying is not huge (it is after all only a roll of paper).
Now consider a packing and moving company. They have to make sure they get all the stuff out of premises A and get it to premises B. They need to have a large enough van to get all the stuff in. To them the amount of “data” they can carry is more important (than speed). If they don’t have enough “bandwidth” (space) in their van they can only carry a limited about of “data” (items) from A to B. To complete the job they will have to make multiple trips. Thus when dealing with a large amount of data bandwidth is more important than latency.
For a SSD latency is, as things stand today, key. The reason is, as has been mentioned, before that an SSD is not going to be dealing with huge chunks of data.
5. LOGICAL BLOCK ADDRESSING (LBA)
This is a technique used by an operating system like windows to locate data on your HDD or SSD. It is simply a set of numbers that point to a specific location on your disk drive. The operating system knows where a particular file is located. When you want to access that file, say a word document, the operating system tells the drive (or specifically its controller) to access the particular block (or blocks) that the file occupies. If the word document occupied block 10, the OS will tell the drive to access block 10. It is the job of the drive’s controller to translate this command into something that the drive can understand. Hard disks store information into round magnetic disks called platters. These platters are divided into tracks, just like tracks on the old vinyl records. Tracks are subdivided into sectors. The hard disk head must be positioned over a particular track and wait for the sector onto which the word document is located to come under it (the disks rotate at a fixed speed. The speed is dependent on the hard disk type and is usually 7200 rpm) and finally read it. The controller on the drive does the job of translating “block 10” to commands that move the head over the right track and tell the head to read the information once the correct sector has rotated underneath it.
In an SSD the controller is responsible for mapping “block 10” to a location on the drive.
6. HARD DISK DRIVE
For those who don’t know what a hard disk drive (HDD) is, the definition is given here. Though I am sure this is not required, but for the sake of completeness, please bear with me. A hard disk drive is a storage medium that relies on a form of that comprises of magnetic disks called platters mounted on a rotating axis (spindle) from which information is written to or read from by means of a head. The entire drive is encased in a hard metallic shell. Thus the name “hard” disk drive.
7. SOILD STATE DRIVE
As the name suggests, this form of storage relies on solid state memory to store data and has no moving parts.
8. BIT, BYTE, KILOBYTE, GIGABYTE AND BEYOND
A bit is the smallest unit of storage. 8 bits make up one byte. Beyond this, things were simpler once.1024 bytes made up a Kilobyte (KB). 1024KB made up a Megabyte (MB).1024MB made up a Gigabyte (GB).1024GB make up a Terabyte. Computer memory followed this convention. Computer storage, however, calculated it differently. Hard drive makes took 1000 bytes to make a Kilobyte, a 1000KB to make a Megabyte. A 1000MB to make a Gigabyte and so on.
The hard drive makers finally got their way (only joking, the International Standards governing body, SI, changed the system). The old terms Kilo- Mega- Giga- and Tera- got rounded off to the 1000th. New units were introduced to take place of the original units which took a 1024 of a quantity to make the next unit.
The original units are now termed as Kibibyte (KiB; 8 bits), Mebibyte (MiB; 1024KiB), and Gigibyte (GiB; 1024 MiB). I’ll call “ibi” system the “re-designated” system and the “ga” as SI equalents.
Consider a 60 GB SSD. How many Gibibytes does it have? Remember that Gibibyte is a bigger unit. Simply multiply the value in Kilobytes by .93. For our drive this comes out to 55.84 Gibibytes.
9. INVALID PAGES
An invalid page on an SSD points to a location that the operating system considers deleted. The stress is on considers. When a delete (which is the same as an erase command; the words erase and delete are used inter changeably) command is issued by an OS to either an SSD or an HDD it does nothing. The controller on the drive tells the OS it has deleted the data but actually does nothing. You’ll see how this is both a blessing and a curse for SSDs.
10. PARTITIONED AND UNPARTIONED SPACE
Before a disk can be used by an operating system like Windows, it must be prepared. This act of preparing a drive (including an SSD) includes a step called “partitioning”. This implies the creation of partitions (one or more) on the drive. In a 60 GB hard disk you can create as many partitions as you want (there are different types of partitions; you can read them up online). Whatever space is left behind is called un-partitioned space. Some SSD controllers can use this un-partitioned space to their advantage.
COURSE #102: THE STRUCTURE OF A SSD –FROM GROUND UPWARDS
A SSD comprises of three major components
- 1. The memory used to store data
- 2. The controller
- 3. The interface
A. SSD MEMORY: UNDER THE MICROSCOPE
An SSD is, as the name suggests is a solid state drive i.e. no moving parts and based on solid state memory. The basic building block of a SSD is flash memory. Flash memory is a special type of reprogrammable memory that can retain information even if not powered i.e. it is non-volatile; the information the memory contains does not “evaporate” once the power is taken away. The type of flash memory used in a SSD is called a NAND memory (Not AND memory).
NAND refers to the electronic gate used to construct this type of memory. Gates are building blocks of electrical circuits. They allow signals to travel through them if certain input conditions are met. All electrical circuits speak binary language. Their inputs and outputs are either “0” for off or “1” or ON. For a NAND gate, the input arriving at its terminal must not all be in the “ON” (or 1) state. To illustrate what a particular gate does a “truth table” is constructed to show how it will operate depending on its input
One NAND transistor is required to create one memory cell. Two types of memory cells are commonly used to make SSDs
1. Single Level Cell (SLC)
A single level cell stores only one bit of information. As has already been mentioned computers talk in binary language: 0 or 1. Thus to store a bit of information, a SLC must store either a “0” or “1”. Put it in another way a SLC stores two states “0” or a “1”
2. Multi Level Cell (MLC)
A multi level cell stores more than one bit of information. Current MLC store two bits of information. This implies that a MLC must store 4 states for the two bits of information that it stores.
Cells operate by responding to certain levels of voltage. To read information from a cell a certain level of voltage is applied (say V¬0) is applied by the controller to the cell. The controller than sees how the cell responds to this voltage. Once it gets a result it simply moves on to a new voltage level (V¬1) and sees how the cell responds. For an SLC only two voltage levels are required (for reading a 0 or a 1). For MLC four voltage levels are required (to read 0 or a 1 for both bits stored).
This implies that it is quicker to read from a SLC rather than a SLC. It also implies it is faster to write to a SLC than a MLC. As an SLC only deals with 2 voltage levels, it will also require less power to operate. The only downside (and a big one) is that an SLC is more expensive as it stores half as much information as a MLC.
SSDs are made of a special type of non-volatile (flash) memory called NAND memory. One NAND transistor is required to create one cell of memory. A cell can either be single level (store one bit of information) or Multi level (store more than one bit of information). A SLC is more expensive but faster as compared to a MLC.
B. SSD MEMORY: THE BIGGER PICTURE
So now that we know about the basic building block of a SSD we can start making one.
The smallest unit of SSD storage is the cell. As most SSDs for consumers use a MLC, each cell stores two bits of information. To store one Kilobyte (KB) of information we need 4 cells (remember 1KB = 8bits: 4 cells x 2 bits per cells = 8 bits =1 KB). To store 4 Kilobytes we need 4x4 or 16 cells. This is called a page. A page is the smallest unit on a SSD that you can work on i.e. write data to or read data from. Think of this as the smallest functional unit of the SSD and the cell as the smallest structural unit of a SSD.
Cells are grouped together to form blocks. If a block is of 1024KB in size it needs 256 pages. (4 KB per page x 256 pages = 1024KB = 1 Block). The block is the smallest unit that can be erased. Typical block size is 512KB. You can read or write to a page size but you can only erase in block sizes! To Paraphrase: In order to erase a 4KB page a SSD needs to wipe an entire 512KB block!
Most of the “problems” associated from a SSD originate right here.
Blocks are arranged as planes. For a 256MB plane you’ll need 256 blocks that each store 1024KB (1024KB per block x 256 blocks = 256MB = 1 plane).
The page, block and plane sizes vary between flash drives. But typically each page is about 4KB, each block about 512KB.
The smallest unit on a SSD is a cell. Cells are grouped to form pages, which is the smallest unit that can be read from or written to. Pages group together to form a block. A block is the smallest unit that can be erased. Blocks are organized into planes which are then organized as integrated circuits (chips) on a flash die.
C. THE SOLID STATE DRIVE CONTROLLER
The SSD controller is an intelligent piece of electronic circuitry. It has many functions, the foremost being translating OS requests into something the SSD can understand.
However this is just the icing on the cake. The controller is also responsible for making up for many of the short comings of a SSD. In fact the controller is probably the single most important piece of silicon on the SSD. A bad controller paired to the best SSD will be as good as putting a Vitz (Yaris) engine into a Ferrari. You have the best crafted sports car with a mediocre engine. You’ll never enjoy the performance you would expect from a Ferrari.
There are several controller manufactures. These include Intel, Indilinx, JMicron, Samsung and Sandforce. The last player is also the most recent entrant and has taken the entire industry by storm. Just about any manufacturer (apart from Intel) has jumped on its boat. What makes this company so special? You’ll get to find out a little later. Suffice to say Sandforce took the best of what was on offer and came up with a product that competed with the market leader (in terms of raw performance; Intel).
Another important piece of silicon associated with specific controller types is the cache. This is in the form of dynamic RAM and behaves just like caches behave on a HDD. Not all SSDs ship with cache. Sandforce based SSDs are an example of drives that have no onboard cache.
Controllers might also perform some extra function like error correction. This is especially true of Sandforce controllers.
D. THE INTERFACE -PUTTING IT ALL TOGETHER
The final product looks something like this
You obviously have to have some memory (blue outline), which is in the form of NAND flash memory integrated circuits (ICs). Thus more the number of ICs the better the performance of an SSD.
Next you have the controller (red outline). This is what helps operating system talk to the memory and then some. It also helps over come SSD associated problems (honestly, more on this is coming up).
Finally you need to have an interface for data transfer (purple) and power (green). This is in the form of regular S-ATA connectors –both for power and data.
COURSE #103: THE FUNCTIONING OF A SSD
So far we have covered what makes an SSD both at a microscopic and gross level. Now we’ll delve into how it works.
A. NITTY GRITTY: THEORY OF OPERATION
To an operating system (OS) any form of storage is just that –storage. It, the OS, in large part, really doesn’t care if is a flash drive or a hard disk. It only needs to know that it, the operating system can fill it up with data. The operating system tells the drive to read, write, delete and over-write data. The controller on the drive helps “translate” these instructions for the drive. The controller acts as a translator between the drive and operating system.
For most operations both HDDs and SSDs work the same way. Both use Logical Block Addressing (LBA). The only difference is that the building blocks of the two are different thus information is written to different structures on a HDD and a SSD.
Remember that hard disk stores information into units called sectors. This is the smallest unit that can be read, written, erased and over written. For HDDs all operations can be performed on this unit.
As you can see for all but one operation both pretty much behave in the same way. It is only when it comes to writing when the two differ and that only because of the peculiarity of the SSD, which must erase in blocks and write (or read) in pages.
Also note that erasing does nothing. The OS marks this area as usable again (if need be), but does nothing to the data on the hard disk, until it is over written. This is the second source of issues with SSDs
B. SSD OPERATION –THE NEED FOR SPEED
SSDs offer faster bandwidth as well as improved latency over their magnetic counter parts. We have already talked about these two terms. We have also established that as SSDs are usually smaller in capacity and are usually used as OS drives and thus latency is more important than bandwidth. A lot is dependent on the “translator” (i.e. the drive controller) that sits between the drive and OS. Controllers here offer more than just literally translate. They can change the way the drive receives information in order to improve performance and reduce wear and tear.
The other, more important, factor that is responsible for making SSD blazingly fast is the way they write data to and from the flash chips on the drive. Each SSD is made up of a number of flash chips. Theoretically there are two ways a controller can interact with these flash chips:
- 1. One chip at a time
If the OS gives the drive an order to write some data, it can easily take the data and write it on to a single flash chip until it fills it up. The controller would then start using another flash chip and write data to it. This way the controller interacts with only one flash chip at a time. If the speed at which flash can be written to is “X” MB/sec, that becomes the maximum possible speed for the drive. The same process holds true for a read operation.
- 2. Multiple chips at time
This time the controller can interact with multiple flash chips on the drive. The controller has separate channels that connect it to multiple chips. The number of channels is dependent on the drive and controller architecture. Say the controller has the ability to interact with 3 chips at a time. So instead of simply writing to one chip, the controller is writing to 3. This increases the speed to 3X (where “X” as stated above is the write speed to one flash chip). The read speed would improve by a factor of 3 as well. If “X” was 10 MB/sec, now it becomes 30 MB/sec.
SSD controllers use method #2 to interact with flash chips in a drive. It is this ability of an SSD to talk to multiple chips that gives it its speed characteristics primarily.
This is contrast to a HDD which likes all its data to be together. An HDD writes data to a sector at a time. That write speed is dependent on how fast a) the writing head moves over the appropriate track and b) how fast the track spins to position the appropriate sector under the head.
HDDs rely on fast spinning platters (disks) to achieve faster data transfer numbers. They also use some memory as cache to store commonly used data. The fastest HDDs however are slower than the fastest SSDs under ideal circumstances. Even with SSD performance degradation, the newer SSDs are still faster than the fastest HDDs
To give you an example the fastest hard disks reads sequential data at about 120MB/sec. While even the slowest SSD reads sequential data at about 134MB/sec.
For small data transfers, that SSDs will usually employ, latency is more important. Random read times (the type of access more important for an SSD) for an SSD is in the range of about .2 to .7 milliseconds (ms). For the fastest hard disk this is still more than 1 millisecond usually much, much more. SSDs achieve faster latencies as there are basically no moving parts. It also achieves this as it can interact with multiple flash chips at one time.
SSDs are fast as the controller interacts with multiple flash chips to improve read and write speeds. It also has some intelligence that can speed up these operations beyond simply talking to multiple chips. SSDs also have lower latency times as compared to HDDs. For small data transfers, for which we’ll use our SSDs for now, are usually more sensitive to latency as opposed to data transfer speeds.
C. SSD READ CYCLE
Remember the picture with the LBA definition? That is basically all there is to a SSD read cycle. Let’s look at it again
The operating system talks in terms of blocks which contain information. The OS knows what block contains which information. When the user or a program requests information off the disk, the OS asks the disk controller for that block. The block knows how to “talk” to the disk and get the requisite information from it. And that is pretty much it. This process is called “mapping”. The controller maps a request made by the OS to the drive.
Read performance is dependent on how “disparately” the information is written. Remember that each IC can have its own channel to the controller.
If the information is spread over 3 ICs (dies) and each IC has its own channel (with each channel transferring at 20MB/sec), the SSD can actually read at 60 MB/sec (20x3 MB/sec).
There are other variables that also affect transfer rate, including the type of memory cell used. A SLC has better performance as compared to a MLC. Almost all consumer grade drives use MLCs as they are cheaper as compared to SLCs.
D. SSD WRITE CYCLE
The write cycles use the same LBA technique. However, things get tricky when we talk about SSD write cycles. Before we move on we need to know how many types of write operations can be performed:
- 1. Simple write operation: This is as simple as things get. The controller gets some data from the OS. It won’t write this data to the memory chips until it has combined enough to be able to write effectively to multiple chips. Once it has this amount of data it simply writes to empty cells on the flash memory chips
2. Over-write operation: An SSD will only perform an over-write when no space is available to it. It won’t overwrite even if the OS tells it to. This is to prevent flash memory from dying too quickly. An HDD on the other hand will gladly over-write data without giving it a second thought. Remember that a flash memory is good for about 10,000 erase/ over-write cycles.
3. Erase operation: An erase operation is performed as a part of write operation to free up space. An erase operation can also be performed if the SSD supports garbage collection feature. Erase operations are just as bad as over-write operations they reduce the life of flash chips. Remember that an SSD will NOT perform an erase operation if the OS tells it to. This is true for HDDs as well.
You’ll get to learn more about over-write and erase operations in the next course
As SSD performance increases with it spreading the data on to as many flash chips as possible (remember each chip can have its own channel to the controller). The controller then must try to write to as many chips as possible to improve performance (this comes with its own draw back which we’ll touch on in the next section).
To complete a write cycle a controller:
- 1. Takes the data from the OS
- 2. Finds appropriately disparate pages
- 3. Writes to those disparate pages
- 4. Tells the OS that it has done the job.
Can you for see a problem with this method? Think blocks! A SSD can write to pages but only erase blocks. Assume that the data is spread over say 10 pages on 10 ICs rather than one that was required to write the data (to improve performance). When it comes to overwrite this information the controller will have to erase 10 blocks! As erasing dramatically reduces the life of a SSD, the more overwrites the lesser the life of an SSD. This gets worse if the same blocks get erased again and again. The controller must have some “logic” to reduce this unquestionably unwanted side effect. This is coming up in the next section.
Again the sharp amongst you might have noticed another problem. This time think pages! The smallest unit that can be written to is a page, which is 4KB in size. As most SSD requests are going to be about this size (read the discussion in “The Need For Speed” section), how is it possible for an SSD to write to multiple pages when all it needs to do is to write to one? The answer is simple Write combining. The controller simply takes 4KB requests, combines them into one large request and writes them to all the ICs it can via the channels in the memory controller.
Let’s look at the write cycle from the perspective of the relation between how the OS and SSD store data. Recall that the OS stores data in form of logical addresses which are simple numbers. The SSD stores data on pages. The easiest way is to have a one to one correspondence between the logical address and the physical pages. If you have a 60GB available capacity SSD, it is bound to contain about 15 million 4KB pages! You’ll need to have one really powerful controller to be able to talk to all those pages. So how do you overcome this problem? The simplest way is to simply allocate a bigger unit to each logical address. Rather than using a page, use a block. As each block is 512KB in size and is made up of 128 pages, the amount of information that a controller has to deal with has gone down by a factor of 128. So in our case instead of 15 million pages, our controller is dealing with 117000 blocks. But there is always a catch. No such thing as a free lunch! You see if you use a large as a physical address your small file write speed would suffer. And this is exactly what made controllers from one manufacturer really crappy. This is because even if you have to write 4KB, you end up writing an entire block as your controller can’t deal with a smaller unit. To improve small file write performance (which as you will recall is the primary source of data for our SSD), the physical address must be smaller than a block. Intel and Sandforce controllers use page size mapping. Their controllers are powerful and, more importantly, intelligent enough to do page level mapping. This effects large file write speed (to an extent), but we have already decided we are not going to be doing much of that on a 60GB SSD!
Three distinct types of write operations can be performed on an SSD. Write, over-write and erase. For effectively writing to an SSD a controller uses intelligence to write combine data so that it can write to multiple flash chips. A SSD will only over-write or erase when it runs out of space. In order to prevent certain flash chips from being reused too often the controller also has some intelligence.
COURSE #104: THE ACHILLES HEELS OF SSDs
A. THE ACHILLES HEELS
In our discussion above we highlighted two issues that plague SSD performance and endurance.
- 1. SSDs only erase in blocks
- 2. Do nothing when the command to erase is given.
Both these factors are responsible for almost all of the weaknesses of SSDs.
1. ERASING (DELETING) IN BLOCKS
SSDs can write to, or read from a page, but can only erase blocks. The good part is that it only does erase commands when it runs out of space to write files (or during garbage collection operations; more on this a little later). It does not over-write areas marked as empty by the OS. Even then eventually the SSD will run out of space to write data. The drive then has no option but to issue an erase command (in order to write data). This has two repercussions.
- a. The number of erase cycles decreases the life of a SSD. Standard MLC flash goes bad after about 10,000 erase cycles. As the erase command can only be carried out on an entire block at the least, the SSD is not doing itself any favors when it comes to preserving itself!
b. Erasing in blocks slows down the SSD
c. Write ampiflication
For #a, recall that to improve speed a SSD controller likes to stagger data over multiple ICs. When the time comes to over write this data, entire blocks needs to be deleted in all ICs where the data was stored. This can be avoided by simply writing all the data together, but that would cut down the speed advantage of a SSD.
To understand #b, you need to remember that the only time a block needs to be replaced is during an over-write operation. Erasing by itself does nothing on the drive. Only the OS knows that that particular page is free and can be reused whenever needed. Think of these as “invalid” pages on your SSD, as opposed to valid pages which, well, contain valid data.
SSD controllers are usually intelligent enough to avoid erase operation simply by writing elsewhere to the disk even if the OS commands the drive to over write an existing page. But eventually there will come a time when the drive will run out of space and the controller would be forced to do an erase operation.
This is what happens during an erase cycle:
- 1. The block that is marked for erasing is read into the memory
- 2. The invalid page(s) are erased
- 3. New data is placed in the pages that were erased in step #2
- 4. The entire block is erased
- 5. The new block from memory is written to the drive
Steps 2-3 take place in memory outside the SSD storage area. This can either be inside your computer’s memory OR on cache located on the SSD. Steps 4 and 5 take place in the SSD.
As you can see an erase operation not only requires write operations but also read operations thus slowing down the drive. Every time an SSD has to issue an erase command it will be bogged down as compared to a HDD which will simply erase the sector.
Erasing in a copy, modify, erase, write cycle brings with it another problem called write amplification. This is defined as the ratio between the amount of data you intended to write and how much actually got written. In a hard disk drive, there is no write amplification. The controller simply writes what ever the OS tells it to write onto a sector. This, as we have talked about before, is the smallest unit that for read, write, erase on a HDD is a sector. On a SSD the smallest erasable unit is a block which is 128 times the size of a page! Look at the figure above. To write two blocks of data our SSD had to:
- 1. Read block into memory
- 2. Erase invalid page in memory
- 3. Copy new data to block in memory
- 4. Erase block (write step one)
- 5. Write data back from memory to SSD (Extra write step)
So essentially for what the user considers to be one write step, the drive is doing two. Thus the drive is doing twice as many writes as the user might expect it to.
Summarizing what we have learnt:
- 1. SSDs life is reduced because of erase cycles
- 2. An SSD is slowed down as it needs to copy, modify, erase and write data during an erase cycle. This is because rather than simply over-writing/erasing data it needs to perform 4 steps (copy to memory, modify in memory, erase from SSD, write to SSD).
- 3. An SSD has to write more data than the user might perceive, again this is mainly due to the multi-staged erase cycle.
2. DOING NOTHING WHEN AN ERASE COMMAND IS GIVEN
The other problem with an SSD is that an erase command is ignored by the controller. This is the same for an HDD as a SSD. For HDDs this really does not matter as they can simply over write the sector marked as “invalid” (remember sectors that the OS considers erased as invalid). For an SSD this is a 4 step process (as mentioned above). Every time an erase command is ignored the hard disk is simply collecting invalid pages (remember SSD talk in terms of pages and blocks, not sectors). When the time comes to over-write, an SSD might have to erase several block in order to free up enough space to write new data. This is responsible for slowing down the SSD over a longer period of time.
COURSE #105: HIDING THE HEELS IN REALLY BIG SHOES!
SSDs are very desirable products. They have the potential to offer lightening fast performance without the hassle of moving parts. But as has been mentioned they do have a problem or two. Over the course of the evolution of SSDs workarounds (read compromises) have been designed to overcome these “issues”. These are mostly controller based improvements. Different controllers handle these issues differently, but they all have the same aim: Make SSDs fast and reliable.
A. REDUCING ERASE CYCLES & IMROVING SSD’s LIFE
The bane of the SSD is its erase cycle. Not only does it make an SSD slower, it also reduces its life. We need to come up with a way to reduce erase cycles one way or another.
1. THERES (MORE) GOLD IN THEM THAR HILLS: OVER PRIVISIONING
One way to overcome erase cycles is to simply not erase at all. How do you say this is possible? One way is by hiding some amount of space on a SSD from the user. Imagine you have a 60GB SSD and now you have run out of all space and need to erase some stuff in order to re-write data. Let’s go back to our picture:
Rather than erasing a block to copy data, why not simply copy the data to the space “hidden” from the user. This way the erase cycle can be postponed. It will eventually be needed, but for now it can delayed. How much space is hidden from the user? About 7.5% of the disk capacity. On a 60GB drive this is about 4.5GB. This is also referred to as “spare area”. Intel controllers have a feature by which they can dynamically use un-partitioned space as “hidden space”, until no free un-partitioned space remains. Then they rely on the 7.5% of hidden space to prevent erase cycles.
Sandforce controllers employ as 20% over provisioning for their enterprise level drives (server class). For consumer drives this is about 13%.
If a SSD has a Sandforce controller and has an advertised capacity of 60GB, what is the actual capacity of the drive? 13% of 60GB is about 7.8GB. The actual usable capacity of the drive is 55.8GiB (remember GiB is Gibibytes). Thus the drive gets about 4.2 GB this way. This leaves us with about 3.6 GB of spare area to create on the drive. The simplest (and the one employed by Sandforce) is to add more flash memory. Thus a drive with a listed capacity of 60GB will actually have a total capacity of about 64GB.
This solution is called over provisioning and is illustrated in the following diagram
2. STUFFING IT UP! –OVERCOME WRITE AMPLIFICATION
The SSD controller has two options to reduce write amplification. One is mentioned above (over provisioning). By reducing erase cycles, write amplification is automatically reduced. Intel controllers rely on over provisioning as do many other controllers. Intel says their write ampiflication factor is about 1.1x
The other way is data compression. The Sandforce controllers, that are all the rage these days, have the ability to do this. They reckon that compression combined with some other nifty tricks, they can actually reduce write amplification to write reduction i.e. their write ampiflication factor is about .5x. They can do this for all forms of data except that has already been compressed maximally like videos, songs etc. Remember we took an oath to use our SSD as an OS drive only? How many movies or songs is one going to actually copy on to a OS drive? Thus the logic behind the .5x factor.
3. IMPROVING THE LIFE OF A SSD –WEAR LEVELLING
If an SSD keeps on erasing the same block over and over again to write data, this block will die off quicker than the rest of the drive. To make sure all block are treated equally (and face the same amount of torture!), the controller makes sure it keeps its erase cycles balanced. At least democracy works somewhere!
B. MAKING THE ERASE COMMAND DO SOMETHING –TRIM CONTROL!
I have been saying that a SSD only erases when there is no space left for it to write data it is asked to by the OS. I’ll make a confession: I wasn’t telling you the entire truth. There is a command that makes a SSD do an erase operation. But this is not something that the user might request (as in deleting a file). It is an autonomous command issued by the OS and thus is very OS dependent. Windows XP doesn’t have it, Windows 7 does.
Remember that both HDDs and SSDs (sort of) do nothing when an OS tells them to delete a page. They are simply marked as invalid and left to themselves. This is beneficial to an HDD as it prevent excessive write commands and potentially increases the life of the drive. For an SSD they are a source of a another problem. Degradation in performance over time. This problem does not show up in a SSD that has lots of free space. This is simply because there is lots of free space on a new SSD and erase requests are not made. When erase requests become inevitable the performance of an SSD degrades. This happens when your SSD has filled up to its capacity and must initiate the long copy, modify, erase, write cycles. Rather than simply let the invalid pages collect as garbage an SSD should have a recycler system to prevent garbage collection in the first place.
To overcome this unwanted idleness of an SSD and garbage collection, they (the SSD controllers) use an OS command known as TRIM. In a nutshell TRIM is like your house-hold waste collection and disposal mechanism. You place all your waste in bins outside your house and a collection vehicle comes in early morning to take it all away. When you wakeup the waste is (usually) all gone. The TRIM command does just that. It keeps an eye out on invalid pages and trims them from the SSD when it is idle. So when an actual request comes in to write data, the copy, modify… step can be avoided and thus prevent SSD slow downs that occur as it fills up.
In the example above you write some data to your SSD on day one and decide to delete some as well. In the end this block on your SSD has one page marked for deletion by the OS (the invalid page), one page is empty and other two have some data (in form of a movie clip). On day two you decide to delete the movie clip and thus end up with another two invalid pages. The Trim command realizes that the block has accommodated enough invalid pages (garbage) to be taken out. Transparent to the user (i.e. the user doesn’t know that invalid pages are being trashed), the trim command erases pages marked as invalid. This makes sure that if a write command comes in the SSD will not have to go through the read, modify, erase, and write cycle. It can simply write to the empty pages.
Which blocks are targeted by the TRIM command? The one with most invalid blocks!
Three mechanism help overcome most glaring short comings of an SSD:
- 1. Over provisioning –To prevent recurring read, modify, erase, and write cycles every time the SSD runs out of space to write data and also to over come write ampiflication.
2. Compression –To overcome write ampiflication
3. TRIM Command –To transparently (from the user) clear invalid pages and add them to usable free page pool.
Now that you have come up to speed on what makes a SSD and how it ticks, its time to get your degree.
In last 4 years (well 30 minutes), you have learnt the structure and the functioning of an SSD. You have learnt the potential issues and their workaround.
So congratulations! Bask in your new found knowledge. Now go read this Anandtech article and the subsequent articles it links out to. That’ll only add what you have grasped so far.