Vista and External Memory Devices

Update – read the first two comments. I'm leaving the rest of the article as it is in order to avoid revisionism. The solution is in the first two comments though.

According to the Windows Vista feature page, Vista is going to be able to use external memory devices (USB flash drives and the like to you and me) to act as extra memory to save having to go to the hard disk. I've heard this mentioned at a few places, and it's always asserted that EMDs are slower than memory but "much, much faster" than disks. This has just been stated as a fact that everyone would just go along with. I've been a bit skeptical myself, so I thought I'd write a couple of very simple benchmarks. I emphasise the fact that they're very simple because it could well be that I'm missing something very important.

Here are three classes. Writer just writes out however many blocks of 1MB data you ask it to, to whichever file you ask it to. Reader simply reads a whole file in 1MB chunks. RandomReader reads however many 1MB chunks you ask it to, seeking randomly within the file between each read.

Writer

using System;
using System.IO;

public class Writer
{
    static void Main(string[] args)
    {
        Random rng = new Random();
        
        byte[] buffer = new byte[1024*1024];
        
        DateTime start = DateTime.Now;
        using (FileStream stream = new FileStream (args[0], FileMode.Create))
        {
            for (int i=0; i < int.Parse(args[1]); i++)
            {
                rng.NextBytes(buffer);
                Console.Write(".");
                stream.Write(buffer, 0, buffer.Length);
            }
        }
        DateTime end = DateTime.Now;
        Console.WriteLine();
        Console.WriteLine (end-start);
    }
}

Reader

using System;
using System.IO;

public class Reader
{
    static void Main(string[] args)
    {
        byte[] buffer = new byte[1024*1024];
        
        DateTime start = DateTime.Now;
        int total=0;
        using (FileStream stream = new FileStream (args[0], FileMode.Open))
        {
            int read;
            while ( (read=stream.Read (buffer, 0, buffer.Length)) > 0)
            {
                total += read;
                Console.Write(".");
            }
        }
        DateTime end = DateTime.Now;
        Console.WriteLine();
        Console.WriteLine (end-start);
        Console.WriteLine (total);
    }
}

RandomReader

using System;
using System.IO;

public class RandomReader
{
    static void Main(string[] args)
    {
        byte[] buffer = new byte[1024*1024];
        
        Random rng = new Random();
        DateTime start = DateTime.Now;
        int total=0;
        using (FileStream stream = new FileStream (args[0], FileMode.Open))
        {
            int length = (int) stream.Length;
            for (int i=0; i < int.Parse(args[1]); i++)
            {
                stream.Position = rng.Next(length-buffer.Length);                
                total += stream.Read (buffer, 0, buffer.Length);
                Console.Write(".");
            }
        }
        DateTime end = DateTime.Now;
        Console.WriteLine();
        Console.WriteLine (end-start);
        Console.WriteLine (total);
    }
}

I have five devices I can test: a 128MB Creative Muvo (USB), a 1GB PNY USB flash drive, a Viking 512MB SD card, my laptop hard disk (fairly standard 60GB Hitachi drive) and a LaCie 150GB USB hard disk. (All USB devices are USB 2.0.) The results are below. This is pretty rough and ready – I was more interested in the orders of magnitude than exact figures, hence the low precision given. All figures are in MB/s.

Drive Write Stream read Random read
Internal HDD 17.8 24 22
External HDD 14 20 22
SD card 2.3 7 8.3
1GB USB stick 3.3 10 10
128MB USB stick 1.9 2.9 3.5

Where possible, I tried to reduce the effects of caching by mixing the tests up, so I never ran two tests on the same location in succession. Some of the random reads will almost certainly have overlapped each other within a test, which I assume is the reason for some of the tests showing faster seek+read than streaming reads.

So, what's wrong with this picture? Why does MS claim that flash memory is much faster than hard disks, when my flash drives appear to be much slower than my laptop and external drives? (Note that laptop disks aren't noted for their speed, and I don't have a particularly fancy one.) It doesn't appear to be the USB bus – the external hard disk is fine. The 1GB stick and the SD card are both pretty new, although admittedly cheap. I doubt that either of them are worse quality than the majority of flash drives in the hands of the general public now, and I don't expect the average speed to radically increase between now and the Vista launch, in terms of what people actually own.

I know my tests don't accurately mimic how data will be accessed by Vista – but how is it so far out? I don't believe MS would have invested what must have been a substantial amount of resource into this feature without conducting rather more accurate benchmarks than my crude ones. I'm sure I'm missing something big, but what is it? And if flash can genuinely work so much faster than hard disks, why do flash cards perform so badly in simple file copying etc?

8 thoughts on “Vista and External Memory Devices”

  1. I am not an expert on this by any means, but I heard that the difference was for random seek time, where flash is much better than HD (while they are about the same for sequencial read).

    I would also guess that you are testing stuff too big.
    Try checking the same amount of data, but in 4Kb, 16Kb, 32Kb, 256Kb chunks, which is what I believe the paging system is using. That may be where the sweet spot is.

    Like

  2. Ah, that was it. I realised that seeking might have been important, but stupidly didn’t spot that when reading a large chunk of data, the seek could easily be the cheapest part, making it less significant as a differentiator.

    Changing the RandomReader to read 4K blocks made it about 15 times faster on the USB flash device than with a hard disk.

    However, the data still needs to get onto the flash device in the first place – and that’s going to take a while, by the looks of it. I guess that can be done in the background where the transfer time isn’t important, however.

    Good to see my blog ending up as a way of broadcasting my stupidity ;)

    Like

  3. Well, I also ran the programs on my RAID 0 SCSI disks in my Dell box, and on a kingston usb2 512MB USB stick. I can tell you, nothing beat the scsi rig. if I execute the tests 10 times (specify 10 as second parameter) the reads/randoms aren’t even noticable (00:00:00 :P) on the scsi harddisks though are on the usb stick seeks. Also with 4KB blocks.

    So it might work on laptops, it’s not going to be as simple on desktop boxes with fast harddisks.

    Like

  4. So just to check Frans – the test I did which showed flash being fast was changing RandomReader to have only a 4K buffer instead of 1MB, then running it with 200 reads. Note that you’ve got to do this when the data isn’t cached, otherwise it really will be instantaneous.

    I’ll try it on my desktop at work…

    Like

  5. The easiest way to flush the disk cache is to just use Writer again to write another 1GB file – then use RandomReader on the first file you wrote.

    I tried the test on my desktop at work, and the seeking was much faster than at home – I think it was about the same speed as the flash drives. My guess is that this will be a feature for laptops much more than desktops. That makes sense in another way too – desktops tend to be relatively easy to upgrade in terms of memory, as there are usually more slots available.

    Like

  6. Also to perform writes directly to the disk you can use FileOption.WriteThrough.

    About seeking, IMO in flash drives there’s no mechanical part, so seeking there must be relatively fast.

    Also memory devices such as flash drives are slow on writing, but can be fast, when reading

    Like

Leave a comment