29 July 2021

The Part Of Computing That Nobody Ever Talks About

Whenever people talk about computers, whether it's CPUs, GPUs, RAM speeds, HDDs, SSDs, or networking, they will always put each of those components through its paces with some kind of test or benchmark and/or a suite of benchmarks.

At the component level, that's usually useful and good enough so that people can be informed about how one part compares to another part so that you can make informed, decisions about your current and/or future purchases and that's fine and dandy and all.

Today, I'm going to be talking about something that people in the computing and IT industry/world (especially for general public consumption) that is pretty much NEVER talked about as far as I can tell/see.
 
In the computing world, people will talk about how much data a CPU or a GPU is able to process, whether it's virtual machines, hyperscalers, databases, HPC/CAE applications, machine learning, etc.
 
In the storage subsystem world of HDDs and SSDs (or PMEM), they'll talk about a drive's performance in terms of sustained transfers rates (STRs) or the number of input/output operations per second (IOps) that a drive can deliver.

In the networking world, they'll either talk in millions of messages sent/received per second, the latency, or the raw bandwidth.

But notice that NONE of these ever talks about, for example, what it really means when you put it all together.

Take HPC/CAE for example. One simulation can generate terabytes (TBs) of data, if not more. A LOT more. As far as I can tell, NOBODY in the entire IT/computing industry talks about what do you do with all of that data and/or how to manage it that volume of data.

Moving the data around, organising it, making sure that it's consistent, especially when you have say upwards of 10 million files (I'm currently just shy of 7.5 million files (7,499,865 files to be exact as of the last scan, as of this writing).) - how nobody every talks about nor benchmarks what it's like to manage that much data.

For example, let's say that I want to update the user and the group that owns all of the files on a given system. That process, on my systems where one is hosting 2.8 million files and the other system is hosting close to 4 million files, can take anywhere between 40 minutes to an hour (each). And sometimes/usually, I will run those tasks overnight so that there are no other changes happening on the file system/server and even then it still takes between 40 minutes to an hour (each) just to do that. So upto 2 hours, JUST to update the user and the group that owns the files. Nothing else.

Why aren't people talking about the data processing speed of having to do just this kind of a basic, simple task?

Whenever people benchmark CPUs or drives, nobody bothers to talk about something like this.

Now you might argue that this isn't done very often, but that isn't the point.

The point is that there has been tremendous improvements that we've made to CPUs and GPUs and HDDs/SSDs and networking, but as a collective system, because nobody tests this, therefore; there aren't any improvements that people are making to actually make these operations run/go any faster.

And more broadly speaking, people will spend a LOT of time talking about, for example, how fast a CPU or GPU is, or how fast a drive is, or how fast networking is.

Nobody talks about how fast it is when you put it all together and you have to manage a (relatively large) volume of data.