Skip to content

Category: Performance

On cache problems, and what they mean for the future

This is a disk utilization graph on a heavily loaded Graphite box. In this case, a Dell with a MegaRAID, but that actually does not matter too much.

Go-carbon was lagging and buffering on the box, because the SSD was running at its IOPS limit. At 18:10, the write-back cache and the “intelligent read-ahead” are being disabled, that is, the MegaRAID is being force-dumbed down to a regular non-smart controller. The effect is stunning.

/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp NORA -l0 -aALL
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp WT -l0 -aALL

and also, on top of that,

#Direct IO instead of cached
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp DIRECT -l0 -aALL
#Force SSD disk write cache (our SSD has super-capacitors, so it safe to enable)
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp -EnDskCache -l0 -aALL

What we observe here is part of an ongoing pattern, and we will see more of it, and at more layers of the persistence-stack in our systems.

2 Comments

“Usage Patterns and the Economics of the Public Cloud”

The paper (PDF) is, to say it in the words of Sascha Konietzko, eine ausgesprochene Verbindung von Schlau und Dumm (“a very special combination of smart and stupid”)

The site mcafee.cc is not related to the corporation of the same name, but the site of one of the authors, R. Preston McAfee.

The paper looks at the utilization data from a number of public clouds, and tries to apply some dynamic price finding logic to it. The authors are surprised by the level of stability in the cloud purchase and actual usage, and try to hypothesize why is is the case. They claim that a more dynamic price finding model might help to improve yield and utilization at the same time (but in the conclusion discover why in reality that has not happened).

Leave a Comment

The attack of the killer microseconds

In the Optane article I have been writing about how persistent bit-addressable memory will be changing things, and how network latencies may becoming a problem.

The ACM article Attack of the Killer Microseconds has another, more general take on the problem. It highlights how we are prepared in our machines to deal with very short delays such as nanoseconds, and how we are also prepared to deal with very long delays such as milliseconds. It’s the waits inbetween, the network latencies, sleep state wakeups and SSD access waits, that are too short to do something else and too long to busy wait in a Spinlock.

1 Comment

Gaming Laptops – your recommendations?

The current vacation is hard on me, because I hardly get to use my own computer – the best wife of all and the Schnuppel both compete for time on my machine in order to play Transport Fever and Cities: Skylines. That’s an annoyance not only because I can’t get the keyboard, but also because a MacBook pro apparently sucks as a gaming machine.

So this website lists a bunch of relatively recent laptops with proper graphics cards, and household peace seems to require a premade machine and a transportable device (not a desktop device).

What would be your recommendation (see above, and maybe Elite Dangerous and No Man’s Sky), and why?

16 Comments

BFQ is coming…

LWN reports that the 4.11 merge window opens. Among other things, Maik Zumstrull reminds us, we get

The multiqueue block layer finally has support for I/O scheduling. That is useful in its own right, but the real news is that it enables the merging of the long-awaited BFQ I/O scheduler. That, says block maintainer Jens Axboe, “should be ready for 4.12”.

Of course, if you are on a LTS release of a Linux kernel, it’s unlikely that you will profit from this any time soon.

Leave a Comment

OMG, our cybervaccines are failing

Dark Reading is scared: All new malware is “zero-day”, for an interesting and wrong definition of zero-day, because then the article reads much more impressive.

The actual definition of a Zero Day is a previously unknown exploit that is being used by some party to compromise a machine. In the article, the term is used differently, meaning a file that is a known malware, but has changed itself so that it has a checksum that is not in currently distributed signature catalogs of known malware.

That is of course neither correct, nor new.

Leave a Comment

Load, Load Testing and Benchmarking

(In order to be able to give up the test blog at blogspot.nl, I am moving content over)

So you have a new system and want to know what the load limits are. For that you want to run a benchmark.

Basic Benchmarking

The main plan looks like this:

The basic idea: Find a box, offer load, see what happens, learn.

You grab a box and find a method to generate load. Eventually the box will be fully loaded and you will notice this somehow.

Leave a Comment

Hipsterdoom with Mongobingo

Felix Gessert does a postmortem of the failed Parse startup and product: “The AWS and MongoDB Infrastructure of Parse: Lessons Learned“.

Technical problem II: the real problem and bottleneck was not the API servers but almost always the shared MongoDB database cluster.

And that was with MongoRocks (Mongo on RocksDB) and replacing the initial app in Ruby with a Go implementation of said thing, with WriteConcern = 1, and other horrible presets. All in all, this is like the perfect nightmare of startup architecture decisions.

Felix closes pointing at his current project:

If this idea sounds interesting to you, have a look at Baqend. It is a high-performance BaaS that focuses on web performance through transparent caching and scalability through auto-sharding and polyglot persistence.

Bingo. Also, found the Hipster.

Leave a Comment