Skip to content

Category: Computer Science

Google Next 2017, Amsterdam Edition

On June, 21 there was the “Google NEXT” conference, 2017 edition, in the Kromhouthal in Amsterdam. Google had a dedicated ferry running to ship people over to the IJ north side, delivering directly at the Kromhouthal.
 
The event was well booked, about 1400 people showing up (3500 invites sent). That is somewhat over the capacity of Kromhouthal, actually, and it showed in the execution in several places (Toilet, Catering, and room capacity during keynotes).
 
The keynotes were the expected self-celebration, but if you substract that, they were mostly useful content about the future of K8s, about Googles Big Data offerings and about ML applications and how they work together with Big Data.
 
For the two talk slots before the lunch, I attended K8s talks. After lunch, I switched to the Big Data track. I did not attend any ML stuff, and I missed the last talk about Spanner because I got sucked into a longer private conversation.
Leave a Comment

On cache problems, and what they mean for the future

This is a disk utilization graph on a heavily loaded Graphite box. In this case, a Dell with a MegaRAID, but that actually does not matter too much.

Go-carbon was lagging and buffering on the box, because the SSD was running at its IOPS limit. At 18:10, the write-back cache and the “intelligent read-ahead” are being disabled, that is, the MegaRAID is being force-dumbed down to a regular non-smart controller. The effect is stunning.

/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp NORA -l0 -aALL
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp WT -l0 -aALL

and also, on top of that,

#Direct IO instead of cached
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp DIRECT -l0 -aALL
#Force SSD disk write cache (our SSD has super-capacitors, so it safe to enable)
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp -EnDskCache -l0 -aALL

What we observe here is part of an ongoing pattern, and we will see more of it, and at more layers of the persistence-stack in our systems.

2 Comments

Scaleway now with 2FA

Cloud Provider Scaleway now has ARM64 based bare metal in Amsterdam. They are also now offering 2FA auth based on Google Authenticator (or other, compatible 2FA apps).

No U2F token support, yet, though (but still a better solution than steam).

This blog is hosted on a Scaleway instance.

Leave a Comment

“Usage Patterns and the Economics of the Public Cloud”

The paper (PDF) is, to say it in the words of Sascha Konietzko, eine ausgesprochene Verbindung von Schlau und Dumm (“a very special combination of smart and stupid”)

The site mcafee.cc is not related to the corporation of the same name, but the site of one of the authors, R. Preston McAfee.

The paper looks at the utilization data from a number of public clouds, and tries to apply some dynamic price finding logic to it. The authors are surprised by the level of stability in the cloud purchase and actual usage, and try to hypothesize why is is the case. They claim that a more dynamic price finding model might help to improve yield and utilization at the same time (but in the conclusion discover why in reality that has not happened).

Leave a Comment

Google: “Federated learning”, Apple: “Differential privacy”

Google is using a strategy called “Federated Learning” to keep privacy sensitive data being used for AI purposes private. They basically download a preliminary model to the phone, modify the data with the observed behavior on the phone and upload the diffs back to Google Cloud, where they merge it to the existing data.

Apple uses “Differential Privacy“, where they add noise to the data so that observed privacy sensitive data observed in the cloud for one user may or may not be actually true, but individual noise contributions even out statistically over the whole data set.

Meanwhile, #neuland talks about Datenkraken and does… nothing?

3 Comments

Shit found on github: crashos

Infinite Fun in Infinite Combinations:

CrashOS is a tool dedicated to the research of vulnerabilities in hypervisors by creating unusual system configurations. CrashOS is a minimalist Operating System which aims to lead to hypervisor crashs, hence its name. You can launch existing tests or implement your owns and observe hypervisor behaviour towards this unusual kernel.

I think you might want to talk to your hoster first.

Leave a Comment

Shit found on github: bork

I> Good Morning!

M> And to you, too!

I> I hope you have a nice and technically reliable start into a new week full of hope!
Kris> Here, catch. https://github.com/mattly/bork

M> »A bash DSL for config management« And it’s not even chistmas, yet.

I> “the Swedish Chef Puppet of Config Management. […] Bork is written against Bash 3.2 and common unix utilities such as sed, awk and grep. It is designed to work on any UNIX-based system and maintain awareness of platform differences between BSD and GPL versions of unix utilities.
Kris> Reinventing configure.in with root privilege!

M> When you’re happy with your config script, you can compile it to a standalone script which does not require bork to run. The compiled script can be passed around
M> Like Scabies.

7 Comments

You are not Google (use UNPHAT)

You are not Google: Ozan Onay suggests to UNPHAT:

Next time you find yourself Googling some cool new technology to (re)build your architecture around, I urge you to stop and follow UNPHAT instead:

  1. Don’t even start considering solutions until you Understand the problem. Your goal should be to “solve” the problem mostly within the problem domain, not the solution domain.

  2. eNumerate multiple candidate solutions. Don’t just start prodding at your favorite!

  3. Consider a candidate solution, then read the Paper if there is one.

  4. Determine the Historical context in which the candidate solution was designed or developed.

  5. Weigh Advantages against disadvantages. Determine what was de-prioritized to achieve what was prioritized.

  6. Think! Soberly and humbly ponder how well this solution fits your problem. What fact would need to be different for you to change your mind? For instance, how much smaller would the data need to be before you’d elect not to use Hadoop?

 

4 Comments

A case for IP v6

So when companies talk about IP V6, it is very often at the scope of “terminating V6 at the border firewall/load balancer and then lead it as V4 into the internal network. Problems that arise there are most often tracking problems (»Our internal statistics can’t handle V6 addresses in Via: headers from the proxy«).

But when you do containers, the need for V6 is much more urgent and internal. Turns out that Docker Port Twiddling is exactly the nuisance that it looks like and networkers strongly urge you to surgically remove all traces of native Docker networking bullshit and go all in on IP-per-Container. Mostly, because that’s what IPs are for: Routing packets, determining their destination and stuff. Networkers have ASICs and protocols that are purpose-built for this stuff.

Now, let us assume you have a modern 40- or 56-core machine that you are running stuff on in your Kubernetes cluster. It means that you will easily at least 30 and up to 100 pods per machine. In a moderately sized cluster with some 100 nodes you get to use 100×100, 10.000 IPs to handle that. And because IP space is not handed out in sets of one, but in the form of subnets per node, you will have need for more than 10k addresses. Expect to consume a /17 or /16 to handle this.

Even if you are digging into 10/8 for internal addressing here, this is going to be a problem – it’s unlikely that you will be able to use all of 10/8, because non-cluster things exist, too, in your environment, and you will likely have more than one cluster.

With V6, things are becoming a complete non-issue, with the minor issue of getting V6 running on the inside of your organisation.

Leave a Comment

Good Riddance to the Query Cache

MySQL 8.0 will be retiring support for the Query Cache.

The MySQL Query cache is a result cache: The MySQL server will record all result sets that are small enough to keep in the cache, and a hash of the query that produced it. If a query meets certain requirements, and the hash of the same query string is ever seen again, the query will not be actually parsed and executed, but the same result set will be replayed.

There are mechanisms in place that prevent uncacheable queries from being cached in the first place, and that prune outdated data from the query cache.

The query cache exists in the first place, because it was easier to create than to teach every PHP CMS developer in the world about sessions. So instead of retrieving the current background color of the current theme over and over from the database, the query cache recognizes the current theme color query again and just replays “green” over and over.

But that was then.

Leave a Comment