Skip to content

Category: Computer Science

Unlearning Descriptive Statistics

Anscombe’s Quartet by Schutz

Unlearning Descriptive Statistics explains many things you should know about working with Numbers that your Statistics Class in University probably did not explain properly.

If they did, maybe Graphite would not hurt so much, with all the Averaging going on where it shouldn’t, and maybe Gill Tene would not have had to give talks like How NOT to measure latency (which is awesome, by the way and if you haven’t seen this talk, do it right now).

From the Intro of Unlearning:

If you’ve ever used an arithmetic mean, a Pearson correlation or a standard deviation to describe a dataset, I’m writing this for you. Better numbers exist to summarize location, association and spread: numbers that are easier to interpret and that don’t act up with wonky data and outliers.

Statistics professors tend to gloss over basic descriptive statistics because they want to spend as much time as possible on margins of error and t-tests and regression. Fair enough, but the result is that it’s easier to find a machine learning expert than someone who can talk about numbers. Forget what you think you know about descriptives and let me give you a whirlwind tour of the real stuff.

Go, read the rest.

Leave a Comment

Mandatory Widevine (Browser Video DRM) in Chrome

Changes are coming to Chrome. Not all of them are good.

For example the ability to actually view the details of a TLS certificate in Chrome has been moved far away into a hard to reach Developer menu.

Most Chrome plugins have been disabled and removed, and the chrome://plugins page will go away very soon (Chrome 57 and later). The remaining Plugins cannot any longer be disabled (Bug report). This will also silently re-enable disabled plugins.

One of them is the Widevine video DRM plugin, and that is widely seen as very problematic, for security and legal reasons.

Leave a Comment

git Improvements for Monorepos

Microsoft has been doing things to git, they report.

[W]e […] have a handful of teams with repos of unusual size! For example, the Windows codebase has over 3.5 million files and is over 270 GB in size. The Git client was never designed to work with repos with that many files or that much content. You can see that in action when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.

What Microsoft is doing here is called a Monorepo approach. It not insane, has many advantages and is being discussed at length at Dan Luu, and is also in use with Facebook and Google and in many other places. But git is running into problems handling very large Monoreports, as discussed in an article at Atlassian.

What Microsoft GVFS does, according to their paper, is addressing the issues git has instead of working around them. And that is an awesome thing.

Leave a Comment

It’s not an APT, it’s just you sucking at basic IT

Dr. Ian Levy

So El Reg has spoken to Dr. Ian Levy, the chief technical director of GCHQ. And Levy goes:

“If you call it an advanced persistent threat, you end up with a narrative that basically says ‘you lot are too stupid to understand this and only I can possibly help you – buy my magic amulet and you’ll be fine.’ It’s medieval witchcraft, it’s genuinely medieval witchcraft.”

and continues

He pointed out that a UK telco had recently been taken offline using a SQL injection flaw that was older than the hacker alleged to have used it. That’s not advanced by any stretch of the imagination, he said.

So there you have it. It’s not an APT. It’s you sucking at running an IT organisation.

Leave a Comment

Damian Gryski on Go Slices and CPU Caches

Booking.com’s Damian Gryski on Go Slices and CPU caches (17 minutes, english language)

The dot Post: »Modern computers have multiple layers of caches between the processor and main memory. Algorithms which effectively use these caches can be orders of magnitude faster than those that don’t. Damian looks at how using slices can make your inner loops more cache friendly.«

1 Comment