Category Archives: Random stuff

Python Machine Learning – a really practical machine learning book

I am currently reading Python Machine Learning¬†as I wanted to know more about scikit-learn, amongst other things. It’s a very practical guide with just enough theory to make sense of it all. A lot of machine learning books dive pretty deep into the theory, which is great if that’s what you want. On the other hand, if the idea is to get doing something fast, this book seems like a great place to start. It’s always easier to delve into theory when its relevance is clear and there’s nothing like actually writing and running code to get a feel for relevance.

One core == one thread

Back in the dark ages, when CPUs only had one, two or maybe four cores, the idea of dedicating an entire core to a single thread was ridiculous. Then it became apparent that the only way to scale CPU performance was to integrate more cores onto a single CPU chip. Now there are monsters like the Xeon E5-2699 v4 with 22 physical cores – not to mention Knight’s Landing with 72! People started wondering – how to use all these cores in a meaningful way without getting bogged down in delays from cache coherency, locks and other synchronization issues.

Turns out the answer may well be to hard-allocate threads to cores – just one thread locked into each core. This means that almost all of an application can be free of kernel interaction. This is how DPDK gets its speed for example. It uses user space polling to minimize latency and maximize performance.

I have been running some tests using one thread per core with DPDK and lock-free shared memory links. So far, on my old i7-2700K dev machine (with another machine generating test data over a 40Gbps link), I have been seeing over 16Gbps of throughput through DPDK into the shared memory link using a single core without even trying to optimize the code. It’s kind of weird seeing certain cores holding at 100% continuously, even if they are doing nothing, but this is the new reality.

An interesting question comes out of this though: is this the end of Java/Scala and the JVM for high performance applications? Can the JVM support this model of operation fully? I don’t know. Fashions change of course, even when it comes to programming styles. In the old days, every CPU cycle mattered. More recently, people were happy to waste cycles to get things like automatic garbage collection. Maybe now the tide is turning back to every CPU cycle mattering again. And they are some really powerful cycles now!


A new mini-server is born

MiniServerJust built this sweet, small and very quiet water cooled i7-6700K system with 16GB DDR4 and a 256GB NVMe SSD disk. Even though the case is a pretty small mini-ITX style, there’s plenty of room. There’d be even more if I had got a modular power supply to avoid the cable mass, most of which is unused and sitting on top of the radiator. It will eventually be used for GPU work and the case will take a decent size card when I get to that.

The rotating magnetic disk – about as useful as an RS-232 cable

Old DisksOk, that’s a bit unfair but I have resolved that I have purchased my last conventional hard disk. I was doing some clearing up and came across this disk graveyard. Old hard disks are a pain because, unless the internals are destroyed, you can never be too sure what’s on the disk and whether someone can access your super-sensitive personal information. Given enough time and energy it is possible to use adaptors to connect them to PCs and securely erase them if they still work to that extent but, in the end, it is much easier to throw them in a box and forget about them…until you accidentally find them when clearing up of course.

SequenceSafe – making personalized medicine more personal

SequenceSafeOk, that’s not a real device – just a mock-up of something that I think might be useful. SequenceSafe is intended to solve three problems:

  • Giving everyone access to their sequenced genome in a way that brings positive benefits to their healthcare.
  • Solving one of the biggest challenges of mass whole genome sequencing – where to put the data.
  • Ensuring that a person’s sequenced genome stays confidential.

The idea is pretty simple. An individual sends a sample for sequencing and the result is a SequenceSafe that holds the sequence data. The owner keeps this in a safe place and takes it along to doctor or pharmacy visits. The SequenceSafe never releases the whole sequence. Instead, the device acts as an oracle – health care professionals can ask it questions about aspects of the sequence and get responses. Health care professionals can connect to the device via bluetooth or USB and SequenceSafe only operates once the owner has authorized its use with a fingerprint.

Continue reading