20170521

rPi 3 vs Parallella benchmark

Been knee deep in C# lately, so updates have been rather sparse. Well, non existent really. Anyhow - got the desire to have a little fun with my Pi3 and another favourite of mine, Go.

Basic premise; see how faster a bit of Go routines would make a simple program. Copy - Paste from the reddit thread I made for this project:

---

First, I do not have a Parallella - so I'm using the numbers from this talk:

Parallella Demonstration

The gist of it is, it got 2 ARM cores and 16 custom RISC cores - and testing it they run a program written in C to test performance running in serial (on one ARM CPU core) and then in parallel (on all 16 risc cores). Both times finding the prime numbers between 0 and 16 million.
Results are:
  • Serial: ~4min
  • Parallel: ~18sec
So decided to do the same on the Pi, though using Go rather than C as it is easier to spin up multiple threads in that language. Go is not quite as efficient as C, so some performance is 'lost in translatation' as it were. But the difference should be minimal.

Also a shoutout to /u/siritinga who helped me getting it working as intended and as efficiently as possible.

Full source (and a native ARM rPi binary if not wanting to compile yourself) on github

Final on Raspberry Pi 3 result was:
  • Serial: ~2min19sec
  • Parallel: ~58sec
Not bad! The Pi 3 is much faster in single thread - but not surprisingly slower in parallel. Taking about 3 times as long with 1/4 (4 vs 16) of the cores compared to the Parallella :)

edit: Upping the number of Go Routines to 16 reduced the result down to 53 seconds - shaving 5 seconds or nearly 10% off from when only running 4 Go Routines.

Will look into actual optimizations rather than just throwing more threads at the problem...