May 11, 2015

Lambdas in Spark: now you're talking!

In a previous article, I tried to figure out what the hell lambdas could be good for in Java. Short version: nice, but not compelling (mostly because of my silly example).

But lo and behold, ye of little faith. The Good (?) Lord of Functional Programming has inspired another much better example in me.

Take a look at this simple Spark program. Nothing fancy, just building key/value pairs from a text file :

In the pre-lambda world, you would build key/value pairs like this:
  "WTF?!" I hear you say and yes, I'd have to agree. This is pretty hard to read and definitely not Java at its finest.

Here's the lamba version:

Do you see the light now? Oh c'mon, just a little bit? ;)

Till next time, keep codin'.

Reading list (May 2015)

Tech books: one of the great loves of my life. There isn't much I'd rather do than grab a book, lay on the couch and immerse myself in complex systems. Feeding my insatiable curiosity. Learning new skills. Grinding my mind on the whetstone once more. Me against me, my favorite battle :)

And so here's my current reading list, with some early impressions.

AWS System Administration, by Mike Ryan (O'Reilly, 2015, early release version). Possibly the first book of its kind. Yes, the AWS online documentation is very good, but the humongous amount of information is sometimes a little overwhelming. This book is a nice introduction to most AWS building blocks, with lots of real-life advice and tons of examples. A useful compass to navigate the AWS ocean.

Designing Data-Intensive Applications, by Martin Kleppmann (O'Reilly, 2015, early release version). Subtitled "the big ideas behind reliable, scalable and maintainable systems", this book covers all major concepts and techniques used to build data stores, both for OTP and analytics : data models, storage and retrieval (yes, you will understand B-trees at last), encoding, replication, etc. Lots of illustrations, lots of examples from current technologies, lots of complex stuff explained in plain English. I like it very much so far.

Learning Spark, Matei Zaharia et al. (O'Reilly, 2015). A beginner book written by the creator of Spark (O'Reilly has another Spark book for advanced readers). This one delivers exactly what the title says and is another fine example of why O'Reilly books are the best : straight to the point and lots of examples (Python, Java, Scala). You'll be coding Spark jobs in no time. Some advanced topics are covered at the end of the book, including machine learning with MLLib.

Next on the pile :

 

User Story Mapping, Jeff Patton (O'Reilly, 2014) - Key Agile concept! Short version here.

Data Science From Scratch, Joel Grus (O'Reilly, 2015) : "Anyone who has some amount of mathematical aptitude and some amount of programming skill has the necessary raw materials to do data science". Sounds pragmatic and bullshit-free :)

PS: anyone from O'Reilly reading this? If you feel so inclined, I'll gladly accept a t-shirt or something. Thank you.