blog.cleverTitle = ???

Fork me on GitHub

Coursera: "Machine Learning" Review

Machine Learning is a Coursera class that aims to provide students with basic understanding of the field.
What's interesting about it is that it's the course - one of the three courses that started the whole MOOC movement.

You'll have the opportunity to learn and play around with a wide array of different ML techniques that are used in real life world even now.

On the course page programming is listed as a pre-requisite, but I'd say that's a bit off.
Sure there are programming assignments, but you can get away with very basic knowledge like var assignments, loops and if statements.

Now, what's not stated on the website are math & statistics requirements, which actually threw me off quite a bit, because
at the time of taking this course I had maybe pre-calculus level of understanding math and a lot of concepts just whooshed past me.

I'm now in the middle of Calculus One course and constantly get the "Oh, that's why X is like that" moments.
So I would say that in order to get the most out of this course, you need at least some calculus + basic statistics knowledge.

I've taken the October 2013 version, which might differ slightly from future offerings, but I doubt there will be any drastic changes.

Motivation

Well, Machine Learning is a field that I was passionate about even before learning how to program.

Something about the idea of creating an artificial mind that can, with minimal supervision, learn about the world on it's own and potentially
even surpass it's creators in intelligence some day, gives me the chills of excitement.

I wonder if that's how Skynet will be born?...

Passing specifically this course was also a personal goal of mine: I've actually taken it back when it all began, but couldn't keep up with the
material and dropped out. So this class was a sort of a benchmark for me. To know that my general knowledge and ability to manage time
improved enough that I can do it now.

Goals

What I expected to gain from this class was first and foremost demystification of common terms of the field like NN, KNN, SVM and so on.
Ideally, I would like to gain enough knowledge to be able to apply it in my pet projects or maybe even everyday work.

Overview

This course has a very weird pace to it.

It starts off rather slow, having no programming assignment for first week and reasonably hard one for second week.
Then it gets really steep really fast, so by the end of week 4 I was ready to give up once again. But then suddenly after that
the difficulty drops significantly and never really goes back up.

So let me re-iterate this for any future students: you just have to push hard through first 3-4 weeks and then you're golden.

Homework was based on Octave programming language, which I guess is to Matlab what R is to S-Plus - a powerful OS alternative.
But as I mentioned in the intro, it's not really so much programming as understanding math & then translating it into code.

And actually this was kind of my biggest "Aha!" moment of the course - I realized that ML is not about computers, programming or even AI,
it's actually a framework for extracting a matrix of data by issuing a bunch of matrix to matrix operations.
I'm obviously greatly over-simplifying the complexity of the field, but this realisation really helped me in understanding how think about it.

My favorite moment was at assignment #3, where the task was to implement an NN that recognized hand-written digits.
Even though my hand was held the entire time, seeing a program that I personally coded correctly identify numbers was probably one of the coolest moments in my programming journey.

What I liked

The content: it was easy enough to follow without background, even though it came at a cost of missing out on in-depth understanding.
Oh and Andrew Ng always showed real life applications for every topic he covered. For me this eventually turned into a game:
to guess what techniques a major tech company might be applying here and there.

What I disliked

I think difficulty curve could use some tweaking. First few weeks were too hard, but later were too easy even.

But a bigger problem I see is that content age starts to show.
Course was recorded in 2009-ish and so newer trends in ML like deep learning aren't even mentioned.

Effort required

Here's the breakdown of effort I had to put into this course.*

Lectures*: 19 hours
Review: 5 hours
Homework: 20 hours
Total: 44 hours

Homework average was ~ 2.5 hours.
The longest was assignment #1, which took 4 hours, almost 20% of overall time! While the last few homeworks only took around 1.5 hours.

*Listed are actual hours I've put into working through the course, so 1 hour literally means 60 minutes of work, not 20 min homework, 20 min twitter, 20 min homework

*To save time, I've usually watched videos at 1.2-1.5 speed. This may or may not work for you, for me it was a necessity and thankfully it did.

Final verdict

Overall I think this course is a solid option for anybody looking to broaden their CS knowledge.
I think it was completely worth my time and the goals I set up were achieved.

I wish the content was updated with latest trends in the industry, but it's not a show stopper by any means.

Final Score*: 7/10

*based on arbitrary set of rules that are decided upon by running 1,000,000 Monte-Carlo simulations of rolling an uneven dice.

Leave a comment
Fork me on GitHub

Coursera: "FP Principles In Scala" Review

Functional Programming Principles in Scala is an Coursera course about.. well.. FP and Scala.
What's interesting is that it's taught by the creator of Scala himself, Martin Odersky.

It is aimed at programmers who are confident in at least one programming language and want to expand their views into the world of FP.
On the website it states that experience with Java/C# is preferred, but I've successfully completed it with almost purely PHP background.

Oh and I've actually taken this back in September, but figured that it will be no less relevant to people thinking about taking a future offering.

Motivation

I've actually been intrigued by both Functional Programming and Scala for awhile now, but always found excuses not to look into them.
And what better way to stop looking at red panda pictures find time than to combine these goals and learn it from a guy that knows it best.

So after watching the intro video I quickly jumped into week 1 content and never looked back.

Goals

I think that to find out whether the course was a success for me is to set measurable and realistic goals for the end result.
So what I expected to gain here is to have basic understanding of what FP is about and be able to write proper Scala code on my own.

Overview

Right off the bat, the course started roughly for me - I was thrown right into a foreign language, foreign paradigm and foreign type system.
It felt like I had a good grasp of the videos, but the homework threw me completely off - I felt completely lost.

Especially week 2 homework: it felt like bashing my head against the wall for longer than I care to admit.
Actually, it was probably the hardest out of the whole course. However, so was the level of satisfaction when it finally clicked.
What's funny is that looking at the task now, it seems obvious and something I could've handled in less than an hour, but alas.

After that period of acclimatization, the pace was very much steady and I had no further issues when following the content.
The lecture videos worked without any lag, homework grading server was up 24/7 and TAs were very active in the forums.

If I had to pick a favorite week, it would be #6, where we're taught elegant ways to "query" over a data set, kind of like C# LINQ works.

Oh and the wrap-up homework was very epic.
As a PHP guy, I was not exposed to concept of streams before and that homework perfectly showcased the power of those.

What I liked

Aside from the content itself, I want to point out that I have really enjoyed Martin's way of presenting the content.

What he left me with is the impression of somebody who's very passionate about his project. Might not seem much to Java/C#/C++ guys,
but for me seeing creator of the project still very much "in love" with it gives me reassurance that Scala has a very bright future ahead of it.

What I disliked

There's basically only one big quarrel I have with this course and that's week 1's homework.
It was hard, it was kind of subjectively irrelevant to what was taught in the lectures and would've made me struggle even in PHP.

And to be perfectly honest, this course had more "Scala" and less "FP" in it.
I wish Martin would explain some theory behind various FP features or maybe even compared how they're implemented across languages.
Then again that would require even more time investment into an already comprehensive course, so understandable that it's not there.

Effort required

Here's the breakdown of effort I had to put into this course.*

Lectures*: 16 hours
Review: 6 hours
Homework: 35 hours
Total: 57 hours

Homework average was 5 hours.
The longest was week #4 one, which took 9 hours - 25% of overall time! And last homework took only 4 hours, despite being quite epic.

*Listed are actual hours I've put into working through the course, so 1 hour literally means 60 minutes of work, not 20 min homework, 20 min twitter, 20 min homework

*To save time, I've usually watched videos at 1.2-1.5 speed. This may or may not work for you, for me it was a necessity and thankfully it did.

Final verdict

Overall I think this course is one of the best out there in terms of both quality and content. I think it was completely worth my time and the goals I set going into it were most definitely achieved.

I haven't had a chance to really dive into Scala coding yet, but here's a glimpse of what you could expect to be able to do as "alumni".

Final Score*: 8/10

*based on arbitrary set of rules that are decided upon by running 1,000,000 Monte-Carlo simulations of rolling an uneven dice.

Leave a comment
Fork me on GitHub

Rethinking developers blog setup

The idea

Picture this: you're reading this awesome blog post, but author has a small typo or factual error. What do you do?

Allow me to answer that. You either:

  • (Most likely) Ignore it because it doesn't seem important enough to warrant a comment
  • Write a weird comment that goes something like this: "there's a typo in 5-th word of 3-rd sentence of 4-th paragraph"

How many times have you thought that you might have contributed useful information, but it was waaay too much of a hassle?
How many times have you scrolled through a popular post's comments section full of "Hey, you have a typo here and this is all wrong"?

For me, the count is over nin... a lot.
And it always baffled me how we can be so great at collaborating on our code and yet so ineffective when it comes to sharing knowledge.

At some point I caught myself thinking "Meh, wish I could just open a PR against this blog post" and then it hit me: why not?

And then the more I thought about it the more it made sense.
We're already using git & github to manage our code, why not use it to manage & collaborate on our knowledge?

The concept

And thus the concept for my new blog was born:

  • It should be fast
  • It should use only the most necessary code
  • It should be possible to store content on something like Github
  • It should be Markdown & GFM powered

Additionally, I decided that it should be PHP-based (funny enough, not because I'm a PHP dev, I actually started coding it in Ruby).
The reason is the same as why Wordpress & Joomla are still so popular - it's dead simple to get it up & running.
And I'm sure that's very arguable, but hey - I'm developing a blog system for myself here, so get off my back, imaginary reader!

Note: Jekyll kinda does this already, but not quite. My biggest quarrel with it is that it adds a lot of clutter to the content repo.
To the point where the repo feels like a config dump for the website, rather than an invitation for contributions.

The proof of concept

And so I present to you the PoC for my idea in the form of this blog. It is:

  • Very simple (~200 loc)
  • Powered by Silex + Ciconia
  • Heavily relies on HTTP cache
  • Fully embraces Markdown + GFM
  • Whole content is up on github
  • Content repo is compact and easy to navigate

But wait, there's more! Only after implementing it, I realized that I got 3 additional features for free:

  • Content is versionable out of the box
  • I get a very well-polished Markdown editor on Github
  • Users can consume the content directly on Github if the wish

I'm actually quite happy with how it worked out. However, it's not polished enough to be open sourced yet.
And the reason is simple: I want to get some feedback first before I sink too much time into it.

So what do you guys think? Do you like the idea? Would you use a polished implementation? Do you have any suggestions?

Leave a comment
Fork me on GitHub

PHP 5.3 Zend Certified Engineer

PHP Zend Certification - or ZCE is something that I wanted to achieve at various points in my career, for various reasons.
And yesterday I finally did it, even though it's probably just a small achievement now, rather than a major career turning point.

Why did I go for it? How did I prepare for it? Was it hard? Am I a PHP Guru now?

The why:

As a junior developer, I viewed ZCE as a benchmark. I believed that by passing it I can be considered a top tier PHP developer.
Right now I realize that simply knowing one programming language well is not even remotely close to what being a very good developer or problem solver really is.

You must know a lot more than just function arguments order and language syntax specifics.
That said, ZCE is still the best way for PHP developer to know how he is compared to others.
So while being a ZCE doesn't (and shouldn't) show that you're a PHP guru, not being able to pass it should raise some flags.

The how:

No idea why, but Zend decided not to offer practice tests for PHP 5.3 and I didn't have $1000 to spare for training, so I had to resort to what I could find on the internet.

For studying:

A lot of people seem to suggest PHP Manual, but I found it detrimental. PHP has a lot of weird and situational functions. While knowing them is fun, they're useless when trying to memorize as much as possible.
What helped me a lot were blog posts by other examinees:

For testing yourself:

  • phpriot quizzes are really good, although they decided to stop the series halfway for some reason.
  • php|architect's Zend PHP 5 Certification Study Guide while meant for PHP 5 version of the exam, a lot of topics still overlap
    and they explain all the questions in detail. In fact, questions overlap so much that one question at the exam looked exactly as one in the book.
  • Zend Study Guide - it covers all the topics you will see at the exam itself. Read it thoroughly.
  • Zend Certification Demo - this is a presentation with a lot of good demo questions, very similar to the ones you'll see in actual exam

The afterthought:

Exam was hard, but reasonable. Most questions were actually testing how you understand a given topic, although there were couple that literally only tested how much you pay attention.
Given amount of time was ok, maybe even too much. I had enough time to answer all questions, review flagged ones, then review all 70 once more and still had 10 minutes to spare.

What I really hope fades away in future versions is questions that test how you remember function arguments order.
There's absolutely no reason for a developer to keep them in memory when we have very good IDEs for that.

So, am I a PHP Guru now?

Hell no.
I'd say I'm at 20-25% of what some of the really talented developers are.

Still, passing ZCE was a fun experience and I learned quite a few things about PHP along the way, so overall it was worth it.

Leave a comment
Fork me on GitHub

Automatically gzipping assets with Capifony

One of many easy ways to speed up your website which, surprisingly, not a lot of people are using is serving a gzipped version of your css/js assets.
In fact, you can usually decrease their size by about 80%!

One problem though is that it's tedious to run a gzipper each time you deploy.
But fourtunately for us, it's easily automatable with our friendly deployment tool - Capifony.

Ensure that you have assets dump option enabled in Capifony:

# app/config/deploy.rb

set  :dump_assetic_assets, true

Add a custom deployment recipe:

# app/config/deploy.rb

namespace :deploy do
  desc "gzip all assets"
  task :gzip_assets do
    pretty_print "--> Compressing assets with gzip"
    run "cd #{release_path} && for file in web/js/*.js; do gzip -c \"$file\" > \"$file.gz\"; done"
    run "cd #{release_path} && for file in web/css/*.css; do gzip -c \"$file\" > \"$file.gz\"; done"
    puts_ok
  end
end

Add an after hook:

# app/config/deploy.rb

after  "symfony:assetic:dump", "deploy:gzip_assets"

And you're done! If all is well then your deployment process should look like this:

....
--> Dumping all assets to the filesystem...............v
--> Compressing assets with gzip.......................v
--> Successfully deployed!

P.S. I'm actually not sure if above is optimal way to gzip assets files, so if you know a better one - please do share it with me.
Keep in mind that you need to gzip files individually and keep original copy intact.

Leave a comment
Fork me on GitHub

Two-liner for backing up & emailing database

This is a really nifty and simple script I've written for myself, figured I'd share it.

#!/bin/bash

mysqldump <db_name> | gzip > /path/to/backups/<db_name>_`date +'%H-%M_%d-%m-%Y'`.sql.gz

echo "<Project name> database backup" | mutt -s "[Backup] `date +'%H:%M %d-%m-%Y'`" <email@email.com> -a /path/to/backups/$(ls /path/to/backups/ -tr | tail -n 1)

Let's break it down:

mysqldump <db_name> | gzip > /path/to/backups/<db_name>_`date +'%H-%M_%d-%m-%Y'`.sql.gz

This part is pretty straightforward, I think.
Dump database, gzip the results, save it with timestamp.

echo "<Project name> database backup" | mutt -s "[Backup] `date +'%H:%M %d-%m-%Y'`" <email@email.com>

This is the email part. I'm using mutt insteal of mail to attach a file in next part.
I've added [Backup] as subject to automatically filter these emails and mark them as read to avoid getting spammed.

$(ls /path/to/backups/ -tr | tail -n 1)

This is a little trick to make sure that we always email latest file (it returns first file in a list sorted by modified date).

Leave a comment