My Experience With The John Hopkin’s Data Science Specialization

  • Posted by bsimms
  • On May 19, 2016

No doubt Alberta’s economy had seen better days, but when my contract with an oil service company was not renewed last year it was something I couldn’t help but take personally.  In reality there was nothing I could have done to change the outcome and I was an easy cull,  having had only a single year of experience in a revenue sucking portion of the business.  Comically, I should have realized my days were numbered when my cube-mates began referring to me as “Bad News Brett”!

So there I was unemployed for the first time in my life, at 35 years old. Given my formal training the easiest solution would have been to immediately return to the lab bench, but that would have meant sacrificing my long-term goal of leaving academia and bringing evidence-based practice with me. What to do? I felt I needed to add to my strength in data analysis.

Enter The Data Science Specialization  by Coursera & The John Hopkin’s Bloomberg School of Public Health.  Data science was a passing interest of mine, and I had often wondered if there was any substance to the buzzwords, “business intelligence”, “big data”, “machine learning”, or “predictive analytics”. My pessimism for these words was grounded in nearly a decade of experience working with”small data” as a scientist, and I was curious as to whether there were some magic beans I had yet to taste.

Below I will describe my thoughts on the specialization, but before that  I should mention that this entire specialization is based on writing code in R (an open source analytics program).  I had no prior coding experience with any computer language, so this specialization does come with a steep learning curve for those in similar shoes.

After ten months invested completing the specialization, here are my thoughts:

What I liked most about The Data Science Specialization

  • Learning a skill that is immediately applicable:
    • ‘R’ is a very useful language — its like writing macros for Excel but with broader application
    • The decisiveness of code, it either ‘works’ or ‘doesn’t work’ —  a pleasant contrast to the uncertainty of most data analyses which often leave you wondering if something ‘worked’
    • Imagining the exciting applications of R in your everyday, or in your area of expertise
  • The generosity of the people who write open source code:
    • Stack Overflow is full of advice on how to get your code to work in R — without it I would have been screwed
    • When open source code writers aren’t on Stack Overflow helping newbies like me, they are updating they’re programs (called packages) to work better next time I use them
    • In a profit-based world I can’t help but be impressed by open source communities– my website was also made possible by another open source community WordPress
  • Learning to work with non-structured data
    • Not all data comes neatly packaged in a spreadsheet– take for example a million lines of text from Twitter & The New York Times, the basis for this Word Prediction App

What I didn’t like about The Data Science Specialization

  • The moments of extreme frustration that come from tackling a new field completely alone
    • Yes, there was Stack Overflow and the course forum, but that’s it… its not like I could actually talk to my instructors
  • Statistics — especially Bayes Theorem
    • A necessary evil of data science

Other remarks

  • Massively Offered Online Courses (MOOCs) are here to stay:
    • I predict that for some subjects, like programming, MOOCs will eventually replace traditional post-secondary education
    • The whole Data Science Specialization (10 courses) cost about $700 CAD — dare I ask what would the equivalent cost be at a college or university?
  • This specialization is a significant time commitment
    • The recommended 4-8hrs per week applies if you have prior experience coding– double or triple that commitment if you do not have an IT background

 

To conclude, I am very happy with my decision to commit ten months to this specialization.

It successfully open my eyes to new possibilities and helped paint a clearer picture of some of the common data science buzzwords.  This specialization also reinforced my prior experience with “small data” in that there is no magical machine learning algorithm that can substitute for clean data, or fancy code sequence that can make-up for personal experience or expertise.  In other words, data analysis is still challenging, statistics are still a necessary evil and translation of data into actionable information is still a major hurdle in our society.

I think the instructors at the John Hopkin’s Bloomberg School of Public Health  have put together a solid set of courses, and that the online infrastructure provided by Coursera is excellent. I would recommend this adventure to anyone who finds themselves analyzing data regularly, and would like to add to their analytical toolkit. This specialization will open your eyes to novel applications at work or school, and with continued learning, perhaps even set a foundation to develop your own data science business.

 

0 Comments

Leave Reply

Your email address will not be published. Required fields are marked *