10 essential skills: (2) learn a programming language
- Pamela Kinga Gill
- Jan 13, 2019
- 6 min read
Updated: Jan 13, 2019
Following yesterday's post "10 essential skills: (1) applied statistics, matrix theory, and mathematical reasoning," I'll borrow some descriptive statistics for today's topic: programming.
First, why is programming an essential skill for big data? And, which programming language(s) best suit your big data career goals? Let's begin high-level.

Part 1: What is programming? A simplified description.
Programming can be used to do almost anything, and as such, it is an incredibly valuable skill to have. "Programs" are essentially codes; a series of written instructions that tell computers exactly how to perform tasks that the programmer wants executed. At first, I imagined the relationship between the programmer, code, and computer as a mega-calculator solving complex mathematical equations. This is essentially still the case, but overlooks the incredible applications of today's coded programs which are also controlling robots and creating virtual reality! Code can look like many lines of letters, characters and numbers that usually have a certain "nature" to them - this will depend on the programming language being used.
To demonstrate how the act of programming can translate into a profession in the real world (I know how abstract "programming" must seem to a person who hasn't coded much of anything before) here is a great YouTube video by Physics Girl "What do programmers actually do?" It's 10 minutes of entertaining interviews with software engineers at YouTube describing their definitions of programming. And if you don't like YouTube videos, I'll summarize the general consensus in one line:
programming is a creative process that is best described as (fun) problem solving.
* The second best part to this video is in Carrot Slice's comment to "What do programmers actually do?" who wrote "Actually we just angrily browse stackoverflow." (Just saying, It had the most up-votes...)
Nevertheless, if you're still interested in learning to code and incorporate programming into your professional development, here is what you'll need:
The programmer (you - yay, we still need humans!);
The programming language (a defined style/syntax/convention/terms being used to code instructions and execute tasks);
The computer (physical hardware that processes the programmer's commands (locally, remotely, or in the cloud);
The proper environment (an application or interface that reads your language and tells the computer how and when to run your specified commands).
Part 2: Programming Languages
My story: I speak/write/read three languages but I dabble across many for programming. Just as I didn't exactly choose which languages to learning growing up - those decisions were determined by my environment - what I learned in programming was also mostly determined by my environment: my studies and professional obligations.
A short history: In university, I benefited by learning STATA, R, and GAUSS but I also know of other students working in MATLAB and SPSS at the time. We were mostly using these languages for statistical analysis and data visualization. I don't think any of us came out as programming experts, but it was certainly foundational in building the logic and framework behind programming.
After graduation, I learned to code in different software because these were part of standard enterprise-backed tools. Significantly, I was now working directly with databases rather than simple datasets. These programs/languages were: SAS and SQL. Still highly relevant and very powerful languages today.
But already, here are too many languages for me to develop a single-pointed proficiency. And believe it or not, depending on your work function, there are many more languages you could feasibly be expected to learn in your career.
While I promise I'm not trying to explode your brain, for the sake of visualization, here is a simple word cloud (that I did not create) with the names of a few of those languages:

P.s. I hope you noticed how terrible of a visualization this is (for numerous reasons). But it is pretty, and serves sufficiently.
Recap: Learning a programming language is a powerful skill that allows you to tell a computer, with powerful processing capacity, exactly what to do and how to do it. The sky is the limit (literally?) in terms of what can be achieved with programs coded in a variety of languages.
Part 3: You made it this far, which programming language(s) should you learn?
That's a lot of pressure on me if you're actually expecting me to tell you. So please take all of this as words of guidance and curated insight. In regards to the popularity and growth of demand for programming languages, here is a projection of the popularity of five programming languages and that growth. The following chart seems to be modelled off of web traffic being driven to questions on Stack Overflow discussing each of these languages, across high-income nations. This could serve as an indicator for popularity, and perhaps even a proxy, for the demand behind the language's use. The model is described by the author in this Stack Overflow article.

As you can see, and maybe even suspected, Python's ranking, as measured in this way, has exploded and in such a short period of time! I'd be suspicious of this chart because I can very easily cast doubt on the role that "hype" may play in this model, although, it does follow a similar trend of different analyses I've looked into in the past.
So here's another look at quantifying the demand for programming languages. This is a great article that shows the demand for programming languages as evidenced by job openings on Wall Street, New York. If you're motivated to pursue a language that will pay the big bucks, you'll like to see this:
I am not actually trying to prompt the reader to begin learning Python, and immediately. Disclaimer: I am learning Python, and immediately. But rather, giving you some insight into the preference for programming languages in relation to one another.
I'd prefer to discuss these languages in context:
Programming for Data Science
I don't have the data to support this (terrible of me, I know) but I can tell you that the Python language is part of a cluster of languages widely used in data science which include SQL, R, Unix Shell/AWK/Gawk, and more limited, Java.
For example, if you look to any data science certification, online course offering, or university curriculum in data science, you'll likely observe that the curriculum is guided in R or Python, or both. The popular data science website KDNuggets advocates that the data science practitioner be adept at coding in both languages. They make a case in point with Jupyter Notebook:
The name «Ju-Pyt-er» is derived from Julia, Python, and R which immediately tells you that these three languages are the focus, though today these online notebooks support something like 40 different languages - Source
Programming for Big Data
In a later post I'll discuss big data tools. Big data tools include database ecosystems/frameworks for processing large amounts of data. An example of this is the IBM product, Apache Hadoop.
As a big data professional, you'll want to know which language is going to enable you to operate the tools you'll need to do your best work. While Hadoop is written in Java, it is not necessary to code programs using Hadoop with Java. A data architect can use C++ or Python. This article is one perspective on the compatibility between Python and Hadoop.
Another consideration is in collecting data to load into your big data ecosystem. This can be done through REST APIs:
(If you want to see an example of how cool a REST API is, and have some explanation, here is a fantastic YouTube video.)
Fortunately, REST API development in Python is very easy. Most REST API output is JSON (another language) and a really cool thing about Python is that you can very easily import a JSON library to parse that output and use the data.
This doesn't mean you can't find other means to do what you need, but there are real constraints to what some programming languages and platforms will allow you to do.
In the context of big data and data science, Python is going to provide you with the means to get a lot of valuable work done. More generally, having a programming language in your tool kit is like speaking a language that most people can understand: it's going to be incredibly helpful and important in your career development. In this example, you're able to speak with computers!
Furthermore, as we engage in the era of the Internet of Things, big data is sure to permeate every industry in some form or other. Having the technical skill set (i.e. programming language) and ability to use the necessary tools to translate data into information/knowledge/insight/advantage will prove invaluable. In time, it may even become the norm.
The point is, you'll definitely need a programming language as a skill to be gainfully employed in the big data landscape and this is why it is listed as the #2 essential tools for the big data professional!
If you're looking to develop your programming skillset, there are great courses online, at universities, colleges, and even immersive bootcamps!
Happy coding!
Commentaires