Often I get emails from people who are curious about my work. Sometimes they are journalists who want to do more data analysis, sometimes they are data geeks who want to do more journalism. Here are some common things I get asked.


How did you learn how to code?

I studied journalism for my undergrad and masters, so obviously I have never been formally trained in code. I did study programming in my high school and that stuck. When I landed at Columbia University in NYC for a degree in journalism, I decided to take few courses in data analysis and web development. In one class I got to learn R, in another I was introduced to HTML/CSS and Javascript.

It was there that I was given the building blocks for both computer science and journalistic storytelling. To demonstrate how little code I knew, here is a project I made at journalism school.

After that, it was almost a year of very little sleep, and a lot of projects as I taught myself data and code while interning at journalistic outlets. At one internship I made a very simple interactive showing diversity in American newsrooms, at another I developed and designed their homicides tracker that the newspaper continues to update for several years after my stint. After a full year of dabbling with some projects here and there, I published my first D3.js project.


How should a newbie start doing data journalism?

Find a project and teach yourself everything you need to do inorder to reach that end goal. Take small steps.

For reporters, I suggest starting with Excel. You don’t have to visualise straightaway. Even if you are able to add a new insight or a sentence or a paragraph to a story because of Excel, that would have otherwise not been there, it is a good start!

After you hit the wall with point and click tools, start learning how to code. Again, same logic, pick a project and learn small bits you need for that project. I find this project driven approach much more affective than one where one is continuously doing tutorials without applying the knowledge in a field relevant to them.


Is it important to learn to code?

Absolutely not. Lots of amazing data journalists and graphics journalists I know do not code at all.

Code literacy definitely helps. Cause if you know what is possible to can get help from other people with those skills as well. There is a LOT you can do with Excel when it comes to analysis. Similarly, there is a lot you can do within Illustrator and GUI tools for visualisation.


What tools do you use?

The list is huge and it is mostly a factor of convenience and personal preference rather than the efficacy of a tool. I can’t tell you which tool you should use, but here is a list of all that I use often.

A lot of Javascript (particularly node.js) - across all categories! But in addition to that…

  • Scraping: node.js, Github Actions
  • Mapping/Geospatial Analysis: QGIS, D3.js, GDAL, mapshaper, turf.js
  • Data cleaning: Openrefine, node.js, R.
  • Data analysis: node.js, R, Excel, pandas.
  • Data visualisation: Datawrapper, HTML/CSS, Canvas, D3.js, ggplot2, Adobe Illustrator, ai2html
  • Video: Adobe After Effects, Adobe Premiere Pro


Should I learn Python or Javascript?

Simple answer
If your goal is analysis centric - Python. If your goal is interactive visualisation centric for the web - Javascript (Not Java! Nerd joke - ham is to hamster, as Java is to Javascript).

Detailed answer
Most commonly, Python is a back-end langauge. Backend tasks or serverside tasks are those that you don’t see on a webpage. Like if I have a bunch of code that goes and checks the COVID numbers from the CDC and then tweets out where things are going up or down - that is a back-end task. If all you want to do is some heavy data analysis with some static charts for exploration of the data, Python is a better fit than Javascript. This will help if you want to do work around language and automation as well. The buzzwords that you hear often - machine learning and artificial intelligence would stem in this domain too. For most applications in journalistic reporting, these terms don’t retain their true value and are essentially a long strain of if - else statements. I don’t use Python a lot personally.

Javascript a.k.a. js for short, is both a front-end and a back-end programming language. If you want to make a customised chart where as you scroll things fly and transform, or if you want to make a globe where things rotate and update on certain interactions, you need js. Interactivity on the web is dependent on different applications of js. Those are the front-end applications. Since I am not good at python, I use js for back-end tasks like scraping and analysis as well.


I find D3.js overwhelming, is there some place else that one can start?

Yes! In case you don’t know what the term means, D3.js is a javascript library. A library is like adding superpowers to existing powers. Javascript has powers, and adding D3.js to Javascript gives it super powers that help us create data visualisations. To use D3, you first need to know some Javascript and understand some use cases for journalism.

When I started, I first built some interactives purely with underscore (helps in manipulating data on the page) and jQuery (helps in manipulating design elements on the page). When I got super comfortable with that, I started making things in D3. If you find D3 overwhelming, don’t be disappointed. It does have a steep learning curve.

Examples of things I have built with just jQuery and underscore:


Where do you find data stories?

In your curiosity! So much of what I do is driven my questions. I was watching Anupama Chopra’s interview with some female singers and there was this anecdote of how every other song is now sung by Arijit Singh while that was not the case earlier. In the past, Lata mangeshkar and Asha Bhosle dominated these albums. My curiosity drove me to see if this could be proven quantitatively. And that is what drove me to do this piece.

So many stories are like that. You read something, you watch something, you live something in your daily life and are posed with a question. Data stories are similar to just regular stories. You see a question and try and answer it with data. Sometimes the data to answer that question exists, sometimes it doesn’t.

If there is something you’d like to ask, feel free to get in touch at gurmanbh@gmail.com or @gurmanbhatia!