Picture yourself sitting at a desk, faced with a looming deadline and data that needs to be put into nice pretty charts to explain some trend that your boss thinks is critical. The data you have are in an odd format from some client, and you need to get that information into a spreadsheet. How about keeping track of your research results? Labs produce lots of data, would you keep it in a spreadsheet, and hope that the same file format is supported in the future, or would you make a database of some sort? These questions are ones I see too often in the workplace. I’m continuously amazed when I see people waste entire days cutting and pasting information from one format to another. This task could be done in minutes using a scripting language. Many labs keep their information in formats that don’t survive. Continuity is lost, or even worse the companies spend thousands to pay a contractor to migrate data for them; these cases could be fixed by somebody with minimal coding experience.

So where do you go to learn basic coding for “data-wrangling” or “data-manipulation?” There are quite a few, I’ll Google them for you:

Sure you can pick up a book, and there are tons. Any of the Camel books are usually decent and can be self taught. I recommend this path to anybody that has to do data entry with spreadsheets or databases. Work smarter, not harder. Use the computer to multiply your efforts.

Even simpler than coding is command line usage. If you have a Mac then you have the terminal available and there are hundreds of commands that could make your life simpler. Combine them with some basic scripting and there is almost no limit to the number of ways you can re-arrange data.

Some useful commands:

  • head, take the first number of lines or bytes of a file.
  • tail, take the last few lines or bytes of a file
  • sed, stream editor to modify characters within the file given some patterns
  • tr, translate, delete or remove characters from stream
  • pbpaste/pbcopy, copy from the clipboard directly into the linux terminal, which can be used with pipes to other processes like sed OS X only, although there are linux versions

Last but not least, for OS X and Linux users, pipes are critical. Instead of writing a tutorial myself, I found this one which seems fairly comprehensive: link.

Is coding the new keyboarding, home economics, wood-shop? Not that any of these classes should be replaced, however coding is a new type of literacy, just the same as authoring essays. Coding is writing with a purpose, in a foreign language. Computer code must be concise, expressive, and readable. The skill of coding is giving people the ability to interact with the wider world. Anybody can learn to code, there is no computer science (CS) necessary for entry (and no, CS is not coding). Sure, you might not write the most performant code knowing only Perl or Python, but you won’t notice in many cases because it is faster than the cut & paste it replaces (and modern processors are far faster than most people really need). Unfortunately like everything else in life, the time when you could make huge sums of money with basic coding skills have passed (was there ever such a time?). Simply teaching coding will not automatically lift swaths of our economy out of poverty. But perhaps it will lower the barrier of entry to careers that require interaction with technology (don’t they all these days?).

Basic coding is critical for existing within our modern world. The skills necessary to succeed are all around us. Computers can be purchased for $25 (e.g., Raspberry Pi). Monitors are equally cheap from E-Bay second hand. There is no reason not to have a small, cheap computer for children to learn on. This is equally true for college students. Sure it might not be the most flashy computer, but it will enable people who are otherwise unable to do so, to step into the digital age. The instructional materials necessary for people to learn coding, and data-wrangling are all around us. All we have to do is find the time and patience necessary to learn (and perhaps the $50 or so dollars to setup a computer at home). From my perspective, it’s time well spent.