Showing headlines posted by Bob_Mesibov

« Previous ( 1 2 3 4 5 6 7 8 9 ... 10 ) Next »

Character equivalence classes 2: the nature of equivalence

A POSIX equivalence class is a group of characters that can be treated as the same character in your computer's locale. But what does "same" actually mean?

How to bookmark directories in the shell

I do shell work on files in various directories scattered around my file system. After googling around for solutions, I decided it would be easier to build my own bookmarker, one that was portable between computers and terminal programs. The result is described below, and I'm again very grateful to BASH wizard Chris Dunlop of onthe.net.au for his advice on how to improve the code.

Join consecutive lines if condition applies

How to use AWK, or the paste command + sed, to merge lines in a text file based on a condition, such as starting or not starting with a number. It's a handy fix for tables exported to text with embedded newlines.

Printing repeats within repeats, and splitting a list into columns

BASH printf is a complex piece of machinery. The man page says a printf command should look like printf FORMAT [ARGUMENT]..., which makes it seem the "argument" is the thing to be printed and the "format" describes how. But it's not that simple, and some ingenious tricks are possible with "format".

Add an issues field to a data table

Some of the data tables I audit have an "Issues" field that flags problems with individual records. It's a kind of self-reporting "Look! I've got an error!" feature. I wondered if I could do something similar with my favourite data-processing language, AWK, and this post documents my tinkerings.

How to move selected lines within a file

If you're working with a GUI text editor and want to move a particular line from one place to another, you might use cut and paste. If you're working with a text file on the command line (not in an editor like emacs or vim) and don't want to use a clipboard, line-moving can be done with a single command.

Spellchecking scientific names on the command line

In a table with thousands of scientific names like "Stropharia rugosoannulata var. rugosoannulata Farl. ex Murrill, 1922", are all the Latin bits correctly spelled? This post explains how to build and use a custom dictionary of scientific names, entirely on the command line.

Dealing with an all-CAPS/first-CAP jumble

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Apr 29, 2020 2:31 PM EDT)
  • Story Type: Tutorial
I sometimes need to tally lists in which the same word might be capitalised (Word) or all in capital letters (WORD). This post explains how to normalise the words and tally them using sed or AWK.

Brace expansion with variables and arrays: eval to the rescue

In a 2019 blog post I tinkered with two alternatives to BASH brace expansion. Alternatives might be needed because strings with spaces cause problems unless separately quoted, and (I thought) you can't put shell variables inside the braces because BASH does brace expansion first when executing a command. But with the help of eval, a BASH built-in, you can brace-expand variables with ease, and a simple conversion lets you brace-expand BASH arrays.

Checking date components across fields

23-Sep-05 can be correctly broken up into 23, 9 and 2005. This post demonstrates a command to find incorrect break-ups in tables with tens of thousands of date components.

Targeted string replacements with sed and AWK

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Apr 10, 2020 12:05 AM EDT)
  • Story Type: Tutorial
Global replacement of A with B might be a mistake unless you're 100% sure that you really, truly want to replace every instance of A with B in the data file. It's much safer to do replacements targeted at particular instances of A, and this post reviews a few strategies for using sed and AWK for targeted string replacements.

More mojibake fun

How to correctly interpret 😉, <0#FF:> and a few other mysteries.

Second Tuesday of each month and a BASHing data century

The 100th "BASHing data" blog post demonstrates a handy way to use the ncal calendar utility.

A curious pair of data ops

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Mar 17, 2020 10:01 PM EDT)
  • Story Type: Tutorial
A command-line method for doing multiple pivots on a table of market data, and how to organise Chinese characters when you don't understand them.

Life tables

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Mar 11, 2020 9:49 AM EDT)
  • Story Type: Tutorial
How many years do you have left to live? What are the chances you'll die before your next birthday? Life tables and gnuplot graphs offer some non-definitive answers.

Moving averages with AWK

How to add moving averages to a table using an ingenious AWK command.

The easy-going syntax of AWK commands

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Feb 26, 2020 1:52 PM EDT)
  • Story Type: Tutorial
Unlike some programming languages, AWK is flexible and tolerant. You can write AWK commands in several different ways and not worry too much about spaces, indentation and line breaks.

Changing TTY prompt, font and colors

Bored with a plain, white-on-black virtual terminal? You can make it more to your liking, but be prepared for some difficulties.

How to be uncertain with dates

ISO 8601 has been extended to allow for uncertainty in writing dates and times. The key extensions for dates are explained here.

Data quality in iNaturalist downloads

In contrast to biodiversity records from some museums and herbaria, data from the citizen-science project iNaturalist is relatively free of character, structure, format and content errors.

« Previous ( 1 2 3 4 5 6 7 8 9 ... 10 ) Next »