banner

For a full list of BASHing data blog posts, see the index page.     RSS


Spreadsheet annoyance no. 2

In a Linux Rain post in 2014 I complained about spreadsheets (Gnumeric, LibreOffice Calc, Excel) converting non-dates into dates. One way to avoid the unwanted conversion is to type a single quote (apostrophe) before the non-date. The single quote tells the spreadsheet that your non-date is to be treated as plain text.

I like to think that spreadsheet developers know how angry people get when a spreadsheet converts a non-date to a date. They're aware that right around the world, at any hour and in many languages, users are shouting IT'S NOT A DATE, YOU STUPID SPREADSHEET! They're also aware that users would like the option to turn off this annoying behaviour. But they don't code this option, and would rather you figure out a workaround. As Microsoft says:

Microsoft Excel is preprogrammed to make it easier to enter dates. For example, 12/2 changes to 2-Dec. This is very frustrating when you enter something that you don't want changed to a date. Unfortunately there is no way to turn this off. But there are ways to get around it.

I recently found another unwanted conversion. I was entering museum registration codes into a Gnumeric spreadsheet, and the codes had the form YYYY:NN:NNNN. For example:

2019:23:2000
2019:23:0200
2019:23:0020
2019:23:0002

In Gnumeric, the first two codes in the example went in without problems, but the second two lost the leading zeroes on the second number:

ssht1

In the screenshot above, I've added the "type" function to the second column. It tells me that the entries in the first two rows are text (2) and the entries in the second two rows are numbers (1). In the third column the "n" function reveals the number-equivalents I supposedly entered.

LibreOffice Calc wouldn't even let me enter those first two museum codes:

ssht2

I was baffled. Then I remembered that Gnumeric spreadsheets are just gzipped XML files. This meant I could read the Gnumeric sheet in a terminal by using the zcat command, which peers into zipped files and uncompresses the contents:

ssht3
 
ssht4

So Gnumeric interpreted my second two entries as [h]:mm:ss, which is number of hours + number of minutes + number of seconds. That was being converted by Gnumeric into number of days:

ssht5

And LibreOffice Calc was doing this to all the entries.

Well, my workaround (again) was to type a single quote before a museum registration code. I wonder how many other unwanted conversions are done by spreadsheets?

In A Data Cleaner's Cookbook I explain how to make a TSV by copy/pasting from a spreadsheet. Suppose the spreadsheet is full of initial single quotes that you used to force the spreadsheet to treat data items as plain text. When you copy/paste, will the single quotes also be copy/pasted? No. They stay behind in the spreadsheet.


Last update: 2020-01-13
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License