Showing headlines posted by Bob_Mesibov
« Previous ( 1 ... 2 3 4 5 6 7 8 9 ... 10 ) Next »Long, narrow tables vs short, wide ones
To satisfy my curiosity about table shape, I carried out three processing tests on a table arranged "long" with 10000 rows and 100 columns, and arranged "short" with 100 rows and 10000 columns.
The lat/lon floating point delusion
The city of Sydney says the Redfern Community Centre is at latitude/longitude -33.8903169365705 151.198409720645. And this morning for breakfast I had 189.41576911 ml of fruit juice, followed by 75.24902503 g of muesli topped with 36.55668786 ml of milk and 15.44171338 g of yogurt.
A bulk replacement GUI with YAD
I sometimes need to tidy up data tables by replacing all nearly-the-same versions of a data item with a single "normalised" version. This post describes a script for doing this job neatly with a GUI dialog while also making progressive back-ups of the table.
Renumber a list after inserting a line
Murphy's Law says that after you carefully number all the items in a long list, you'll notice that you need to insert a new item in the middle. This post describes a function for inserting and renumbering.
Finding malformed markup
HTML tags in non-HTML documents sometimes get messed up. This post describes a two-step procedure for locating most of the usual errors.
A muggle's guide to AWK arrays: 2
Part 1 of this series looked at a couple of AWK array basics, namely index strings and value strings. This post deals with a very common use of AWK arrays, namely filtering or modifying a file based on information in another file.
Return of the mojibake detective
The case of the incorrect identifica???? and two other mysteries of character corruption.
Leading and trailing whitespace
Googling the phrase "trailing whitespace" is like googling "coffee stains": you mainly get "how to remove" recipes. There are procedures for deleting trailing whitespace in C, Python, Vim, PHP, Java, Visual Studio, R, C++, JavaScript, etc etc. Nobody wants trailing whitespace in their code. Trailing (and leading) whitespace, though, isn't a big deal in data tables. This post explains how to find and delete it (if necessary) when it occurs inside fields.
Plotting data in the terminal with gnuplot
Plotting in a terminal has some serious limitations compared to building a gnuplot image in a new window.
Working around the BASH brace expansion rule
Brace expansion in BASH is a neat way to build a Cartesian product, like all the combinations of a set of first names and a set of last names. Unfortunately, you can't do this with strings unless they're in comma-separated lists. This post explains two alternatives.
A muggle's guide to AWK arrays: 1
First in a series of posts for AWK users about the in's and out's of arrays. Array naming, index strings and value strings, with examples.
Growing the Cookbook's
A Data Cleaner's Cookbook has a couple of pages on "broken" records and recommends a function that checks for "brokenness". This post describes a short script that does the same job more informatively.
Transpose, pivot and bin with GNU Datamash 1.4
GNU Datamash is a command-line data wrangler with a lot of useful capabilities. In this post I test-drive three Datamash operations, in two cases comparing them with other command-line tools.
The magic of BASH string expansion
If you've struggled with workarounds for certain character formats on the command line, this BASH trick will make you very happy. It allows AWK and sed to use BASH as an interpreter.
How to delete, insert and replace whole lines
The trick to successful operations with whole lines in data tables is the correct use of line addresses. You need to be sure not only that the operation is done where you want it, but also that it isn't done where you don't want it.
How to delete, insert and replace whole fields
Much as I like using AWK, there are some processing jobs with data tables that are easier to do with cut and paste on the command line.
Two ugly CSVs
A command-line exploration of publicly available but surprisingly messy data from the Australian Electoral Commission and Companies House (UK).
Spreadsheet annoyance no. 2
Spreadsheets (LibreOffice Calc, Gnumeric, Excel) make dates out of entries that aren't dates. They do it to be helpful, but right around the world, at any hour and in many languages, users are shouting "IT'S NOT A DATE, YOU STUPID SPREADSHEET!" Non-date to date isn't the only unwanted conversion that spreadsheets are guilty of, as I recently learned.
Making pictures with data
A little-appreciated feature of the command-line program ImageMagick is that IM can display data bytes as image bytes.
Quotes as characters
BASH users know the difference between single and double quotes, but there are seven other quote characters to think about when processing text data.

