Showing headlines posted by Bob_Mesibov

« Previous ( 1 ... 4 5 6 7 8 9 ... 10 ) Next »

How to find distances between lat/lons for geochecking

Finding incorrect latitude/longitude figures in a big data table can be difficult, but checking can be made easier with an AWK calculation.

Mapping with gnuplot

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Oct 30, 2018 10:33 PM EDT)
  • Story Type: Tutorial
How to use gnuplot to put data points on a basemap.

Repair job: separate the tandem repeats

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Oct 26, 2018 10:15 AM EDT)
  • Story Type: Tutorial
By mistake, I put two identical data items in the same field as a tandem repeat. Here's how either sed or AWK could be used to split the repeat and put its parts into two different fields.

Bird watching with AWK and grep

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Oct 23, 2018 11:06 PM EDT)
  • Story Type: Tutorial
A "fileA-in-fileB" search is a search of fileB for lines that match any of the lines in fileA. I thought I'd post what happens when you systematically vary a fileA-in-fileB search. TL;DR: AWK wins.

How to enter nothing in a database

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Oct 18, 2018 8:02 AM EDT)
  • Story Type: Tutorial
I call them NITS, which is short for Nothing Interesting To Say. They're the filler items that appear in spreadsheets and databases when the person entering the data has no information for a particular field.

How to validate ISO 8601 dates without regex

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Oct 4, 2018 8:36 PM EDT)
  • Story Type: Tutorial
How to check for format and content errors in YYYY-MM-DD fields with AWK.

Fightin' fields

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Sep 30, 2018 1:34 AM EDT)
  • Story Type: Tutorial
Disagreements between fields in a database can be tricky to diagnose and even harder to detect.

Fuzzy matching in practice

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Sep 22, 2018 4:42 PM EDT)
  • Story Type: Tutorial
Approximate or "fuzzy" matching on the command line is easily done with tre-agrep. Here's a practical example.

Data on clay

If you were wandering the streets of a busy city in the Fertile Crescent a few thousand years ago, you might have run across someone jotting down a few notes on a small clay tablet. The jotting-down was done with the cut stem of a reed, and the result is today called cuneiform writing, after the wedge shape of some of the written elements (Latin cuneus, "wedge"). Cuneiform writing on clay was around for at least 3000 years and was adopted for use in a range of languages.

iconv and illegal input sequences

Character encoding is a big topic and this post isn't going to cover it all. In fact, I'm only going to deal with one particular problem — what to do when you're converting a file with the iconv utility and you get a error message .......How to get around the "illegal input sequence" roadblock.

Displaying data from table fragments

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Sep 6, 2018 12:26 AM EDT)
  • Story Type: Tutorial
This post was inspired by a presentation I watched at a recent conference. The presenter had collected a large number of data tables, each of which had a selection of fields from a large pool of fields... How to neatly display which fields were in which tables?

A record pager built with YAD

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Aug 17, 2018 8:11 PM EDT)
  • Story Type: Tutorial
How to turn a YAD dialog into a GUI viewer/pager for records in a data table.

48 sea levels and a trope for your terminal

A bulk string replacement with AWK, and that ACCESS DENIED thing. The messiest dataset I've ever audited included a field that illustrates a key rule of Fussy Database Management: Never let users enter free-text data in a field unless absolutely necessary.

Mojibake detective work

A close look at some character encoding problems.

BASHing data

It's easy to find records in a data table that are entirely blank or only contain tabs or whitespace. Just ask AWK to print the line numbers of any records that have no fields

GUI ways to view and edit big text files

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Jul 30, 2018 9:12 PM EDT)
  • Story Type: Reviews
glogg, gvim, Geany and csvpad, but not spreadsheets.

Question marks that aren't really question marks

Some question marks are signs that a program is baffled by a character's encoding.

Curse of the CSV monster

  • BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Jul 17, 2018 6:13 PM EDT)
  • Story Type: Tutorial
How to convert a CSV to a TSV and complain about CSVs at the same time

Truncated data items

Some command-line tricks for detecting truncations, such as a 100-character string clipped to 50 characters in a database.

Combo characters

How to deal with Unicode's combining characters on the command line.

« Previous ( 1 ... 4 5 6 7 8 9 ... 10 ) Next »