For a full list of BASHing data blog posts see the index page.

A GUI to re-order fields in a table

In a previous BASHing data post I showed how to manipulate whole fields in a data table with cut and paste.

Those were simple manipulations. Re-ordering fields in a table with lots of fields isn't simple on the command line (or in a spreadsheet!), and although the best way is with AWK, the AWK command for re-ordering can be pretty tedious. For example, to shift field 17 in a tab-separated 22-field table to between fields 4 and 5, and field 21 to between fields 9 and 10:

awk 'BEGIN {FS=OFS="\t"} {print
$1,$2,$3,$4,$17,$5,$6,$7,$8,$9,$21,$10,$11,$12,$13,$14,$15,$16,$18,$19,$20,$22}'
table > new_table

I wrote a shell script to do re-ordering more readably in a GUI and in this post I demonstrate and explain two versions of the script: a basic one for tables with just a few fields, and a polished, final one for wider tables. Both scripts assume the table's fields are tab-separated.

The basic script. To demonstrate this one I'll use the simple 6-field table "table1":

fld 1	fld 2	fld 3	fld 4	fld 5	fld 6
AB	CD	EF	GH	IJ	KL
AB	CD	EF	GH	IJ	KL
AB	CD	EF	GH	IJ	KL
AB	CD	EF	GH	IJ	KL
AB	CD	EF	GH	IJ	KL

Suppose I want to delete fields 1 and 6 and reverse the orders of fields 2 and 3, and 4 and 5. In other words, I want a new table with fld 3 [tab] fld 2 [tab] fld 5 [tab] fld 4. To do this I enter the script name "reorder" and the filename. A YAD window appears in the centre of my desktop and I enter the new field order in the boxes next to the field names:

To save space, in this screenshot I've cut off the bottom of the tall YAD window with its "Cancel" and "OK" buttons.

I click "OK" or press Enter, and the fields are deleted and re-ordered as I wanted:

The basic script is shown and explained below. The final version of this script (see below) sends the output to a new table, not to stdout.

#!/bin/bash

mapfile -t < <(head -n1 "$1" | tr '\t' '\n' | nl -ba -w1 -s"/" | sed 's/^/--field=/')
# This command loads a BASH array with a numbered list of
# field names from the table. Each number is followed by a forward slash,
# and each list item is preceded by "--field=". The array contains
# the field labelling instructions for YAD (see YAD command below).
# The "-ba" option for nl ensures that all fields are
# numbered in the YAD window, even those with no field label
# in the header line.

oldflds=$(seq $(awk -F"\t" 'NR==1 {print NF}' "$1"))
# The variable "oldflds" is a simple list of field numbers in the
# original table. The length of the list (the number of fields)
# is found with AWK.

yadout=$(yad --center --form --width="300" --height="800" --separator="\n" "${MAPFILE[@]}")
# The variable "yadout" contains the output from YAD, namely
# a newline-separated list of new field order numbers.

case $? in
0) pasted=$(paste <(echo "$yadout") <(echo "$oldflds"));;
1) exit 0;;
252) exit 0;;
esac
# This "case" construction allows the script to exit if the YAD
# dialog was closed with Cancel, Esc or the "X" at top right of the
# YAD window. If not, then the YAD output and the list of
# original field numbers are pasted together. Because every
# field in the original table is represented in the YAD output,
# either by a new order number or by an empty string, the list
# is the same length as the list in "oldflds". The pasted lists
# are stored in the variable "pasted".

new_order=$(echo "$pasted" | awk -F"\t" '$1 != ""' | sort -n | cut -f2 \
| paste -s | sed 's/^/$/;s/\t/,$/g')
# Here the pasted lists are passed to AWK, which finds the
# lines where the YAD output list (field 1) isn't an empty string.
# These lines are sorted, which gives a new sorted order for the fields.
# The old field numbers are cut out from field 2 of the paste,
# then modified with sed into AWK-readable format, e.g. "$3,$2,$5,$4" here.
# This formatted string is stored in the variable "new_order".

awk 'BEGIN {FS=OFS="\t"} {print '"$new_order"'}' "$1"
# The final AWK command prints the reordered table,
# following the instructions in "new_order".

exit 0

The full script. In a previous BASHing data post I described a script for neatly displaying with a YAD form the contents of a single record from a multi-field table. The number of columns in the YAD form is decided before the script is run, so that a table with lots of fields doesn't push the bottom of the YAD window off the screen.

A tweak to the "reorder" script does this automatically on my system. With my desktop and system settings, the YAD "reorder" form, with its height set at 800 pixels, can hold about 20 fields vertically before another column is needed. Roughly, then, the total number of columns will be 1 plus the integer value of the number of fields in the original table, divided by 20. This number is easily calculated with BASH integer arithmetic, stored as a variable ("cols" in the script) and added to the YAD command (columns="$cols"), which no longer has a "width" specified. In practice, YAD does some figuring of its own in distributing fields among columns, but the result is OK up to 70-80 fields:

And here's the final "reorder" script, with the output sent to the new table "[filename]_reordered":

#!/bin/bash

mapfile -t < <(head -n1 "$1" | tr '\t' '\n' | nl -ba -w1 -s"/" | sed 's/^/--field=/')
oldflds=$(seq $(awk -F"\t" 'NR==1 {print NF}' "$1"))
cols=$((1 + $(awk -F"\t" 'NR==1 {print NF}' "$1")/20))

yadout=$(yad --center --form --columns="$cols" \
--height="800" --separator="\n" "${MAPFILE[@]}")

case $? in
0) pasted=$(paste <(echo "$yadout") <(echo "$oldflds"));;
1) exit 0;;
252) exit 0;;
esac

new_order=$(echo "$pasted" | awk -F"\t" '$1 != ""' | sort -n | cut -f2 \
| paste -s | sed 's/^/$/;s/\t/,$/g')

awk 'BEGIN {FS=OFS="\t"} {print '"$new_order"'}' "$1" > "$1"_reordered

exit 0

Last update: 2019-08-30
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License