Linux News
The world is talking about GNU/Linux and Free/Open Source Software

Login

If you don't have an account yet, visit the registration page to sign up.

If you already have an account, you may login here:

Today's Big Story

OpenTofu 1.7 touts state encryption, open source Hashicorp Terraform drama continues

LXer Features

My Linux Laptop

Laptop Dual Boot Project: Part 2

Laptop Dual Boot Project

Lenovo Laptop Love..Not!

Attempting to install Linux on a new laptop, a follow-up

Attempting to install Linux on a new laptop

Have something to say?

Ready to be published? LXer is read by around 350,000 individuals each month, and is an excellent place for you to publish your ideas, thoughts, reviews, complaints, etc. Do you have something to say to the Linux community?

Publish it here.

DaniWeb Linux Community
An exciting professional discussion group about software development, php, shell scripting, networking, ruby, and more.

Latest Discussions

Wrong headline

Uhhh

this time, it's an ad

Calamares 3.3.5 vs Ubuntu 24.04 Desktop installer

Virt-manager vs Cockpit Web Console on Fedoras 40 Beta,39,38 and other Linux Flavors

KDE Plasma 6.0.2 on Manjaro Testing branch

Wonderfull app

How to contact editorial board of Lxer.com in private ?

LOL

Install Python 3.12.2 on ManjaroKDE 23.1.3 Stable branch

More...

Site Menu

Other News

- LWN.net
Their weekly coverage of Linux news is unmatched in this community.

- LinuxGizmos.com
Excellent news for embedded Linux.

- LinuxQuestions.org
Discussion forums for Linux users.

LinuxQuestions.org is a friendly and active Linux Community with forums, reviews, a hardware compatibility list, a wiki, tutorials, a download site, a podcast and more.

Showing headlines posted by Bob_Mesibov

« Previous ( 1 2 3 4 5 6 7 ... 10 ) Next »

How to watermark a UTF-8 plain text file

BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Nov 24, 2021 6:10 AM EDT)
Story Type: Tutorial

Watermarking plain text isn't easy. Plain text files don't have headers (or magic numbers), and although you can insert invisible control characters, those characters may get revealed by text editors and word processors. There's a sneaky way to do it, though.

What's wrong with my footprintWKT?

BASHing data; By B (Posted by Bob_Mesibov on Nov 17, 2021 11:17 AM EDT)
Story Type: News Story

WKT is a simple text format for describing some geometric shapes, like those used in GIS. The format is so simple that it's hard to get it wrong, yet the Global Biodiversity Information Facility has lately been declaring "invalid" thousands of apparently valid WKT strings. Why?

A quick cross-file comparison with AWK

BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Nov 10, 2021 9:36 AM EDT)
Story Type: Tutorial

I really like AWK. It allows me to do simple, effective, ad hoc processing of data files, as this post will demonstrate. If AWK was a football club I'd be an ardent supporter: "Carn the mighty AWK!"

On visual contrast and QR codes

BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Nov 3, 2021 6:27 AM EDT)
Story Type: Tutorial

Webpage text is easier to read if there's good contrast, and boosting contrast can also make blurry QR codes readable.

Duplicate records differing only in unique identifiers

BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Oct 27, 2021 11:47 AM EDT)
Story Type: Tutorial

There's a big data table with lots of fields and lots of records. Each record has one or more unique identifier field entries. How to check for records that are exactly the same, apart from those unique identifiers? I've been tinkering with this problem for years, and I described a fairly clumsy method in a 2020 blog post. Here I present a much-improved command-line solution.

Some regex tests with grep, sed and AWK

BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Oct 20, 2021 2:09 AM EDT)
Story Type: Tutorial

In my data work I regularly do searching and filtering with GNU grep (version 3.3), GNU sed (4.7) and GNU AWK (4.2.1). I don't know if they all use the same regex engine, but I've noticed differences in regex speed between these three programs. This post documents some of the differences.

TSV to CSV on the CLI (if you really have to)

BASHing data (Posted by Bob_Mesibov on Oct 13, 2021 3:50 AM EDT)
Story Type: Tutorial; Groups: Linux

Regular visitors to this blog will know that I don't like the CSV format. It's awful. In my humble opinion, data workers should aim to use invisible tabs (TSV) or visible pipes (PSV) as field separators in delimited text tables. Sometimes, though, data workers are required to convert a perfectly good TSV or PSV to a CSV. What to do?

How to do replacements based on multiple field values

BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Oct 6, 2021 6:11 AM EDT)
Story Type: Tutorial

In a previous post I explained how to normalise entries in a field based on the entry in another field. The same command-line method can be used to repair entries based on entries in several other fields in the same record. An example will make it a lot easier to see what this is all about and why this method is so useful.

How to find mixed Latin+Cyrillic words

BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Sep 29, 2021 11:59 AM EDT)
Story Type: Tutorial

Latin, Greek and Cyrillic "A" all look alike but are encoded differently and mixed-script words can cause data processing problems. This post demonstrates a function that finds these mixed words in a data table and colors the letters according to the script they come from.

An AWK histogram with scaling

BASHing data (Posted by Bob_Mesibov on Sep 22, 2021 5:03 AM EDT)
Story Type: Tutorial; Groups: Linux

It's not hard to build a frequency-of-occurrence histogram with AWK, but scaling the histogram bars is a little bit trickier. By "scaling" I mean setting the longest bar to a defined character length in the terminal, and adjusting the lengths of all the shorter bars proportionally.

Show Unicode code points for UTF-8 characters

BASHing data (Posted by Bob_Mesibov on Sep 15, 2021 1:52 AM EDT)
Story Type: Tutorial; Groups: Linux

Like the title says, I wanted to show the Unicode code points (formatted "uxxxx") for a set of UTF-8 characters. There are programs that do just that in a number of programming languages, but here's how to do it with shell tools.

Put an editable command at the next prompt

BASHing data (Posted by Bob_Mesibov on Sep 8, 2021 6:14 AM EDT)
Story Type: Tutorial; Groups: Linux

This post explains two simple ways to send an editable command to a prompt without executing the command.

Yet another gremlin: the zero-width space

BASHing data (Posted by Bob_Mesibov on Sep 1, 2021 5:34 AM EDT)
Story Type: Tutorial; Groups: Linux

A zero-width space (ZWSP) is a formatting aid for the text to be displayed by browsers and word processors. In other processing contexts it's a damned nuisance, but ZWSPs can be located and zapped with command-line tools.

zbarimg and blurry QR codes

BASHing data (Posted by Bob_Mesibov on Aug 25, 2021 5:34 AM EDT)
Story Type: Tutorial; Groups: Linux

This is a tinkering post about zbarimg and its ability to read QR code images. Because there are so many possible ways to produce an image of a QR code, I decided to start with a readable image and progressively degrade it, to see what happened.

CSV viewers for CSV haters

BASHing data (Posted by Bob_Mesibov on Aug 17, 2021 11:33 PM EDT)
Story Type: Tutorial; Groups: Linux

There are quite a few programs that let us CLI people view a CSV file as a simple table. I doubt if there any CSV parsers that work perfectly with all variations in the wild of the horrible, awful CSV format, but the two parsers shown here are usually reliable.

Two data formatting tweaks

BASHing data (Posted by Bob_Mesibov on Aug 10, 2021 11:01 PM EDT)
Story Type: Tutorial; Groups: Linux

Nearly all the data files I audit are tab-separated plain text (TSV). While tabs are wonderful field separators, they're invisible, and sometimes it helps to see where one field ends and another begins. This post describes a couple of the methods I use to make TSVs more eye-friendly.

Visualising data as a PGM image

BASHing data (Posted by Bob_Mesibov on Aug 3, 2021 5:23 PM EDT)
Story Type: Tutorial; Groups: Linux

An ASCII PGM file ("Portable Gray Map") is a simple text file that encodes a grayscale image. It's so simple that I was tempted to build a PGM from a 3-parameter dataset, with "x" for image width, "y" for image height and various gray colors for "z" categories. The resulting image wasn't impressive, but the method might be useful for other datasets.

Revisiting a command-line translator

BASHing data; By Bob Mesibov (Posted by Bob_Mesibov on Jul 27, 2021 10:32 PM EDT)
Story Type: Tutorial

The translate-shell program works on the command line and translates any common language into any other, with the help of online translation engines. This post explains how to embed translate-shell in a handy GUI, and also how to colorize the translation.

The data worker's guide to psiphiorrhea

BASHing data (Posted by Bob_Mesibov on Jul 21, 2021 10:47 AM EDT)
Story Type: Humor; Groups: Linux

In the same way that someone who talks far too much is exhibiting logorrhea, or excessive word-iness, someone who uses far too many digits in their numbers is exhibiting psiphiorrhea, or excessive digit-iness. No amount of explaining about significant figures or measurement error will convince the psiphiorrhea sufferer that their numbers are absurd.

What is +ACY- doing in the data?

BASHing data (Posted by Bob_Mesibov on Jul 13, 2021 10:32 PM EDT)
Story Type: Tutorial; Groups: Linux

Almost all the data files I audit are in UTF-8, but the files have often started out in other encodings. This can lead to some hilarious mojibake and loads of fun for me as I try to reverse the encoding conversion failures. Last week a file appeared with mojibake I'd never seen before.