csvsed

csvsed is a command to do find-and-replace on a per-column basis.

For example, given a data.csv like this:

id,name
1,Mrs. Adams
2,Miss Miller
3,Mrs Smith

Replace by patternTKTK:

$ csvsed '(Mrs|Miss|Ms)\.?' 'Ms.' data.csv
id,name
1,Ms. Adams
2,Ms. Miller
3,Ms. Smith

Usage reference

-c, --columns <columns_list>

A comma separated list of column indices, names or ranges to be affected, e.g. “1,id,3-5”. Defaults to all columns.

-m, --match-literal

By default, [PATTERN] is assumed to be a regular expression. Set this flag to do a literal match and replacement.

-F, --filter

Only return rows that matched [PATTERN]. This has the same effect as operating on data filtered and piped from csvgrep -r (or -m) [PATTERN]

The main reason to use this is for terseness.

Given data.csv like this:

id,name,val
1,2,3x
3,4,5
6,7y,8z

And TK:

$ csvsed -F '[a-z]' '%' data.csv
id,name,val
1,2,3%
6,7%,8%
$ csvsed -F -c 1-20  'pattern' 'replace' data.csv

Versus TKTK:

$ csvgrep --any-match -c 1-20 -r pattern data.csv | csvsed -c 1-20 'pattern' 'replace'

High level description

Like sed, but on a per-column basis

Example:

$ csvsed "Ab[bi].+" "Abby" -E "(B|R)ob.*" "\1ob" -E "(?:Jack|John).*" "John"  examples/aliases.csv


id,to,from
1,Abby,Bob
2,Bob,John
3,Abby,John
4,John,Abner
5,Rob,John
6,Jon,Abby
7,Rob,Abby

Real-world use cases

Using csvsed to clean up the SSA babynames data

Continued from: Adding a header to the Social Security babynames data

Given file babynames-yob.csv

year,name,sex,count
yob1880.txt,Mary,F,7065
yob1880.txt,Anna,F,2604
yob1880.txt,Emma,F,2003
...
yob2018.txt,Zyrie,M,5
yob2018.txt,Zyron,M,5
yob2018.txt,Zzyzx,M,5

TK you can do:

TK make test

$ csvsed 'yob(\d{4}).txt' '\1' babynames-yob.csv
  year,name,sex,count
  1880,Mary,F,7065
  1880,James,F,22
  1880,Leslie,F,8
  1880,James,M,5927