BASH script and regexp of the day

Here’s my BASH script and regexp of the day.
I’m not particularly proficient in BASH scripting (in fact, not that proficient at all), so sorry if there are some obvious improvements to be made. But I suppose you can find the script useful anyway.

The purpose of the script is to get  values of a column of a CSV file by column name. The CSV file is comma-separated (not Excel style semicolon separated FYI) and may contain quoted values with commas, and double-quote escaped quotes (i.e “this is a value”,”this is another value with “” (quote) symbol”).

Parameters are: <file name> <column name>. Additional parameters -n or -u can be used to either output line numbers or output only unique values. Note: you can’t use both at the same time, -u takes precedence.

Also the script will output you the number of the column.

The script was written/tested against GNU grep (version 2.12 in particular). It doesn’t work correct for column #1 on OS X due to weird way BSD grep treats condition “^x{1}”. Installing GNU grep for OS X from sources like Rudix or Homebrew is recommended for Mac users.

You’re welcome (-:

unique=
addp=

for p in "$@"
do
     if [ "$p" == "-u" ]
     then
          unique="true"
     fi
     if [ "$p" == "-n" ]
     then
          addp=-n
     fi
done;

colMatch=$(head -n 1 $1 | grep -E "(,|^).*$2[^,]*" -o)

if [ "$colMatch" == "" ]
then 
	echo "Column not found."
else
	colNumber=$(head -n 1 $1 | grep -E "(,|^).*$2[^,]*" -o | grep "," -o | wc -l | grep -E "[0-9]+" -o)
	echo "Column number: $colNumber. Column name: $(head -n 1 $1 | grep -o -E "^((\"[^\"]*\")*,|[^,\"]*,){$colNumber}((\"[^\"]*\")*|[^,\"]*)" | grep -o -E '(^|,)((\"[^\"]*\")*|[^,\"]*)$')"

	if [ "$unique" == "true" ]
	then
		tail -n "+2" $1 | grep -o -E "^((\"[^\"]*\")*,|[^,\"]*,){$colNumber}((\"[^\"]*\")*|[^,\"]*)" | grep -o -E '(^|,)((\"[^\"]*\")*|[^,\"]*)$' | sort | uniq
	else
		tail -n "+2" $1 | grep -o -E "^((\"[^\"]*\")*,|[^,\"]*,){$colNumber}((\"[^\"]*\")*|[^,\"]*)" | grep $addp -o -E '(^|,)((\"[^\"]*\")*|[^,\"]*)$'
	fi
fi

Example of usage with results (tested on Ubuntu linux):

$ cat example.csv 
One,Two,Three
x,,
,y,
,,z
11,12,13
21,22,23
31,32,33
"1,1","1,2","1,3"
"2,1","2,2","3,3"
"3,1","3,2","3,3"
"1""1","1""2","1""3"
"2""1","2""2","3""3"
"3""1","3""2","3""3"

$ getcsv example.csv One
Column number: 0. Column name: One
x
11
21
31
"1,1"
"2,1"
"3,1"
"1""1"
"2""1"
"3""1"

$ getcsv example.csv Two
Column number: 1. Column name: ,Two
,
,y
,
,12
,22
,32
,"1,2"
,"2,2"
,"3,2"
,"1""2"
,"2""2"
,"3""2"

$ getcsv example.csv Three
Column number: 2. Column name: ,Three
,
,
,z
,13
,23
,33
,"1,3"
,"3,3"
,"3,3"
,"1""3"
,"3""3"
,"3""3"
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s