BASH script and regexp of the day

Here’s my BASH script and regexp of the day.
I’m not particularly proficient in BASH scripting (in fact, not that proficient at all), so sorry if there are some obvious improvements to be made. But I suppose you can find the script useful anyway.

The purpose of the script is to get  values of a column of a CSV file by column name. The CSV file is comma-separated (not Excel style semicolon separated FYI) and may contain quoted values with commas, and double-quote escaped quotes (i.e “this is a value”,”this is another value with “” (quote) symbol”).

Parameters are: <file name> <column name>. Additional parameters -n or -u can be used to either output line numbers or output only unique values. Note: you can’t use both at the same time, -u takes precedence.

Also the script will output you the number of the column.

The script was written/tested against GNU grep (version 2.12 in particular). It doesn’t work correct for column #1 on OS X due to weird way BSD grep treats condition “^x{1}”. Installing GNU grep for OS X from sources like Rudix or Homebrew is recommended for Mac users.

You’re welcome (-:

unique=
addp=

for p in "$@"
do
     if [ "$p" == "-u" ]
     then
          unique="true"
     fi
     if [ "$p" == "-n" ]
     then
          addp=-n
     fi
done;

colMatch=$(head -n 1 $1 | grep -E "(,|^).*$2[^,]*" -o)

if [ "$colMatch" == "" ]
then 
	echo "Column not found."
else
	colNumber=$(head -n 1 $1 | grep -E "(,|^).*$2[^,]*" -o | grep "," -o | wc -l | grep -E "[0-9]+" -o)
	echo "Column number: $colNumber. Column name: $(head -n 1 $1 | grep -o -E "^((\"[^\"]*\")*,|[^,\"]*,){$colNumber}((\"[^\"]*\")*|[^,\"]*)" | grep -o -E '(^|,)((\"[^\"]*\")*|[^,\"]*)$')"

	if [ "$unique" == "true" ]
	then
		tail -n "+2" $1 | grep -o -E "^((\"[^\"]*\")*,|[^,\"]*,){$colNumber}((\"[^\"]*\")*|[^,\"]*)" | grep -o -E '(^|,)((\"[^\"]*\")*|[^,\"]*)$' | sort | uniq
	else
		tail -n "+2" $1 | grep -o -E "^((\"[^\"]*\")*,|[^,\"]*,){$colNumber}((\"[^\"]*\")*|[^,\"]*)" | grep $addp -o -E '(^|,)((\"[^\"]*\")*|[^,\"]*)$'
	fi
fi

Example of usage with results (tested on Ubuntu linux):
Continue reading

Advertisements

A blog in Ukrainian language

Всім привіт.

Я завів собі паралельний україномовний блог за цією адресою: mvmnua.wordpress.com

Наразі намагаюсь наповнювати його соціально-значущим контентом на стику з інформаційними технологіями (а саме саморобною статистикою (-: ).

Всіх україномовних – запрошую.