Using awk to get values from XML
I have had occasion to write a shell script which carries out several tasks relating to learning analytics. One such task is to find out the current academic week. It’s not that easy to do MySql/PHP operations from inside a shell script.
As luck would have it we have an in house web service which provides this information. The output however, is either JSON or XML. The XML version looks like this:
<?xml version='1.0' encoding='iso-8859-1' ?> <weekno>32</weekno>
The Linux shell script stores this XML in a text file and initially needs to get the contents of the weekno tag into a variable called $current.
current=$(awk -F '[<>]' '/weekno/{print $3}' weekno.txt)
Breakdown of the command
The current=$() means the contents of the brackets will be stored in the variable ‘current’.
The -F switch specifies a field separator – in this case where awk finds an open or close tag on a line <> this will represent a new field
The second part of this command says “only on lines which contain the word weekno, print the 3rd output field”.
So for the text <weekno>32</weekno> Count the opening tag(1) then the word weekno(2) and field (3) should be the value we are looking for.
weekno.txt is the input for the awk command.
You can try this out by echoing some XML and piping it into awk. This has exactly the same effect as specifying a file as an input:
echo "<weekno>32</weekno>" | awk -F '[<>]' '/weekno/{print $3}'