Server Side

Server related posts

Using the sort join and grep commands in Linux

Written by: Peter Fisher on February 25, 2012
Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn
Linux penguin Tux

The Linux penguin is called Tux. He is the official mascot of the Linux kernel. Tux was originally created by Larry Ewing in 1996

Imagine  you have two files, the first is a list of customers (customers.csv) and the second is a list of products (order_items.csv) that the customers have purchased.

You’ve been given the challenge to list all products  that a specific customer has purchased and order the output by the products price.

This article will demonstrate the Linux grep, sort and join commands as well as streams, pipes and redirection.

This is the customer.csv file:
[crayon lang=”bash”]
$ less customers.csv

1,Mr,Peter,Fisher
2,Mr,John,Smith
3,Mrs,Lisa,Mathews
[/crayon]

The first field is the unique customer id. This could be a primary key from a database. The second field is the customers title.  The third and forth fields are the customers first and last names

This is the order_items.csv file:
[crayon lang=”bash”]
$ less order_items.csv

1,1,Tyres,22.50,2
2,2,Cateye lights,39.50,1
3,1,Bike,550.95,1
4,3,Jacket,49.50,1
5,1,Gloves,25.60,1
[/crayon]

The first field is the order id.  This could be a primary key like the customer id. The second is a reference to the customer who purchased the product. This could be a foreign key which relates to the customer record. The third field is the products description.  The forth field is the products price and the last field is the products quantity.

Sorting the fields

If we jumped right in and joined both of these files together we would get the following error:

[crayon lang=”bash”]
$ join -t ‘,’ -1 1 -2 2 customers.csv order_items.csv

1,Mr,Peter,Fisher,1,Tyres,22.50,2
2,Mr,John,Smith,2,Cateye lights,39.50,1
join: file 2 is not in sorted order
3,Mrs,Lisa,Mathews,4,Jacket,49.50,1
[/crayon]

I will discuss the join command in more detail below. The reason the error  join: file 2 is not in sorted order is being displayed is due to the second field on the order_items.csv (customer_id) not being in  numerical order.

So before we can continue we must reorder this file by doing the following:

[crayon lang=”bash”]
$ sort -t ‘,’ -k 2 order_items.csv > order_items_sorted.csv
[/crayon]

The sort command has lots of options but the ones used here are -t and -k.

-t is the field separator. As this is a csv file the delimiter which splits the fields is a comma.

-k is the key in which to sort the file.  In this case we want to sort by the customer_id which is the second key.

The stdout stream is then redirected to the new order_items_sorted.csv file. As we are using > and not >> redirection the file will be overwritten if it already exists.

More details on the sort command can be found in the man pages

This new file contains the following:

[crayon lang=”bash”]
$ less order_items_sorted.csv

3,1,Bike,550.95,1
5,1,Gloves,25.60,1
1,1,Tyres,22.50,2
2,2,Cateye lights,39.50,1
4,3,Jacket,49.50,1
[/crayon]

As you can see its now ordered by the second field (customer id) and not the first.

Joining the files

Now the order items are sorted correctly we can join both of these files together to show who has bought what product.

The following command will join these file
[crayon lang=”bash”]
$ join -t ‘,’ -1 1 -2 2 customers.csv order_items_sorted.csv
[/crayon]

The join command is one of the few commands that require two input files.  Options can be used to tailor the output.

-t is the delimiter used to separate the fields .  As these files are in a csv format the delimiter is a comma

-1 represents the first file.  The value of 1 means that the first field is used in the join.  This means that the first file (in this case customer.csv will be joined on the first field (id)

-2 represents the second file.  The value of 2 means that the second field (customer_id) is used in the join. So the join command will join the first file (customer.csv) using the first field (id) against the second file (order_items.csv) using the second field (customer_id)

This is the stdout  of the join command.

[crayon lang=”bash”]
1,Mr,Peter,Fisher,3,Bike,550.95,1
1,Mr,Peter,Fisher,5,Gloves,25.60,1
1,Mr,Peter,Fisher,1,Tyres,22.50,2
2,Mr,John,Smith,2,Cateye lights,39.50,1
3,Mrs,Lisa,Mathews,4,Jacket,49.50,1
[/crayon]

So now we have joined the files together based on the customer id. We now need to sort the stdout so that its ordered by the most expensive price.

The following will take the stdout stream of the join command and pipe it as stdin to the sort command.

[crayon lang=”bash”]
$ join -t ‘,’ -1 1 -2 2 customers.csv order_items_sorted.csv | sort -t ‘,’ -k 7 -r
[/crayon]

The sort command reorders the stdin based on the seventh field and its search is reserved to get the most expensive product first.

[crayon lang=”bash”]
1,Mr,Peter,Fisher,3,Bike,550.95,1
3,Mrs,Lisa,Mathews,4,Jacket,49.50,1
2,Mr,John,Smith,2,Cateye lights,39.50,1
1,Mr,Peter,Fisher,5,Gloves,25.60,1
1,Mr,Peter,Fisher,1,Tyres,22.50,2
[/crayon]

Only show orders from a specific customer

Finally the above stdout needs to be narrowed down to show orders for a specific customer.  To do this we will pass the stdout as stdin to the grep command.  Grep is used to search a stdout stream for a specific value or pattern and return all the lines that match.

[crayon lang=”bash”]
$ join -t ‘,’ -1 1 -2 2 customers.csv order_items_sorted.csv | sort -t ‘,’ -k 7 -r | grep ‘Peter,Fisher’
[/crayon]

The above pipes the stdout of the sort command to the grep command which searches and returns all the lines that match ‘Peter,Fisher’. We now have the following output which solves this challenge.

[crayon lang=”bash”]
1,Mr,Peter,Fisher,3,Bike,550.95,1
1,Mr,Peter,Fisher,5,Gloves,25.60,1
1,Mr,Peter,Fisher,1,Tyres,22.50,2
[/crayon]

More details on the join, sort and grep commands can be found in their associated man pages

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Peter Fisher is a web developer working in Gloucester UK. Founder of the digital agency Websomatic, author of this blog and the HowToCodeWell Youtube channel. Peter has over ten years of web development experience under his belt

Read all about Peter Fisher

Leave a Reply

Your email address will not be published. Required fields are marked *