SQL is a very handy language to explore large datasets across multiple tables and also very easy to interpret because it is just like basic English.
Contributed by: Aadish Jain
Sometimes, we can get confused when there are SQL Complex Queries, we end up entangled in a large number of where or join conditions. Apart from creating them once, we face a lot of challenges when we need to debug them or re-use them in the same or in different form.
Below are few scenarios when SQL queries becomes quite complex to write:
- When there are 3-4 tables to join on different attributes.
- When there is an uncertainty of which join conditions (Inner Join, Outer Join, Right Join, Left Join) we should use in which case.
- When the column required is not directly present in the database but needs to be fetched from 2-3 columns or by mathematical functions like Avg, maximum, minimum, etc.
- When there is a uncertainty over usage of some uncommon clause conditions like groupby, having, order by.
In this article, I’ll try to cover the “magic” behind writing complex SELECT statements.
Before you start to write (complex) queries you should understand what is where – which tables stored what data. Also, you should understand the nature of relations between these tables. We should always have schema of database in our mind and have the understanding of all tables and fields from a business point of view.
Having an overall picture of the database will spare a lot of time in the long run. Always invest some time in getting a complete understanding of your database before jumping on query making. After getting the complete understanding of the database schema and relationships between multiple tables, try to understand the business task of the problem statement and prepare a blueprint on a paper. Split the problem into smaller segments and then start working on them.
We should use proper indentation because only writing and running is not important. It should be easily legible by another person. Put each field to be selected on its own line, and put each new table joined onto on its own line, and each element of a ‘WHERE’ clause on its own line. Use meaningful aliases for tables to further aid readability. You are likely to need to refer to your tables multiple times within your query, and if you are working within someone else’s database with their naming convention, the table names can contain redundant information. For example, ‘Employee Details ’ — so you call a table like that ‘ED’, table name ‘Order header‘ can be aliased as ‘OH’ etc.
Figure 1: Schema for orders dataset
Let’s see how to create a complex SQL query with an example:
Problem statement is that we need to:
Write a query to display the order_id, customer id and customer full name of customers along with (product_quantity) as total quantity of products shipped for order ids > 10060 for the customers who bought more than 15 products per shipped order.
Here, we can understand that tables required are: online_customer, order_header, order_items which can be aliased as C,OH and OI respectively.
Let’s start:
Select a query by mentioning columns which are required from which tables.
So, orderID can be fetched from order header or order_id table. Customer_id and customer name can be fetched from the customer table.
Product Quantity for each order is available in the Order Items table. So, the query would be:
SELECT OI.ORDER_ID,C.CUSTOMER_ID,CONCAT(C.CUSTOMER_FNAME," ",C.CUSTOMER_LNAME) AS 'CUSTOMER_FULL_NAME', SUM(OI.PRODUCT_QUANTITY) AS 'TOTAL QUANTITY'
Now, we can see that we have to find out common orders from all three tables, so we should use INNER JOIN here.
Order Items and Order Header have Order_ID common in between them and Online Customers and Order Header have CustomerID as common field key. So we will use join conditions accordingly.
SELECT OI.ORDER_ID,C.CUSTOMER_ID,CONCAT(C.CUSTOMER_FNAME," ",C.CUSTOMER_LNAME) AS 'CUSTOMER_FULL_NAME',
SUM(OI.PRODUCT_QUANTITY) AS 'TOTAL QUANTITY' FROM online_customer C INNER JOIN ORDER_HEADER OH ON
C.CUSTOMER_ID = OH.CUSTOMER_ID INNER JOIN order_items OI ON OH.ORDER_ID = OI.ORDER_ID
Now the task is to fetch all these details for order IDs > 10060,so we can understand we need to put where clause to filter out the orders having order id <= 10006.Also the order status should be Shipped in order header table.
So the updated query would be:
SELECT OI.ORDER_ID,C.CUSTOMER_ID,CONCAT(C.CUSTOMER_FNAME," ",C.CUSTOMER_LNAME) AS 'CUSTOMER_FULL_NAME',
SUM(OI.PRODUCT_QUANTITY) AS 'TOTAL QUANTITY' FROM online_customer C INNER JOIN ORDER_HEADER OH ON
C.CUSTOMER_ID = OH.CUSTOMER_ID INNER JOIN order_items OI ON OH.ORDER_ID = OI.ORDER_ID
WHERE OH.ORDER_ID > 10060 AND OH.ORDER_STATUS = 'Shipped'
We need to take care of things whenever we have to use some data aggregation like sum, avg, max, min, etc., then we need to specify on what basis that aggregation would be performed and that condition is specified in GROUPBY clause.
So here in our example, we have to calculate the sum of product quantity for all the products ordered in a particular OrderID, so we will use GROUPBY ORDER_ID here.
So updated query would be:
SELECT OI.ORDER_ID,C.CUSTOMER_ID,CONCAT(C.CUSTOMER_FNAME," ",C.CUSTOMER_LNAME) AS 'CUSTOMER_FULL_NAME',
SUM(OI.PRODUCT_QUANTITY) AS 'TOTAL QUANTITY' FROM online_customer C INNER JOIN ORDER_HEADER OH ON
C.CUSTOMER_ID = OH.CUSTOMER_ID INNER JOIN order_items OI ON OH.ORDER_ID = OI.ORDER_ID
WHERE OH.ORDER_ID > 10060 AND OH.ORDER_STATUS = 'Shipped' GROUP BY OI.ORDER_ID
If we have to put conditions on aggregated data then it cannot be handled by WHERE clause and we have to use HAVING clause for the same.
Like in our example, we need to filter out the orders for which the total quantity of products in an order was less than 15.
So our updated final query would be:
SELECT OI.ORDER_ID,C.CUSTOMER_ID,CONCAT(C.CUSTOMER_FNAME," ",C.CUSTOMER_LNAME) AS 'CUSTOMER_FULL_NAME',
SUM(OI.PRODUCT_QUANTITY) AS 'TOTAL_QUANTITY' FROM online_customer C INNER JOIN ORDER_HEADER OH ON
C.CUSTOMER_ID = OH.CUSTOMER_ID INNER JOIN order_items OI ON OH.ORDER_ID = OI.ORDER_ID
WHERE OH.ORDER_ID > 10060 AND OH.ORDER_STATUS = 'Shipped' GROUP BY OI.ORDER_ID HAVING TOTAL_QUANTITY > 15 ;
So this is how we can easily create complex SQL queries by understanding data and breaking up the query in smaller chunks. If you wish to learn more such concepts, head over to Great Learning Academy and learn through Free Online Courses.