Showing posts with label Relational Algebra. Show all posts
Showing posts with label Relational Algebra. Show all posts

Saturday, June 17, 2017

Relational Algebra - Joins - Theta Join, Equijoin, Natural Join, Outer Join, Semijoin

I wrote a post on Relational Algebra that discusses most of operations related to it. This is the continuation of it and this explains Join Operations related to Relational Algebra. You may find that it is different with Joins we do now but it is the foundation for all type of joins we do with our relational databases.

Join Operations

Join is one of the main operations we perform against relations (tables) for retrieving data. It is done over the Cartesian Product of the two operand relations, using a SELECT statement with a Predicate. We are familiar with Joins like INNER JOIN, OUTER JOIN and CROSS JOIN but initially there were introduced with types like Theta Join, Equijoin, Natural Join, Outer Join and Semijoin. Modern DBMSs have enhanced these and have different implementations and that is why we do not see these types with mentioned names. But let's try to understand each of these types and how they are represented with Relational Algebra. Note that I have used different SQL Statement but it can be implemented using many techniques. Remember, almost all Joins are based on Cartesian Products.

Theta Join

This is based on a Predicate added to a Cartesian Product. In simple term, if you have joined two tables using CROSS JOIN, then you can add a filter to the result using one of the comparison operators. See the example given. Note that it can be implemented using SELECTION over a Cartesian Product as well.


Equijoin

This is same as Theta Join but the comparison operator is equal. Generally, if the operator of the Theta Join is equal operator (=), then the join is called as Equijoin instead of Theta Join, Here are two examples;



Natural Join

Natural Join is an Equijoin of two relations over all common attributes. In other words, when joining two tables, join is done using all common columns. Therefore, explicit Predicate is not required. See the sample given. I have used NATURAL JOIN which is not available with some DBMSs. Note that Common Attributes are not duplicated.


Outer Join

This join type includes both matching and no matching values from one relation and matching values from the other relation when two relations are joined. The relation that returns all tuples is determined using the Symbol used for the operation. If the Symbol is opened for the Left Relation, it is considered as the relation that returns all tuples. This is implemented using either LEFT or RIGHT in SQL.


Semijoin

Here is the last one. This join performs a join operations over two relations and projects over the attributes of first operand (or the relation). With this join, tuples can be limited for the join operation by adding a predicate, increasing the performance of the join operation.

 

Monday, May 29, 2017

Understanding Relational Algebra

When the relational model was introduced, in order to work with the model or in order to retrieve or update data in the model, languages were introduced. They are called as relational languages. Initially, two languages: Relational Algebra and Relational Calculus were introduced by Codd during 1971 as basis of relational languages.

When you start studying databases and its related languages, these are the two main languages you learn first, and of course, they are not much user-friendly. If you have worked with modern Database Management Systems such as Microsoft SQL Server or Oracle, then you know that how TSQL or PLSQL is powerful and richer when you compare to Relational Algebra and Relational Calculus. However, if you are to study database and related languages, it is always better to start with basis as it explains the fundamentals.

The Relational Algebra

It is a high-level procedural language. It describes how an operation is performed between two relations (tables: that is the word familiar to us that results another relation. The operation is described with expressions and it can be nested, means, an output of one relation can be used to performed another operation.


This language is considered as theoretical language as well as relation-at-a-time language. It manipulates all tuples (records) using one statement without looping them. There are many different types of operations in Relational Algebra but Codd originally proposed eight operations and they are categoried into two: Unary and Binary.

Unary Operations

There are two operations defined under this: Selection and Projection. They are called as unary operations since they operate on only one relation.

Binary Operations

There are six operations under this category: Cartesian Product, Union, Set Difference, Join, Intersection and Division. Unlike unary operations, they work on pairs of relations.

Here are some sample images I used in my classes. Note that SQLs can be written in different ways for getting the output. The one I have written is not the only way for getting the result.

Symbols

There are set of symbols that are used with operations, here are the standard symbols used with Relational Algebra.


Selection

This operation returns tuples from a single relation based on the specified predicate (condition). Multiple predicates can be added using AND, OR and NOT.


Projections

This operation returns a relation that contains a vertical subset of used relation. In other words, it returns set of tuples only with specified attributes.


Cartesian Product

This operation creates a relation combining two relations, concatenating every tuple in one relation with every tuple in other relation. In simple term, if the first table has 100 records with 5 attributes and other table has 50 records with 10 attributes, this creates an output with 5000 records (100 * 50) and 15 attributes (5 + 10).


Union

This operation makes a relation containing all tuples from first relation or second relation or both first and second relations. This eliminates duplicates. Remember, both relations used must be union-compatible.


Set Difference

This operation creates a relation containing tuples in first relation that are not in second relation. Just like the previous one, both relations must be union-compatible.


Intersection

This operation creates a relation containing tuples in both first and second relations. Both relations must be union-compatible.


Division

This operation creates a relation containing selected attributes in first relation, matching with every tuple in second relation. See the image; It tries to answer Which customers are registered from ALL the countries ranked as 2.


I have made a separate post on Joins. Here is the link for it: http://dinesql.blogspot.com/2017/06/relational-algebra-joins-theta-join-equijoin-natural-join-outer-join-semijoin.html