Monday, May 29, 2017

Understanding Relational Algebra

When the relational model was introduced, in order to work with the model or in order to retrieve or update data in the model, languages were introduced. They are called as relational languages. Initially, two languages: Relational Algebra and Relational Calculus were introduced by Codd during 1971 as basis of relational languages.

When you start studying databases and its related languages, these are the two main languages you learn first, and of course, they are not much user-friendly. If you have worked with modern Database Management Systems such as Microsoft SQL Server or Oracle, then you know that how TSQL or PLSQL is powerful and richer when you compare to Relational Algebra and Relational Calculus. However, if you are to study database and related languages, it is always better to start with basis as it explains the fundamentals.

The Relational Algebra

It is a high-level procedural language. It describes how an operation is performed between two relations (tables: that is the word familiar to us that results another relation. The operation is described with expressions and it can be nested, means, an output of one relation can be used to performed another operation.


This language is considered as theoretical language as well as relation-at-a-time language. It manipulates all tuples (records) using one statement without looping them. There are many different types of operations in Relational Algebra but Codd originally proposed eight operations and they are categoried into two: Unary and Binary.

Unary Operations

There are two operations defined under this: Selection and Projection. They are called as unary operations since they operate on only one relation.

Binary Operations

There are six operations under this category: Cartesian Product, Union, Set Difference, Join, Intersection and Division. Unlike unary operations, they work on pairs of relations.

Here are some sample images I used in my classes. Note that SQLs can be written in different ways for getting the output. The one I have written is not the only way for getting the result.

Symbols

There are set of symbols that are used with operations, here are the standard symbols used with Relational Algebra.


Selection

This operation returns tuples from a single relation based on the specified predicate (condition). Multiple predicates can be added using AND, OR and NOT.


Projections

This operation returns a relation that contains a vertical subset of used relation. In other words, it returns set of tuples only with specified attributes.


Cartesian Product

This operation creates a relation combining two relations, concatenating every tuple in one relation with every tuple in other relation. In simple term, if the first table has 100 records with 5 attributes and other table has 50 records with 10 attributes, this creates an output with 5000 records (100 * 50) and 15 attributes (5 + 10).


Union

This operation makes a relation containing all tuples from first relation or second relation or both first and second relations. This eliminates duplicates. Remember, both relations used must be union-compatible.


Set Difference

This operation creates a relation containing tuples in first relation that are not in second relation. Just like the previous one, both relations must be union-compatible.


Intersection

This operation creates a relation containing tuples in both first and second relations. Both relations must be union-compatible.


Division

This operation creates a relation containing selected attributes in first relation, matching with every tuple in second relation. See the image; It tries to answer Which customers are registered from ALL the countries ranked as 2.


I have made a separate post on Joins. Here is the link for it: http://dinesql.blogspot.com/2017/06/relational-algebra-joins-theta-join-equijoin-natural-join-outer-join-semijoin.html

No comments: