Dinesh's Blog :::: Being Compiled ::::: NoSQL

Showing posts with label NoSQL. Show all posts

Thursday, August 18, 2016

NoSQL, Not only SQL, NewSQL: Are they same?

Last year, I wrote a post on NoSQL Document Database: What is NoSQL Document Database? What is Azure DocumentDB?, that speaks about Document Database and how it stores data. Although the post explains about NoSQL as well, it does not talk about other terms like Not Only SQL or NewSQL, hence thought to make a post on it.

NoSQL in simple term

We talk about NoSQL when we talk about different type of databases (or storage). In most cases, our projects were done with relational databases but for certain scenario, it needs a different structure to hold data, specifically when the data needs to hold is semi-structured or unstructured. As I mentioned in the previous post, it is not a new database type, it is as old as relational databases. In simple terms, NoSQL database is a non-relational database that allows us to store mainly large mount of semi-structured data.

NoSQL has become prevalent in recent years because;

Previously-ignored-Big-Data has become the most important.
Generally, applications built ignore some of the data not because of they are not important but difficult to process with traditional DBMSs. Large streams of data, semi-strucutured or unstrcutured data coming from various equipments are some of the examples. With modern technologies and techniques developed, it has become easy to process them, hence previously-ignored data needs to be captured and stored. This is one of the reasons for getting NoSQL databases are populated.
Less-expensive-scaled-out solutions
For processing large datasets, specifically data without a proper structure, it needs a good infrastructure solution. Initially, the required infrastructure was very expensive, but now, with powerful commodity hardware, virtualization technologies, cloud services, the cost has relativiely has gone down. Not only that, since some services can be taken on demand and pay as you go, the usage of them has become easy and affordable. This has definetly become another factor for higher usage of NoSQL.

Not Only SQL

Although NoSQL databases are non-relational, some of databases now support SQL as well, hence sometime they are referred as Not Only SQL.

NewSQL

The NewSQL represents relational database that offers NoSQL capabilities. It offers similar scalability and allows us to work with semi-structured and unstructured data as we do with NoSQL databases.

You can see types of NoSQL databases available with this link: http://nosql-database.org/

Wednesday, October 28, 2015

Loading data from Microsoft SQL Server to DocumentDB in Azure

WIth my previous post (How to create a DocumentDB in Azure), I mentioned that there are multiple ways of loading documents into DocumentDB in Azure. This is the continuation of it, let's see how we can load data from SQL Server into DocumentDB using Data Migration Tool. Data Migration Tool is a free open-source tool that can be used for loading data into DocumentDB from many different sources. It converts loaded data into JSON hierarchical documents and supports loading data from Microsoft SQL Server, MongoDB, JSON, CSV, Azure Table Storage and many more.

The Data Migration Tool can be downloaded from Microsoft Download Center. However there is an easy way, link for the page is available with DocumentDB account blade;

Go to the page and download the compressed file. Once files are extracted, you will see two executable files; dt.exe and dtui.exe.

Click the dtui.exe for opening the tool. As usual, welcome screen appears, and next screen for selecting the source. Since we try to load data from SQL Server, select SQL under Import from. You need to manually add the connection string (you can get it generated by creating a UDL file), make sure it is added without the provider.

Let's try to load data from AdventureWorksDW2014 database. This is the query for loading customers with locations.

USE AdventureWorksDW2014;
GO

SELECT
 c.CustomerKey AS [key]
 , c.FirstName + ' ' + c.LastName AS [Name]
 , c.AddressLine1 AS [Location.Street]
 , g.City AS [Location.City]
 , g.StateProvinceName AS [Location.Province]

FROM dbo.DimCustomer c
 INNER JOIN dbo.DimGeography g
  ON c.GeographyKey = g.GeographyKey;

Examine the query. Note the way how aliases have been set, specifically columns related to the address. If you need a hierarchy inside the document, this is the way of instructing the tool for making them. As per this query, a document will contain properties like key, name, and location, and location will have three more properties; Street, City and Province.

Once the query is placed, write dot (".") for Nesting Separator for instructing the tool that dot is used for separating nested properties.

Export to
Next screen for setting the target. There are multiple options for Export to, since this loads only 18484 documents, we can use DocumentDB - Bulk Import rather selecting other items such as DocumentDB - Sequential record import and JSON file. This creates a stored procedure in the collection and uses for loading documents. It is recommended to use second option which is DocumentDB - Sequential record import for large set of documents because it is faster than bulk load. The third option JSON file does not load documents to DocumentDB but creates JSON files.

Connection string
Setting the connection string is easy. All we have to do is, get the copy of primary connection string from keys in documentdb account blade and add the database name to it. Note that database name has to be manually added.

Collection
We can load documents to an existing collection or create a new one. If the one we added with the screen is not exist, tool will create a one for us with the selected performance tier.

Id field
If you leave the Id input box blank, it will create one for us, else we can instruct to use one of our columns for that. As you see, I instruct to use Key column for Id of the documents.

Indexing policy
We can add a policy that explains how we search documents and what type of index we prefer. We do not need to write it but can select the required index by right-clicking on it for getting the menu and selecting one. There are three options;

Default - Efficient indexing only for numeric supporting equality and range check and sorting. String searches cause scan and no support on sorting.
Hash - Support only scanning for both numeric and string, and no support on sorting.
Range - Efficient indexing for both numeric and string including sorting but may use extra space and resources.

Now it is ready for loading, go through the summary and complete it.

Documents are loaded, we can check in the collection whether documents are loaded as we instructed.

Tuesday, October 27, 2015

How to create a DocumentDB in Azure

Microsoft implementation of NoSQL database, documentDB is a cloud based, fully managed, massively scalable service for managing semi-structured data in Azure. Just like other NoSQL products, it supports schema-free JSON documents and unlike other NoSQL platforms, supports indexing automatically on all properties in the document as soon as it is added. Properties in added documents are easily query-able using SQL (Not exactly SQL for relational databases) and any property in the hierarchy is searchable.

This post speaks about creating documentDB in Azure with simple set of steps. If you are not familiar with NoSQL, read this post: What is NoSQL Document Database? What is Azure DocumentDB?

Let's create a documentDB. Login to Azure with new portal (https://portal.azure.com/) using your Azure account. Then Browse for DocumentDB Accounts and click on it for opening the blade related to DocumentDB creation.

Once it is opened, firth thing you need to do is, creating an account. Click on Add and enter a name that is globally unique for DocumentDB Account.

DocumentDB Account is created and it is available as .documents.azure.com. This still does not have a database (see image), for creating a database under the account created, click on Add Database button, and give a name for your database. As you see, I have named it as db01.

Once the database is created, you should see it in the Databases grid in the DocumentDB Account blade. Every component in documentDBs has a ResourceID and every documentDB needs at least one Collection.

What is a Collection?

Collection is 10GB of storage that can be created inside the database. It is not only for handling storage but it determines how much you pay for Microsoft. DocumentDB pricing is based on Collections and price is getting changed based on the Pricing Tier assigned to the collection. It is hourly-basis billing and scaling up and down is done by adding or removing collection assigned.

You can get the database blade opened by clicking the name of the database created and add Collection by clicking the Add Collection button. Add Collection blade allows you to enter the name of the collection, select the Pricing Tier, and select the Indexing Policy.

What is Request Units per Second?

As you see, Collection can be assigned to one Pricing Tier out of three; S1, S2, S3. Storage offered with each tier is same but RU different. RU, Request Units per Second simply means how many requests can be performed against the Collection per second. It can be reading requests, can be writing requests but the counting is not exactly the number of requests made but the throughput for the request considering usage of the resources such as CPU, memory and IO operations. As I read, generally a read-request lesser than 1KB requires one RU but a write-request with a same size of document requires more than RU as writing cost is higher than reading cost. A complex request that needs to update documents with many indexes may need more RUs. How do you determine the required RU? It is possible by looking at dashboards given with the blade, it shows all requests made and the RUs used for requests.

Here is an image explains model:

Image was taken from DocumentDB hierarchical resource model and concepts. Read it for more info.

You can start with the tier as you want for the Collection. Once the tier is selected, Indexing Policy has to be selected, it has two policies; Default and Range. The Default is good for properties with numbers and Range is best for both string and numbers but the storage cost is high. For this, let's create a Collection with S1 and Range policy.

Now we have a database with a collection. All Collections are appeared in the database blade with their ResourceID and can be opened by clicking on them. There are many ways of connecting with the database, adding documents and querying documents;

Using the portal itself - Manually add one by one or upload upto 100 documents at a time.
Using DocumentDB migration tool
Programmatically using REST API (difficult and complex), SDK (like .NET, flexible and can do almost all)
Using Stored Procedures and Triggers added in the collection.

Let's use the given interface for adding and querying for testing. Click on the Create Document button in the Collection blade (in my case it is Customer). This opens another blade that can be used for writing a document. As you see, it adds Id automatically which is required. If you do not add an Id to your document, it will create one for the document. Here is my document;

Once it is added, it can be viewed using Document Explorer in database blase as below.

Let's see how this can be done using other ways with next post related to DocumentDB.

Thursday, October 15, 2015

What is NoSQL Document Database? What is Azure DocumentDB?

NoSQL databases are not something recently introduced, it was another type of model that was discussed with relational database management system (introduced in 1970s) whenever disadvantages of relational database management concepts were heavily hitting against traditional database management systems. It is as old as other database concepts like file-based models and relational models and was exist since late 1960s.

While the relational database model is considered as the Second-Generation database model, NoSQL database model is considered as Third-Generation database model. Initially, NoSQL model was introduced for addressing Object-Oriented concepts within the databases calling it as Object-Oriented Database. This was not required with many solutions at the beginning, used by few applications but it started seeing as another type of required backend product with modern projects related to big data, real time applications, and databases that require heavy scaling out implementations.

What it exactly does?

There are different types of NoSQL databases; Column databases (Cassandra), Document databases (MongoDB, DocuemntDB), Key-value databases (Oracle NoSQL database), Graph databases (InfiniteGraph). NoSQL does not mean No Structured Query Language, it means no relational structure, hence No-Relational would be the best name for it. The type we talk about with this post is, Document databases, and it does not mean it stores documents, it stores objects, hence object database would be the best name for it.

Document databases store objects, with no predefined, static schema. It is flexible and designed for documents with semi-structures, typically schema-free JSON. One key advantage with NoSQL is, scaling out. Traditionally, relational database management systems support scaling up by adding more resources to the box, but limited with various factors like OS, Mother-board used, etc. Scaling out is also possible with relational databases but heavily limited. This is completely addressed with the NoSQL database. It is designed for scaling out and boundary for the limitation is availability of resources, nothing else.

How it stores data?

If you understand how it maintains data, then you understand NoSQL document database. We all are familiar with relational structure, we maintain relations (with tables), entities (with rows), and attributes (with columns) with our relational databases. NoSQL database maintains them differently; row as a document and column as a property.

Here is an example. Following image shows how a table is manged for Customers in a relational database.

And this is how they have be managed in NoSQL database.

As you see, though all three documents have common properties, addresses of customers are maintained different ways, this is because NoSQL is schema-free.

How Microsoft supports on it?

Microsoft implementation of Document database is called as Azure DocumentDB. If you have an Azure Account, you can create a DocumentDB and start using it for your NoSQL requirements. Since it is completely could-based, you do not need to worry on infrastructure required for it, setting it up and mataintaining it. Let's see the way of creating and using it with next NoSQL post.