Friday, July 3, 2015

What is Schema-on-write and Schema-on-Read?

When subjects related to Database Management Systems is discussed, the term Schema-On-Write is not widely used (or not popular), though that is what we have been using as a method for writing and holding data. This method forces the record to be matched with the defined schema before writing it into the storage. For example, in order to hold Customer data, we create a table in a traditional database with appropriate columns, with fixed data types. When a record is inserted, the record has to be aligned with defined columns (defined structure) and the record gets validated (via columns defined, type of them and constraints) before inserting to the table. This slows down the insert operation but consistence of the record is guaranteed. This method is called as Schema-on-write.

Schema-On-Read is a method that matches data to the schema as it is read from storage. Traditional database management system does not employ this but new platform like Hadoop uses methods like this for processing data, specifically on semi-structured and unstructured data. For example, if there is a dataset that holds some log records formatted as semi-structured or unstructured, content of it can be read with a defined schema as per the requirement. The requirement may need only few elements of the content and schema defined for reading addresses only required elements. There can be another requirement on the same content but needs to read differently, new schema is applied when reading. This slows down reading operation as data that is read has to be checked with defined schema.

No comments: