From SQL to NoSQL/MongoDB: Rethinking your developer life

We live in a massively relational world, storing our data in relational databases, using SQL to find, create, update and delete data, and the thought of storing data was invariably in this paradigm. But the Internet has grown exponentially, different needs has come and relational databases began to not suit the needs of many projects that can not be thought of for years and remain as they are for so many years ahead.

Then, new types of storage approach emerged and stirred relational supremacy. And with the life of developers. In this article I discuss a little my history and what leaded me to use MongoDB as base storage for my projects.

I like a lot agile methodologies, though I do not strictly follow none. I always suggest to my students: learn about SCRUM and read Get In Real (37 Signals), because you must think differently from the traditional to meet demands that are no longer traditional.

Some time ago, something like 7, 8 years, I started to think that programming could not be "monkey work". So I looked for ways to be faster PHP programming and met frameworks such as Miolo, worked in the project and that inspired me to create my own framework. But then I stopped trying to reinvent the wheel, and choosed symfony (but there were other great options).

The natural companion of PHP - the good and old friend MySQL - was the beginning. Actually, before that I have worked extensively with MS Access and a little with MS SQL Server (ok, "everybody's got something to hide except for me and my monkey"), so I did not start from scratch. The wonderful world of SQL got me.

Then came the need to work with other databases (Firebird, Postgres, Oracle, MS SQL Server) and SQL generic was really important. So the world of free software began to put me some dilemmas.

Initially, the dilemma of whether to use specific features (and often with high gain performance and data integrity) like triggers and stored procedures. Advantages: of course performance, integrity, blah, blah. Disadvantages: you will only use that database.

Obviously, on a production environment with high load, this decision is quite simple, because you can not switch databases each weekend. But when working with free software, you must provide freedom of choice for the database - at least between MySQL and Postgres. Then work with SQL, though very beautiful, wonderful, begins to fall in the subtleties of implementation of each bank.

A "facility" of MySQL - the autonumeric primary key- does not exist in others databases, which have different approach, sometimes better, sometimes worse, sometimes just different.

Well, my first step was to use a database abstraction layer well known at the time (and still used today), the ADODB. This made things easier, because keeping SQLs reasonably portable I could work with several databases on the same project. It was just having a SQL schema for each database we want support, and one or another fine-tuned SQL sentence.

PHP included as a standard library the PDO, which really improved our poor PHP developer life ... mysql_connect not more. Not even his OO version, MySQLi. Everyone under the arms of the mother PDO.

Even so - and yet - to be multi-database is part of the problem. It is necessary to validate data (business rules), create forms, edit, insert, retrieve, and delete records ... well, do CRUD. Many small libraries were doing this work, a huge puzzle.

So, I have entered into the world of symfony, and met PROPEL, a wonderful ORM (english, português) that changed my relation with relational databases (ok, somewhat redundant). When Ia started to look at the database in a object-oriented way, my programming life changed radically.

Of course, the paradigm is very different, and at some points fights with DBAs become inevitable (because using some specific features of a database is not easy or possible with an ORM), and there is a huge temptation to return to "control everything again," - I said, put ALL business rule in the application, leaving only referential integrity to the database - is almost inevitable. But not everything is perfect.

To let us distance of SQL, PROPEL has its own query language. Oops ... that is already a problem. Think isolately this is unpleasant, but - worse- the query language of PROPEL is very strange. Difficult to adapt. But the gains outweigh the losses ...

... until I'd found Doctrine, another ORM with many, many features that PROPEL don't have. Two I really like: migrations (I don't know how I lived managed without it until then ...) and a query language a looot more simple, the DQL, at least similar to SQL (which is good for our brain under transition) but with OO facilities. Doctrine allows think the database in OO paradigm, and it does the hard work of converting it into the relational database. Simulates inheritance! Ok, if the developer forgets that everything runs in a relational database, you can do something with sluggish performance. Well, I lived very well in this world for a long time.

But I still had my eternal dilemma ... no freedom to manipulate the database "on-the-fly", and many customers with the same application can change fields without hard updates on the database structure, so ... an object "Product" with multiple configurations with no need to handle a table with many fields or an EAV standard, which works but is very complex. Or simply allow my customer to create a new field in a "People" register.

All this can be solved in a relational database, but the maintenance (and time spent thinking about maintenance) is hard.

I was already looking a long way to start some projects with these assumptions, when I started to hear on the horizon: NoSQL, NoSQL, NoSQL... Well, I researched and discovered that are really BIG players use and invest in NoSQL. Studying the alternatives, I met MongoDB.

Simple, fast, strong community, good documentation and constantly evolving. And fits pretty well with agile projects because it allows easily extend an application with new fields, "small" documents living side by side with "huge" documents in the same collection. The dilemma "product table with different fields (attributes)" is not problem - just save attributes that are needed.

This has a name: schemaless. It is not necessary to define how data will be saved ("CREATE TABLE ..."). Just save it. More than that, no need to create the table. No need to even create the database (CREATE DATABASE ...")! Just say you want to record this data as a documnt in the "X" collection in the "Y" database. And MongoDB creates all that for you. Need not include a field ("UPDATE TABLE ADD COLUMN ..."), just save it. No need to delete a field ("UPDATE TABLE DROP COLUMN ..."), just delete it from the documents (an" unset ").

Plus, you can search in this "mess", similar (but very different) to how is done in SQL. You can search with REGEX! Aggregation is possible through MapReduce, which is also a powerful analysis tool.

There are strange things that the this "freedom" brings. If a document doesn't have an attribute, how can I search it? Well, I can even search documents that have an attribute or not...

And relationships? Get used to the idea that you will use much less relationships. And it will must handle redundancies, and that will have to maintain them in code. On the other hand, uou will retrieve data without complex joins. This is strange, difficult at first, but it doesn't mean that the DBA will be retired, he will only have to think differently. And he will be also responsible for thinking what the best way to store, with more performance, and management functions, backup, replication, etc.. No, you will not fire him, just shake it.

And the programming?

The MongoDB support in PHP is very good, using a native extension. There are several supporting ORMs MongoDB, perhaps with more resources is. .. Doctrine (version 2). Particularly, I use the Mondongo - today Mandango. But both are not schemaless, and this may be a bit frustrating. Therefore, I made a fork in Mondongo for use in personal projects, bypassing this "limitation".

If you want to go this way, keep in mind that the developer's responsibility increases significantly because there is no referential integrity, if you save an integer as string the data is written as is, so you must know how to deal with this freedom. It's easy to make a big mistake. But it is also easy to maintain, no need for migration, database updates. It is easy to scale, it's easy to create redundancy.

It's a different paradigm for building systems.

At the beginning of the text I commented about agile methodologies. It's because I concluded that NoSQL, in the case MongoDB, made me feel really AGILE.

In time, my choice may be good for me but not for you. You could consider my choice, but study and get your own conclusions, which will certainly be the path you must follow.