After a long search, I did for myself a tool to migrate data to MongoDB.
It’s a recurring need, a lot of people look for it: a developer decides to migrate a MySQL project to MongoDB . But how to do it?
Actually it’s not so hard as it seems. Exports in CSV from MySQL and imports in MongoDB in the same way. But there is a big trap.
First of all, the datatype issue. If you import with CSV, all data will be imported into MongoDB as string, even if is integer, float or date. That’s bad, really bad.
It’s important to remember the paradigm change. So tables with classic relationships as below:
INVOICE < INVOICE_ITEMS > PRODUCT
has no meaning under MongoDB. The DE-normalization, redundance in application level are others features that make hard directly migrations. The database must be remodeled.
Considering this questions, I’ve created this tool to help in those situations (but I know it’s impossible to handle all situations).
- Runs migration through a config file using YAML format (very simple to edit)
- To start the config, has a php-cli dumper that reads the de RDBM and create a config file to begin
- Support to datatype mapping (really imports date, boolean, integer, float and string BSON types)
- Support to simple foreign keys relations
- Support to many to many foreing keys relations (not automatic dumped, but easyl configured by hand if you want)
- Foreing keys converted to MongoDB DBRef or not (see config file defatul->use_dbref)
- Easy ignore tables to import
Let’s walk throught this features in details…
Runs migration through a config file using YAML format (very simple to edit)
The YAML format is a “relaxed XML” using a indented file that can be easyly transformed into an PHP array. It’s so much easier to edit a YAML file than a XML file or even a large PHP array.
To start the config, has a php-cli dumper that reads the de RDBM and create a config file to begin
I’t would be real hard to make de config file all by hand. So, the runDumper.php script (running in shell) creates a starting config file for you. It can be suficient or then you can adjust it. It asks you the source database connection info and dumps the YAML file, and than you can customize.
To make the things real easier, it’s possible to ignore tables or fields, or just delete its lines in the config file. Simple relations (foreign keys) are automatic maped IF YOU HAS THE CONSTRAINTS correctly configure in the source database.
Another feature is the possibility of rename target field names (target_name) and, eventualy, even the datatype, if does not fits to you (target_type).
Support to datatype mapping (really imports date, boolean, integer, float and string BSON types)
The PHPMongrationDriverMysql (wich extends the basic PHPMongrationBaseDriver class) maps the basic MySQL datatypes to MongoDB. So, in the YAML config file you can choose which type will be used to save the record into MongoDB.
In this way, date fields will be saved as MongoDate, and so on.
Support to simple foreign keys relations
This mapping is automatic if the table has well configured CONSTRAINTS. You can also parametrize if the relationship will use or not MongodDB dbrefs (see the “default” session on config file).
To be possible doing such thing, all data is imported, even the original foreing key, and only after the data is passed again to make references to MongoID.
Support to many to many foreing keys relations (not automatic dumped, but easyl configured by hand if you want)
As each developer can use a diferent pattern to name his tables and relationships, it would be hard to guess all situations. Besides, this are the most common case of non-conversion – to accommodate in the new moddeling pattern.
So, the way used was simpler: manual config when is desired. In MongoDB a field could be an array, so it’s really easy to reference various documents as a “member” of a “master” document.
Starting in that point, it’s possible to choose in which collection to put the relationship, or for redundance to put in both collections.
It could be configured in “user”:
(...) tables: user: many_references: groups: reference_table: user_group in_id: user_id out_id: group_id out_table: group out_table_id: id
And it’s possible to put in “group”, or both.
Important: soon I want to implement “embedded” import, so the reference table could have more info (e.g. INVOICE_ITEMS would have quantity, subtotal, and so on).
oreing keys converted to MongoDB DBRef or not (see config file defatul->use_dbref)
As explained before.
Easy ignore tables to import
The config field permits to list tables and fields to be ignored. See options default->ignore_tables and tables->??->ignore_columns
I believe that this overview gives you a good picture of the tool. But there’s a lot to improve…