Wednesday, October 12, 2005

Version Control of Database Data
Applying the concepts of version control to databases

Background:
I've found that there is very little information overall on this subject. Moreover, I understand (correct me from wrong) that there are no complete database versioning systems out there.

I have put some thought into this semo-automatic way of accomplishing the task. Feel free to use and adapt the concepts however you like. Note: This is not a tutorial or a beginner's-guide, but more a road-map of a viable concept of database version

A way of accomplishig version control of database data:

1. Create a dbdata-directory under your trunk:

trunk
|-proc
|-www
|-conf
|-dbdata

2. Then, write a script that will perform the following operations, for each database:
a. Extract "show table status from dbname" to trunk/dbdata/dbname.status.sql
b. Use this to determine:
- Which table-schemas has been modified?
- Which tables' data has been modified?
(Of course, if it exists in db but not in dbdata/ or vice versa, both should be seen as modified)
c. Use the information from b. to extract the approriate table's schemas into trunk/dbdata/dbname.tablename.schema.sql (with CREATE TABLE, important), and the approriate table's data into trunk/dbdata/dbname.tablename.data.sql, overwriting the previous one's, if any. (Use complete inserts, and remember to prior to this make sure all tables have a primary key!)
d. Run the approriate VCS-command to delete those .sql-files belonging to tables that do not exist in the database anymore.
e. Run the approriate VCS-command to add all un-versioned .sql-files in trunk/dbdata/

3. Hook this script up on a pre-commit hook. A VCS status-command after this script's execution will have the VCS show exactly which table's schemas and data has been modified, and a commit will, because of the .sql-files' text-format, only save the diffed changes in the repository, saving lots of space compared to putting the table-binaries under version control (which in itself is not recommended).

Viola! You have automatic commits of your database schema and data, in a format that is suitable for the VCS.

4. Now, write yet another shell-script that will perform the following actions:
a. Check which table's schemas and which table's data has been modified by comparing the checked out files with the files belonging to the revision that the working-copy is updating from.

In case of conflict, see the table as not modified, in order to prevent application of conflicted files to the database. Resolve manually (see below) and commit once and then update in order to get out of the conflicted state.

b. For each modified table: Apply DROP, then the ...schema.sql, then the ...data.sql

Note: b. will handle deletion of tables completely by having no .sql-files to execute after the DROP. The creation is handled by the CREATE TABLE in ...schema.sql.

5. Hook this script up to post-update.

Voila again! You have automatic update of the database data integrated with your VCS.

Resolving conflicts / Merging database schemas and data:
Use a tool like SQLYog Enterprise's Database Schema and Data synchronization wizards to merge the the two ...schema.sql.r##-files.

This is more of a mind-game than actual practise, as I have not myself used this method yet. Feel free to comment on additions, adjustments and shortages with this method!

Other resources:
- Another blog about the subject, for a more comprehensive and complex solution
- Thread at Subversion's mailing list discussing Database Version Control
- Some guy proposing making an OOS-project of DVC

Forum posts that eventually lead up to this blog-entry:
- .sql patch-file generator - Synchronize two DBs (July 16th, 2005, 01:30 AM)
- Putting a database under version control (July 16th, 2005 10:345 AM) [No replies]
Using Version Control in Web-Development
The use of a Version Control System (VCS) in web-development

Reasons:
Offline-development, more than one developer, testing (staging)-area to try out code on server before production-release, always have good conscience when deleting old dev-files and old thoughts / ideas in todo-lists (all versions of every file is saved in the system!)

My experience:
It took me a long time to even get all the pieces and facts together to actually understand what a VCS does! All guides were all about how to use them. Why should I learn that when I don't know what I was to use it for, and in what context? What happened was I dropped my interest in VCS for some months. First when I really needed a smart solution for offline-dev and staging-area I took back my interest and searched a LOT about it and posted pleas of clarification on desvshed.

(If someone presented me with the reasons above a long time ago, I probably would have started using it a long time ago and saved huge amounts of developing time! VCS correctly used is really a saver!)

But even then, it took me a long time to understand which VCS to use (CVS is the most popular historically, first I thought it was the only one!).

Now I know.

Choosing a Version Control System:
CVS has many shortages, for instance it cannot version control directory structures... Subversion can do this. And this is what I use today, together with the great windows GUI-tool TortoiseSVN.

(SVK is yet a Subversion-like VCS that even claims to have all that Subversion-capabilities plus offline repository-mirroring an such, but without the existance of TortoiseSVK, I will not use it)

Prerequisites:
First of all, my setup is a Windows XP Pro laptop for development and a Linux server with shell-access for production purposes. (I will eventually post my experiences of how to cope with the situation when there is no shell access available, as i am to face that problem in a project coming up soon).

(Do not worry if you do not understand some of the terms I am talking about. Either you read the Subversion Book now or after this. All will come clear after the initial chapters in that book!)

Note! Common knowledge about linux adminstration, user permissions etc are recommended. I had to learn it from scratch when I started this out. Be prepared rather than learning as you come, it will save you head-aches. Expecially: Understand that permissions are grouped into user, group, and all - levels.

Use a google search to find a nice Linux adminstration tutorial, and use LAME for reference.

My way of using VCS with web-development:
Using shell access to my production-server I have set it up using the ssh+svn:// method (using putty-tunneling for a secure simulated direct connection). I guess WebDAV is faster, but I haven't come araound changing it to that. It was simply easier to get ssh-tunneling up and running, and furthermore it is easier to set up permissions, as it uses the server's linux-permissions for this. (All activity through WebDAV goes through the webserver-user)

Preparations:
I...
1. Checked-out a working-copy to my laptop and got a server up and running on the checked-out code.

Tip: Have the server function as much alike as possible as the production server. This will save you lots of headaches! Especially important to be the same: mysql, apache and php versions, php safe mode activated or not, activated modules.

2. Checked out a working-copy to productions folders on the server, but called them www-staging etc.

When doing step 1 and 2, be prepared for a bunch of permission issues to take care of to make the checked out copies work with the webserver (since php in safe mode on linux is much more sensitive than on windows, where everything commonly is run by the same user and with adminsitrator rights). The most common solution here is to allow full read and write access globally* to folders for file-uploads, image-uploads etc and wherever the webserver-userneeds to alter stuff in the filesystem.
Also, some svn-ignores is good to put in wherever you have temporary files lying around. (Remember to ignore the compile-directory if you use S

* Globally because the user that created the files (you) most often is not belonging to the same permissions-group as the webserver-user! If your linux supports Access Control Lists, you might want to use that instead.

3. Made a common include that together with virtual host-directives adjusts settings and paths depending on the location of the code - so that they all can work on the different locations with the same identical code-base. I use the following scheme:
http://local.site.com - Local on laptop (dns through hosts-file only)
http://staging.site.com - Staging code and copy of the production database
http://preview.site.com - Staging code and uses the production database
http://www.site.com - Production code, live
4. Deleted the production folders and checked out working copies for the production code instead, once I was certain it was to work.

(Use
# Disallow browsing of Subversion working copy administrative dirs.

Order deny,allow
Deny from all

in httpd.conf for safety)

I you managed to come this far, you are ready to fully use VCS in your web development!

Basic work-cycle, an example:
(One developer, offline development, staging-area)
1. Develop on laptop and the local server. When considered done, commit.
2. Update the staging folder, try it out and fix bugs. Commit changes when done.
3. When moving to production, tag this bugfixed code-base, lying in the trunk, as /tags/site_v1.#.RevisionNumber
4. I then switch the production-codes code-target to this tag-folder. And voila changes are made live, already bugfixed, almost instantly.

This is very summarized. I have more advanced routines in use, taking advantage of branching for post-bugfixing and patching already staging-released code while developing further on laptop etc, as well as coping with routines for two developer's development.

If you plan to use Subversion, read the Subversion Book, it explains everything!

Database Management for Version Control Systems:
As for today, there are no VCS-systems for Databases. I have solved my needs of at least being able to syncronize my development and staging-databases from time to time by using the great SQLYog Enterprise's Database synchronization wizard. It is totally worth the money (99$ for commercial use. 49$ for private).

I have tought of a semo-automatic model of using VCS with a database, and I figure I eventually will use it as well. As for today with my needs, it would be too much overhead though. My thoughts about how to do it, however, are posted here: Database Management for Version Control Systems (Coming).

Forum posts that eventually lead up to this blog-entry:
- Set up a CVS server to which it it should be possible to log on with ie NuSpherePHPED (December 22nd, 2004)
- Developing Web Applications in Three Levels for offline work (July 9th, 2005)
Becoming a blogger

Well, then. Even I eventually got caught up in this blogging business...

I want an area to post some things that have taken me a long time to learn / find out, in order to prevent other's going through the time-consuming process of trial and error that I've (haven't we all?) done so many times. My vision is to eventually dig up all or at least most of my computer-related trial and error-lessons and eventually present them here.

Important: I do not expect this blog to be frequently visited! But... I do hope that it will show up properly in the search engines so that people searching for solutions to the problems I've had will find this site somewhere in the results. Followingly, I make no active promotion for this blog. Furthermore, I do not expect myself to post here more than once each third months or such! This is a long-range project that will only bring up the essentials of avoiding trial-and-error.

Short about me:
Swede. At the moment 19 going on 20 (27-dec-05), having dealt with computers from themoment I got a 1983-model "laptop" (tabletop...) as a 3-year old. Changed my dad's laptop's password as a 4-year old (accidentally ruining his day at work since he had no access to his computer).

Started coding webpages in 7th grade. Scripting (ASP) in 9th. Gone over to php in 10th grade when I started developing my first website worth mentioning: http://www.demomusic.nu.

Since then developing in many hours, mostly at night, having time for important stuff on the evenings. Started a web-publishing company in Jan 2004, and just recently (June-05) gone over to professional php-development on the site http://www.prylbanken.se. Also works as IT-Administrator at a company as well as consulting with computer support in private homes.

Interests in the computer world: Web-development, Backup! (too many crashes... too many dead days reinstalling systems), Open-source (although not yet a contributor, maybe later), Visions associated with the use of Clustering, Trying out the latest techniques and software, Computer mobility.

Outside: Friends, Girlfriend, Piano, Saxophone, Guitar, Disc golf, Stand-up (watching :) ), Exercising, Discussions about latest scientific progress in physics (hmm...).

Studying: Industrial Engineering and Economics at the Royal Institute of Technology in Stockholm, Sweden.

If I would not to study for the sake of what I feel would give me the greatest job-oppurtunities, I would study to become a Physician's expert or a Psychologist.

View of people and emotions: Every thought and action has it's grounds in our most basic instincts of survival and reproduction. See through this, understand the basic principles behind the human mind, and you will understand people's thoughts and actions! Remember: Every person has a different genome, a different grow-up environment as well as different social advantages from the start!

Well... That was I guess more than enough about me. Hope you enjoy my other posts that are to come!

Last inital words:
Feel free to comment on this post if you have any ideas for common trial-and-error situations that could be brought up here!