Software Secret Weapons™


 
Lessons learned while moving from JSPWiki to WordPress
by Pavel Simakov on 2007-06-20 16:39:55 under Code Generation, view comments
Bookmark and Share
 


The History

JSPWiki2WordPress

The Software Secret Weapons web site has been running for over two years on the open-source Java wiki engine called JSPWiki. It was not used exactly as a wiki supposed to be used, since I was the only writer. But the ability to edit pages remotely using web interface and to have the revision history for each post was very important to me. This is exactly what wiki's do the best.

JSPWiki version 2.0.52 proven to be a great Java open-source product. It worked well out of the box, was stable, had no bugs (that I hit at least), was easy to compile and debug. I was also able to make many changes and patches to tune it's behavior. The most importantly - it is very well designed allowing even junior Java programmer to make changes after a very short learning curve. Great job, Janne Jalkanen.

The Problem

As the web site traffic grew, I continued to host it from my basement using seven years old Pentium 486. Using uninterrupted power supply, patched Linksys Router, and custom residential ISP with static IP address I was able to get to only about 5-15 minutes of total downtime a week. This is not bad, but constant worry about the uptime finally got to me and I decided to move the site to a "cheap" hosting center.

Finding Java-Enabled Shared Hosting Center

To cut this part of the story short let me tell you that Java shared hosting is a nightmare. In addition to typical shared hosting limitations, Java has even more issues:

  • the Tomcat is restarted only once per day
  • the custom jars for your app are reloaded once per day
  • you have very limited access to a file system or none at all
  • Apache web server and MOD_JK are both installed, typically on Linux; all you WEB-INF folders can be accessed from Apache side bypassing MOD_JK if you are not careful with .htaccess or have no rights to properly edit this file at all
  • the hosting is hard to find, most shared hosting companies don't like Java
  • the hosting price are 3-4 times that of typical LAMP hosting
The once-a-day reload of custom jars makes Java shared hosting virtually useless for agile web development. The bottom line, once again, is that Java hosting is a rare, expensive and painful animal.

LAMP to the Rescue

Earlier this year I had to learn PHP/LAMP for variety of reasons. LAMP world dramatically changed my prospective on Java world. LAMP shared hosting, for example, is a complete opposite of Java shared hosting. It's full of features, cheap, reliable, widely available, simple and carries an instant gratification - any change to PHP sources has an immediate effect. The capabilities and expressiveness of .htaccess alone bit WEB-INF/web.xml hands down. But ease of updates and deployment is the key.

Be careful here, as I am not advocating that all Java developers have to move to LAMP camp overnight. Having now substantial both LAMP and Java experience, I am still a solid supporter of the Java core technologies, especially for the development of the enterprise level, distributed or mission-critical applications. But blogs, wikis and many other "simple" web-facing applications are done better, easier and faster in LAMP.

The Solution

Last weekend I decided to move Software Secret Weapons web site from Java onto LAMP! It was a complete success that I want to share with you. The page you are reading now is hosted in PHP! In the following sections you can find the discussion of the various issues I faced in the migration and the solutions that were found to complete the task.

The Design Rules

For any wide-open project like Java to PHP migration I always define the design rules. These are similar to vision, values and mission in the corporate world and they help to set the mood. These are the design rules for JSPWiki to LAMP migration:

  • no manual work (the site has over 200 pages)
  • fully automated repeatable migration with the ability to test and to review the final result before going live (my readers wouldn't like seeing the broken links and screwed up HTML pages)
  • all URL's to images and pages of the old site must be honored after the move (to preserve the page rank and backlinks)
  • all abstractions of the old site must be preserved, including page embedding (to preserve metamodel)
  • look and feel must be preserved with 100% uptime during the move (so one one knows we are moving)

The JSPWiki Metamodel

In this migration I had not intentions to rewrite the JSPWiki in PHP. The goal was to recreate the same site, but on LAMP. There are plenty of general categories of open-source products that could serve as a JSPWiki replacement. Here is the analysis of these products:

Concern/Desired FeatureJSPWikiWiki engines Blog enginesBulletin board engines
The core artifact iswiki page wiki pageblog post forum topic
Engine can maintain the revision history of the artifactYES YESNO NO
Images and files can be attached to the artifactYES YESYES NO
Tiered administration (admin, editor, reader) is supported NO NOYES YES
Several authors can edit artifactsYES (as a small customization)YESYES YES
Readers can add comments to artifacts, but not edit artifacts NONO YESYES (as topic replies)
Free-form server-side script can be embedded into the artifact NOYES YESYES (as a small customization)
Artifacts can be freely composedYES NO NO NO
Custom skins and themes can be developedYES YESYESYES
Server-side dynamic pages that compute stats, etc. can be created YESYESYES YES

After considering several different wiki's, blog engines and bulletin boards the WordPress was chosen. Having no prior experience with WordPress and not knowing its real limitations, I decided to simply move all pages from JSPWiki to WordPress, creating one new WordPress post for each original JSPWiki page.

The ability to compose pages by embedding them into each other is a very valuable feature and no PHP product really offers it as a standard feature. Let me explain. In JSPWiki there is a standard plugin that allows embedding of one page into another, like this:


[{INSERT com.ecyrd.jspwiki.plugin.InsertPagePlugin pageToInsert=ProblemSolvingTopBar}]

I found this feature very useful to add common header/footer to a set of posts. The example of such header/footer can be seen at the top of the Linguine Map's page. The header is a separate page that is embedded into many other pages dynamically. Without page embedding one would have to duplicate the header in every post, which is evil.

A gut feel originally, the WordPress would later prove to be a very good choice because of its good internal design and extensibility with actions, hooks, and filters. The page embedding and other features that are missing from WordPress would be tackled after the pages were moved. And I officially and sadly gave up on having the revision history for each post... 

The Export2Wp.java

The Java source for a migration tool Export2Wp.java is attached. In about 250 lines of Java code the following is accomplished:

  • JSPWiki engine is configured and started outside of Tomcat or any other web container
  • all pages are retrieved
  • the most recent page version is retrieved
  • wiki text is converted into HTML
  • embed page plugin calls are replaced with sswPageEmbed() PHP calls in hope to complete page embedding later
  • WordPress database connection is established directly into the hosting center
  • SQL statement for each post is prepared and executed
  • original JSPWiki page name is used as a WordPress post slug to simplify later URL rewriting
  • all attachments are retrieved
  • the most recent version of attachment is retrieved
  • attachment binary data is written to a properly named file on the local file system
  • all attachment files are FTP'ed from local machine to the hosting center
There are several options in Export2Wp.java that might need to be tweaked for your own purposes. It's quite easy, for example, to create either WordPress "posts" or "pages" or to change posts to be "private" or "draft". The specific user_id for the author of the posts can be set as well. But the good news is - just insert posts into one WordPress table and you are all done. No foreign keys or other things of that nature to worry about. The page encoding did prove to be a bit of a problem. Make sure that MySQL database is UTF-8 or whatever you need. The JSPWiki must be configured with the proper encoding as well (Windows-1252 in my case).

Takes about 30 seconds to move 200 posts over. Now all the pages are in the WordPress! But we are far from over as some of the WordPress customizations took some time to complete.

WordPress Customization: Custom Look & Feel

Believe it or not this is what took the most time! I assumed that having so many themes around WordPress would make customizing look and feel easy. I was wrong. Let me explain.

As was mentioned earlier, the WordPress has good internal design and is extensible with actions, hooks, and filters. The "default" and "classic" themes shipped with WordPress 2.2 are great. They have reasonable number of comments and are more or less easy to read. There are not that many advanced feature examples in the distribution, however. 

Advanced features you can find in the support forum and other themes, many of which actually written by people who have no idea how WordPress works. Many users keep reusing each other's code. They add a line or two, add if() here and there, change text, replace images and the new new theme is born. At the all you have is a big mess without a real author behind the theme, no clarity, no comments, no working code.

Implementing the custom theme to match the old site design and doing the page embedding were both very advanced feature. In order to complete the migration I had to learn a lot about WordPress internals. I am now able to use many advanced features like nested WP_Query objects, custom SQL statements, custom templates, etc. But this was not easy. Allocate a lot of time in your schedule for learning WordPress. Remember that remote debugging is not easy (none at all) in shared hosting...

WordPress Customization: Page Embedding

This part was easy only after I learned WordPress in depth.

Download runPHP WordPress plugin. Copy all the files into WordPress plugins directory and you are all done. The Export2Wp.java migration tool already enables runPHP for each post by adding the corresponding row into the wp_postmeta table. As was mentioned earlier, each reference to the JSPWiki plugin was replaced with the corresponding PHP function from


[{INSERT com.ecyrd.jspwiki.plugin.InsertPagePlugin pageToInsert=ProblemSolvingTopBar}]

to

<?php sswEmbedPage('problemsolvingtopbar'); ?>

This approach to page embedding worked fine, but was later abandoned. It was abandoned because allowing the site editors to run PHP is a very serious security risk that I was not willing to take. Ultimately the runPHP plugin was disabled. Instead the newt custom template HTML tags were added that served the same purpose. Much safer, and not limiting in any way. The administrator can still run PHP if desired!

The Magic of .htaccess

The .htaccess can help you to preserve the old URL's to images and pages. The JSPWiki has these standard URL patterns:

  • view: /jspwiki/Wiki.jsp?page=MyPageNameHere
  • edit: /jspwiki/edit.jsp?page=MyPageNameHere
  • diff: /jspwiki/Diff.jsp?page=MyPageNameHere
  • attachment: /jspwiki/attach?page=MyPageNameHere
Some other URL's might need to be added here (the RSS feed URL in my case).

The requests to all these URL patterns were directed to a new PHP page, that issued 301 redirects to the new page location. New location URL was constructed assuming that new page (or the page slug) is the same as the old page "page" parameter as was enforced during the migration above.

The Testing

I hoped to test my new PHP site end-to-end before making it live, but was not able to do so easily. Usually you can simply change C:\WINDOWS\system32\drivers\etc\hosts file and map the desired domain name to the IP address of a test server. This approach, however, doesn't work for the shared hosting situation. The sharing hosting server hosts several different web sites from one IP address, so it needs a hint (the proper domain name in the request) to properly handle the request. Just the IP address is not sufficient.

Testing can still be accomplished by using a proxy server. This is a bit more work than the hosts file, but ultimately works well. It took several iterations of export-import-test-fix before the site was all done and domain name got moved.

Unresolved

Several issues came up during the move that were missed. They are not that difficult in the retrospect and I will mention them to complete the picture.

WordPress has a concept of <!-- more --> tag. It's a hint used to truncate the post for the RSS feeds or any other places where the post is shown in the compact form. I had exact same customization in JSPWiki, but with two separate tags: <!-- more_begin --> and <!-- more_end -->. Given that WordPress press has only one marker I have yet to invent the best way to restore this behavior.

Another custom feature of in JSPWiki allowed me to mark posts for inclusion or exclusion from the RSS feed using <!-- allow rss --> tag. So far I did not see how to exclude the post from the RSS feed except by assigning it to some weird category or making it a draft. This also got discovered after my Bloglines subscription reported that 200 new posts were just made in my blog. Why don't we extend RSS to add a version attribute to a post...

The Final Word

I love Java. I now love PHP. PHP pisses me off on occasion, especially with "global" keyword and no exception handling, but it is productive for building simple web apps. If only someone told me how to store PHP user session data in memory, not in the database, not in MemCached - in memory, as an object reference. If only...

Comments (2)

  • Comment by Igor Katkov — January 4, 2008 @ 2:55 pm

    What was the point in remote debugging and testing on some hosting site? Would it not have been easier to run the whole solution locally, make sure it works and then deploy it?
    Use pc virtualization if you like.

  • Comment by Fred Borry — January 25, 2008 @ 11:10 am

    I have customized the Export2Wp.java class to export the content of JSPWiki into an XWiki compiant archive.
    I anyone is interested, fill free to contact me : fborry AT free DOT fr
    Fred.


Leave a comment


 
Dog Emotional 2010 Calendar Dog Emotional Mousepad Dog Fashionable 2010 Calendar Dog Fashionable Mousepad

Copyright © 2004-2010 by Pavel Simakov
any conclusions, recommendations, ideas, thoughts or the source code presented on this site are my own and do not reflect a official opinion of my current or past employers, partners or clients
SourceForge.net Logo