Pavel (Pasha) Simakov - L10N and I18N of Complex Learning Objects

by Pavel Simakov 2014-07-11

The Context

About 10 years ago I started a website for online adaptive learning: www.itestyou.com. It comprises a complete suite of tools and systems to create, deliver and monitor how students interact with complex learning objects.

One of the key design goals was to support efficient localization (L10N) and internationalization (I18N) of learning objects. In this article I explain how this is accomplished. I further hope that authors of modern online learning systems, struggling with L10N and I18N, can use these notes to significantly improve their products.

The Requirements

The requirements are simple: any learning object or piece of content must be available in multiple languages. The underlying asset management system must enable author to create a learning object, translator to translate it and student to view it whatever language he so chooses. Easy enough!

Look at the math problems below in English and Russian:

Both pictures show the exact same problem, but in different languages. So how hard can this be?

Behind The Scenes

Now look closely. Notice that the actual numbers of hours and dollar amounts are slightly different in the pictures. This is exactly where "complex learning objects" come in. These problems are no just the pieces of text. They are in fact mini computer programs.

Different system can use different programming or markup languages to represent this math problem. Here is how ITestYou does it:


<?xml version="1.0" encoding="UTF-8"?>

<unit

    xmlns="http://www.testvisor.com/schemas/1.0/unit"

    bindings="{a,b,c}" defaults="{32,4,7}"

>

	<question>

           Joan needs $${b*c+a} for a class trip. She has $${a}.

           She can earn $${b} an hour mowing lawns.

           If the equation shows this relationship,

           how many hours must Joan work to have the money she needs?

           <br/>

           <latex>${b}h+${a}=${b*c+a}</latex>

        </question>  

	<choices>      

           <answer>  

	      ${c} hours

           </answer>

	   <decoy>

	      ${c+10} hours

	   </decoy>

	   <decoy>

	      ${b*c} hours

	   </decoy>

	   <decoy>

	      ${b*c-b} hours

	   </decoy>

	</choices>

</unit>

The problem is encoded in XML. The question tag defines the text of the problem. Choices tag holds the correct answers and the decoys. Anywhere HTML and Latex are allowed as well as expression enclosed in $${ ...}. These expressions are computed, before the problem is shown to the user. This is what ITestYou does. Other system do it similarly; here is one example from Khan exercises framework.

The Problem

The key challenge in translating the above problem is that translator must be told what he shall or shall not translate. For the specific case above his instructions are: do not touch the any of the XML tags or the text between $${...}, or Latex and translate the rest.

If translator can follow these requirements 100% of the time and we trust him not to modify the XML, all is fine. It will take him somewhat longer time to translate this XML, as compared to the plain text document, but he will get tho job done.

But people do make mistakes. One extra < or > and the whole thing is ruined. The error translator will see will sound like "Error parsing XML, line X pos Y.", which will be a mystery to him as he is a translator, not a software engineer.

Even worst when translation are crowdsourced and you do not know if translator has malicious intent. What if he insert JavaScript into the question text ares? That area will go out to HTML and will be presented to the student. The script can do all kinds of bad things now.

So the key problem here is how to control what translator shall or shall not translate do automatically What kind of user interface is required so translation process is simple and safe?

The Solution

Look at the translation console we use in ITestYou:

This looks nothing like XML editor! You see the original English version on the left. All of it is read-only. Translator works on the right, where he has one text area for question text and four text areas for the answer choices. Notice how formulas and HTML tags are hidden as <#1/> or <$1/>, while the rest of the text to be translated clearly visible. Efficient and safe!

So the solution is: give translator a simplified editor and hide the rest of the markup. Easy enough!

The Solution

If we only had multiple choice questions with four choices to worry about life would be very easy.Typically learning systems have dozens and dozens of different kinds of learning objects. How does on build simplified editors for each kind of object? By hand? Nop...

In ITestYou we do it automatically using abstract syntax tree (AST) transformations. Here is what we do:

Parse XML definition of the problem into AST. Our parser spans languages: it parses XML, HTML, latex and inlined expressions into uniform AST.
Walk the AST and number all the nodes with unique ID's of your choice.
Make a clone of AST tree.
Walk the clone tree and replace each node type with one of the two: translatable or non-translatable. Preserve ID's.
Convert clone tree into HTML page where each translatable element is converted into text area and each non-translatable element shown as non-editable icon or text <#1/> or <$1/>.
Let translator provide translations and click Submit button.
You now have a clone AST tree with translations. Save it in the database.
To create a translated problem, walk the clone tree containing the translations and replace translatable node content in the original AST using node ID's.
Render this AST tree to HTML so student can interact with it.

The approach is quite complex and maybe difficult or impossible to implement in your specific system, but the benefits are enormous and generically applicable to all object types.

Final Word

Formal modeling and high levels of abstraction remain my top design goal in any project. This example shows the enormous benefits of formal modeling as it provides dramatic cost reduction in production and maintenance of L10N and I18N of complex learning objects.