2010/04/11

Parsing XML in J2ME


The convergence of J2ME and XML is currently a handful of open source parsers. In this article you'll learn how to parse XML in a MIDP client application. I'll begin by talking about system architecture and the motivation for using XML as a data transport. Then I'll describe the available XML parsers, discuss the challenges of developing in a small environment, and present some sample code.

Multi-tier System Architecture

To understand why you might want to parse XML on a J2ME device, let's first examine the architecture of a typical multi-tier application. Multi-tier is one of those ubiquitous terms that means something different to just about everyone. I'll nail it down with a fairly specific architecture, as shown in the following figure:

Three tier architecture
Typical multi-tier architecture

The current world is web-centric, so systems are often designed with HTML browsers as the clients. The client performs very little of the application processing and functions as a fancy kind of terminal. The bulk of the application runs on server which uses a database for persistent storage.

As the wireless world began expanding, server vendors found that they could conveniently support wireless devices by adding support for WAP browsers. The underlying paradigm of the browser as front end to the application remains unchanged; the server is just serving WML over WAP in addition to HTML over HTTP.

The diagram also shows a standalone client, which could communicate with the application on the server in several different ways. The client could make HTTP connections, use RMI to manipulate remote objects, or implement a customized protocol. The chief advantage of having a standalone client in place of a browser is the chance to provide a richer user interface. The main disadvantage is the difficulty of client installation and maintenance.

Where do MIDP clients fit in this picture? Keep in mind that with MIDP devices, everything is small, and this affects their utility as application clients.

  • Network connection setup is slow.
  • Data rates are slow.
  • The processor is slow.
  • Memory is scarce.

Because of these constraints, MIDP client applications should be designed to be as small as possible. At the same time, they can feature a smooth and capable user interface that goes far beyond the user experience offered by a WAP browser.

MIDP client have one other important characteristic as compared to WAP applications; the application can run offline and make updates to the server periodically. This is especially important with wireless networks, which are slower and less reliable than the desktop network you may be accustomed to. WAP applications, by contrast, require a network connection as the user moves from screen to screen.

The following figure shows one possible implementation of a multi-tier system that supports HTML browsers, WAP browsers, standalone clients, and MIDP clients.

Three tier architecture with XML
A multi-tier architecture with XML

The figure shows one way of supporting multiple client types. Instead of creating custom server-side code for each client type, you write generic code that returns data as XML documents. Then you create transformations (using XSLT) that change the basic XML documents into whatever is required for the client device. Once set of transformations produces HTML for desktop browsers, while another set might produce the WML to support WAP browsers.

But what kind of data do you send to the MIDP client? You send whatever you want, of course, anything from plain text to binary data. If you're using XML on the server side, however, you may consider XML itself as a data exchange format. It makes your life pretty easy on the server side, for one thing. You might be able to send your basic XML documents unchanged, or you could create some simple transformations to send a more terse XML format to the MIDP device.

Sending XML from client to server offers XML's usual advantages: the data is self-describing and offers the opportunity to loosely couple the client and server.

Sending XML to the client has another advantage. During the development cycle, you can use validating XML parsers in emulated clients to ensure that the documents the server generates are clean. By the time you try running the application on a real MIDP device, you'll be pretty sure that the data it's getting is good.

The downside of XML is that it's not a very efficient way of expressing data. On slow wireless networks, every byte counts. If you are considering XML as a data exchange format, do some testing with real devices first to familiarize yourself with the delays involved. On today's wireless networks, latency is usually more of an issue than data transfer rate, so you may notice the larger message size of XML versus a binary format.

For another discussion of the use of XML in enterprise applications, see the Designing Wireless Enterprise Applications Using JavaTM Technology white paper.

Parser Roundup

If there's a slogan for XML parsers in the MIDP world, it might be "Don't Supersize Me." Parsers are traditionally bulky, featuring lots of code and hefty runtime memory requirements. In MIDP devices, the memory available for code is usually small and individual applications may have a maximum code size. The Motorola iDEN phones, for example, have an upper limit of 50 kB on the size of a MIDlet suite JAR file. Aside from code size, the amount of memory available at runtime is also small. What you need is a parser that's designed to be small and light.

Open source parsers are attractive because they give you lots of control. You can customize a parser if you need additional features, and you can fix the parser if it has bugs.

There are three fundamental parser types. Which type you choose depends on how you want your application to behave and what types of documents you're expecting to parse.

  1. A model parser reads an entire document and creates a representation of the document in memory. Model parsers use significantly more memory than other types of parsers.
  2. A push parser reads through an entire document. As it encounters various parts of the document, it notifies a listener object. (This is how the popular SAX API operates.)
  3. A pull parser reads a little bit of a document at once. The application drives the parser through the document by repeatedly requesting the next piece.

The following table summarizes the current offering of small XML parsers that are appropriate for MIDP.

Name License Size MIDP Type
ASXMLP 020308 Modified BSD 6 kB yes push, model
kXML 2.0 alpha EPL 9 kB yes pull
kXML 1.2 EPL 16 kB yes pull
MinML 1.7 BSD 14 kB no push
NanoXML 1.6.4 zlib/libpng 10 kB patch model
TinyXML 0.7 GPL 12 kB no model
Xparse-J 1.1 GPL 6 kB yes model

The Name and License columns contain links to the corresponding web pages and licenses. The Size column indicates the size of the class files for the parser as contained in a JAR, which is an approximation of how much size the parser will add to your MIDlet suite JAR. The MIDP column indicates whether the parser will compile without modifications in a MIDP environment. Finally, the Type column indicates the type of the parser, as discussed above.

Two parsers that did not make the list are NanoXML 2.2 Lite and XMLtp 1.7. Although both of these parsers are small, they rely heavily on J2SE APIs and would require significant effort to port to MIDP. The three parsers in the table that do not compile in a MIDP environment can be modified to do so with moderate effort.

It's fairly simple to incorporate a parser into your MIDlet suite using the J2ME Wireless Toolkit. If the parser is distributed as source code .java files, you can place these files into the src directory of your J2MEWTK project. If the parser is distributed as a .jar or .zip archive of .class files, you can place the archive in the lib directory of the J2MEWTK project. (For an introduction to the J2MEWTK and the project directory structure, see Wireless Development Tutorial Part I .)

The parsers shown in the table represent the current offerings in the MIDP 1.0 world. Standardization efforts are underway and the landscape is shifting rapidly. Keep your eye on both JSR 118, MIDP Next Generation and JSR 172, J2ME Web Services Specification.

Performance Considerations

In this section I'll describe some optimizations you can use to make your MIDlet code run well in a constrained environment. The techniques described here apply to any MIDP development, not just XML parsing. The reason I'm describing them here is because the use of an XML parser is likely to make your code significantly bigger and slower; you will probably want to optimize your application before delivering it to users.

The optimizations presented here fall into three categories:

  1. Runtime performance
  2. User perception
  3. Deployment code size

Achieving good runtime performance is related to your XML document design. On the one hand, it takes a long time to set up a network connection. This means you should make each document contain as much useful data as possible. You might even want to aggregate documents on the server side and send one larger document rather than several smaller ones. On the other hand, the data transfer rate is slow. If you make your documents too large, the user will be left waiting a long time for each document to be loaded. In the end, you will need to find a balance between avoiding connection setup times and minimizing download wait times. One thing is for sure: XML documents that are sent to a MIDlet should not contain extra information. You don't want to waste precious wireless bandwidth transferring data you will only throw away.

Another way you can improve your application is to improve the user experience. This is not really an optimization--you're not making anything run faster or leaner--but it makes the application look a lot better to a user. The basic technique is simple: parsing, like network activity, should go in its own thread. (For several strategies for network threading, see Networking, User Experience, and Threads .) You don't want to lock up the user interface while the MIDlet is parsing an XML document or reading the document from the network. Ideally, you can allow the user to perform other offline tasks at the same time that network activity and parsing is occurring. If that is not possible, you should at least try to show parsed data as soon as it is available. Note that you will need a push or pull parser to accomplish this; a model parser won't give you any data until the entire document is parsed.

Finally, you may be concerned about the size of your MIDlet suite JAR. There are two reasons this might be a problem. As I mentioned, there's not much space on MIDP devices, and carriers or manufacturers may impose limits on your code size. Second, users may download your application over the wireless network itself, which is slow. Making the MIDlet JAR small will minimize the pain of downloading and installing your software.

What's in the MIDlet suite JAR, and how can you reduce its size? The MIDlet suite JAR contains classfiles, images, icons, and whatever other resource files you may have included. Assuming you've removed all the resources you don't need, you are now ready to use something called an obfuscator to cut down on the classfiles.

Not all obfuscators are equal, but an obfuscator usually includes some of the following features:

  1. Removes unused classes
  2. Removes unused methods and variables
  3. Renames classes, packages, methods, and variables
  4. Adds illegal stuff to classfiles to confuse decompilers

Features 1, 2, and 3 are fine and will reduce the size of your MIDlet suite JAR, sometimes dramatically. If you have incorporated an XML parser in your MIDlet project, there may be parts of the parser that your application never uses. An obfuscator is good for pruning out the stuff you don't need.

Watch our for feature 4. Obfuscators were originally designed to make it hard for other people to decompile your classfiles. Some obfuscators do nasty things to the classfiles in order to confound decompilers. This may mess up either the class preverifier or the MIDP device's classloader, so I suggest avoiding this feature if possible.

Two freely available obfuscators are JAX and Retroguard. Consult the documentation for features and usage.

An Example: Parsing RSS

Enough talk--let's look at some code. The example presented here is a MIDlet that parses an RSS file.

RSS (Rich Site Summary) is a simple XML format that summarizes headlines and story descriptions for a news site. Many technology news web sites have RSS files (called feeds) available. Other web sites aggregate RSS feeds from various places to present you with a customized view of the news. Meerkat is one such aggregator. The interesting thing about Meerkat is that it can provide different flavors. You can see the news as HTML or in a variety of different formats, including RSS. Think of Meerkat as a big funnel. You pour RSS feeds in the top and out the bottom comes a single aggregated RSS feed. For more information on Meerkat and its features, see Meerkat: An Open Service API.

An example of the output from Meerkat is shown below.




"-//Netscape Communications//DTD RSS 0.91//EN"
"http://my.netscape.com/publish/formats/rss-0.91.dtd"
>

="0.91">



</strong></span>Meerkat: An Open Wire Service<span style="color:#5e5e5e;"><strong>
http://meerkat.oreillynet.com

Meerkat is a Web-based syndicated content reader
providing a simple interface to RSS stories. While
maintaining the original association of a story with
a channel, Meerkat's focus is on chronological
order -- the latest stories float to the top,
regardless of their source.

en-us


</strong></span>Meerkat Powered!<span style="color:#5e5e5e;"><strong>

http://meerkat.oreillynet.com/icons/meerkat-powered.jpg

http://meerkat.oreillynet.com
88
31
Visit Meerkat in full splendor at
meerkat.oreillynet.com...



</strong></span>MmO2 cuts jobs, to take GBP110m charge<span style="color:#5e5e5e;"><strong>
http://c.moreover.com/click/here.pl?r31561327

FTMarketWatch Feb 5 2002 5:05AM ET...



</strong></span>S.E.C. Says Motorola Cant Exclude Audit<br /> Proposal<span style="color:#5e5e5e;"><strong>
http://c.moreover.com/click/here.pl?r31562096

New York Times Feb 5 2002 5:17AM ET...



</strong></span>1,900 jobs to go at mmO2<span style="color:#5e5e5e;"><strong>
http://c.moreover.com/click/here.pl?r31562134

ZDNet Feb 5 2002 5:18AM ET..



</strong></span>Mobile firm cutting 1,900 jobs<span style="color:#5e5e5e;"><strong>
http://c.moreover.com/click/here.pl?r31558750

CNN Europe Feb 5 2002 4:30AM ET...



</strong></span>The axe falls at mmO2<span style="color:#5e5e5e;"><strong>
http://c.moreover.com/click/here.pl?r31558856

The Register Feb 5 2002 4:32AM ET...
j


</strong></span>mmO2 plans to axe 1,900 jobs<span style="color:#5e5e5e;"><strong>
http://c.moreover.com/click/here.pl?r31559617

Evening Standard Feb 5 2002 4:42AM ET...



</strong></span>UPDATE 2-Sohu Q4 revenues up 15 pct on wireless<br /> services<span style="color:#5e5e5e;"><strong>
http://c.moreover.com/click/here.pl?r31557811

CNET Feb 5 2002 4:08AM ET...






The root element of this document is rss, with a contained channel element. The information that interests us is in the item elements, which have title, link, and description sub-elements. The example MIDlet parses an RSS document and displays all the titles for the items it finds.

For this example, I chose to use the kXML 1.2 parser. Although it's not the smallest parser available, it has several compelling advantages:

  1. It is designed for MIDP; no porting is necessary.
  2. It is stable and relatively mature.
  3. It is a pull parser, which means our application can process and display information as it is parsed, as it is being downloaded from the server. A push parser would also provide this behavior, but not a model parser.

The kXML 1.2 parser is simple to use. Just create an instance of org.kxml.parser.XmlParser and use the skip() and read() methods to move through the document. One version of the read() method returns a ParseEvent, which contains information like the name of the element or the text content of an element.

In this example, parsing is entirely contained in its own class, RSSParser, shown below.

import java.io.*;

import javax.microedition.io.*;

import org.kxml.*;
import org.kxml.parser.*;

public class RSSParser {
protected RSSListener mRSSListener;

public void setRSSListener(RSSListener listener) {
mRSSListener = listener;
}

// Non-blocking.
public void parse(final String url) {
Thread t = new Thread() {
public void run() {
// set up the network connection
HttpConnection hc = null;

try {
hc = (HttpConnection)Connector.open(url);
parse(hc.openInputStream());
}
catch (IOException ioe) {
mRSSListener.exception(ioe);
}
finally {
try { if (hc != null) hc.close(); }
catch (IOException ignored) {}
}
}
};
t.start();
}

// Blocking.
public void parse(InputStream in) throws IOException {
Reader reader = new InputStreamReader(in);
XmlParser parser = new XmlParser(reader);
ParseEvent pe = null;

parser.skip();
parser.read(Xml.START_TAG, null, "rss");
parser.skip();
parser.read(Xml.START_TAG, null, "channel");

boolean trucking = true;
boolean first = true;
while (trucking) {
pe = parser.read();
if (pe.getType() == Xml.START_TAG) {
String name = pe.getName();
if (name.equals("item")) {
String title, link, description;
title = link = description = null;
while ((pe.getType() != Xml.END_TAG) ||
(pe.getName().equals(name) == false)) {
pe = parser.read();
if (pe.getType() == Xml.START_TAG &&
pe.getName().equals("title")) {
pe = parser.read();
title = pe.getText();
}
else if (pe.getType() == Xml.START_TAG &&
pe.getName().equals("link")) {
pe = parser.read();
link = pe.getText();
}
else if (pe.getType() == Xml.START_TAG &&
pe.getName().equals("description")) {
pe = parser.read();
description = pe.getText();
}
}
mRSSListener.itemParsed(title, link, description);
}
else {
while ((pe.getType() != Xml.END_TAG) ||
(pe.getName().equals(name) == false))
pe = parser.read();
}
}
if (pe.getType() == Xml.END_TAG &&
pe.getName().equals("rss"))
trucking = false;
}
}
}

RSSParser has two parse() methods. The first accepts a URL string as a parameter and sets up a separate thread for network access. It then calls the other parse() method; this method accepts an InputStream as a parameter and does the actual work.

RSSParser uses kXML 1.2 to work its way through an RSS document. As you can see, the structure of the code roughly mirrors the structure of the document, which is a hallmark of a pull parser. After finding the opening rss and channel tags, RSSParser works its way through the document. For every item tag it finds, it attempts to parse the contained title, link, and description tags. When it comes to the end of an item, it sends the information it has parsed to a listener object of type RSSListener. Every time an item is parsed, the listener's itemParsed() is called. If an exception occurs, the exception() method of the listener will be called. The RSSListener interface consists of just those two methods:

public interface RSSListener {

public void itemParsed(String title, String link,
String description);
public void exception(java.io.IOException ioe);
}

Since most of the hard work is done in RSSParser, writing a MIDlet that uses RSSParser is relatively easy. Let's look at a MIDlet, RSSMIDlet, that connects to a Meerkat feed and displays headlines. It also allows the user to view the full description for each story. The screen shots below show RSSMIDlet in action.

Title list Item detailScreen shots from RSSMIDlet

RSSMIDlet registers itself as the listener for events from RSSParser. Every time a new item is received, RSSMIDlet adds it to a List that is displayed for the user. Controls are also provided for seeing the full description for a particular story. RSSMIDlet connects to a URL that is specified in a system property. The default value for this URL is:

http://www.oreillynet.com/meerkat/?_fl=rss&p=9

This URL simply requests Meerkat to return items from profile 9 (Wireless) and to return the flavor RSS.

The full source code for RSSMIDlet is shown below. Mostly it deals with the standard details of creating a user interface. Take particular note of startApp(), where the parser object is created and started, and itemParsed(), where items parsed by the RSSParser are delivered to the MIDlet.

import java.util.Vector;


import javax.microedition.lcdui.*;
import javax.microedition.midlet.*;

public class RSSMIDlet
extends MIDlet
implements CommandListener, RSSListener {
private Display mDisplay;
private List mTitleList;
private Command mExitCommand, mDetailsCommand;

private boolean mInitialized;
private Vector mTitles, mDescriptions;

public RSSMIDlet() {
mInitialized = false;
mTitles = new Vector();
mDescriptions = new Vector();
}

public void startApp() {
if (mDisplay == null)
mDisplay = Display.getDisplay(this);

if (mInitialized == false) {
// Put up the waiting screen.
Screen waitScreen = new Form("Connecting...");
mDisplay.setCurrent(waitScreen);
// Create the title list.
mTitleList = new List("Headlines", List.IMPLICIT);
mExitCommand = new Command("Exit", Command.EXIT, 0);
mDetailsCommand = new Command("Details", Command.SCREEN, 0);
mTitleList.addCommand(mExitCommand);
mTitleList.addCommand(mDetailsCommand);
mTitleList.setCommandListener(this);
// Start parsing.
String url = getAppProperty("RSSMIDlet.URL");
RSSParser parser = new RSSParser();
parser.setRSSListener(this);
parser.parse(url);
mInitialized = true;
}
else
mDisplay.setCurrent(mTitleList);
}

public void pauseApp() {}

public void destroyApp(boolean unconditional) {}

public void commandAction(Command c, Displayable s) {
if (c == mExitCommand)
notifyDestroyed();
else if (c == mDetailsCommand ||
c == List.SELECT_COMMAND) {
int selection = mTitleList.getSelectedIndex();
if (selection == -1) return;
String title = (String)mTitles.elementAt(selection);
String description =
(String)mDescriptions.elementAt(selection);
Alert a = new Alert(title, description, null, null);
a.setTimeout(Alert.FOREVER);
mDisplay.setCurrent(a, mTitleList);
}
}

public void itemParsed(String title, String link,
String description) {
mTitles.addElement(title);
mDescriptions.addElement(description);

mDisplay.setCurrent(mTitleList);
mTitleList.append(title, null);
}

public void exception(java.io.IOException ioe) {
Alert a = new Alert("Exception", ioe.toString(),
null, null);
a.setTimeout(Alert.FOREVER);
mDisplay.setCurrent(a, mTitleList);
}
}

Summary

XML is a viable choice for data transport to J2ME devices. XML is a good choice for some applications and not for others. If you need the decoupling that XML provides, or if you want to connect to an existing service using XML, or if the data exchange between server and client must be highly structured, then XML is an excellent choice, assuming you can pay the price of including an XML parser in your MIDlet. Small parsers are readily available and performance is acceptable, assuming you are careful about the design of your documents. If necessary, you can reduce the size of your MIDlet suite JAR by using an obfuscator.


source : sun.com