Generating EMF Models from XML Schema Definitions (XSDs)

In this blog post I will show how to generate models for the Eclipse Modeling Framework (EMF) out of an XML schema definition. EMF is a powerful framework which allows you to create Java classes corresponding to the XML schema types, code to load XML documents to Java models and code to serialize Java models back to XML again. As an example, I will use MusicXML schema definitions.

Installing Required Features

To work with EMF and XSD schemas, you need to install the following features in your Eclipse development environment:

  • EMF – Eclipse Modeling Framework SDK
  • XSD – XML Schema Definition SDK

You can check for already installed in the About dialog of your Eclipse installation. In case the features are not installed yet, go to Help -> Install New Software… and choose the update site corresponding to your Eclipse version. For example, for Eclipse 2019-09 the update site is http://download.eclipse.org/releases/2019-09. Search for the two features and install them.

Creating an EMF Project and Importing the Model

If you don’t have a project already, create one using New -> Other… -> Eclipse Modeling Framework -> EMF Project. Specify a project name and click Next. Several model importers should be proposed.

Select XML schema in the list (it should have been installed with the XML Schema Definition SDK) and Click Next. The following page appears:

Click Browse File System… and select the XSD you would like to import. I recommend not to select the option Create XML Schema to Ecore Map. The Generator model file should have an appropriate name already, otherwise you can change it here. I changed the capitalization slightly. Click Next.

On the next page you can specify the file name of the generated ECore model file. It should align with the generator model file name you just specified. Click Finish.

You should end up with a new project containing the folder model. It in turn contains the imported data model in an ecore model file (MusicXML.ecore). It also contains a generator model file named MusicXML.genmodel. If you want to make any adjustments to the data model (classes and attributes/references), this can be done in the ECore model. However, since this an imported model this should not necessary in our case. Below is a screenshot of the imported model:

Adjusting the ECore Model

The only adjustments we need to do for now in the ECore model are:

  1. Right-click the Musicxml package below the root element and choose Show Properties View
  2. Change the Name and the NS Prefix to musicxml (note the lower case m). This is important because this will become part of the java package we will generate.
  3. Set the NS URI to http://www.musicxml.org/xsd/MusicXML

Adjusting the Generator Model

The generator model gives us control about how and where the Java classes for our model will be generated. Select the package below the root element and open the Properties view. Adjust the following settings:

  • Base package: enter the common Java package name prefix which should be put in front of all classes/interfaces/enums to be generated, e.g. org.myapp. Note that the ECore package name will be appended to this prefix automatically. For example, if you use the base package org.myapp and your ECore Package name is musicxml, the code will be generated in the Java package org.myapp.musicxml.
  • Prefix: this is the class name prefix used for EMF-specific classes such as factories and utility classes. I propose to change this to a CamelCase identifier you would put at the beginning of a Java class name. Example: the prefix MusicXML will generate class names such as MusicXMLFactory, MusicXMLPackage, MusicXMLResourceImpl.

Generating the Java Code

Now it’s time to generate the java classes. In order to do that, right-click on the MusicXML package and choose Generate Model Code.

After the operation finishes, you will see lots of interfaces/classes/enums generated in the src folder of your project:

If you want the source code to be generated in another source folder such as src/main/java, this can be adjusted when editing the properties of the root object of the generator model.

Loading an XML File

Now that we have our Java code, we can use the EMF infrastructure to load an XML file to a java model. Loading a file in EMF typically involves:

  1. Creating a ResourceSet
  2. Registering appropriate resource factories in the resource set
  3. Loading a resource by specifying an URI

The following code assumes that a file is loaded from disk, but you could also specify internet URIs instead of the file URI:

1
2
3
4
5
6
7
8
9
10
11
12
13
public static Resource loadMusicXMLFile(File musicXMLFile)
{
    ResourceSetImpl resourceSet = new ResourceSetImpl();
    resourceSet.getResourceFactoryRegistry().getExtensionToFactoryMap().put("xml", new MusicXMLResourceFactoryImpl());
    resourceSet.getResourceFactoryRegistry().getExtensionToFactoryMap().put("musicxml", new MusicXMLResourceFactoryImpl());
 
    // disable DTD resolution since it fails for MusicXML files
    Map<String, Boolean> parserFeatures = new HashMap<>();
    parserFeatures.put("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
    resourceSet.getLoadOptions().put(XMLResource.OPTION_PARSER_FEATURES, parserFeatures);
 
    return resourceSet.getResource(URI.createFileURI(musicXMLFile.getAbsolutePath()), true);
}

For most scenarios, the first three lines and the last line would be sufficient to load an XML file to a Java model. For MusicXML, I had to tweak the XML parser configuration a bit, because it tried to load MusicXML DTDs from a server which failed. Since we don’t need DTD validation anyway, I disabled the parser feature to load external DTDs. The map with the parser feature settings in turn has to be put as value for the load option key XMLResource.OPTION_PARSER_FEATURES, and EMF will take care of forwarding the parameters to the XML parser. Call getContents() of the returned resource to access the java model representation of the loaded XML file. Here is an example how score parts are accessed in MusicXML files:

1
2
3
4
5
6
EObject eObject = resource.getContents().get(0);
if (eObject instanceof ScorePartwiseType)
{
    ScorePartwiseType scorePartwise = (ScorePartwiseType)eObject;
    processParts(scorePartwise.getPart());
}

If you want to save a MusicXML java model to an XML file, basically use the same code as above, but save your model into the contents of a resource and then call resource.save().

That’s it 🙂 I hope this blog post illustrated how easy and powerful XML to Java object mapping can be when using EMF. Of course this can be done with any correctly structured XSD, not just with MusicXML.

7 thoughts on “Generating EMF Models from XML Schema Definitions (XSDs)

  1. There is one drawback with EMF / XSD compatibility though: EMF does not support recursive group references. If you are starting with such a schema that you cannot change, you are screwed. I had to escape to using JAXB in one of my EMF projects.

  2. Thanks for the hint! Since Java 9, I had problems with JAXB because classes were removed from JREs and moved to another package. Especially in OSGi environments this becomes a serious problem. It took me a while to implement something that runs both on Java <= 8 and Java 9. But of course if EMF does not support recursion JAXB is the way to go.

    • Iterating the resource contents is typically accomplished by accessing the first content member with resource.getContents().get(0). Then you should check if this EObject is an instance of your model root object. In the case of MusicXML, this is checked with eObject instanceof ScorePartwiseType or eObject instanceof DocumentRoot. Afterwards, you can cast to the respective type and then access the model directly via its Java interface.

Leave a Reply

Your email address will not be published. Required fields are marked *