Format Definition XML
  • 26 Aug 2024
  • 7 minute read
  • Dark
    Light
  • PDF

Format Definition XML

  • Dark
    Light
  • PDF

Article summary

When creating or editing a source format, the XML setting on the Format Definition tab tells Slate how it should parse incoming data. Each setting is listed here, followed by examples for common source data formats.

Tip

We recommend finding one of our example format definitions for flat files or web services, copying the XML, and modifying as needed.

Format definition XML always begins with a <layout> node. This node can have several attributes:

  • The b attribute indicates the separator. For example, use b="," for a comma-delimited file.

  • The h attribute indicates whether the data contains a header row. Use h="1" if your data has a header row. If there's no header row, this attribute can be set to 0 or simply excluded.

  • The t attribute indicates the text qualifiers. For example, use t="&quot;" for files where each data element is contained in double quotes.

  • The type attribute is used when the incoming data is not a text file, and can be set to one of the following values:

    • type="convert" is used for Excel files.

    • type="xml" and type="json" are used for XML and JSON files, respectively. 

  • The width attribute is used in fixed-width files to define how many characters each row of the file has.

When the incoming data does not have a header row, <f> nodes contained within the <layout> node are required. Each <f> node represents a single field to be mapped in the source format, and can have the following attributes:

  • Required: The id attribute is what the ID of the field should be. It should be unique among the other fields in the Source Format. This is also what you will see as the "Source" field when you are mapping the values on the Remap page. Note: this ID should be no more than 64 characters in length.

  • Required: The s attribute defines where to find the field in the source data. The examples below demonstrate how to use this attribute for various format types. 

  • The w attribute is used in fixed-width files to define the character length of the field.

Defining Multiple Namespaces in an XML file: 

Multiple namespaces can be declared in the Format Definition for an XML file. This is useful when an XML file contains more than one namespace URI. To avoid conflicts between XML elements with the same name, a unique prefix attribute can be added to each namespace node, which is then associated with the specific namespace URI.  

The following format definition examples are for defining multiple namespaces in an XML file:

<layout type="xml" node="/ns1:UCRecords/UCRecord">
  <namespace prefix="ns1" uri="http://www.ucop.edu/schema/UCRecord" />
  <namespace prefix="AdmApp" uri="http://www.pesc.org/message/AdmissionsApplication" />
</layout>
<layout type="xml" node="/AcRecBat:AcademicRecordBatch/HSTrn:HighSchoolTranscript">
  <namespace prefix="HSTrn" uri="urn:org:pesc:message:HighSchoolTranscript:v1.3.0" />
  <namespace prefix="AcRecBat" uri="urn:org:pesc:message:AcademicRecordBatch:v1.0.0" />
</layout>

Flat File Examples

The following format definition examples are for flat files, like CSV, Excel, and fixed-width files.The following format definition examples are for flat files, like CSV, Excel, and fixed-width files.

CSV file with a header row and double quotes around each element:

<layout b="," h="1" t="&quot;" />

This format definition works if the source data looks like this:

"Joe","Brown","1745 E. New St., Apt. 22","Chicago","Illinois","46702"

Double-quotes are wrapped around each field, while a comma separates each field. By including the quotes, Slate interprets "1745 E. New St., Apt. 22" as one field rather than two.  

Tab-separated file with a header row and no text qualifiers:

<layout b="&#x9;"h="1"/>

Pipe-separated file with a header row and no text qualifiers:

<layout b="|" h="1" />

Excel file with a header row:

<layout type="convert" h="1" />

CSV file without a header row:

<layout b="," h="0" t="&quot;">
  <f s="1" id="First Field Name" />
  <f s="2" id="Second Field Name" /> 
</layout>

The h="0" indicates that the file does not contain a header row. Since files without a header row need to have the fields defined, <f> nodes are used with the following attributes:

  • The s attribute defines which column of data corresponds to each field. For example, s="1" is the first field in the file, up until the first delimiter (in this case, a comma).

  • The id attribute is the source field name that's shown in the Field Mappings stage of Upload Dataset. It should be a user-friendly name that's unique and no longer than 64 characters.

A fixed-width file with no delimiters, where the beginning and end of each field is defined based on the specific location in the row:

<layout width="43">
    <f s="1" w="25" id="Student Last Name" />
    <f s="26" w="16" id="Student First Name" />
    <f s="42" w="1" id="Student Middle Initial" />
</layout>

The <layout> node's width attribute defines how many characters each row of the file has. In the above example, each row would be 43 characters long.

Fixed-width files require <f> nodes so that Slate can determine where each field begins and ends, using the following attributes:

  • The s attribute defines which character the field starts at. For example, s="1" means that this field starts with the first character in the file.

  • The w attribute defines the width of the field. For example, w="25" means that the field is 25 characters long.

  • The id attribute is the source field name that's shown in the Field Mappings stage of Upload Dataset. It should be a user-friendly name that's unique and no longer than 64 characters.

Web Services Examples

The following examples are for XML and JSON formats, which are commonly imported via web services.

XML:

<layout type="xml" node="/persons/person">
  <f s="first" id="First Name" />
  <f s="last" id="Last Name" />
  <f s="email" id="Email" />
  <f s="birthdate" id="Birthdate" />
  <f s="schools/school[1]/ceeb" id="School 1 Code"/>
  <f s="schools/school[2]/ceeb" id="School 2 Code" />
</layout>

This format definition works if your incoming data is structured like this:

<persons>
  <person>
    <first>Joe</first>
    <last>Brown</last>
    <email>[email protected]</email>
    <birthdate>2000-01-01</birthdate>
    <schools>
      <school>
        <ceeb>380880</ceeb>
      </school>
      ...
    </schools>
  </person>
  ...
</persons>

In the <layout> node, the type attribute tells Slate which file type to expect - in this case, XML. The node attribute tells Slate the path to each "row" of the file; in this case, there is a <person> node for each row contained within an overall <persons> node.

Each <f> node in the format definition represents a field you will map.

  • The s attribute uses XPath syntax to define where the data exists in the source. For example, s="email" means Slate will use the <email> node in the source data.

  • The id attribute is the source field name that's shown in the Field Mappings stage of Upload Dataset. It should be a user-friendly name that's unique and no longer than 64 characters.

The last two <f> nodes demonstrate how to handle multirelational data - in this case, Schools:

  • The School 1 Code has a path of schools/school[1]/ceeb. Unlike the previous paths, this path contains square brackets with a number, [1], which indicates that we should pull the <ceeb> node from the first <school> node that we encounter under the <schools> node.

  • We get to the second <school> code by incrementing the number contained in the square brackets. You will need to anticipate how many nodes might appear and map them all separately, incrementing the bracketed number each time.

JSON:

<layout type="json" node="/students">
  <f s="first" id="First Name" />
  <f s="last" id="Last Name" />
  <f s="email" id="Email" />
  <f s="birthdate" id="Birthdate" />
  <f s="schools[1]/ceeb" id="School 1 Code"/>
  <f s="schools[2]/ceeb" id="School 2 Code" />
</layout>

This format definition works if your incoming data is structured like this:

{
  students: [
    {
      first: "Joe",
      last: "Brown",
      email: "[email protected]",
      birthdate: "2000-01-01",
      schools: [
        {
          ceeb: "380880"
        },
        ...
      ]
    },
    ...
  ]
}

🔔 Important

A named root node is required for JSON formats. This is because JSON is internally converted to XML during processing. JSON data that begins with an array (i.e. begins with a square bracket) cannot be imported.

In this format, the type attribute is set to JSON, and the node attribute tells Slate the name of the root node. If your data begins on a lower level of nesting, you can use slashes to indicate the path; for example: students/student.

Each <f> node in the format definition represents a field you will map.

  • The s attribute uses XPath syntax to define where the data exists once the JSON has been converted to XML. For example, s="email" means Slate will use the email property in the source data.

  • The id attribute is the source field name that's shown in the Field Mappings stage of Upload Dataset. It should be a user-friendly name that's unique and no longer than 64 characters.

The last two <f> nodes demonstrate how to handle JSON arrays:

  • The School 1 Code has a path of schools[1]/ceeb. Unlike the previous paths, this path contains square brackets with a number, [1], which indicates that we should pull the ceeb property from the first object in the schools array.

  • We get to the second CEEB code by incrementing the number contained in the square brackets. You will need to anticipate how many items might be in the array and map them all separately, incrementing the bracketed number each time.


Was this article helpful?