BizInt Logo
BizInt Home Page
support downloads contact

BizInt Smart Charts Preferred HTML Format - Table Version

Initial version 12 January 2001

Purpose

The purpose of this format specification is to allow you to provide records in a format that will simplify integration with BizInt Smart Charts. The format is intended to be human-readable and uses a simple subset of HTML that should be rendered properly using any HTML browser.

This format uses HTML tables to present data, and is offered as an alternative to the record-based format documented previously. This format can only support records from a single database. Result sets containing records from more than one source in a file should use the record-based format.

Note: This specification describes a preferred format for data that is to be imported into BizInt Smart Charts products. This format will simplify the support of new databases (or new platforms for existing databases) in BizInt Smart Charts. Data in this format will not automatically be recognized by BizInt Smart Charts. BizInt Solutions will need to provide a definition file to clients wishing to chart data in this format.

Note: This format will be supported by version 3.0 of the BizInt Smart Charts products. Some features are supported in earlier releases. Please contact us for additional information.

Contact

If you have questions, would like clarification, or would like to request enhancements to this format, please contact John Willmore at BizInt Solutions (jaws@bizcharts.com).

Format in a Nutshell

The preferred data format is HTML with a very simple format.

File Structure

The file should consist of valid HTML, although the HTML, HEAD, and BODY tags are not essential. Tags may be written in upper-case or lower-case. End tags for elements are optional unless explicitly mentioned in the description below.

A valid file will have the following structure:

  ...

  <!--BIZINT IDENT FORMAT=HTMLTable 
      SYSTEM="[system name]" DATABASE="[database name]"-->

  ...

  <table>
    <TR>
      <TH>[Column Label]</TH>
      ... repeated for each database field that is exported ...
    </TR>
    <TR>
      <TD>[Field Value]</TD>
      ... repeated for each database field that is exported ...
    </TR>
  </table>

Additional text may be present in the file beyond the required headers and the records. For example, a TITLE element is usually present in the header. The body often contains text with identifying information (such as the name of the database), and copyright notices (either one for the entire file or one per record). Any additional html increases the amount of data to be transferred and slows down the charting process, so markup and text should not be added indiscriminately.

Additional text should appear outside the extent of the TABLE. If another table appears in the file before the TABLE containing data, you must add the word FIND_START to the BIZINT IDENT comment at the start of the file, and then place a BIZINT START_DATA comment immediately before the TABLE containing the data, as shown here:

  ...

  <!--BIZINT IDENT FORMAT=HTMLTable 
      SYSTEM="[system name] DATABASE="[database name]"
      FIND_START-->

  ... other html including a table ...

  <!--BIZINT START_DATA-->
  <table>
  ... the table containing the data ...

Line feeds and other whitespace may be inserted to aid human readability of the HTML source.

Header Information

Rather than requiring BizInt Smart Charts to infer the identity of the records in a file, header information is written to identify the database. The header information is written in a special comment which begins with the keyword BIZINT.

The database identification appears as follows:

  <!--BIZINT IDENT FORMAT=HTMLTable
      SYSTEM="[system name]" DATABASE="[database name]" REV=[revision date]-->

for example

  <!--BIZINT IDENT FORMAT=HTMLTable
      SYSTEM="IMS Lifecycle Web Version" DATABASE="R&D Focus" REV=2001-1-12-->

indicates that this file contains records from the "IMS Lifecycle Web Version" of "R&D Focus", in a format last changed January 12, 2001.

The [system name] is the name of the server or product that is generating the output. This information is used together with the database name to uniquely identify the source of the data. We recommend that you put at least your company name and database system name in this field.

[database name] is the name of the database (such as "TDR IPD" or "R&D Focus"). The database name should be unique on the [system name]. BizInt Solutions will need to know the database name and system name that you have chosen in order to add support to the BizInt Smart Charts product.

The [revision date] is assigned by the publisher to indicate changes to the structure of the database. This is a date in CCYY-MM-DD format. Any time the database structure is changed in a way that affects the HTML file, the revision number should be changed to a new, higher number. The software only uses this date to identify variations in the data format - the number itself is not interpreted. The revision date is optional, but if structural changes are made to the database format (e.g. changing a List value to a Table value), a revision number should be added in the subsequent releases.

As described above, the FIND_START keyword can be added to the IDENT comment. If this keyword is found, BizInt Smart Charts will not interpret the contents of the file until a BIZINT START_DATA comment is found.

TABLE Structure

In this format, all records are presented in a single HTML TABLE.

COLSPAN and ROWSPAN are not supported and should not be used to fill empty cells.

Each column corresponds to a field in the database. The order of the columns does not matter. We suggest using the order that the fields might appear in a record, but any order is permitted.

The first row of the table should contain field labels for each column. Field labels appear within TH elements.

  ...
  <TR>
    <TH>Field Label 1</TH>
    <TH>Field Label 2</TH>
    ...
  </TR>
  ...

Each field label should be unique within the database. Field labels are not case-sensitive, so "Descriptors" and "DESCRIPTORS" would be considered the same field name.

Rows after the first row contain record data. Field values appear within TD elements.

  ...
  <TR>
    <TD>Field Value 1</TD>
    <TD>Field Value 2</TD>
    ...
  </TR>
  ...

Fields must appear in the same order as the corresponding labels.

Empty cells at the end of a row may be skipped. Empty cells within a row must be written as an empty cell. The COLSPAN option to TD must not be used.

  ...
  <TR>
    ...   Two ways to write empty cells:
    <TD>&nbsp;</TD>
    <TD></TD>
    ...
  </TR>
  ...

The end tags for TR, TH, and TD are optional, but recommended.

No other markup or text should appear within the TABLE structure. Database identification, navigation aids (such as links to the top of page), and copyright notices, should all appear before the <TABLE> or after the </TABLE>.

Field Values

The value of a field is presented in a TD element. All punctuation within a TD element will be considered part of the value.

Scalar values: Scalar values (a single value for a field) are presented as plain text. A scalar value may have structure (such as a code and a description), or several related values (such as an originating company, a nationality, and a parent company). The scalar value is the same as a list value with one entry. However, if a table value has only one data row, the data should still be presented as a table.

List values: If a field has a set of values, the items should be separated by a BR (break) element. For example, if a drug has three therapeutic activity codes, the corresponding field in the record may look something like this:

  <TD>Growth  Hormones  (H4C)<BR>
    Systemic Anabolic Hormones (A14A)<BR>
    All Other Urological Products (G4B9)
  </TD>

Recall that the whitespace in the file, including the line feeds after each BR element, are only there to help the human read the HTML file.

Table values: When a field contains many sets of values, and would normally be presented in a table, an HTML TABLE should be used. Each row should be contained within a TR element. Each column heading cell should appear in a TH (table heading) element. Each data cell should appear in a TD (table data) element. End tags are optional for TR, TH and TD elements.

Empty cells at the end of a row may be skipped. Empty cells within a row should be included in the file, although no text is required in the cell. The COLSPAN argument to TH and TD should only be used if the data in a cell truly spans columns (a VERY rare occurance). Do not use COLSPAN to fill empty cells.

Text values: Paragraphs of text, including headings and tables, can appear as the value of a field. Within a text value, BR or P (paragraph) tags should be used to separate paragraphs. Word wrap within a paragraph should not be forced with a BR.

Section headings, such as the "Field Values" heading of this section, are presented by making the entire paragraph text bold, as in:

  ... previous text ...
  <P><B>Field Values</B>
  <P>...

Run-in headings (such as the Text values: heading of this sub-section) may be included with either bold, italic, or bold-italic type. An example of an italic heading is given below:

  ... previous text ...
  <P><I>Text values:</I> Paragraphs of ...

Tables within text do not need to be separated by a P or BR, but these tags are allowed.

Image values: If a field contains an image, it should not contain any text. The image should be embedded using the IMG tag. The preferred format for the source image is JPEG, but Windows Bitmap, PNG, and TIFF are supported. GIF format is not supported. If the file is being created by a Windows application (such as a CD-ROM), the image SRC path may be relative. If the export file is delivered over the web, a fully qualified path should be given so that BizInt Smart Charts can retrieve the image.

  <TD><img src="img00001.jpg"></TD>

ASCII-formatted text values: Finally, some values stored in existing systems are stored in ascii format where line feeds and white space are significant (such as a table of values as presented by on-line hosts such as DataStar). The entire value should be stored within a PRE element (between <PRE> and </PRE> tags).

ASCCI-formatted values should only be used as a presentation of last resort. The appropriate structured format should be used when possible.

Data Format

Any special characters should be presented with ISO named entities instead of with integer codes or font changes. For example, the ampersand character (&) should be presented as &amp;, not the equivalent &#038;.

Font and type face changes are generally ignored, with the exception of the specific uses of bold and italic type face identified in the Text Value description above.

Dates that are to be interpreted by the software should appear in CCYY-MM-DD format. In general, the software only interprets the modification date in the START comment for each record. However, the processing for some databases may include processing of other dates (such as priority dates for a patent).

© 2002 BizInt Solutions