BizInt Logo
BizInt Home Page
support downloads contact

BizInt Smart Charts Preferred HTML Format - Record Version

Revised 12 January 2001 to reflect new BIZINT comment structure.

Purpose

The purpose of this format specification is to allow you to provide records in a format that will simplify integration with BizInt Smart Charts. The format is intended to be human-readable and uses a simple subset of HTML that should be rendered properly using any HTML browser.

If your system is only exporting records from a single database at a time, the simpler table-based format is another option for exporting.

Note: This specification describes a preferred format for data that is to be imported into BizInt Smart Charts products. This format will simplify the support of new databases (or new platforms for existing databases) in BizInt Smart Charts. Data in this format will not automatically be recognized by BizInt Smart Charts. BizInt Solutions will need to provide a definition file to clients wishing to chart data in this format.

Note: This format will be supported by version 3.0 of the BizInt Smart Charts products. Some features are supported in earlier releases. Please contact us for additional information.

Contact

If you have questions, would like clarification, or would like to request enhancements to this format, please contact John Willmore at BizInt Solutions (jaws@bizcharts.com).

Format in a Nutshell

The preferred data format is HTML with a very simple format.

File Structure

The file should consist of valid HTML. Tags may be written in upper-case or lower-case. End tags for elements are optional unless explicitly mentioned in the description below.

A valid file will have the following structure:

  <html>
    <head>
      [required header information]
    </head>
    <body>
      [record]
      <hr>
      [record]
      ...
    </body>
  </html>

Additional text may be present in the file beyond the required headers and the records. For example, a TITLE element is usually present in the header. The body often contains text with identifying information (such as the name of the database), sequence information (such as "Record 1 of 32"), and copyright notices (either one for the entire file or one per record).

Any additional html increases the amount of data to be transferred and slows down the charting process, so markup and text should not be added indiscriminately.

Line feeds and other whitespace may be inserted to aid human readability of the HTML source.

Header Information

Changed 2001-01-12. Rather than requiring BizInt Smart Charts to infer the identity of databases in a file, header information is written to identify the databases. The header information is written in a special comment which begins with the keyword BIZINT.

The database identification appears as follows:

  <!--BIZINT IDENT FORMAT=HTMLRecord OPT=DL
      SYSTEM="[system name]" DATABASE="[database name]" REV=[revision date]-->

for example

  <!--BIZINT IDENT FORMAT=HTMLRecord OPT=DL
      SYSTEM="IMS Lifecycle Web Version" DATABASE="R&D Focus" REV=2001-1-12-->

indicates that this file contains records from the "IMS Lifecycle Web Version" of "R&D Focus", in a format last changed January 12, 2001.

The [system name] is the name of the server or product that is generating the output. This information is used together with the database name to uniquely identify the source of the data. We recommend that you put at least your company name and database system name in this field.

[database name] is the name of the database (such as "TDR IPD" or "R&D Focus"). The database name should be unique on the [system name]. BizInt Solutions will need to know the database name and system name that you have chosen in order to add support to the BizInt Smart Charts product.

The [revision date] is assigned by the publisher to indicate changes to the structure of the database. This is a date in CCYY-MM-DD format. Any time the database structure is changed in a way that affects the HTML file, the revision number should be changed to a new, higher number. The software only uses this date to identify variations in the data format - the number itself is not interpreted. The revision date is optional, but if structural changes are made to the database format (e.g. changing a List value to a Table value), a revision number should be added in the subsequent releases.

If a file contains a mix of records from several databases, do not include the database name in the header. Each record should be identified at the start of the record.

Record Structure

Changed comment format 2001-01-12 Records are presented using DL (dictionary list) structures. Each record appears within a pair of <DL> </DL> tags. The structure of a record is:

  <DL>
    <!--BIZINT RECORD DATABASE="[database name]" AN=[accession number] ED=[modification date] -->
    <DT>[field label]</DT>
    <DD>[field value]</DD>
    ...
  </DL>

The first line within the DL is a comment that identifies the record. [database name] is a database name, as described in the header information section above. If there are only records from one database in a file, the database name is optional in the RECORD comment. [accession number] is the publisher's accession number for this record, and [modification date] is the date of the last significant content change. The publisher may define the modification date however desired. Keep in mind that the software uses the modification date to decide when to replace a record in a chart with new data.

The accession number and modification date are both optional. If the file only contains data from one database, and that database is listed in the IDENT comment, the RECORD comments may be omitted.

Each field should appear in a DT/DD pairing. The field label should appear in the DT (dictionary term) element, and the field value should appear in the DD (dictionary definition) element.

Empty fields should be omitted.

Fields may appear in any order. We recommend that the order of the fields remain consistent within a file.

Field names should be unique in a database. Case is ignored, so "Descriptors" and "DESCRIPTORS" would be considered the same field name. It is preferrable that a field label should only appear once within a record. If there are multiple values associated with a field label, those values should be presented as a list within the value. However, in some cases a field has a complicated structure that may repeat (for example, the abstracts of equivalent patents within a family).

The label of the field (in the DT) may be marked up in bold-face (B tags) to enhance the visual presentation of the data. All other markup is ignored. The B tags should lie entirely within the DT, and should encompass the entire DT value, as shown below:

    <DT><B>The Field Label</B></DT>

The value of a particular field should appear in the DD tag corresponding to the DT tag containing the field name. The format of field values is described in the section on Field Values, below.

The end tags for DT and DD are optional, but recommended.

No other markup or text should appear within the DL structure. Database identification, navigation aids (such as links to the next record or top of page), and copyright notices, should all appear before the <DL> or after the </DL>.

Field Values

The value of a field is presented in a DD element. All punctuation within a DD element will be considered part of the value.

Scalar values: Scalar values (a single value for a field) are presented as plain text. A scalar value may have structure (such as a code and a description), or several related values (such as an originating company, a nationality, and a parent company). The scalar value is the same as a list value with one entry. However, if a table value has only one data row, the data should still be presented as a table.

List values: If a field has a set of values, the items should be separated by a BR (break) element. For example, if a drug has three therapeutic activity codes, the corresponding field in the record may look something like this:

  <DT><b>Therapeutic Activity</b></DT>
  <DD>Growth  Hormones  (H4C)<BR>
    Systemic Anabolic Hormones (A14A)<BR>
    All Other Urological Products (G4B9)
  </DD>

Recall that the whitespace in the file, including the line feeds after each BR element, are only there to help the human read the HTML file.

Table values: When a field contains many sets of values, and would normally be presented in a table, an HTML TABLE should be used. Each row should be contained within a TR element. Each column heading cell should appear in a TH (table heading) element. Each data cell should appear in a TD (table data) element. End tags are optional for TR, TH and TD elements.

Empty cells at the end of a row may be skipped. Empty cells within a row should be included in the file, although no text is required in the cell. The COLSPAN argument to TH and TD should only be used if the data in a cell truly spans columns (a VERY rare occurance). Do not use COLSPAN to fill empty cells.

Text values: Paragraphs of text, including headings and tables, can appear as the value of a field. Within a text value, BR or P (paragraph) tags should be used to separate paragraphs. Word wrap within a paragraph should not be forced with a BR.

Section headings, such as the "Field Values" heading of this section, are presented by making the entire paragraph text bold, as in:

  ... previous text ...
  <P><B>Field Values</B>
  <P>...

Run-in headings (such as the Text values: heading of this sub-section) may be included with either bold, italic, or bold-italic type. An example of an italic heading is given below:

  ... previous text ...
  <P><I>Text values:</I> Paragraphs of ...

Tables within text do not need to be separated by a P or BR, but these tags are allowed.

Image values: If a field contains an image, it should not contain any text. The image should be embedded using the IMG tag. The preferred format for the source image is JPEG, but Windows Bitmap, PNG, and TIFF are supported. GIF format is not supported. If the file is being created by a Windows application (such as a CD-ROM), the image SRC path may be relative. If the export file is delivered over the web, a fully qualified path should be given so that BizInt Smart Charts can retrieve the image.

  <DT>Structure</DT>
  <DD><img src="img00001.jpg"></DD>

ASCII-formatted text values: Finally, some values stored in existing systems are stored in ascii format where line feeds and white space are significant (such as a table of values as presented by on-line hosts such as DataStar). The entire value should be stored within a PRE element (between <PRE> and </PRE> tags).

ASCCI-formatted values should only be used as a presentation of last resort. The appropriate structured format should be used when possible.

Data Format

Any special characters should be presented with ISO named entities instead of with integer codes or font changes. For example, the ampersand character (&) should be presented as &amp;, not the equivalent &#038;.

Font and type face changes are generally ignored, with the exception of the specific uses of bold and italic type face identified in the Text Value description above.

Dates that are to be interpreted by the software should appear in CCYY-MM-DD format. In general, the software only interprets the modification date in the START comment for each record. However, the processing for some databases may include processing of other dates (such as priority dates for a patent).

© 2002 BizInt Solutions