BizInt Smart Charts Preferred HTML Format - Record Version
Revised 12 January 2001 to reflect new BIZINT comment structure.
Purpose
The purpose of this format specification is to allow you to provide records in a format that will simplify integration with BizInt Smart Charts. The format is intended to be human-readable and uses a simple subset of HTML that should be rendered properly using any HTML browser.
If your system is only exporting records from a single database at a time, the simpler table-based format is another option for exporting.
Note: This specification describes a preferred format for data that is to be imported into BizInt Smart Charts products. This format will simplify the support of new databases (or new platforms for existing databases) in BizInt Smart Charts. Data in this format will not automatically be recognized by BizInt Smart Charts. BizInt Solutions will need to provide a definition file to clients wishing to chart data in this format.
Note: This format will be supported by version 3.0 of the BizInt Smart Charts products. Some features are supported in earlier releases. Please contact us for additional information.
Contact
If you have questions, would like clarification, or would like to request enhancements to this format, please contact John Willmore at BizInt Solutions (jaws@bizcharts.com).
Format in a Nutshell
The preferred data format is HTML with a very simple format.
- Information that identifies the database(s), and optional features of the format are included in a BIZINT comment within the HEAD element.
- The body contains records within DL (dictionary list) elements.
- Records are separated by a HR (horizontal rule) element.
- Immediately after the start of a DL element a comment may be inserted indicating the accession number and modification date of the record. If records from more than one database are present in the file, the database must also be identified in this comment.
- Within a record, fields may appear in any order.
- Field labels appear in the DT tag, and the value for a field appears in the corresponding DD tag.
- Field labels and values should be proper HTML, with at least the &, <, and > characters represented by the proper ISO character entities.
- Lists of items within a field should be separated by BR (break) elements.
- Tabular data in a field should be presented in a TABLE, using TH (Table Head) elements for all header cells and TD (Table Data) elements for all data cells.
- Paragraphs of text should only have a BR or P (paragraph) element at the end of a paragraph. Heading paragraphs within a text value should be in bold (B), although both bold and italic run-in headings are supported.
- Fields containing image data should contain only image data using an IMG tag.
File Structure
The file should consist of valid HTML. Tags may be written in upper-case or lower-case. End tags for elements are optional unless explicitly mentioned in the description below.
A valid file will have the following structure:
<html>
<head>
[required header information]
</head>
<body>
[record]
<hr>
[record]
...
</body>
</html>
Additional text may be present in the file beyond the required headers and the records. For example, a TITLE element is usually present in the header. The body often contains text with identifying information (such as the name of the database), sequence information (such as "Record 1 of 32"), and copyright notices (either one for the entire file or one per record).
Any additional html increases the amount of data to be transferred and slows down the charting process, so markup and text should not be added indiscriminately.
Line feeds and other whitespace may be inserted to aid human readability of the HTML source.
Header Information
Changed 2001-01-12. Rather than requiring BizInt Smart Charts to infer the identity of databases in a file, header information is written to identify the databases. The header information is written in a special comment which begins with the keyword BIZINT.
The database identification appears as follows:
<!--BIZINT IDENT FORMAT=HTMLRecord OPT=DL
SYSTEM="[system name]" DATABASE="[database name]" REV=[revision date]-->
for example
<!--BIZINT IDENT FORMAT=HTMLRecord OPT=DL
SYSTEM="IMS Lifecycle Web Version" DATABASE="R&D Focus" REV=2001-1-12-->
indicates that this file contains records from the "IMS Lifecycle Web Version" of "R&D Focus", in a format last changed January 12, 2001.
The [system name] is the name of the server or product that
is generating the output. This information is used together with the database
name to uniquely identify the source of the data. We recommend that you put
at least your company name and database system name in this field.
[database name] is the name of the database (such as "TDR
IPD" or "R&D Focus"). The database name should be unique
on the [system name]. BizInt Solutions will need to know the database name
and system name that you have chosen in order to add support to the BizInt
Smart Charts product.
The [revision date] is assigned by the publisher to indicate
changes to the structure of the database. This is a date in CCYY-MM-DD format.
Any time the database structure is changed in a way that affects the HTML
file, the revision number should be changed to a new, higher number. The software
only uses this date to identify variations in the data format - the number
itself is not interpreted. The revision date is optional, but if structural
changes are made to the database format (e.g. changing a List value to a Table
value), a revision number should be added in the subsequent releases.
If a file contains a mix of records from several databases, do not include the database name in the header. Each record should be identified at the start of the record.
Record Structure
Changed comment format 2001-01-12 Records are presented using DL (dictionary list) structures. Each record appears within a pair of <DL> </DL> tags. The structure of a record is:
<DL>
<!--BIZINT RECORD DATABASE="[database name]" AN=[accession number] ED=[modification date] -->
<DT>[field label]</DT>
<DD>[field value]</DD>
...
</DL>
The first line within the DL is a comment that identifies the record. [database
name] is a database name, as described in the header information section
above. If there are only records from one database in a file, the database
name is optional in the RECORD comment. [accession number] is
the publisher's accession number for this record, and [modification
date] is the date of the last significant content change. The publisher
may define the modification date however desired. Keep in mind that the software
uses the modification date to decide when to replace a record in a chart with
new data.
The accession number and modification date are both optional. If the file only contains data from one database, and that database is listed in the IDENT comment, the RECORD comments may be omitted.
Each field should appear in a DT/DD pairing. The field label should appear in the DT (dictionary term) element, and the field value should appear in the DD (dictionary definition) element.
Empty fields should be omitted.
Fields may appear in any order. We recommend that the order of the fields remain consistent within a file.
Field names should be unique in a database. Case is ignored, so "Descriptors" and "DESCRIPTORS" would be considered the same field name. It is preferrable that a field label should only appear once within a record. If there are multiple values associated with a field label, those values should be presented as a list within the value. However, in some cases a field has a complicated structure that may repeat (for example, the abstracts of equivalent patents within a family).
The label of the field (in the DT) may be marked up in bold-face (B tags) to enhance the visual presentation of the data. All other markup is ignored. The B tags should lie entirely within the DT, and should encompass the entire DT value, as shown below:
<DT><B>The Field Label</B></DT>
The value of a particular field should appear in the DD tag corresponding to the DT tag containing the field name. The format of field values is described in the section on Field Values, below.
The end tags for DT and DD are optional, but recommended.
No other markup or text should appear within the DL structure. Database identification, navigation aids (such as links to the next record or top of page), and copyright notices, should all appear before the <DL> or after the </DL>.
Field Values
The value of a field is presented in a DD element. All punctuation within a DD element will be considered part of the value.
Scalar values: Scalar values (a single value for a field) are presented as plain text. A scalar value may have structure (such as a code and a description), or several related values (such as an originating company, a nationality, and a parent company). The scalar value is the same as a list value with one entry. However, if a table value has only one data row, the data should still be presented as a table.
List values: If a field has a set of values, the items should be separated by a BR (break) element. For example, if a drug has three therapeutic activity codes, the corresponding field in the record may look something like this:
<DT><b>Therapeutic Activity</b></DT>
<DD>Growth Hormones (H4C)<BR>
Systemic Anabolic Hormones (A14A)<BR>
All Other Urological Products (G4B9)
</DD>
Recall that the whitespace in the file, including the line feeds after each BR element, are only there to help the human read the HTML file.
Table values: When a field contains many sets of values, and would normally be presented in a table, an HTML TABLE should be used. Each row should be contained within a TR element. Each column heading cell should appear in a TH (table heading) element. Each data cell should appear in a TD (table data) element. End tags are optional for TR, TH and TD elements.
Empty cells at the end of a row may be skipped. Empty cells within a row should be included in the file, although no text is required in the cell. The COLSPAN argument to TH and TD should only be used if the data in a cell truly spans columns (a VERY rare occurance). Do not use COLSPAN to fill empty cells.
Text values: Paragraphs of text, including headings and tables, can appear as the value of a field. Within a text value, BR or P (paragraph) tags should be used to separate paragraphs. Word wrap within a paragraph should not be forced with a BR.
Section headings, such as the "Field Values" heading of this section, are presented by making the entire paragraph text bold, as in:
... previous text ... <P><B>Field Values</B> <P>...
Run-in headings (such as the Text values: heading of this sub-section) may be included with either bold, italic, or bold-italic type. An example of an italic heading is given below:
... previous text ... <P><I>Text values:</I> Paragraphs of ...
Tables within text do not need to be separated by a P or BR, but these tags are allowed.
Image values: If a field contains an image, it should not contain any text. The image should be embedded using the IMG tag. The preferred format for the source image is JPEG, but Windows Bitmap, PNG, and TIFF are supported. GIF format is not supported. If the file is being created by a Windows application (such as a CD-ROM), the image SRC path may be relative. If the export file is delivered over the web, a fully qualified path should be given so that BizInt Smart Charts can retrieve the image.
<DT>Structure</DT> <DD><img src="img00001.jpg"></DD>
ASCII-formatted text values: Finally, some values stored in existing systems are stored in ascii format where line feeds and white space are significant (such as a table of values as presented by on-line hosts such as DataStar). The entire value should be stored within a PRE element (between <PRE> and </PRE> tags).
ASCCI-formatted values should only be used as a presentation of last resort. The appropriate structured format should be used when possible.
Data Format
Any special characters should be presented with ISO named entities instead of with integer codes or font changes. For example, the ampersand character (&) should be presented as &, not the equivalent &.
Font and type face changes are generally ignored, with the exception of the specific uses of bold and italic type face identified in the Text Value description above.
Dates that are to be interpreted by the software should appear in CCYY-MM-DD
format. In general, the software only interprets the modification date in
the START comment for each record. However, the processing for some databases
may include processing of other dates (such as priority dates for a patent).
© 2002 BizInt Solutions

