BizInt Smart Charts Preferred HTML Format - Table Version
Initial version 12 January 2001
Purpose
The purpose of this format specification is to allow you to provide records in a format that will simplify integration with BizInt Smart Charts. The format is intended to be human-readable and uses a simple subset of HTML that should be rendered properly using any HTML browser.
This format uses HTML tables to present data, and is offered as an alternative to the record-based format documented previously. This format can only support records from a single database. Result sets containing records from more than one source in a file should use the record-based format.
Note: This specification describes a preferred format for data that is to be imported into BizInt Smart Charts products. This format will simplify the support of new databases (or new platforms for existing databases) in BizInt Smart Charts. Data in this format will not automatically be recognized by BizInt Smart Charts. BizInt Solutions will need to provide a definition file to clients wishing to chart data in this format.
Note: This format will be supported by version 3.0 of the BizInt Smart Charts products. Some features are supported in earlier releases. Please contact us for additional information.
Contact
If you have questions, would like clarification, or would like to request enhancements to this format, please contact John Willmore at BizInt Solutions (jaws@bizcharts.com).
Format in a Nutshell
The preferred data format is HTML with a very simple format.
- Information that identifies the database, and optional features of the format are included in a BIZINT comment within the HEAD element.
- The data is presented in an HTML TABLE, in which each row corresponds to a single record, and each column corresponds to a database field.
- If there is no data present in any of the exported records for a particular field, the corresponding column may be omitted or a column of empty cells can be included.
- Field labels appear in the first row of the table. Header cell elements (TH) are used for these cells.
- Fields of each record are stored in cell elements (TD).
- Within a row, fields must appear in the order of the header cells. If a record does not have a value for a field, an empty cell should appear. The empty cell may also contain a non-breaking space ( ) in order for the cell borders to display properly.
- Field labels and values should be proper HTML, with at least the &, <, and > characters represented by the proper ISO character entities.
- Lists of items within a field should be separated by BR (break) elements.
- Tabular data in a field should be presented in a TABLE, using TH (Table Head) elements for all header cells and TD (Table Data) elements for all data cells.
- Paragraphs of text should only have a BR or P (paragraph) element at the end of each paragraph. Heading paragraphs within a text value should be in bold (B), although both bold and italic run-in headings are supported.
- Cells containing image data should not contain any text. Images should be included using the IMG tag. GIF format is not supported.
File Structure
The file should consist of valid HTML, although the HTML, HEAD, and BODY tags are not essential. Tags may be written in upper-case or lower-case. End tags for elements are optional unless explicitly mentioned in the description below.
A valid file will have the following structure:
...
<!--BIZINT IDENT FORMAT=HTMLTable
SYSTEM="[system name]" DATABASE="[database name]"-->
...
<table>
<TR>
<TH>[Column Label]</TH>
... repeated for each database field that is exported ...
</TR>
<TR>
<TD>[Field Value]</TD>
... repeated for each database field that is exported ...
</TR>
</table>
Additional text may be present in the file beyond the required headers and the records. For example, a TITLE element is usually present in the header. The body often contains text with identifying information (such as the name of the database), and copyright notices (either one for the entire file or one per record). Any additional html increases the amount of data to be transferred and slows down the charting process, so markup and text should not be added indiscriminately.
Additional text should appear outside the extent of the TABLE. If another table appears in the file before the TABLE containing data, you must add the word FIND_START to the BIZINT IDENT comment at the start of the file, and then place a BIZINT START_DATA comment immediately before the TABLE containing the data, as shown here:
...
<!--BIZINT IDENT FORMAT=HTMLTable
SYSTEM="[system name] DATABASE="[database name]"
FIND_START-->
... other html including a table ...
<!--BIZINT START_DATA-->
<table>
... the table containing the data ...
Line feeds and other whitespace may be inserted to aid human readability of the HTML source.
Header Information
Rather than requiring BizInt Smart Charts to infer the identity of the records in a file, header information is written to identify the database. The header information is written in a special comment which begins with the keyword BIZINT.
The database identification appears as follows:
<!--BIZINT IDENT FORMAT=HTMLTable
SYSTEM="[system name]" DATABASE="[database name]" REV=[revision date]-->
for example
<!--BIZINT IDENT FORMAT=HTMLTable
SYSTEM="IMS Lifecycle Web Version" DATABASE="R&D Focus" REV=2001-1-12-->
indicates that this file contains records from the "IMS Lifecycle Web Version" of "R&D Focus", in a format last changed January 12, 2001.
The [system name] is the name of the server or product that
is generating the output. This information is used together with the database
name to uniquely identify the source of the data. We recommend that you put
at least your company name and database system name in this field.
[database name] is the name of the database (such as "TDR
IPD" or "R&D Focus"). The database name should be unique
on the [system name]. BizInt Solutions will need to know the database name
and system name that you have chosen in order to add support to the BizInt
Smart Charts product.
The [revision date] is assigned by the publisher to indicate
changes to the structure of the database. This is a date in CCYY-MM-DD format.
Any time the database structure is changed in a way that affects the HTML
file, the revision number should be changed to a new, higher number. The software
only uses this date to identify variations in the data format - the number
itself is not interpreted. The revision date is optional, but if structural
changes are made to the database format (e.g. changing a List value to a Table
value), a revision number should be added in the subsequent releases.
As described above, the FIND_START keyword can be added to
the IDENT comment. If this keyword is found, BizInt Smart Charts will not
interpret the contents of the file until a BIZINT START_DATA comment is found.
TABLE Structure
In this format, all records are presented in a single HTML TABLE.
COLSPAN and ROWSPAN are not supported and should not be used to fill empty cells.
Each column corresponds to a field in the database. The order of the columns does not matter. We suggest using the order that the fields might appear in a record, but any order is permitted.
The first row of the table should contain field labels for each column. Field labels appear within TH elements.
...
<TR>
<TH>Field Label 1</TH>
<TH>Field Label 2</TH>
...
</TR>
...
Each field label should be unique within the database. Field labels are not case-sensitive, so "Descriptors" and "DESCRIPTORS" would be considered the same field name.
Rows after the first row contain record data. Field values appear within TD elements.
...
<TR>
<TD>Field Value 1</TD>
<TD>Field Value 2</TD>
...
</TR>
...
Fields must appear in the same order as the corresponding labels.
Empty cells at the end of a row may be skipped. Empty cells within a row must be written as an empty cell. The COLSPAN option to TD must not be used.
...
<TR>
... Two ways to write empty cells:
<TD> </TD>
<TD></TD>
...
</TR>
...
The end tags for TR, TH, and TD are optional, but recommended.
No other markup or text should appear within the TABLE structure. Database identification, navigation aids (such as links to the top of page), and copyright notices, should all appear before the <TABLE> or after the </TABLE>.
Field Values
The value of a field is presented in a TD element. All punctuation within a TD element will be considered part of the value.
Scalar values: Scalar values (a single value for a field) are presented as plain text. A scalar value may have structure (such as a code and a description), or several related values (such as an originating company, a nationality, and a parent company). The scalar value is the same as a list value with one entry. However, if a table value has only one data row, the data should still be presented as a table.
List values: If a field has a set of values, the items should be separated by a BR (break) element. For example, if a drug has three therapeutic activity codes, the corresponding field in the record may look something like this:
<TD>Growth Hormones (H4C)<BR>
Systemic Anabolic Hormones (A14A)<BR>
All Other Urological Products (G4B9)
</TD>
Recall that the whitespace in the file, including the line feeds after each BR element, are only there to help the human read the HTML file.
Table values: When a field contains many sets of values, and would normally be presented in a table, an HTML TABLE should be used. Each row should be contained within a TR element. Each column heading cell should appear in a TH (table heading) element. Each data cell should appear in a TD (table data) element. End tags are optional for TR, TH and TD elements.
Empty cells at the end of a row may be skipped. Empty cells within a row should be included in the file, although no text is required in the cell. The COLSPAN argument to TH and TD should only be used if the data in a cell truly spans columns (a VERY rare occurance). Do not use COLSPAN to fill empty cells.
Text values: Paragraphs of text, including headings and tables, can appear as the value of a field. Within a text value, BR or P (paragraph) tags should be used to separate paragraphs. Word wrap within a paragraph should not be forced with a BR.
Section headings, such as the "Field Values" heading of this section, are presented by making the entire paragraph text bold, as in:
... previous text ... <P><B>Field Values</B> <P>...
Run-in headings (such as the Text values: heading of this sub-section) may be included with either bold, italic, or bold-italic type. An example of an italic heading is given below:
... previous text ... <P><I>Text values:</I> Paragraphs of ...
Tables within text do not need to be separated by a P or BR, but these tags are allowed.
Image values: If a field contains an image, it should not contain any text. The image should be embedded using the IMG tag. The preferred format for the source image is JPEG, but Windows Bitmap, PNG, and TIFF are supported. GIF format is not supported. If the file is being created by a Windows application (such as a CD-ROM), the image SRC path may be relative. If the export file is delivered over the web, a fully qualified path should be given so that BizInt Smart Charts can retrieve the image.
<TD><img src="img00001.jpg"></TD>
ASCII-formatted text values: Finally, some values stored in existing systems are stored in ascii format where line feeds and white space are significant (such as a table of values as presented by on-line hosts such as DataStar). The entire value should be stored within a PRE element (between <PRE> and </PRE> tags).
ASCCI-formatted values should only be used as a presentation of last resort. The appropriate structured format should be used when possible.
Data Format
Any special characters should be presented with ISO named entities instead of with integer codes or font changes. For example, the ampersand character (&) should be presented as &, not the equivalent &.
Font and type face changes are generally ignored, with the exception of the specific uses of bold and italic type face identified in the Text Value description above.
Dates that are to be interpreted by the software should appear in CCYY-MM-DD
format. In general, the software only interprets the modification date in
the START comment for each record. However, the processing for some databases
may include processing of other dates (such as priority dates for a patent).
© 2002 BizInt Solutions

