SGMLS(1)                                                           SGMLS(1)




 NAME
      sgmls - a validating SGML parser

      An SGML System Conforming to
      International Standard ISO 8879 -
      Standard Generalized Markup Language

 SYNOPSIS
      sgmls [ -deglprsuv ] [ -cfile ] [ -iname ] [ filename... ]

 DESCRIPTION
      Sgmls parses and validates the SGML document entity in filename... and
      prints on the standard output a simple ASCII representation of its
      Element Structure Information Set.  (This is the information set which
      a structure-controlled conforming SGML application should act upon.)
      Note that the document entity may be spread amongst several files; for
      example, the SGML declaration, document type declaration and document
      instance set could each be in a separate file.  If no filenames are
      specified, then sgmls will read the document entity from the standard
      input.  A filename of - can also be used to refer to the standard
      input.  The following options are available:

      -cfile
           Write a report of capacity usage to file.  The report is in the
           format of a RACT result.  RACT is the Reference Application for
           Capacity Testing defined in the Proposed American National
           Standard Conformance Testing for Standard Generalized Markup
           Language (SGL) Systems (X3.190-199X), Draft July 1991.

      -d   Warn about duplicate entity declarations.

      -e   Describe open entities in error messages.  Error messages always
           include the position of the most recently opened external entity.

      -g   Show the GIs of open elements in error messages.

      -iname
           Pretend that

                <!ENTITY % name "INCLUDE"> occurs at the start of the
                document type declaration subset in the SGML document
                entity.  Since repeated definitions of an entity are
                ignored, this definition will take precedence over any other
                definitions of this entity in the document type declaration.
                Multiple -i options are allowed.  If the SGML declaration
                replaces the reserved name INCLUDE then the new reserved
                name will be the replacement text of the entity.  Typically
                the document type declaration will contain

                <!ENTITY % name "IGNORE"> and will use %name; in the status
                keyword specification of a marked section declaration.  In
                this case the effect of the option will be to cause the
                marked section not to be ignored.

      -l   Output L commands giving the current line number and filename.

      -p   Parse only the prolog.  Sgmls will exit after parsing the
           document type declaration.  Implies -s.

      -r   Warn about defaulted references.

      -s   Suppress output.  Error messages will still be printed.

      -u   Warn about undefined elements: elements used in the DTD but not
           defined.  Also warn about undefined short reference maps.

      -v   Print the version number.

    Entity Manager
      An external entity resides in one or more files.  The entity manager
      component of sgmls maps a sequence of files into an entity in three
      sequential stages:

      1.   each carriage return character is turned into a non-SGML
           character;

      2.   each newline character is turned into a record end character, and
           at the same time a record start character is inserted at the
           beginning of each line;

      3.   the files are concatenated.  A system identifier is interpreted
           as a list of filenames separated by colons.  A filename of - can
           be used to refer to the standard input.  If no system identifier
           is supplied, then the entity manager will attempt to generate a
           filename using the public identifier (if there is one) and other
           information available to it.  Notation identifiers are not
           subject to this treatment.  This process is controlled by the
           environment variable SGML_PATH; this contains a colon-separated
           list of filename templates.  A filename template is a filename
           that may contain substitution fields; a substitution field is a %
           character followed by a single letter that indicates the value of
           the substitution.  If SGML_PATH uses the %S field (the value of
           which is the system identifier), then the entity manager will
           also use SGML_PATH to generate a filename when a system
           identifier that does not contain any colons is supplied.  The
           value of a substitution can either be a string or it can be null.
           The entity manager transforms the list of filename templates into
           a list of filenames by substituting for each substitution field
           and discarding any template that contained a substitution field
           whose value was null.  It then uses the first resulting filename
           that exists and is readable.  Substitution values are transformed
           before being used for substitution: firstly, any names that were
           subject to upper case substitution are folded to lower case;
           secondly, space characters are mapped to underscores and slashes
           are mapped to percents.  The value of the %S field is not
           transformed.  The values of substitution fields are as follows:

      %%   A single %.

      %D   The entity's data content notation.  This substitution will
           succeed only for external data entities.

      %N   The entity, notation or document type name.

      %P   The public identifier if there was a public identifier, otherwise
           null.

      %S   The system identifier if there was a system identifier otherwise
           null.

      %X   (This is provided mainly for compatibility with ARCSGML.) A
           three-letter string chosen as follows:

                                      |            | With public identifier
                                      |            |________________________
                                      | No public  |   Device    |  Device
                                      | identifier | independent | dependent
           ___________________________|____________|_____________|__________
           Data or subdocument entity | nsd        | pns         | vns
           General SGML text entity   | gml        | pge         | vge
           Parameter entity           | spe        | ppe         | vpe
           Document type definition   | dtd        | pdt         | vdt
           Link process definition    | lpd        | plp         | vlp

           The device dependent version is selected if the public text class
           allows a public text display version but no public text display
           version was specified.

      %Y   The type of thing for which the filename is being generated:
           tab(&); l lB.  SGML subdocument entity&sgml Data entity&data
           General text entity&text Parameter entity&parm Document type
           definition&dtd Link process definition&lpd
           The value of the following substitution fields will be null
           unless a valid formal public identifier was supplied.

      %A   Null if the text identifier in the formal public identifier
           contains an unavailable text indicator, otherwise the empty
           string.

      %C   The public text class, mapped to lower case.

      %E   The public text designating sequence (escape sequence) if the
           public text class is CHARSET, otherwise null.

      %I   The empty string if the owner identifier in the formal public
           identifier is an ISO owner identifier, otherwise null.

      %L   The public text language, mapped to lower case, unless the public
           text class is CHARSET, in which case null.

      %O   The owner identifier (with the +// or -// prefix stripped.)

      %R   The empty string if the owner identifier in the formal public
           identifier is a registered owner identifier, otherwise null.

      %T   The public text description.

      %U   The empty string if the owner identifier in the formal public
           identifier is an unregistered owner identifier, otherwise null.

      %V   The public text display version.  This substitution will be null
           if the public text class does not allow a display version or if
           no version was specified.  If an empty version was specified, a
           value of default will be used.

    System declaration
      The system declaration for sgmls is as follows:

                         SYSTEM "ISO 8879:1986"
                                 CHARSET
      BASESET  "ISO 646-1983//CHARSET
                International Reference Version (IRV)//ESC 2/5 4/0"
      DESCSET  0 128 0
      CAPACITY PUBLIC  "ISO 8879:1986//CAPACITY Reference//EN"
                                FEATURES
      MINIMIZE DATATAG NO  OMITTAG  YES   RANK     NO  SHORTTAG YES
      LINK     SIMPLE  NO  IMPLICIT NO    EXPLICIT NO
      OTHER    CONCUR  NO  SUBDOC   YES 1 FORMAL   YES
      SCOPE    DOCUMENT
      SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Reference//EN"
      SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Core//EN"
                                VALIDATE
               GENERAL YES MODEL    YES   EXCLUDE  YES CAPACITY YES
               NONSGML YES SGML     YES   FORMAL   YES
                                  SDIF
               PACK    NO  UNPACK   NO

      The memory usage of sgmls is not a function of the capacity points
      used by a document; however, sgmls can handle capacities significantly
      greater than the reference capacity set.  In some environments, higher
      values may be supported for the SUBDOC parameter.  Documents that do
      not use optional features are also supported.  For example, if
      FORMAL NO is specified in the SGML declaration, public identifiers
      will not be required to be valid formal public identifiers.  Certain
      parts of the concrete syntax may be changed:
           The shunned character numbers can be changed.  Eight bit
           characters can be assigned to LCNMSTRT, UCNMSTRT, LCNMCHAR and
           UCNMCHAR.  Declaring this requires that the syntax reference
           character set be declared like this:

                BASESET   "ISO Registration Number 100//CHARSET
                           ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
                DESCSET   0 256 0


           Uppercase substitution can be performed or not performed both for
           entity names and for other names.  Either short reference
           delimiters assigned by the reference delimiter set or no short
           reference delimiters are supported.  The reserved names can be
           changed.  The quantity set can be increased within certain limits
           subject to there being sufficient memory available.  The upper
           limit on NAMELEN is 239.  The upper limits on ATTCNT, ATTSPLEN,
           BSEQLEN, ENTLVL, LITLEN, PILEN, TAGLEN, and TAGLVL are more than
           thirty times greater than the reference limits.  The upper limit
           on GRPCNT, GRPGTCNT, and GRPLVL is 253.  NORMSEP cannot be
           changed.  DTAGLEN are DTEMPLEN irrelevant since sgmls does not
           support the DATATAG feature.

    SGML declaration
      The SGML declaration may be omitted, the following declaration will be
      implied:

                            <!SGML "ISO 8879:1986"
                                    CHARSET
      BASESET  "ISO 646-1983//CHARSET
                International Reference Version (IRV)//ESC 2/5 4/0"
      DESCSET    0  9 UNUSED
                 9  2  9
                11  2 UNUSED
                13  1 13
                14 18 UNUSED
                32 95 32
               127  1 UNUSED
      CAPACITY PUBLIC  "ISO 8879:1986//CAPACITY Reference//EN"
      SCOPE    DOCUMENT
      SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Reference//EN"
                                   FEATURES
      MINIMIZE DATATAG NO OMITTAG  YES          RANK     NO  SHORTTAG YES
      LINK     SIMPLE  NO IMPLICIT NO           EXPLICIT NO
      OTHER    CONCUR  NO SUBDOC   YES 99999999 FORMAL   YES
                                 APPINFO NONE>

      with the exception that characters 128 through 254 will be assigned to
      DATACHAR.  When exporting documents that use characters in this range,
      an accurate description of the upper half of the document character
      set should be added to this declaration.  For ISO Latin-1, an
      appropriate description would be:

      BASESET   "ISO Registration Number 100//CHARSET
                 ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
      DESCSET   128 32 UNUSED
                160 95 32
                255  1 UNUSED


    Output format
      The output is a series of lines.  Lines can be arbitrarily long.  Each
      line consists of an initial command character and one or more
      arguments.  Arguments are separated by a single space, but when a
      command takes a fixed number of arguments the last argument can
      contain spaces.  There is no space between the command character and
      the first argument.  Arguments can contain the following escape
      sequences.

      \\   A \.

      \n   A record end character.

      \|   Internal SDATA entities are bracketed by these.

      \nnn The character whose code is nnn octal.  A record start character
           will be represented by \012.  Most applications will need to
           ignore \012 and translate \n into newline.  The possible command
           characters and arguments are as follows:

      (gi  The start of an element whose generic identifier is gi.  Any
           attributes for this element will have been specified with A
           commands.

      )gi  The end an element whose generic identifier is gi.

      -data
           Data.

      &name
           A reference to an external data entity name; name will have been
           defined using an E command.

      ?pi  A processing instruction with data pi.

      Aname val
           The next element to start has an attribute name with value val
           which takes one of the following forms:

           IMPLIED
                The value of the attribute is implied.

           CDATA data
                The attribute is character data.  This is used for
                attributes whose declared value is CDATA.

           NOTATION nname
                The attribute is a notation name; nname will have been
                defined using a N command.  This is used for attributes
                whose declared value is NOTATION.

           ENTITY name...
                The attribute is a list of general entity names.  Each
                entity name will have been defined using an I, E or S
                command.  This is used for attributes whose declared value
                is ENTITY or ENTITIES.

           TOKEN token...
                The attribute is a list of tokens.  This is used for
                attributes whose declared value is anything else.

      Dename name val
           This is the same as the A command, except that it specifies a
           data attribute for an external entity named ename.  Any D
           commands will come after the E command that defines the entity to
           which they apply, but before any & or A commands that reference
           the entity.

      Nnname
           nname.  Define a notation This command will be preceded by a p
           command if the notation was declared with a public identifier,
           and by a s command if the notation was declared with a system
           identifier.  A notation will only be defined if it is to be
           referenced in an E command or in an A command for an attribute
           with a declared value of NOTATION.

      Eename typ nname
           Define an external data entity named ename with type typ (CDATA,
           NDATA or SDATA) and notation not.  This command will be preceded
           by one or more f commands giving the filenames generated by the
           entity manager from the system and public identifiers, by a p
           command if a public identifier was declared for the entity, and
           by a s command if a system identifier was declared for the
           entity.  not will have been defined using a N command.  Data
           attributes may be specified for the entity using D commands.  An
           external data entity will only be defined if it is to be
           referenced in a & command or in an A command for an attribute
           whose declared value is ENTITY or ENTITIES.

      Iename typ text
           Define an internal data entity named ename with type typ (CDATA
           or SDATA) and entity text text.  An internal data entity will
           only be defined if it is referenced in an A command for an
           attribute whose declared value is ENTITY or ENTITIES.

      Sename
           Define a subdocument entity named ename.  This command will be
           preceded by one or more f commands giving the filenames generated
           by the entity manager from the system and public identifiers, by
           a p command if a public identifier was declared for the entity,
           and by a s command if a system identifier was declared for the
           entity.  A subdocument entity will only be defined if it is
           referenced in a { command or in an A command for an attribute
           whose declared value is ENTITY or ENTITIES.

      ssysid
           This command applies to the next E, S or N command and specifies
           the associated system identifier.

      ppubid
           This command applies to the next E, S or N command and specifies
           the associated public identifier.

      ffilename
           This command applies to the next E or S command and specifies an
           associated filename.  There will be more than one f command for a
           single E or S command if the system identifier used a colon.

      {ename
           The start of the SGML subdocument entity ename; ename will have
           been defined using a S command.

      }ename
           The end of the SGML subdocument entity ename.

      Llineno file
      Llineno
           Set the current line number and filename.  The filename argument
           will be omitted if only the line number has changed.  This will
           be output only if the -l option has been given.

      #text
           An APPINFO parameter of text was specified in the SGML
           declaration.  This is not strictly part of the ESIS, but a
           structure-controlled application is permitted to act on it.  No #
           command will be output if APPINFO NONE was specified.  A #
           command will occur at most once, and may be preceded only by a
           single L command.

      C    This command indicates that the document was a conforming SGML
           document.  If this command is output, it will be the last
           command.  An SGML document is not conforming if it references a
           subdocument entity that is not conforming.

 BUGS
      Some non-SGML characters in literals are counted as two characters for
      the purposes of quantity and capacity calculations.

 SEE ALSO
      The SGML Handbook, Charles F. Goldfarb
      ISO 8879 (Standard Generalized Markup Language), International
      Organization for Standardization

 ORIGIN
      ARCSGML was written by Charles F. Goldfarb.  Sgmls was derived from
      ARCSGML by James Clark (jjc@jclark.com), to whom bugs should be
      reported.