Document policy files are ordinary text files that list the "policies" that AscToHTM should implement when converting your document. The file can have added comment lines (starting with a "!" or "#" character) and headings for clarity.In most cases recognised policy lines are identical to those listed in the generated policy file (see 4.1). This is usually a good place to start when making your own policy.
Only those lines that are recognised policies will be acted upon.
To use a policy file, simply list it on the command line after the name of the file being converted (see 4.2.2.1).
Document policies have two main uses :
- To correct any failure of analysis that AscToHTM makes. Hopefully this won't be needed too much as the core analysis engine improves.
Examples include page width, whether or not underlined section headings are expected etc.
- To tell the program how to produce better HTML end product in ways that couldn't possibly be inferred from the original text.
Examples include adding colour and titles to the page, as well as requesting a large document is split into several pages, and a contents list created.
This documentation has itself been converted using AscToHTM. The files used were
- A2HDOCO.TXT. This is the text version of the documentation. The text version is kept as the master copy and updated as required. It's then converted to HTML.
- IA2HDOCO.POL. This is the policy file used to create the HTML version of this document. Only those policies that differ from the defaults have been added.
This policy file "includes" the link dictionary A2HLINKS.TXT.
These files are included in the distribution kit as an example set of documentation.
- A2HDOCOH.TXT. This is the header HTML added as the bottom of each generated HTML page.
- A2HDOCOS.TXT. This is the JavaScript HTML added into the <HEAD>... </HEAD> portion of the generated HTML page. This particular example toggles the logo when the mouse is over it (only if you use Netscape V3.0 or above though).
- A2HLINKS.TXT. This is the link dictionary used for this document and is used to add hyperlinks to the main text file.
- A2HDOCOF.TXT. This is the footer HTML added as the bottom of each generated HTML page.
You can, of course, use AscToHTM to convert this doco into whatever format, colour etc that you wish.
These policies are generated during the analysis of the source document. They should only normally need changing when AscToHTM's analysis goes wrong in some way or other
Example:
[General Analysis] Indent position(s) : (none) Definition Char : (none) Hanging paragraph position(s) : (none) Page width : 76 Text Justification : Left Expect blank lines between paras : Yes New Paragraph Offset : (none) Keep it simple : No Min chapter size : 8 Short line length : (none) Expect code samples : No
These are the positions of the major indent levels in the document.
See the discussion in 5.6.1
These refer to the indentation levels at which definition paragraphs are expected (see 5.6.2).
This value can be used to influence short line and centred text detection. It also helps to determine if the definition characters ':' and '-' (see 5.6.1) are to be regarded as "strong" or "weak".
This policy is important in detecting pre-formatted text.The possible values are "left", "center" (i.e. left and right), "right" and "none". If text is centered then padding spaces may be added. This has to be ignored when detecting pre-formatted text.
Paragraphs are normally expected to have blank lines before them. Where this isn't true (e.g. on a text file dumped from Word) different algorithms can be applied more rigorously.
This policy refers to any hanging indent. Again, this is a Word for Windows favourite.
This policy tells AscToHTM to suppress much of its search for global structure. This should be used when converting documents that don't really have numbered sections, but which might look to AscToHTM as though they do (e.g. because they contain addresses, lists or tables of consecutive numbers)An example might be an email digest which consists of a series of small documents collected together. This is quite likely to confuse AscToHTM because it violates the one assumption (see 3.1) that AscToHTM makes.
This policy allows you to specify the minimum chapter size expected in the document (in numbers of lines). AscToHTM will ignore any apparent Chapter headings that appear too close together.
This policy determines what constitutes a short line. AscToHTM may add a <BR> to any line it deems to be short.If omitted, a "short" line is determined as some fraction of the calculated page width. The fraction varies from 50-75% according to the conversion type being carried out.
This policy indicates that the document is liable to contain sections of programming code. AscToHTM will attempt to detect such code fragments, and preserve their layout so that the code remains comprehensible.At present only "C"-like code is handled in a limited form, usually by inserting extra <BR>'s or marking up in a pre-formatted sections. This is expected to improve in future versions.
AscToHTM has the following bullet point policies that will normally be correctly calculated on the analysis pass :-
[Bullets] Expect Numbered bullets : Yes Expect alphabetic bullets : No Expect Roman Numeral bullets : No Bullet Char : '-' Bullet Char : 'o' Bullet Char : '*'
AscToHTM tries hard not to get confused by the "1", "a" and "I" that happen to end up at the start of lines by random. These could get mistaken for bullet points.
This indicates that numerical bullets are expected (but you probably guessed that).
This does likewise for alphabetic bullet points.AscToHTM recognises (and distinguishes between) upper and lower case variants.
This does likewise for roman numerals. Again upper and lower case variants are recognised.
These policy lines indicate character(s) that can occur at the start of a line to represent a bullet point. Special attention is paid to '-' and 'o' characters, but any character will do.Use one line per bullet char.
AscToHTM has the following section heading policies that will normally be correctly calculated on the analysis pass :-
[Headings] First Section Number : 1 Expect Numbered Headings : No Expect Underlined Headings : Yes Expect Capitalised Headings : No Expect Second Word Headings : No We have 0 recognised headings Smallest possible section number : Largest possible section number :
Section headers are far and away the most complex things the analysis pass has to detect, and the most likely area for errors to occur.
This policy indicates what the first section number is. Normally this starts at 1, but if it starts higher, then AscToHTM may reject headers as being out of sequence, and fail to detect to presence or absence of contents lists correctly.
This indicates whether or not numbered sections are to be expected.
This indicates whether or not underlined headers are to be expected. AscToHTM normally promotes any underlined lines to section headers. This policy can be used to switch that behaviour off.
This indicates whether or not a line that is wholly capitalised should be regarded as a section heading.
*** not fully supported in this version ***
*** not fully supported in this version ***
*** not fully supported in this version ***
*** not fully supported in this version ***
6.2.4.1 "Allow definitions inside PRE"
This policy specifies whether or not AscToHTM should detect definition terms inside a pre-formatted section of text. Only really relevant if the "Highlight definition text" policy is selected (6.3.6.2).
These policies allow you to fine tune the conversion, and are used during the output to HTML.
AscToHTM has the following HTML policies that will only ever take effect if supplied in a user policy file :-
[Added HTML] Document title : AscToHTM user documentation Document keywords : text, html, conversion Document description : This is part of the AscToHTM user documentation HTML Script file : A2HDOCOS.TXT HTML header file : A2HDOCOH.TXT HTML footer file : A2HDOCOF.TXT Background Colour : E0D0E0 Background Image : (none) Text Colour : Red Unvisited Link Colour : (none) Visited Link Colour : (none) Active Link Colour : (none)
These "polices" allow you to start "adding value" to the HTML generated. That is, they allow to specify things that cannot be inferred from the original text.You can also add HTML to your files by using the HTML preprocessor command (see 7.4)
This identifies the text to be placed in the <TITLE> ... </TITLE> markup in the document header.If omitted, the default title will be "Converted from <filename>". We did consider defaulting to the first line of text, but that rarely works.
The title can also be specified via a preprocessor command (see 7.5) placed in the source document, which will override this policy when present.
new in V2.1This policy allows you to specify keywords that are added to a META tag inserted into the <HEAD> section of the output page(s) as follows :-
<META NAME="keywords" VALUE="your list or keywords">This tag is often used by search engines when indexing your HTML page. You should add here any relevant keywords possibly not contained in the text itself.
The presence of a KEYWORDS pre-processor command will overrides this policy (see 7.8).
new in V2.1This policy allows you to specify a description of your document that is added to a META tag inserted into the <HEAD> section of the output page(s) as follows :-
<META NAME="description" VALUE="your description">This tag is often used by search engines (e.g. AltaVista) as a brief description of the contents of your page. If omitted the first few lines may be shown instead, which is often less satisfactory.
The presence of a DESCRIPTION pre-processor command will override any description specified in a policy file (see 7.9).
This identifies the name of a text include file to be transcribed into the <HEAD> ... </HEAD> portion of the generated HTML page.This allows you to place JavaScript in your pages (though you'll be a little limited as to what it can act on).
Recently HTML has introduced style sheets (see 6.3.6.1 and 7.7 ).
This identifies the name of a text include file to be transcribed into the HTML file at the top of the <BODY> ... </BODY> portion of the generated HTML page.This can be used to add standard headers, logos, contact addresses to your HTML pages, and is especially useful to give a consistent "look and feel" when breaking your document up into a number of smaller HTML files.
This identifies the name of a text include file to be transcribed into the HTML file at the bottom of the <BODY> ... </BODY> portion of the generated HTML page.This can be used to add "return to home page" links, and contact addresses to your HTML pages. Again, this helps to give a consistent "look and feel" when breaking your document up into a number of smaller HTML files.
This identifies the URL of any image to be placed in the BACKGROUND attribute of the <BODY> tag.
These policies identifies the colours to be placed in the various attributes of the <BODY> tag. You can enter any value acceptable to HTML. Normally a value is expressed as a 6-digit hexadecimal value in the range 000000 (black) to FFFFFF (white), but certain colours such as "white", "blue", "red" etc may also be recognised by HTML. AscToHTM simply transcribes your value into the output file.The various policies control the colours of the foreground Text (TEXT), the background (BGCOLOR), unvisited hyperlinks (LINK), visited hyperlinks (VLINK) and active hyperlinks (ALINK).
A value of "none" signals the defaults are to be used. By default AscToHTM changes the background colour to be white, and omits all the other <BODY> tag attributes.
AscToHTM has the following hyperlink policies set as defaults :-
[Hyperlinks] Create hyperlinks : No Create mailto links : Yes Create NEWS links : Yes Only use known groups : Yes Cross-refs at level : 2
Hyperlinks can also be added by using a link dictionary (see 4.3.2 and 4.5.2).
This policy really means that all http, www and ftp URLs will get converted to hyperlinks.
This indicates that probable email addresses such as jaf@yrl.co.uk are to be converted into mailto hyperlinks.
This indicates that probable newsgroup references such as alt.games.mornington.cresent (sic) are to be converted.
This indicates whether or not only newsgroups from known hierarchies should be converted into news: hyperlinks.AscToHTM can detect possible newsgroups by looking for words like "something.like.this" and "news.answers". However assuming these
are newsgroups often leads to errors.Consequently if this policy is set to "Yes" then candidate newsgroups have to belong to a recognised USENET hierarchy such as "alt", "comp", "sci" etc.
If set to no, then "something.like.this" will be turned into a news: hyperlink.
Later versions of AscToHTM may allow you to specify allowable USENET hierarchy roots.
This policy only takes effect if conversion of news hyperlinks is selected.
This defaults to "Yes"
This indicates the section level at which and above which all cross-references are to be converted to hyperlinks.For example a value of 2 means all n.n, n.n.n etc references are converted. A value of "1" might seem desirable, but is liable to give many false references (see 5.3.2).
This behaviour may be improved in later versions.
AscToHTM has the following HTML policies that will only ever take effect if supplied in a user policy file :-
[File generation] Input directory : (none) Output directory : (none) Use .HTM extension : No Use DOS filenames : No DOS filename root : (none) Split level : (none) Min HTML File size : (none) Add navigation bar : No Output policy file : Yes Output policy filename : (none) Generate diagnostics files : No
These policies how your document is divided into one or more HTML files, and how those files are to be named and linked together with hyperlinks.
*** reserved for future use ***This policy will allow the source directory to be specified. It is not yet implemented, but currently the input directory is written to any output policy file created, indicating where the source files where found relative to the run location.
A value of blank indicates the current directory.
*** Not available in the shareware version ***This policy allows you to specify which directory you want your files output to.
If an output policy file is created, this indicates where the generated files where placed relative to the run location.
A value of blank indicates the current directory.
This policy specifies whether or not the generated HTML files should have a .HTM extension. The default is to use a ".html" extension, unless DOS-compatible files are requested.
This policy allows you to specify that the HTML file names must be DOS compatible.If selected the filenames will all have a ".HTM" extension, and be given upper case names.
Any file name whose root exceeds 8 character will be shortened by keeping the first 3 characters, and adding a unique 5-digit number derived from the longer name.
See the discussion in 4.2.2.4.
Where DOS filenames are used this allows you to specify an up-to-5 character root to which any section numbers will be appended (see 6.3.3.6).If splitting a document at 2 levels we normally recommend a 3-character filename root.
Thus MYDOC.TXT given a root of MYD would produce MYD.HTM, MTD_1.HTM MYD_1_1.HTM etc... which are all less than 8 characters and thus maintain some readability.
If no root were specified, MYDOC_1_1.HTM would be renamed to MYDnnnnn.HTM where "nnnnn" would be a generated 5-digit code.
See the discussion in 4.2.2.4.
This identifies the heading level at which the generated HTML should be split into smaller files.A value of "none" will put all the HTML into one file.
A value of "1" will create a new HTML file for each new major section. A value of "2" will create a new HTML file for each new n.n section, whilst "3" creates a new document for each n.n.n section, and so on.
The first file created normally has a name that matches the source file. Subsequent files append the section number, separated by underscores.
This a file called MYDOC.TXT, will generate MYDOC.HTML, MYDOC_1.HTML, MYDOC_1_1.HTML etc...
This policy is only relevant when splitting the document into small output files, i.e. a "split level" is specified (see 6.3.3.6).This policy specifies a minimum output HTML size in lines (although this is only approximate).
This can be useful for documents that have chapters where all the content is in the sub-sections. In such documents you'd end up with a virtually empty chapter heading file if this policy is not used.
This policy is only relevant if you have elected to split your document into a number of smaller HTML files (see 6.3.3.6).In such cases this policy allows you have a navigation bar inserted at the foot of each HTML page, before any standard footer is added.
The navigation bar consists of
- A "Previous" link, to take to the previous HTML page.
- A "Next" link, to take to the next HTML page.
- A "Contents" link, to take to the start of the next section in the contents list.
This policy allows you to specify that you want AscToHTM to output the file policy that is being used. This will be a combination of the policy calculated by AscToHTM during the analysis pass, and any user-supplied policy lines.The output policy file will have a .pol extension in the output directory.
- Note_1:
- In earlier versions of AscToHTM the creation of an output policy file was the default, now it is not.
- Note_2:
- This policy has the same effect as the command line qualifier /POLICY (see 4.2.2.5). An output file will be create when either that qualifier is used, or this policy is set to yes.
**** not supported in this release ****
This policy specifies whether or not diagnostics files should be produced. This has exactly the same effect as the /DEBUG qualifier has in command line versions (see 4.2.2.2 ).
AscToHTM has the following HTML policies that influence the detection and generation of contents lists :-
[Contents] Expect Contents List : No Add contents list : No External contents list filename : (none)
6.3.4.1 "Expects contents list"
See the discussion in 4.2.2.3.
This policy specifies that AscToHTM should generate a contents list to match all the section heading that it marks up. This contents list will consist of hyperlinks to take you to the corresponding section and HTML file.The placement of the contents list depends on how you have decided to split up your output HTML (see 6.3.3.6) .
If you decide to convert MYDOC.TXT to a single HTML file MYDOC.HTML, AscToHTM will create a separate file called CONTENTS_MYDOC.HTML and add a link to this file at the top of MYDOC.HTML. You can, if you wish, simply cut and paste this file into MYDOC.HTML.
If you decide to convert MYDOC.TXT into several files, then the contents list is placed at the bottom of MYDOC.HTML, and points to all the newly created files. Any text before the first section in your document will be placed before the contents list in MYDOC.HTML.
Whenever you elect to have a contents list generated, and lines perceived by AscToHTM as being part of a contents list in the original document will be discarded.
This is the name of the external content file generated by AscToHTM should such a file be wanted.By default the file will be called contents_<filename>.html.
The contents file should be in the same directory as the created HTML files.
AscToHTM has the following policies that can be used to influence the preprocessor (see Using the preprocessor) , and hence the HTML output :-
[Preprocessor] Use Preprocessor : Yes Include document section : Public_part
This policy tells AscToHTM whether or not the preprocessor should be used. If it isn't used, then all preprocessor directives are ignored and a straight conversion from input to output files occurs.
- Note:
- If this policy is set to "no", all related preprocessor policies will have no effect.
This policy tells AscToHTM which section types are to be included in the conversion (see 7.1). The name supplied should match that in the SECTION directive.A value of "all" indicates that all section types should be converted.
At present one line per section type is required. Support for lists of sections may be added in the future.
AscToHTM has the following "styling" policies that will normally be correctly calculated on the analysis pass :-
[Style] Document Style Sheet : text.css Highlight Definition Text : No Use <DL> markup for defn. paras : Yes Allow automatic centring : No Minimum automatic <PRE> size : (none) Smallest allowed <Hn> tag : 5 Largest allowed <Hn> tag : 2 Ignore multiple blank lines : No Search for emphasis : Yes
6.3.6.1 "Document style sheet"
new in V2.1This policy allows you to specify the URL of a style sheet file, usually with a .css extension. Style sheet files are a new HTML feature that allow you specify fonts and colours to be applied to your document.
The resulting HTML is inserted into the <HEAD> section of the output page(s) as follows :-
<LINK REL="STYLESHEET" HREF="URL" TYPE="text/css">The presence of a STYLE_SHEET pre-processor command will overrides any style sheet specified in a policy file (see 7.7).
This policy specifies whether or not the definition term (the part marked up in <DT> ... </DT>) should be placed in bold for greater emphasis (see 5.6).
This policy specifies whether or not definition paragraphs should be marked up using <DL><DT>..<DT> <DD>..</DD>< /DL> markup.See the discussion in 5.6.2
This policy allows automatic detection of centred text to be performed.This is normally left switched off, as it is prone to give errors. This algorithm may be refined in later versions.
This policy specifies the minimum number of lines that must appear pre-formatted before they can be placed in their own <PRE> ... </PRE> sections.This is sometimes desirable, so is set to 1 by default. For example if you have lines with page numbers at the top of each page in your document. Of course... this makes no sense in an HTML document.
- Note:
- Only values in the range 1-20 are likely to have an effect. Values above 20 are likely to simply disable this feature entirely. This limitation is due to the size of the readahead buffer AscToHTM uses.
These policies control the output heading sizes. By default <H2> is used for main level headings, with each subsequent heading level being one size smaller, down to <H3> (normal text size).In the contents list, entries are shown down to <H4>.
The software will ignore these values if out of range, or if the largest value represents a smaller heading (larger Hn) that the "smallest" value.
This policy specifies whether or not multiple white lines should be ignored.Normally HTML ignores white space, but if this policy is selected then additional blank lines will be marked us as <BR>
This policy specifies whether or not AscToHTM should look for emphasised text. Text can be emphasised by placing asterisks (*) either side of it, or underscores (_). AscToHTM will convert the enclosed text to bold and italic respectively.
Link definitions appear as follows :-
[Link Dictionary] Link definition : "A2HDOCO.TXT" = "Source text" + "/~jaf/A2HDOCO.TXT"
That is, the text to be matched, the text to be used in its placed as the highlighted text, and the URL this link is to point to (in this case a relative URL).See the discussion in 4.2.2.2.