HTML files from text files with special but simple and unobtrusive markup. It is intended for generating HTML versions of plain-text documentation (like
). It can also be used to create simple web pages.
Documentation files are often in plain-text format, which, while versatile, lacks the enhanced functionality of hyperlinks that allow one to jump between sections of the documentation or to resources outside it.
rTxt2htm parses text files written in a specific format for title, URLs, sections, code fragments, styled text, tables of content, etc., creating the necessary HTML elements for presentation in the HTML output. The format rTxt2htm uses is somewhat like the reStructured text or
format.
A form interface is provided to upload the plain text files, or paste their content, and to set additional information for use in the generated HTML content.
1.2 Formatters
(to top)
rTxt2htm looks for specific white-spacing, characters, etc. (
formatters) in the plain-text files for creating the necessary HTML elements.
The formatters that rTxt2htm uses are simple and unobtrusive, and yet meaningful inside plain-text files. A comparison of the
HTML, and the
plain-text versions of this
readme documentation shows this clearly.
Formatters (processing done in the shown order) are:
* A block of text with
+-----(5 or more)+ at top and at bottom (leading or trailing spaces are okay) is rendered as plain,
non-formatted, mono-spaced text for tables, ASCII diagrams, etc.; rest of formatters don't apply to its content. Like:
+~~~~ ~~~~+
| *hello* |
+~~~~ ~~~~+
* A block of text with
== Content ==(any number of) at top and atleast one empty line at bottom is considered a
table of content (TOC); rest of formatters except those for styled text don't apply to its content. Lines inside the block are made into TOC items, that get auto-linked to different sections, etc., if they have the identifiers for the sections, etc.
The
section identifiers, that can have the period (
.) character, can be numeric (like
1,
5.4.3 and
2.2.), or alphanumeric but inside round parentheses -- like,
(A),
(5i) and
(A.5i.1). The HTML
ID values generated by rTxt2htm for the identifiers asre the same as the identifiers but prefixed with an
s and with the brackets replaced with underscores (
_). E.g.,
s5.4.3 and
s_A.5i.1_.
* A block of text flanked with
/* style PHP comment markers will be shown in a
subtle div element. Like:
*
Title,
keywords,
description,
encoding and
language for use with the HTML version are gleaned from lines like this (the lines are not shown in the output):
@@title: rTxt2htm documentation
@@language: en
@@keywords: rTxt2htm, text to HTML, convert, conversion, PHP, Labware, readme
@@encoding: utf-8
@@description: rTxt2htm generates HTML versions of plain text files
The
encoding should be a value accepted by
IANA. The
language should also be a value accepted by
IANA. When such lines are missing, the information provided in the form is used. One may also manually edit the generated HTML files to alter such information.
* Four or more spaces before a sentence lead to the sentence being shown as
code (a tab is considered equal to 4 spaces). Like this:
<this is some 'code'>
* Flanking a word or phrase with ' makes it rendered as a
special span element,
like this (URL and bold or italics formatters are not applied to it). Flanking a word or phrase with `
italicizes it. Flanking a word or phrase with * makes it
appear bold.
* A word followed by
:-, a space, and then another word is rendered as the first word
hyperlinked to the location pointed out by the second one.
E.g.:
-- for rTxt2htm support, see
section 3.2
-- rTxt2htm was created for documenting
htmLawed
* Words with
http:,
https:,
mailto:,
ftp:,
file:, and
sftp: are rendered with appropriate
hyperlinks. Like,
http://www.bioinformatics.org/phplabware.
* Two
= characters followed by optional spaces and then text followed by more
= characters on a new line that is preceded by an empty line indicate a
section start. The text is shown as an
h2 element. Any o's at the end are for
div closures. If the text has a leading number like
1 and
3.2.1, the section gets an anchor named the same as the number but prefixed with
s, like
s1 and
s3.2.1.
* For
sub-sections (rendered with an
h3 element) and
sub-sub-sections (rendered with an
h4 element), instead of the
= character, the characters
- and
. respectively are used.
* Five or more underscores on a line by themselves and preceded by an empty line are rendered as an
hr element (
horizontal rule); any o's at the end are, like with the formatters for sections, etc., for
div closures.
Note:
Empty spaces are preserved, so any indentation is preserved. For bold, italicized or otherwise stylized text, the characters
[ and
( if at the beginning, and characters
?,
;,
!,
:,
,,
.,
), and
] if at the end of a word/phrase are not stylized. Same is true for hyperlinking.
Formatters for HTML lists, tables, colored text, etc., are missing as such information either cannot be expressed in plain-text format or is adequately functional in it without a need for a formatter.
rTxt2htm 1.2.1, 22 January 2019
Copyright Santosh Patnaik
GPL 3 license
A PHP Labware internal utility - http://www.bioinformatics.org/phplabware/internal_utilities/index.php