/* htmLawed_TESTCASE.txt, 23 January 2023 To test htmLawed Copyright Santosh Patnaik Dual licensed with LGPL 3 and GPL 2+ A PHP Labware internal utility - www.bioinformatics.org/phplabware/internal_utilities/htmLawed */ This file has UTF-8-encoded text with both correct and incorrect/malformed HTML/XHTML code snippets to test htmLawed (test cases/samples). The entire text may also be used as a unit. ************************************************ when viewing this file in a web browser, set the character encoding to Unicode/UTF-8 ************************************************ --------------------- start -------------------- Try different $config and $spec values. Some text even when filtered in will not be displayed in a rendered web-page
Attributes
Xml:lang:, ,
Standard, predefined value, or empty attribute: , ,
Required: , image
Quote & space variation: a, a, a
Invalid: a
Duplicated: a
Deprecated: a,

Casing:
Custom: image
Data-*: a
Admin-restricted?:
Attribute values
Duplicate ID value:, ,
(try 'my_' for prefix)
Double-quotes in value:, ,
(try filter for CSS expression)
CSS expression:

Other: ,
(try 'maxlen', 'maxval', etc., for 'input' in '$spec')
Blockquotes
abc

abc
def

abc
def

abc
def
ghi

abc
def
ghi
QQQ
x

x
QQQ

x
QQQ
x

x
QQQ

x



(try with blockquote parent)
CDATA sections
Special characters inside: ]]>, 3.5, & 4 > 4 ]]>
Normal: , CDATA follows:
Malformed: , < ![CDATA check ]]>, , < ![CDATA check ] ]>
Invalid: >CDATA in tag content,
text not allowed
Complex-1: deprecated elements
The PHP software script used for this web-page webpage is htmLawedTest.php, from PHP Labware.
Complex-2: deprecated attributes
aa

image

Section

Para

  1. First item
  1. First item

Complex-3: embed, object, area


navigate the site: 1 | 3 | 4

value
Complex-4: nested and other tables
Cell
Cell
Cell
Cell Cell Cell
Cell
Cell Cell Cell

PCDATA wrong: Well
Hello

Missing tr:
Well

Complex-5: pseudo, disallowed or non-HTML tags
(Try different 'keep_bad' values) <*> Pseudotags <*> Non-HTML tag xml

Disallowed tag p

Elements
Unbalanced: check
Non-XHTML:

Malformed: < a href="">, , , , < /a>, < a href="">, a, a,
Invalid: a
Empty: a, a, atext
Content invalid: 12
Content invalid?:

(try setting 'form' as parent)
Casing:
Check for tidy:



hi
Customized element: Custom element: Click me?A beautiful tree towering over an empty savannah Custom element: Facebook G+ xx Math: 2 = 2 SVG:
Entities
Special: & 3 < 2 & 5>4 and j >i >a & ia
Padding: B B f f  
Malformed: & #x27;, &x27;, ' &TILDE;, &tilde
Invalid: , �, , �, ￿, &bad;
Discouraged characters: , „, ﷠, 􏿾
Context: '>', <?
Casing: ', ', &TILDE;, ˜
(also check named-to-numeric and hexdec-to-decimal, and vice versa, conversions)
Format
Valid but ill-formatted: text text text text
p r e
text text

text none text text none t e x t
text none t e x t text none t e x t
p r e  
				pre
		
Cell
Cell
Cell
CellCellCell
Cell
CellCellCell
(try to compact or beautify)
Forms
(note nesting of 'form', missing required attributes, etc.)
pl
h


B:C:

(try each of these lines separately)
what
what (try with container as div and as form)
c a b
HTML comments (also CDATA)
Script inside:
Special characters inside: , , , c
Normal: , , comment:,
text not allowed

Malformed: , < ![CDATA check ]]>, < ![CDATA check ] ]>
Invalid:
>comment in tag content,
HTML5
figure and figcaption:
picture
Caption for the awesome picture
article:

A

B

C

E

F

G

meter:

Heat 150.

datalist:
Ins-Del
(depending on context, these elements can be of either block or inline type)

block


d


d

d

d
Lists
Invalid character data:
  • (item
  • )

Definition list:
a
bad
first one
b
second

Definition list, close-tags omitted:
a
bad
first one
b
second

Definition lists, nested:
T1
D1
T2
D2
t1
d1
t2
d2
T3
D3
T4
D4
t1
d1

Definition lists, nested, close-tags omitted:
T1
D1
T2
D2
t1
d1
t2
d2
T3
D3
T4
D4
t1
d1

Nested:
  • l1
  • l2
    1. lo1
    2. lo2
  • l3
  • l4
    1. lo3
    2. lo4
      1. lo5

Nested, directly:
  • l1
    1. l2
  • l3

Nested, close-tags omitted:
  • l1
  • l2
    1. lo1
    2. lo2
  • l3
  • l4
    1. lo3
    2. lo4
      1. lo5

Complex:
Menu:
  • Microdata
    I am X but people call me Y. Find me at
    Microsoft Word
    Proprietary tag:

     


    XML declaration:
    XML-invalid character code-point (may not replicate):

    “Where is he?” asked both Mary – the one so lovely – and Jane.

    Nesting
    Block or inline a:

    text

    hi

    Non-English text-1
    Inscrieţi-vă acum la a Zecea Conferinţă Internaţională
    გთხოვთ ახლავე გაიაროთ რეგისტრაცია
    večjezično računalništvo
    อ.อ่าง
    Зарегистрируйтесь сейчас на Десятую Международную Конференцию по
    (this file should have utf-8 encoding; some characters may not be displayed because of missing fonts, etc.)
    Non-English text-2: entities
    用统一码
    გთხოვთ
    Inscreva-se agora para a Décima Conferência Internacional Sobre O Unicode, realizada entre os dias 10 e 12 de março de 1997 em Mainz na Alemanha.
    Ruby
    (need compatible browser)
    さい とう のぶ W3C Associate Chairman
    WWW (World Wide Web)
    A (aaa)
    Tables
    Omitted closing tags:
    h1c1h1c2
    r1c1r1c2
    r2c1r2c2

    Nested, omitted closing tags:
    h1c1h1c2
    r1c1r1c2
    h1c1h1c2
    r1c1r1c2
    r2c1r2c2
    r2c1r2c2

    Tag transformation
    Font element with malicious code:


    Font element intended as 'inline' element:

    hi


    Font element intended as 'block' element:
    hi

    Font element intended as 'block' element:
    hi
    QQQ

    Tidy
    White-space handling: abc def ghi abc def ghi
    URLs
    Relative and absolute: , , , , , ,
    (try base URL value of 'http://a.com/b/')
    CSS URLs:
    ,
    ,
    ,
    ,

    Double URLs: b
    Anti-spam: (try regex for 'http://a.com', etc.) , , , , , , ,
    Soft-hyphen: ídis­c
    XSS
    <img onmouseover=confirm(1)// '';!--"=&{()}





    test






    test
    Bad IE7: x
    Opera: link Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: xxx
    Bad IE7: x
    Bad IE7: x
    Bad IE7: x
    Bad IE7: x
    Bad IE7: exp/*x
    Bad IE7: hi
    Bad IE7: hi
    Bad IE7: test
    Bad IE7: hi
    Bad IE7: hi
    Other
    3 < 4
    3 > 4
    > 3
    <._.> hi!
    <<< ALERT >>>
    some stuff



    if(13age){say 'teen'}
    age >51 and a smoking history of >51 pack-years was
    age > 51 and a smoking history of >51 pack-years was
    age <51 and a smoking history of <51 pack-years was
    age < 51 and a smoking history of < 51 pack-years was
    age >51 and a smoking history of >51 pack-years
    age > 51 and a smoking history of >51 pack-years
    age <51 and a smoking history of <51 pack-years
    age < 51 and a smoking history of < 51 pack-years