Coder Social home page Coder Social logo

html-tagset's Introduction

HTML::Tagset

This module contains data tables useful in dealing with HTML.

It provides no functions or methods.

PREREQUISITES

This suite requires Perl 5.

HTML::Tagset doesn't use any nonstandard modules.

INSTALLATION

You install HTML::Tagset, as you would install any perl module library, by running these commands:

perl Makefile.PL
make
make test
make install

If you want to install a private copy of HTML::Tagset in your home directory, then you should try to produce the initial Makefile with something like this command:

perl Makefile.PL LIB=~/perl

DOCUMENTATION

POD-format documentation is included in Tagset.pm. POD is readable with the perldoc utility. See ChangeLog for recent changes.

AVAILABILITY

The latest version of HTML::Tagset is available from the CPAN. See https://metacpan.org/pod/HTML::Tagset

COPYRIGHT

Copyright 1999,2000 Sean M. Burke. Copyright 1995-2000 Gisle Aas. Copyright 2000-2024 Andy Lester.

This library is free software; you can redistribute it and/or modify it under the terms of the Artistic License version 2.0.

html-tagset's People

Contributors

petdance avatar

Watchers

 avatar  avatar  avatar

html-tagset's Issues

HTML5 discussion: Interface

Before any code gets written, I'd like to hash out the interface for HTML::Tagset 5.00+. Following is my proposal and I welcome your feedback. Some points may run counter to prior things I've said. This is what I think makes most sense right now, and after seeing the discussion we've been having the past few weeks.

Guiding principles

  • Upgrading to HTML::Tagset 5.00 will not break existing code.
  • use HTML::Tagset; must give you HTML4 data, in existing hash format.
  • If you want to have HTML5 support, you must:
    • Install HTML::Tagset 5.00 or higher.
    • Change your use statement: use HTML::Tagset 'v5'
  • Add new hashes with more consistent naming, and in some cases, easier and more consistent data layout.
  • Add subs to provide an easier interface to the hashes.
  • Add ability to export subs and hashes.
  • Encourage people to use new subs and hashes.
  • We only support HTML4 or HTML5 at one time. You cannot do use HTML::Tagset ( 'v4', 'v5' );.

Existing HTML4 support

Existing code that uses HTML::Tagset without parameters will behave identically. You will get the same hashes you always have.

use HTML::Tagset;
my $is_head = $HTML::Tagset::isHeadElement{ $tag };

# Specifying 'v4' is identical.
use HTML::Tagset 'v4';
my $is_head = $HTML::Tagset::isHeadElement{ $tag };

You will also be able to use newly-named hashes and subs. All new names will be lowercase-and-underscores, and begin with tag_.

my $is_head = $HTML::Tagset::tag_is_head_element{ $tag };
my $is_head = HTML::Tagset::tag_is_head_element( $tag );

The newly-named hashes are NOT necessarily identical to the old hashes.
For example, the existing %linkElements looks like this:

our %linkElements = ( 'a' => ['href'], 'applet' => ['archive', 'codebase', 'code'] ... );

The newly-named hash %tag_is_link_attribute's values will be hashrefs. Same data, different layout.

our %tag_is_link_attribute = ( a => { href => 1 }, applet => { archive => 1, codebase => 1, code => 1 }, ... );

The new hashes will be easier to use.

# The old way.
my $linkElements = %HTML::Tagset::linkElements{ $tag };
if ( $linkElements ) {
    my %h = map { $_ => 1 } @{$linkElements};
    $is_link_attribute = $h{ $attr };
}

# The new way
my $is_link_attribute = $HTML::Tagset::tag_is_link_attribute{ $tag }{ $attr };
my $is_link_attribute = HTML::Tagset::tag_is_link_attribute( $tag, $attr );

HTML5 support

You can specify 'v5' to get HTML5-based data.

use HTML::Tagset 'v5';
# Same as 'v4' but with v5 hashes and subs.
my $is_head = $HTML::Tagset::isHeadElement{ $tag };
my $is_head = $HTML::Tagset::tag_is_head_element{ $tag };
my $is_head = HTML::Tagset::tag_is_head_element( $tag );

You will also be able to export hashes and subs.

use HTML::Tagset qw( v4 :hashes );
use HTML::Tagset qw( v4 :subs );
use HTML::Tagset qw( v5 :all );
my $is_head = tag_is_head( $tag );

If you export, you MUST pass a version. You may not pass multiple versions.

use HTML::Tagset qw( :all );    # Fails, no version
use HTML::Tagset qw( v4 v5 );   # Fails, too many versions

Disallow use of HTML::Tagset::* namespace

There will be no way to support both v4 and v5 at the same time.

Behind the scenes, the HTML::Tagset hashes and subs will probably be aliases to their ::v4 or ::v5 counterparts. I don't think we want to support people referring to the version-specific namespaces directly. They are only there for ease of development.

# YES
use HTML::Tagset 'v5';
my $is_head = HTML::Tagset::tag_is_head_element( $tag );
my $is_head = tag_is_head_element( $tag );
# NO, not guaranteed to be supported in the future.
my $is_head = HTML::Tagset::v5::tag_is_head_element( $tag );

Add hash for block elements

Migrated from RT originally by [email protected]

Would be nice to have a list of elements which "break line" when textified. I actually this in my HTML::AsText::Fix:

# source: http://en.wikipedia.org/wiki/HTML_element#Block_elements
%isBlockElement = map {; $_ => 1 } qw(
  p
  h1 h2 h3 h4 h5 h6
  dl dt dd
  ol ul li
  dir
  address
  blockquote
  center
  del
  div
  hr
  ins
  noscript script
  pre
);

Not sure what to do with <br>: it breaks line but it's not a block element.

Adding v5 support

I've created a new v5 branch. Let's work against that, and use this ticket for discussion.

Why do you want to hide HTML::Tagset::v[45] from PAUSE? I don't think we do.

As to tests:

  • Make a list of tags that are in v4 and not v5. Verify that. (font, i, center, etc)
  • Make a list of tags that are in v5 and not v4. Verify that. (audio, video, mark, etc)
  • Make a list of tags that are in both. Verify that. (table, div, etc)

We should probably test the differences in attributes. https://www.w3.org/TR/html5-diff/ notes, for example, "A new placeholder attribute can be specified on the input and textarea elements." I think it should be pretty exhaustive. We only have to do this once.

Finally, as I read it, you have v5 as the default, which is what we should do. We just need to make it dead simple, one line of code ideally, for someone to change back to v4 in their existing code.

Various minor changes needed

Here are some HTML v4 changes needed, and some minor other things.

  1. add 'plaintext' to %isOptionalEndTag (admittedly very rarely used)
  2. add 'svg' => ['xmlns', 'xmlns:xlink', 'xmlns:svg', 'xlink:href'], to %linkElements ('svg' is a legit v4 tag, although I'm not sure how much detail you want to get into on its child tags)
  3. add 'reversed' to list of 'ol' attributes in %boolean_attr (requires turning the list into { 'name' => 1} format)
  4. don't duplicate any tag entry in the lists: build up lists from other lists, if possible, adding only tags which don't appear in one of the sublists
  5. in %isPhraseMarkup, add 'svg', 'bdi', 'data', 'iframe', 'picture', 'object', 'param', 'plaintext', 'xmp', 'listing', 'ilayer'
  6. MANIFEST file should NOT include MANIFEST itself

Per 74627, we need to figure out just what exactly is needed for various lists. See also PhilterPaper/HTML-Tagset, in which I have commented out all the HTML 5 tags, leaving only HTML 4. #2 for discussing putting them back in, in some way.

Documentation of %isHeadElement is misleading

Originally opened by "bkb" 2015-11-16 as RT 109044 "Documentation of %isHeadElement is misleading". Please close the RT ticket.

==============================================================================

The documentation for %isHeadElement claims

This hashset contains all elements that elements that should be
present only in the 'head' element of an HTML document.

However, %isHeadElement contains tags like <script> which may be present either in the head or the body section of an HTML document.

==============================================================================

I fixed this in PhilterPaper/HTML-Tagset by changing the POD and code:

=head2 hashset %HTML::Tagset::isHeadElement

This hashset contains all elements that elements that may be
present in the 'head' element of an HTML document. Some, such as <script>,
may also by in the 'body'.

=cut

our %isHeadElement = (
  map {; $_ => 1 } qw(
    title 
    base basefont
    link 
    meta 
    object 
  ),
  %isHeadOrBodyElement,
);

<input> is not an empty element

Migrated from RT, originally by Tomas Doran @bobtfish

$ perl -MHTML::Tagset -e'warn $HTML::Tagset::emptyElement{input}'
1 at -e line 1.

This appears entirely wrong. <input> tags can contain content as this is how you set previous values for them..

E.g. <input type="text" name="foo">initial value</input>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.