gov.lanl.archive.rewrite
Class TagMagix

java.lang.Object
  extended by gov.lanl.archive.rewrite.TagMagix

public class TagMagix
extends Object

Library for updating arbitrary attributes in arbitrary tags to rewrite HTML documents so URI references point back into the Wayback Machine. Attempts to make minimal changes so nothing gets broken during this process.

Version:
$Date: 2011-05-24 19:26:48 -0600 (Tue, 24 May 2011) $, $Revision: 1668 $
Author:
brad

Field Summary
static String ANY_TAGNAME
           
 
Constructor Summary
TagMagix()
           
 
Method Summary
static String getBaseHref(StringBuilder page)
          find and return the href value within a BASE tag inside the HTML document within the StringBuffer page.
static int getEndOfFirstTag(StringBuilder page, String tag)
           
static String getTagAttr(StringBuilder page, String tag, String attr)
          find and return the ATTR value within a TAG tag inside the HTML document within the StringBuffer page.
static String getTagAttrWhere(StringBuilder page, String tag, String findAttr, String whereAttr, String whereVal)
          Search through the HTML contained in page, returning the value of a particular attribute.
static void markupCSSImports(StringBuilder page, ResultURIConverter uriConverter, String captureDate, String baseUrl)
           
static void markupStyleUrls(StringBuilder page, ResultURIConverter uriConverter, String captureDate, String baseUrl)
           
static void markupTagREURIC(StringBuilder page, ResultURIConverter uriConverter, String captureDate, String baseUrl, Pattern pattern)
           
static void markupTagREURIC(StringBuilder page, ResultURIConverter uriConverter, String captureDate, String baseUrl, String tagName, String attrName)
          Alter the HTML document in page, updating URLs in the attrName attributes of all tagName tags such that: 1) absolute URLs are prefixed with: wmPrefix + pageTS 2) server-relative URLs are prefixed with: wmPrefix + pageTS + (host of page) 3) path-relative URLs are prefixed with: wmPrefix + pageTS + (attribute URL resolved against pageUrl)
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ANY_TAGNAME

public static String ANY_TAGNAME
Constructor Detail

TagMagix

public TagMagix()
Method Detail

markupCSSImports

public static void markupCSSImports(StringBuilder page,
                                    ResultURIConverter uriConverter,
                                    String captureDate,
                                    String baseUrl)

markupStyleUrls

public static void markupStyleUrls(StringBuilder page,
                                   ResultURIConverter uriConverter,
                                   String captureDate,
                                   String baseUrl)

markupTagREURIC

public static void markupTagREURIC(StringBuilder page,
                                   ResultURIConverter uriConverter,
                                   String captureDate,
                                   String baseUrl,
                                   String tagName,
                                   String attrName)
Alter the HTML document in page, updating URLs in the attrName attributes of all tagName tags such that: 1) absolute URLs are prefixed with: wmPrefix + pageTS 2) server-relative URLs are prefixed with: wmPrefix + pageTS + (host of page) 3) path-relative URLs are prefixed with: wmPrefix + pageTS + (attribute URL resolved against pageUrl)

Parameters:
page -
uriConverter -
captureDate -
baseUrl - which must be absolute
tagName -
attrName -

markupTagREURIC

public static void markupTagREURIC(StringBuilder page,
                                   ResultURIConverter uriConverter,
                                   String captureDate,
                                   String baseUrl,
                                   Pattern pattern)

getTagAttr

public static String getTagAttr(StringBuilder page,
                                String tag,
                                String attr)
find and return the ATTR value within a TAG tag inside the HTML document within the StringBuffer page. returns null if no TAG-ATTR is found.

Parameters:
page -
tag -
attr -
Returns:
URL of base-href within page, or null if none is found.

getTagAttrWhere

public static String getTagAttrWhere(StringBuilder page,
                                     String tag,
                                     String findAttr,
                                     String whereAttr,
                                     String whereVal)
Search through the HTML contained in page, returning the value of a particular attribute. This version allows matching only tags that contain a particular attribute-value pair, which is useful in extracting META tag values, for example, in returning the value of the "content" attribute in a META tag that also contains an attribute "http-equiv" with a value of "Content-Type". All comparision is case-insensitive, but the value returned is the original attribute value, as unmolested as possible. If nothing matches, returns null.

Parameters:
page - StringBuilding holding HTML
tag - String containing tagname of interest
findAttr - name of attribute within the tag to return
whereAttr - only match tags with an attribute whereAttr
whereVal - only match tags with whereAttr having this value
Returns:
the value of attribute attr in tag where the tag also contains an attribute whereAttr, with value whereVal, or null if nothing matches.

getBaseHref

public static String getBaseHref(StringBuilder page)
find and return the href value within a BASE tag inside the HTML document within the StringBuffer page. returns null if no BASE-HREF is found.

Parameters:
page -
Returns:
URL of base-href within page, or null if none is found.

getEndOfFirstTag

public static int getEndOfFirstTag(StringBuilder page,
                                   String tag)


Copyright © 2013. All Rights Reserved.