gov.lanl.archive.rewrite
Class TextDocument

java.lang.Object
  extended by gov.lanl.archive.rewrite.TextDocument

public class TextDocument
extends Object

Class which wraps functionality for converting a Resource(InputStream + HTTP headers) into a StringBuilder, performing several common URL resolution methods against that StringBuilder, inserting arbitrary Strings into the page, and then converting the page back to a byte array.

Version:
$Date: 2010-09-28 16:28:38 -0600 (Tue, 28 Sep 2010) $, $Revision: 3262 $
Author:
luda inspired by brad

Field Summary
 StringBuilder sb
          the internal StringBuilder
 
Constructor Summary
TextDocument(Memento resource, String pageUrl, String captureDate, ResultURIConverter uriConverter)
           
 
Method Summary
 void addBase()
           
 byte[] getBytes()
           
 String getCharSet()
           
 String getJSIncludeString(String jsUrl)
           
 String getResult()
           
 void insertAtEndOfBody(String toInsert)
           
 void insertAtStartOfBody(String toInsert)
           
 void insertAtStartOfDocument(String toInsert)
           
 void insertAtStartOfHead(String toInsert)
           
 void readFully(InputStreamReader isr)
           
 void resolveAllPageUrls()
          Update all URLs inside the page, so they resolve correctly to absolute URLs within the Wayback service.
 void resolveASXRefUrls()
           
 void resolveCSSUrls()
           
 void resolvePageUrls()
          Update URLs inside the page, so those URLs which must be correct at page load time resolve correctly to absolute URLs.
 void setCharSet(String charSet)
           
 void setResultBytes(byte[] resultBytes)
           
 void stripHTML()
           
 void writeToOutputStream(OutputStream os)
          Write the contents of the page to the client.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

sb

public StringBuilder sb
the internal StringBuilder

Constructor Detail

TextDocument

public TextDocument(Memento resource,
                    String pageUrl,
                    String captureDate,
                    ResultURIConverter uriConverter)
Parameters:
resource -
result -
uriConverter -
Method Detail

addBase

public void addBase()

resolvePageUrls

public void resolvePageUrls()
Update URLs inside the page, so those URLs which must be correct at page load time resolve correctly to absolute URLs. This means ensuring there is a BASE HREF tag, adding one if missing, and then resolving: FRAME-SRC, META-URL, LINK-HREF, SCRIPT-SRC tag-attribute pairs against either the existing BASE-HREF, or the page's absolute URL if it was missing.


resolveAllPageUrls

public void resolveAllPageUrls()
Update all URLs inside the page, so they resolve correctly to absolute URLs within the Wayback service.


resolveCSSUrls

public void resolveCSSUrls()

resolveASXRefUrls

public void resolveASXRefUrls()

stripHTML

public void stripHTML()

readFully

public void readFully(InputStreamReader isr)
               throws IOException
Parameters:
charSet -
Throws:
IOException

getBytes

public byte[] getBytes()
                throws UnsupportedEncodingException
Returns:
raw bytes contained in internal StringBuilder
Throws:
UnsupportedEncodingException

getResult

public String getResult()
                 throws UnsupportedEncodingException
Throws:
UnsupportedEncodingException

setResultBytes

public void setResultBytes(byte[] resultBytes)

writeToOutputStream

public void writeToOutputStream(OutputStream os)
                         throws IOException
Write the contents of the page to the client.

Parameters:
os -
Throws:
IOException

insertAtStartOfDocument

public void insertAtStartOfDocument(String toInsert)
Parameters:
toInsert -

insertAtStartOfHead

public void insertAtStartOfHead(String toInsert)
Parameters:
toInsert -

insertAtEndOfBody

public void insertAtEndOfBody(String toInsert)
Parameters:
toInsert -

insertAtStartOfBody

public void insertAtStartOfBody(String toInsert)
Parameters:
toInsert -

getJSIncludeString

public String getJSIncludeString(String jsUrl)
Parameters:
jsUrl -
Returns:

getCharSet

public String getCharSet()
Returns:
the charSet

setCharSet

public void setCharSet(String charSet)
Parameters:
charSet - the charSet to set


Copyright © 2013. All Rights Reserved.