PDF Document reading

We all like to work on something new and different from time to time.

This is a project which reads in pdf files. The pdf includes a street map containing the locations of various utility items underground. Our challenge was to build an application that would allow someone in the field to mark on the pdf where they planned to dig, and highlight to them any points that their dig line crosses existing utilities – such as water mains, sewerage, electric power, gas, telecoms, fibre optic etc.

Annotations in pdf

A pdf document is fixed when its created. Additional content can be added or taken away but this is done by appending the instructions to do so at the end of the document. For our purposes, we wanted to allow the user to ‘draw’ straight lines over the top of the existing pdf. Annotations were the chosen approach. We want to capture the exact start and end points of a straight line.

Javascript

A pdf document can have some automation code embedded in it. There’s actually a fully documented system that allows for such code to be used. We wrote some code that does the following:

  1. Get all the Annotations from the current document
  2. Filter out line annotations only
  3. Get the start and end points
  4. Populate these into a custom XML document
  5. Submit the data in XML format as a http POST request to our server.

Java Framework

We chose to use Apache PdfBox to handle the server side operations. These included

  1. Adding the javascript to the original pdf file to enable the above functions
  2. A Controller to receive the incoming POST request
  3. Service to process the XML data, to re-draw the lines on top of existing pages as new Annotations
  4. Service to read in all existing graphical lines drawn on the page
  5. Service to calculate interception points between the new line and existing
  6. Service to draw new interception points as annotations onto the original page

Problems

Many, many problems were encountered along this journey.

Capture mouse positions for start and end of line

This turned out to be unfeasable. The pdf document can be rotated or scaled to a great extent. We chose to use the built in functions for creation of a straight line within Adobe’s PDF Reader. The Chrome plugin for Adobe reader has a tool for drawing a freehand line, but the straight line tool isn’t enabled in the browser version. This leads to a clunky solution, where the user must use different applications to download and to edit and submit the pdf document with lines.

Spring Boot unable to receive pdf POST request

The developers first choice was to build a json object and send to the server as a curl request. It turns out that the pdf POST function relies on a Net.HTTP.request request. This function is not available to users with the free PDF Reader application.

Next option was to try and use the submitForm function, with the cSubmitAs set to XML.

var dXMLDoc = “<LinesDTO><jobNumber>4456</jobNumber><assetOwner>Water Company Sydney</assetOwner><lineDTOList><lineDTOList><author>CS Lewis</author><pageNo>0</pageNo><modDate/><points>1,2,3,4</points></lineDTOList></lineDTOList></LinesDTO>”;

var myXML= XMLData.parse( dXMLDoc, false);
this.submitForm({
    cURL: "http://localhost:8080/pdf",
    oXML: myXML,
    cSubmitAs: "XML"
});

This function worked, in that the message was sent to the destination server, but Spring Boot failed to handle it as the header includes a rogue character.

The header “Content-Type” has a value of application/xml; charset=utf-8″ (note the trailing double quote).

The developer tried to write a custom interceptor, repackaging the incoming http request, but this became too big of a rabbit hole.

Instead, eventually, the developer chose to rebuild the application using old fashioned servlets.

Current Blocking point:

At the time of writing, the blocking point is that the lines drawn on the pdf can be picked up, but when reprinted appear to be oriented differently.