Driving PhantomJS from DWScript

phantomjs_logoPhantomJS is a “headless Webkit” browser which can be used for various automation tasks, among which is the generation of thumbnails for BeginEnd.net 🙂

This article is a quick illustration of how it can be controlled from a DWScript WebServer.

PhantomJS comes as a self-contained executable which is controlled by passing it a JavaScript file as parameter, so interacting with PhantomJS will basically involve three steps.

I will illustrate it for the task of generating a website thumbnail, while getting rid of those pesky cookie law prompts you can find on Blogger.

Generating a JS file to take a website thumbnail

First thing you will need is a directory that can be written too, along with a locally unique file name for the thumbnail (as a side note, generate that name internally based on a counter for instance, do not base it on any kind of user-side input).

In the case of DWSWebServer, file access for the scripts through the FileXxx functions is restricted to the website path, so I suggest creating a sub-directory named like “.phantomjs”, place the phantomjs.exe binary as well as your temporary files there. Directories starting with a dot “.” are not exposed to http/https with the DWSWebServer, so prefixing with a dot is a simple way to hide a directory from the wild web.

The basic script to make a screen capture is documented in the screen-capture PhantomJS example, since we are after thumbnails rather than full-size screenshots, we can make use of the zoomFactor property to directly downsize. This is also faster than rendering a full-size web page and downsizing.

Another annoyance to take care of is the “cookie law” prompt, which will otherwise be displayed and will “deface” the thumbnail. PhantomJS allows injecting into the web page, so we can make use of the evaluate method to inject a CSS style-sheet rule that will hide the prompt from Blogger.

The script creation thus looks like

var fjs := FileCreate(fileName + '.js');
FileWrite(fjs, #'
   var page = require("webpage").create();
   var z = 0.25;
   page.zoomFactor = z;
   page.viewportSize = { width: 1280*z, height: 1024*z };
   page.open(' + JSON.Stringify(websiteURL) + #', function () {
      page.clipRect = {top:0, left:0, width:1280*z, height:1024*z};
      page.evaluate(function() {
         var style = document.createElement("style");
         style.appendChild(document.createTextNode(""));
         document.head.appendChild(style);
         style.sheet.insertRule(".cookie-choices-info { display: none !important }", 0);
      });
      page.render(' + JSON.Stringify(fileName + '.png') + #');
      phantom.exit();
   });');
FileClose(fjs);

Multi-line strings can be leveraged to keep the clutter to a minimum.

Invoking the PhantomJS executable

The next step is to invoke PhantomJS, which is an external executable.

By default this is not allowed in the DWSWebServer, you will have to set the “COM” option to true in the “DWScript” section of options.json. You should also raise the worker script timeouts to allow enough time for PhantomJS (assuming you will invoke it from a worker)

{
	"Service" : {
                ... your service options ...
	},
	"Server" : {
                ... your server options ...
	},
	"DWScript" : {
		"WorkerTimeoutMSec" : 60000,
		"COM" : true
   }
}

With the COM connector enabled, we can now use the WshShell object.

While PhantomJS was quite stable IME, if you are going to run it server side, you have to account for the risk of an infinite loop or overly long processing, and so make a provision to abort the conversion:

var wsh := CreateOleObject("WScript.Shell");
var p := wsh.Exec('path.to.phantomjs\phantomjs.exe "' + fileName + '.js"');
var n := 30; // 30 seconds timeout
while p.Status = 0 do begin
   n -= 1;
   if n < 0 then begin
      p.Terminate;
      Break;
   end else Sleep(1000);
end;
if n >= 0 then begin
   ...phantomjs completed without having to be interrupted...
end;

Now all that is left is to move the file to a database or a static resources repository!

Just keep in mind that enabling COM can open up the whole host machine to the scripts, so ideally the web server service should be run under an account with “just enough rights” (like any other Windows services, for what’s it’s worth).

2 thoughts on “Driving PhantomJS from DWScript

  1. It looks like you forgot to decrease n in your abortion part. It starts at 30 but never decrease so the check n < 0 will never happen and the process will never be terminated.

    The if n < 0 statement at the end is also backwards when compared with the comment. if n = 0 then ‘process wasn’t terminated

Comments are closed.