<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DelphiTools.info &#187; asm</title>
	<atom:link href="http://delphitools.info/tag/asm/feed/" rel="self" type="application/rss+xml" />
	<link>http://delphitools.info</link>
	<description>SamplingProfiler, DWS and other Delphi tools</description>
	<lastBuildDate>Thu, 02 Feb 2012 11:33:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Good Practices for JavaScript &#8220;asm&#8221; sections in DWS/OP4JS</title>
		<link>http://delphitools.info/2012/01/16/good-practices-for-javascript-asm-sections-in-dwsop4js/</link>
		<comments>http://delphitools.info/2012/01/16/good-practices-for-javascript-asm-sections-in-dwsop4js/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 07:57:19 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[asm]]></category>
		<category><![CDATA[DWS]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[OP4JS]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=1571</guid>
		<description><![CDATA[The compiler supports writing &#8220;asm&#8221; aka JavaScript section in the middle of Object Pascal, there are a few good practices as well as tips to keep in mind, let&#8217;s review the menu: Name conflicts and obfuscation support Do you really need an &#8220;asm&#8221; section? Don&#8217;t rely on implicit parameters structure Handling callbacks with &#8220;Variant&#8221; methods [...]]]></description>
			<content:encoded><![CDATA[<p>The compiler supports writing &#8220;asm&#8221; aka JavaScript section in the middle of Object Pascal, there are a few good practices as well as tips to keep in mind, let&#8217;s review the menu:</p>
<ol>
<li><a href="#1">Name conflicts and obfuscation support</a></li>
<li><a href="#2">Do you really need an &#8220;asm&#8221; section?</a></li>
<li><a href="#3">Don&#8217;t rely on implicit parameters structure</a></li>
<li><a href="#4">Handling callbacks with &#8220;Variant&#8221; methods</a></li>
<li><a href="#5">Handling callbacks in an &#8220;asm&#8221; section</a></li>
<li><a href="#6">Current limitations</a></li>
</ol>
<h4><a name="#1"></a>1. Name conflicts and obfuscation support</h4>
<p>This should be a point zero actually, but the first thing to have in mind is that you are allowed in Pascal to use as names identifiers that are reserved in JavaScript. Those can be language keywords (&#8220;this&#8221;, &#8220;delete&#8221;, etc.) or common DOM objects and properties (&#8220;document&#8221;, &#8220;window&#8221;).</p>
<p>The compiler automatically protects you from such conflicts by transparently renaming your identifiers (currently by adding a &#8220;$&#8221;+number at the end).</p>
<p>Then there is the obfuscator, which will basically rename everything to short, meaningless names. That&#8217;s good for more than obfuscation: it reduces the size of the JavaScript, improves the parsing and lookup-based performance in the browser.</p>
<p>The consequence is that in an &#8220;asm&#8221; section, you should prefix all Pascal identifiers with an &#8216;@&#8217;, so the compiler can correctly compile your asm. For instance in:</p>
<pre><strong>var</strong> window : <strong>String</strong>;
...
<strong>asm</strong>
   @window = window.name
<strong>end</strong>;</pre>
<p>The &#8216;@window&#8217; refers to the &#8216;window&#8217; string variable (which the compiler will rename), while &#8216;window.name&#8217; will be compiled &#8220;as is&#8221;, as it reads the &#8216;name&#8217; property of the global &#8216;window&#8217; JavaScript object.</p>
<h4><a name="#2"></a>2. Do you really need an &#8220;asm&#8221;&#8216; section?</h4>
<p>Though for some weird cases you might (like this <a href="http://stackoverflow.com/questions/7202157/can-you-explain-why-10">gem</a>), there are many cases in which you don&#8217;t need &#8220;asm&#8221;, as the language supports a &#8220;Variant&#8221; type which is a raw JavaScript object, and upon which you can call methods, read properties directly or via indexes.</p>
<p>For instance, with v a Variant, the following code:</p>
<pre>v := v.getNext();
v['hello'] := v.space + 'world';</pre>
<p>will get compiled (almost) straight into</p>
<pre>v = v.getNext();
v['hello'] = v.space + 'world';</pre>
<p>When using Variant, you don&#8217;t have strong compile-time checks (it&#8217;s just you vs JavaScript), property and function names are case-sensitive, so use them with care. This is similar in syntax and essence to using OLE Variants and Delphi.</p>
<p>On the other hand, you have compiler support, and you get automatic casts when assigning a variant to a strong type (Integer, String, etc.), and you also get name conflict protection &amp; obfuscation support without having to &#8216;@&#8217; your Pascal references.</p>
<h4><a name="#3"></a>3. Don&#8217;t rely on implicit parameters structure</h4>
<p>Because they may change in future compiler revisions!</p>
<p>For instance, methods are currently invoked with an implicit &#8220;Self&#8221; parameters, and the others behind, so currently &#8220;arguments[0]&#8221; is Self, and everything else if after that. But don&#8217;t rely on it.</p>
<p>Future compiler revisions may change that parameter&#8217;s name, may obfuscate it, may remove it entirely in favor of an implicit &#8220;this&#8221;, may inline your function, etc.</p>
<p>So if you need explicit parameters, declare them, if you&#8217;re in a method and need to access the object (Self), use &#8220;@Self&#8221;, if you need to access a field of the current object use &#8220;@Self.FieldName&#8221;, etc. That will keep working.</p>
<h4><a name="#4"></a>4. Avoid declaring variables in &#8220;asm&#8221; sections</h4>
<p>Declare them in the parent function/method instead, and reference them with the &#8216;@&#8217; prefix.</p>
<p>There are three main reasons for that, the first is that doing so means they&#8217;ll be case-insensitive, the second is that it will allow the obfuscator to obfuscate them reason for that, and the third is that you&#8217;ll get compiler warnings if you declare a variable but do not use it (or if you forgot to @-prefix it).</p>
<p>So don&#8217;t write that:</p>
<pre><strong>asm</strong>
   var myTemp;
   myTemp = ...whatever...;
   ...
<strong>end</strong>;</pre>
<p>But write this instead:</p>
<pre><strong>var</strong> myTemp : Variant;
...
<strong>asm</strong>
   @myTemp = ...whatever...;
   ...
<strong>end</strong>;</pre>
<h4><a name="#5"></a>5. Handling callbacks with &#8220;Variant&#8221; methods</h4>
<p>A common occurrence is to register a callback to a JavaScript object, when that object is hosted in a Variant, that&#8217;s fairly simple to achieve:</p>
<pre><strong>procedure</strong> DoImageLoaded;
<strong>begin</strong>
   ...
<strong>end</strong>;
...
<strong>var</strong> myImage : Variant; // will refer to an image object
...
myImage.onload(@DoImageLoaded);</pre>
<p>There we use the &#8216;@&#8217; operator Pascal-side, to make it explicit that we want a function pointer, and not call the function. The &#8216;@&#8217; isn&#8217;t necessary when the function is declared Pascal-side, as the compiler can figure it out, but when invoking a Variant method, it doesn&#8217;t know the parameters type.</p>
<p>Note that since function pointers are unified, you can get a function pointer from an object method or an interface method in the same fashion:</p>
<pre>myImage.onload(@myObject.DoImageLoaded);
myImage.onload(@myInterface.DoImageLoaded);</pre>
<h4><a name="#6"></a>6. Handling callbacks in an &#8220;asm&#8221; section</h4>
<p>If you want to register the callback in an &#8220;asm&#8221; section, the situation is a little more complex, as &#8220;@myObject.myMethod&#8221; will refer to the function prototype, outside of its context. It means it&#8217;s okay for standalone functions or procedures, but may not do what you&#8217;re expecting for object or interface methods.</p>
<p>The solution is to acquire the function pointer outside of the &#8220;asm&#8221; section:</p>
<pre><strong>var</strong> myCallback : Variant;
...
myCallback := @myObject.DoImageLoaded;
<strong>asm</strong>
   @myImage.onload(@myCallback);
<strong>end</strong>;</pre>
<h4><a name="#7"></a>7. Current limitations</h4>
<p>Currently the parser for &#8220;asm&#8221; sections doesn&#8217;t really understand JavaScript:</p>
<ul>
<li>it&#8217;s still treating JS as a weird invalid form of Pascal, and notably {} define comments for it, so it will pass whatever is inside curlies &#8220;as is&#8221;, and will annoyingly ignore @ signs inside curlies</li>
<li>some weird operator combos (but valid JS)  may throw off the parser, if that happens, place that code in between curlies, and post a bug report</li>
</ul>
<p>Hopefully in time, there will be a proper JS parser, but currently the focus is more on the Pascal side, and &#8220;asm&#8221; sections are intended for handling corner cases more than as a main workhorse.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2012/01/16/good-practices-for-javascript-asm-sections-in-dwsop4js/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kudos to the Firefox 4 TraceMonkey team!</title>
		<link>http://delphitools.info/2011/03/24/kudos-to-the-firefox-4-tracemonkey-team/</link>
		<comments>http://delphitools.info/2011/03/24/kudos-to-the-firefox-4-tracemonkey-team/#comments</comments>
		<pubDate>Thu, 24 Mar 2011 11:20:25 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[asm]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[JavaScript]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=893</guid>
		<description><![CDATA[I&#8217;ve been quite impressed with the JavaScript floating point performance in FireFox 4, which puts the Delphi compiler to shame. See for yourself this fractal rendering demo: Mandelbrot Set in HTML 5 Canvas I&#8217;ve made a version of the same code in Delphi XE (source + pre-compiled executable, 331 kB ZIP), and on my machine [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://delphitools.info/wp-content/uploads/2011/03/mandelTest.jpg"><img class="alignright size-full wp-image-897" style="margin-left: 10px; margin-right: 10px;" title="mandelTest" src="http://delphitools.info/wp-content/uploads/2011/03/mandelTest.jpg" alt="" width="200" height="200" /></a>I&#8217;ve been quite impressed with the <a href="http://www.amazon.com/gp/product/0596805527/ref=as_li_tf_tl?ie=UTF8&amp;tag=httpdelphiinf-20&amp;linkCode=as2&amp;camp=217145&amp;creative=399349&amp;creativeASIN=0596805527">JavaScript</a> floating point performance in <a href="http://www.mozilla.com/">FireFox 4</a>, which puts the Delphi compiler to shame. See for yourself this <a href="http://www.amazon.com/gp/product/0716711869/ref=as_li_tf_tl?ie=UTF8&amp;tag=httpdelphiinf-20&amp;linkCode=as2&amp;camp=217145&amp;creative=399349&amp;creativeASIN=0716711869">fractal</a> rendering demo:</p>
<p><a href="http://www.atopon.org/mandel/#">Mandelbrot Set in HTML 5 Canvas</a></p>
<p>I&#8217;ve made a version of the same code in Delphi XE (<a href="http://delphitools.info/wp-content/uploads/2011/03/MandelTest.zip">source + pre-compiled executable</a>, 331 kB ZIP), and on my machine here, for the 480&#215;480 resolution, where FireFox 4 gets the default view rendered in <strong>124 ms</strong>, where the &#8220;regular&#8221; Delphi version, which is limited to the old FPU, takes about <strong>200 ms</strong>&#8230;</p>
<p>It takes <em>manually SSE-enhanced</em> Delphi code to get back on top with a <strong>87 ms</strong> render time. It&#8217;s quick non-optimized scalar SSE code sure, and could likely be improved, but the point remains that without asm, Delphi XE&#8217;s native compiler trails <a href="http://www.google.fr/url?sa=t&amp;source=web&amp;cd=1&amp;ved=0CBQQFjAA&amp;url=https%3A%2F%2Fwiki.mozilla.org%2FJavaScript%3ATraceMonkey&amp;rct=j&amp;q=tracemonkey&amp;ei=QSeLTZCXKcyQ4gbD-tC8Dg&amp;usg=AFQjCNFFbqJTvDAotnQlSgX-vAu3F58Bsg&amp;sig2=FQraLojlqYd3f0S9DRwU8A&amp;cad=rja">TraceMonkey</a> in the floating point department&#8230;</p>
<p>So Embarcadero, how is that Delphi 64 version coming? is it properly SSE-enabled?</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2011/03/24/kudos-to-the-firefox-4-tracemonkey-team/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>The limitations of Delphi&#8217;s &#8220;inline&#8221;</title>
		<link>http://delphitools.info/2011/02/08/the-limitations-of-delphis-inline/</link>
		<comments>http://delphitools.info/2011/02/08/the-limitations-of-delphis-inline/#comments</comments>
		<pubDate>Tue, 08 Feb 2011 13:48:28 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[asm]]></category>
		<category><![CDATA[Compiler]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=838</guid>
		<description><![CDATA[Sometimes, the most simple-looking code can cause the Delphi compiler to stumble. I bumped on such a case recently, and simplified it to a bare-bones version that still exhibits the issue: type TFloatRec = record private Field : Double; public function RecGet : Double; inline; end; TMyClass = class private FRec : TFloatRec; public function [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes, the most simple-looking code can cause the Delphi compiler to stumble.</p>
<p>I bumped on such a case recently, and simplified it to a bare-bones version that still exhibits the issue:</p>
<pre><strong>type</strong>
   TFloatRec = <strong>record</strong>
      <strong>private</strong>
         Field : Double;
      <strong>public</strong>
         <strong>function </strong>RecGet : Double; <strong>inline</strong>;
   <strong>end</strong>;

   TMyClass = <strong>class</strong>
      <strong>private</strong>
         FRec : TFloatRec;
      <strong>public</strong>
         <strong>function </strong>Get : Double; <strong>virtual</strong>;
   <strong>end</strong>;

<strong>function </strong>TFloatRec.Get : Double;
<strong>begin</strong>
   Result:=Field; // here you could do a computation instead
<strong>end</strong>;

<strong>function </strong>TMyClass.Get : Double;
<strong>begin</strong>
   Result:=FRec.RecGet;
<strong>end</strong>;</pre>
<p>Basically all you have are trivial functions that return the value of a floating-point field.</p>
<p>Given the above, for the <em>TMyClass.Get</em> method, the optimal codegen would look just like</p>
<pre>fld qword ptr [eax+8]
ret</pre>
<p>Simple enough, eh? Yet here is what the Delphi XE compiler generates:</p>
<pre><strong>Unit1.pas.326: begin</strong>
0053A794 83C4F0           add esp,-$10
<strong>Unit1.pas.327: Result:=FRec.Get;</strong>
0053A797 83C008           add eax,$08
0053A79A 8B10             mov edx,[eax]
0053A79C 89542408         mov [esp+$08],edx
0053A7A0 8B5004           mov edx,[eax+$04]
0053A7A3 8954240C         mov [esp+$0c],edx
0053A7A7 8B442408         mov eax,[esp+$08]
0053A7AB 890424           mov [esp],eax
0053A7AE 8B44240C         mov eax,[esp+$0c]
0053A7B2 89442404         mov [esp+$04],eax
<strong>Unit1.pas.328: end;
</strong>0053A7B6 DD0424           fld qword ptr [esp]
0053A7B9 83C410           add esp,$10
0053A7BC C3               ret</pre>
<p>for the less-asm fluent, a direct pseudo-pascal translation of the above would be</p>
<pre><strong>var</strong>
   p : PDouble;
   temp1, temp2 : Double;
<strong>begin</strong>
   p:=@FRec.Field;
   temp1:=p^;
   temp2:=temp1;
   Result:=temp2;
<strong>end</strong>;</pre>
<p>And if <em>TMyClass.Get</em> is not virtual, but a static method with &#8220;inline&#8221;, you get the above with a <strong>third </strong>&#8220;<em>temp3</em>&#8221; Double (ie. it will perform even worse).</p>
<p>The above trips to temporaries aren&#8217;t innocuous, because  those temporaries are in the stack, and result in stalls as the CPU pipeline waits for the roundtrips to L1 memory cache to happen. In practice, <em>a  single of those stalls can take as much time as half a dozen floating  operations</em>.</p>
<p>To get rid of the temporaries, there are two options: you can manually inline everything (the <em>RecGet </em>&amp; the <em>Get</em>) to get rid of the temporaries, of course, that doesn&#8217;t sit too well with encapsulation, or with virtual calls for that matter.</p>
<p>Or you can use inline-asm instead, a single instruction of asm being enough, and even with calls betweens the functions, it will be running circles around the Delphi compiler&#8217;s &#8220;inline&#8221; output.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2011/02/08/the-limitations-of-delphis-inline/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>BASM? Yes we can!</title>
		<link>http://delphitools.info/2010/11/18/basm-yes-we-can/</link>
		<comments>http://delphitools.info/2010/11/18/basm-yes-we-can/#comments</comments>
		<pubDate>Thu, 18 Nov 2010 07:17:41 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[asm]]></category>
		<category><![CDATA[basm]]></category>
		<category><![CDATA[DWS]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=687</guid>
		<description><![CDATA[BASM may not be in the Delphi 64 preview, but a proof of concept of &#8220;BASM-for-DWS&#8221; is now available for DWScript in the SVN! It builds upon recently introduced &#8220;language extensions&#8221;, to allow &#8220;asm&#8221; blocks, which will be pre-parsed (to allow BASM-like references to local variables), and then fed to NASM for actual assembling (which [...]]]></description>
			<content:encoded><![CDATA[<p><a href="https://forums.embarcadero.com/thread.jspa?threadID=45533&amp;tstart=0">BASM may not be in the Delphi 64 preview</a>, but a proof of concept of &#8220;BASM-for-DWS&#8221; is now available for DWScript in the SVN!</p>
<p>It builds upon recently introduced &#8220;language extensions&#8221;, to allow &#8220;asm&#8221; blocks, which will be pre-parsed (to allow BASM-like references to local variables), and then fed to <a href="http://www.nasm.us/">NASM</a> for actual assembling (which you&#8217;ll need to download).</p>
<ul>
<pre><strong>const </strong>cOne = 1.0;
<strong>function </strong>RSqrt(x : Float) : Float;
<strong>begin</strong>
   <strong>asm </strong><span style="color: #999999;">// Result := 1/Sqrt(x);</span>
      fld x;
      fsqrt;
      fld cOne;  <span style="color: #999999;">// could have used fld1 here</span>
      fdivr;
      fstp Result;
   <strong>end</strong>;
<strong>end</strong>;

PrintLn(RSqrt(1/4));</pre>
</ul>
<p>You can have multiple asm sections, intersperse them with regular code, etc.</p>
<h4>Language extensions for DWScript</h4>
<p>The language extension mechanism allows to hook into the compiler, and handle part of the parsing on a source file. You can add to the language without affecting the DWS compiler core itself, and if not used explicitly, language extensions won&#8217;t be compiled, referred, or otherwise impact in any way DWS.</p>
<p>You activate extensions per <em>TDelphiWebScript</em> component, by dropping the relevant extension component (a subclass of <em>TdwsCustomLangageExtension</em>) anywhere you wish, and linking the relevant Script component via the Script property.</p>
<p>More pragmatic targets for extensions would be introducing unsafe features (like DLL imports), compile-time parsed SQL, opcodes for industrial automatons/printers, introducing constructs from other languages that may not fit with the general canon of Pascal, experimenting with alien syntaxes, etc. you name it.</p>
<h4>Notes on DWScript&#8217;s asmExtension</h4>
<p>BASM was picked to experiment with the extension mechanism, as it&#8217;s kind of a low hanging fruit: a block delimited by <em>asm</em> and <em>end</em>,  with access to the symbol table, that generates a custom expression,  and whose actual parsing can be handled by a ready-made tool.</p>
<p>Note that the BASM part introduced here, though usable, will likely remain minimalistic, at least until Delphi 64 comes around.</p>
<p>The assembler uses NASM, and so supports all the instructions NASM supports, though there are a few things to keep in mind:</p>
<ul>
<li>you must place <em>nasm.exe</em> in the same directory as your applications. Temporary files will be used during the assembly (in the temp folder). NASM may be bundled in a later version, but right now, it isn&#8217;t.</li>
<li>a semi-colon &#8216;;&#8217; is expected after instructions, a colon &#8216;:&#8217; after labels, and comments are regular ones (not NASM-style).</li>
<li>right now, BASM for DWS exposes constants, local variables, regular parameters (that are neither <em>var</em> nor <em>const</em>) and <em>Result</em> (for functions), when they are of type Integer (64bit), Float (double precision), String (pointer to the PChar) and Boolean (word).</li>
<li>DWS variables are stored in variants, its arrays are thus array of  variants, though the exposed address goes to the data portion of  variants.</li>
<li>the exposure mechanism is still simplistic, via defines, and if you have a local variable named &#8216;<em>eax</em>&#8216; f.i., it will take precedence over the similarly named register&#8230; Same goes for instructions.</li>
<li>the <em>EBP</em> register with offsets is used to expose variables, constants are exposed with absolute addresses (and are unified, btw), if you want to return before the exit point, you&#8217;ll have to &#8220;<em>pop ebp</em>&#8221; manually. As usual, you can do whatever you wish with <em>eax</em>, <em>edx </em>&amp; <em>ecx</em>, other registers have to be preserved.</li>
<li>you can use <em>@variable</em> to get only the address of a variable, f.i. for a float variable, <em>myfloat</em> could be &#8220;<em>qword [ebp-8]</em>&#8221; and <em>@myfloat</em> would be &#8220;<em>ebp-8</em>&#8220;.</li>
<li>you can only jump and call within the asm block.</li>
</ul>
<p>Of course, using any asm makes the script execution wholly unsafe, that&#8217;s why it&#8217;s in a language extension.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2010/11/18/basm-yes-we-can/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Code Optimization: Go For the Jugular</title>
		<link>http://delphitools.info/2009/05/06/code-optimization-go-for-the-jugular/</link>
		<comments>http://delphitools.info/2009/05/06/code-optimization-go-for-the-jugular/#comments</comments>
		<pubDate>Wed, 06 May 2009 05:00:04 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Tips]]></category>
		<category><![CDATA[asm]]></category>
		<category><![CDATA[Bottleneck]]></category>
		<category><![CDATA[Breakpoint]]></category>
		<category><![CDATA[CPU]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Profiler]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=343</guid>
		<description><![CDATA[Code optimization can sometimes be experienced as a lengthy process, with disruptive effects on code readability and maintainability. For effective optimization, it is crucial to focus efforts on areas where minimal work and minimal changes will have to most impact, ie. go for the jugular The Prey I will illustrate this using SamplingProfiler in a [...]]]></description>
			<content:encoded><![CDATA[<p>Code optimization can sometimes be experienced as a lengthy process, with disruptive effects on code readability and maintainability. For effective optimization, it is crucial to focus efforts on areas where minimal work and minimal changes will have to most impact, ie. go for the jugular</p>
<p><br class="spacer_" /></p>
<h4>The Prey<strong><br />
 </strong></h4>
<p>I will illustrate this using <a href="http://delphitools.info/samplingprofiler/">SamplingProfiler</a> in a small example, taken from a small library that deals with short vectors of varying length (but usually less than 10 dimensions), which I simplified, isolated &amp; anonymized for the purpose of this article.</p>
<pre>uses TypInfo;

type
   TDoWhat = (dwInc, dwDec);

procedure DoSomething1(var data : array of Integer; what : TDoWhat);
var
   i : Integer;
begin
   for i:=Low(data) to High(data) do
   begin
      case what of
         dwInc : Inc(data[i]);
         dwDec : Dec(data[i]);
      else
         raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what)));
      end;
   end;
end;
</pre>
<p><br class="spacer_" /></p>
<h4>Get Meat into Belly</h4>
<p>Before starting any kind of optimization, one has to define goals and limits, ie. figure out what &#8220;good enough&#8221; will be rather consider  &#8220;good enough&#8221; to be the state of the code one has grown tired of optimizing it!</p>
<p>The sample code above is quite straightforward and simple. It would of course be possible to blow this code to huge proportions for optimization&#8217;s sake. If you are after getting every last drop of CPU-cycle juice, and allow yourself to use every trick in the book, a fully optimized version could represent several thousandths of lines of code (I&#8217;m not exaggerating). If it&#8217;s your core business, it <em>might</em> be okay, but if it&#8217;s just a utility library, the increased maintainability issues could never be justified.</p>
<p>But since this article is intended more as an illustration than a discussion on the methodology, I&#8217;ll get straight to the buffalo (beef). For further reading on that subject, you can start from <a href="http://en.wikipedia.org/wiki/Big_O_notation">Big O Notation</a>, <a href="http://en.wikipedia.org/wiki/Benchmarking">Benchmarking</a> and <a href="http://en.wikipedia.org/wiki/Software_metrics">Software metrics</a> articles in wikipedia, there are also whole <a href="http://www.amazon.com/gp/product/0201729156?ie=UTF8&amp;tag=httpdelphiinf-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0201729156">books</a> on the subject.</p>
<p><br class="spacer_" /></p>
<h4>Stalking the Prey</h4>
<p>Looking at the above code, the first obvious optimization that developers suggest seems to be taking the conditional out of the loop, resulting in several case-specific loops. On small vectors, this nets about a 30% speedup. For further speedups, the suggestions are typically to go for loop unrolling, asm, and other heavy-handed solutions that come with a significant development time and code complexity increase.</p>
<p>Of course, readers of this website will know better than to jump straight into the code and apply optimization recipes: they would run the code through a profiler first. And since we&#8217;re dealing with a single procedure, an instrumenting profiler would be of little help, so they would run Sampling Profiler instead, and would get to see something like this:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-349" title="Going For The Jugular - Initial Profiling Results" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-1.png" alt="Going For The Jugular - Initial Profiling Results" width="581" height="281" /></p>
<p>In this run, only the dwInc case was stressed (line 37), and obviously the procedure spends less than 30% of its time doing what it was asked of, and most of its time (33%) on the &#8220;<em>end</em>&#8220;, ie. cleaning up, plus 8% setting up in &#8220;<em>begin</em>&#8220;. That&#8217;s 40%+ doing nothing but stack and setup/cleanup work!<br />
 The conditional in the loop that could have looked like the most worrying bit is eating a bit less than 20% of the time.</p>
<p>What is the source of all that <em>begin/end</em> work? Place a breakpoint on begin, run and hit Ctrl+Alt+C when your breakpoint is reached, go have a look at the CPU view, and you&#8217;ll see this:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-350" title="Going For The Jugular - CPU view near &quot;begin&quot;" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-2.png" alt="Going For The Jugular - CPU view near &quot;begin&quot;" width="546" height="220" /></p>
<p>This is a fairly significant stack setup for such a small procedure, and those instructions with &#8220;<em>fs:</em>&#8221; at the bottom are the setting up of an (implicit) exception frame. An exception frame for what? if you haven&#8217;t guessed already, navigate your CPU view near the &#8220;<em>end</em>&#8221; line.<br class="spacer_" /></p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-351" title="Going For The Jugular - CPU view near &quot;end&quot;" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-3.png" alt="Going For The Jugular - CPU view near &quot;end&quot;" width="447" height="306" /></p>
<p>No wonder &#8220;<em>end</em>&#8221; was a bottleneck! The call to <em>UStrArrayClr</em> indicates that the exception frame is here to cleanup several strings&#8230; these strings are those of the <em>raise Exception</em>, one is the string returned by <em>GetEnumName</em>, the other is the result of the concatenation passed to <em>Exception.Create</em>.</p>
<p><br class="spacer_" /></p>
<h4>Isolate and Kill</h4>
<p>How to get rid of that exception frame? One typical way is to use &#8220;Exception.CreateFmt&#8221;, and pass only constant strings to it, but that is not possible here with the call to <em>GetEnumName</em>, which returns a string. The other way is to isolate the exception to its own (nested) procedure:</p>
<pre>procedure RaiseUnsupported(what : TDoWhat);
begin
   raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what)));
end;</pre>
<p>and call <em>RaiseUnsupported</em> in the &#8220;<em>case else</em>&#8220;. Doing so will move the exception frame to the new procedure, where it&#8217;s irrelevant in terms of performance.<br />
 This simple change nets us a 33% speedup, ie. we reclaimed most of the lost time in <em>begin/end</em>! We also gained a bit from the <em>UStrArrayClr</em>, which did essentially nothing since those strings it was used to clear weren&#8217;t defined (as long as we did not hit the exception).</p>
<p>Note that if you use a nested procedure for <em>RaiseUnsupported</em>, you can be tempted not to pass it the &#8220;<em>what</em>&#8221; parameter, but use directly the &#8220;<em>what</em>&#8221; from its parent procedure. However by doing so, you&#8217;ll have the compiler use a special stack setup (so that the nested procedure can access the parent procedure&#8217;s variables). This setup will be faster than the exception frame it replaces, but with it, <em>begin/end</em> would still be taking about 18% of the CPU time spent in the procedure.</p>
<p><br class="spacer_" /></p>
<h4>Repeat Until Belly.Full;</h4>
<p>Those first 33% were easily gained. Let&#8217;s go for another round of SamplingProfiler:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-352" title="Going For The Jugular - Further Profiling Results" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-4.png" alt="Going For The Jugular - Further Profiling Results" width="551" height="278" /></p>
<p>Things are more satisfying: the line performing the actual work is now taking up most of the CPU time. Second comes the <em>case of</em> line. For further speed improvements, we now need to move the conditional out of the loop:</p>
<pre>procedure DoSomething3(var data : array of Integer; what : TDoWhat);

   procedure RaiseUnsupported(what : TDoWhat);
   begin
      raise Exception.Create('Unsupported: '+GetEnumName(TypeInfo(TDoWhat), Integer(what)));
   end;

var
   i : Integer;
begin
   case what of
      dwInc :
         for i:=Low(data) to High(data) do
            Inc(data[i]);
      dwDec :
         for i:=Low(data) to High(data) do
            Dec(data[i]);
   else
      RaiseUnsupported(what);
   end;
end;</pre>
<p>We have increased the line count noticeably, but most of those extra lines are still cosmetic. What further makes it a reasonable trade-off is that the execution time has been reduced by 66% from the initial version, it now executes 3 times faster!</p>
<p>Are there any more easy gains to be had? Let&#8217;s run the last version through SamplingProfiler:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-355" title="Going For The Jugular - Final Profiling Results" src="http://delphitools.info/wp-content/uploads/2009/04/jugular-5.png" alt="Going For The Jugular - Final Profiling Results" width="551" height="296" /></p>
<p>More than 92% of the execution time now goes to the loop and actual work. We got only a wee bit left for stack setup (line 96) and the <em>case of</em> (line 97). At this point, the above makes it clear that if you want to go faster you&#8217;ll have to increase the line count and code complexity significantly as you&#8217;ll need to replace the two-liner loops with something else, which is bound to be heavier (unrolling, SIMD, etc.)</p>
<p><br class="spacer_" /></p>
<h4>Rest Under A Tree</h4>
<p>Some quick final notes to conclude.</p>
<p>When moving an exception to a procedure, there are two things to keep in mind:</p>
<ul>
<li>the exception will be triggered at another place in the code, to know where it was actually triggered, you&#8217;ll have to look up one step in your exception log stack trace&#8230; You do have an exception log stack trace in place, don&#8217;t you?</li>
<li>the compiler won&#8217;t &#8220;know&#8221; about the exception in the called procedure, so it will assume execution continues after your <em>RaiseUnsupported</em>, so you may want to place an <em>Exit</em> after it (which will never be reached), to avoid warnings and allow the occasional register optimization by the compiler.</li>
</ul>
<p>In the final version, we gained more than the previous profiling run hinted at: the new code allowed the compiler to make better use of the registers. Ofttimes, getting the fat out of the way is all you need to see improvements.</p>
<p>If you check the CPU view, you&#8217;ll see everything is quite efficient now, but even then, using all the remaining tricks in the book could probably net noteworthy gains, just at a significant complexity increase. I didn&#8217;t try, but I would guess a 2x or 3x speed up should be about right.</p>
<p>If you were to need to go that route, SamplingProfiler could still help you there: on ASM code, you get profiling data down to the ASM instruction&#8230; but that&#8217;s food for another article.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/05/06/code-optimization-go-for-the-jugular/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Delphi 2009 hidden compiler switch?</title>
		<link>http://delphitools.info/2009/04/01/delphi-2009-hidden-compiler-switch/</link>
		<comments>http://delphitools.info/2009/04/01/delphi-2009-hidden-compiler-switch/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 11:46:26 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[asm]]></category>
		<category><![CDATA[CodeGear]]></category>
		<category><![CDATA[Command]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Site]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=269</guid>
		<description><![CDATA[This morning while debugging a statistical ichthyo-parser I stumbled upon what looked like a Delphi 2009 compiler bug: the compiler was outputting gibberish ASM opcodes&#8230; But after further investigations, it appeared this wasn&#8217;t completely gibberish, but that it was (somewhat) correct MSIL bytecode! What&#8217;s more, a quick hexadecimal examination of dcc32.exe yelded that this MSIL [...]]]></description>
			<content:encoded><![CDATA[<p>This morning while debugging a statistical ichthyo-parser I stumbled upon what looked like a Delphi 2009 compiler bug: the compiler was outputting gibberish ASM opcodes&#8230; But after further investigations, it appeared this wasn&#8217;t completely gibberish, but that it was (somewhat) correct MSIL bytecode!</p>
<p>What&#8217;s more, a quick hexadecimal examination of dcc32.exe yelded that this MSIL codegen looks like it can be forced by using an undocumented command-line compiler switch: <em>-af</em></p>
<p>The resulting exe won&#8217;t run because it&#8217;s a mismatch of Win32 headers and MSIL bytecode&#8230; What do you think?<br />
 Did CodeGear plan supporting unmanaged code in managed executables or managed code in native executables?</p>
<p><strong>Update:</strong> here is a <a rel="nofollow" href="http://delphitools.info/wp-content/uploads/2009/04/d2009-msil.jpg">screenshot</a> of the switch in action.</p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/04/01/delphi-2009-hidden-compiler-switch/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How familiar are you with code profiling?</title>
		<link>http://delphitools.info/2009/03/30/how-familiar-are-you-with-code-profiling/</link>
		<comments>http://delphitools.info/2009/03/30/how-familiar-are-you-with-code-profiling/#comments</comments>
		<pubDate>Mon, 30 Mar 2009 11:50:42 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[asm]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Newsgroup]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Poll]]></category>
		<category><![CDATA[Profiler]]></category>
		<category><![CDATA[Site]]></category>

		<guid isPermaLink="false">http://delphitools.info/?p=256</guid>
		<description><![CDATA[SamplingProfiler was initially released in the Delphi ASM newsgroup, and I&#8217;m curious about the audience of this website, so I&#8217;ve setup a small poll. How familiar are you with code profiling and/or Delphi code optimization? Can you tell apart instrumenting and sampling profilers merely by their respective heisenbugs, or is that profiler business sounding like [...]]]></description>
			<content:encoded><![CDATA[<p>SamplingProfiler was initially released in the <a href="https://forums.codegear.com/forum.jspa?forumID=88">Delphi ASM newsgroup</a>, and I&#8217;m curious about the audience of this website, so I&#8217;ve setup a small poll.</p>
<p>How familiar are you with code profiling and/or Delphi code optimization? Can you tell apart instrumenting and sampling profilers merely by their respective <a href="http://en.wikipedia.org/wiki/Unusual_software_bug#Heisenbug">heisenbugs</a>, or is that profiler business sounding like a <a rel="nofollow" href="http://www.imdb.com/title/tt0115322/">TV series</a> from the last century?</p>
<p><img class="size-full wp-image-286" style="margin-left: 20px; margin-right: 20px; clear:both;" title="Poll - Familiarity with Profilers" src="http://delphitools.info/wp-content/uploads/2009/03/poll-familiarity.png" alt="Poll - Familiarity with Profilers" width="480" height="218" /></p>
]]></content:encoded>
			<wfw:commentRss>http://delphitools.info/2009/03/30/how-familiar-are-you-with-code-profiling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- This Quick Cache file was built for (  www.delphitools.info/tag/asm/feed/ ) in 2.14684 seconds, on Feb 4th, 2012 at 1:22 pm UTC. -->
<!-- This Quick Cache file will automatically expire ( and be re-built automatically ) on Feb 5th, 2012 at 1:22 pm UTC -->
