<?xml-stylesheet type="text/xsl" href="http://scispace.net/tow21/rss/rssstyles.xsl"?>
<rss version='2.0'   xmlns:dc='http://purl.org/dc/elements/1.1/'>
    <channel xml:base='http://scispace.net/tow21/'>
        <title><![CDATA[Toby White : Activity]]></title>
        <description><![CDATA[Activity for Toby White, hosted on SciSpace.net.]]></description>
        <generator>Elgg</generator>
        <link>http://scispace.net/tow21/</link>        
        <item>
            <title><![CDATA[Testing with nose, or why setuptools is annoying.]]></title>
            <link>http://scispace.net/tow21/weblog/2106.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/2106.html</guid>
            <pubDate>Fri, 19 Sep 2008 18:35:51 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/397414246/19">http://feeds.feedburner.com/~r/Uszla/blog/~3/397414246/19</a></span></p> <div class="para"><p>I use <a href="http://somethingaboutorange.com/mrl/projects/nose/"  title="//somethingaboutorange.com/mrl/projects/nose/"  class="http">nose</a> for my Python tests. It's not the only Python testing framework out there, but it seems to fit my needs.</p></div>
<div class="sidebarblock">
<div class="sidebar-content">
<div class="sidebar-title">Doctest discovery</div>
<div class="para"><p>I started using it mainly because test discovery for
doctests seems to be far harder to arrange than it ought to
be. There's no point in me evangelizing doctests more than has been
adequately done before, but it's far too hard to arrange to run all
doctests within a project, which rather defeats their purpose. nose seems to manage it perfectly well
without trying.)</p></div>
</div></div>
<div class="para"><p>Anyway; so nose has this concept of plugins, which let you extend test discovery
, or add extra fixtures, or whatever. Indeed, nose's core functionality
is implemented by bundled plugins. It picks up all available plugins
automatically by scanning the entrypoints from packages installed
by setuptools. This has the irritating effect that</p></div>
<div class="ilist"><ul>
<li>
<p>
nose itself can't work without being installed, since it needs to
find its own bundled plugins.
</p>
</li>
<li>
<p>
any additional plugins you write or use have to be installed as
well.
</p>
</li>
</ul></div>
<div class="para"><p>Now I don't like this at the best of times; I get annoyed by software
that insists it knows better than me where it should live, and I
especially don't like blindly installing new software which might go &amp;
stomp all over existing installed software. Once you introduce
versioning into the equation, I get even more annoyed; you end up with
a python version of <a href="http://en.wikipedia.org/wiki/DLL_hell"  title="//en.wikipedia.org/wiki/DLL_hell"  class="http">DLL Hell</a>,
with one application needing version 0.8 of a package, and another
needing version 0.9.</p></div>
<div class="sidebarblock">
<div class="sidebar-content">
<div class="sidebar-title">Ruby gems</div>
<div class="para"><p>Incidentally - see also Tim Bray's
<a href="http://www.tbray.org/ongoing/When/200x/2008/08/31/Gem-Paranoia"  title="//www.tbray.org/ongoing/When/200x/2008/08/31/Gem-Paranoia"  class="http">take on
a related issue from the Ruby point of view</a>. I'm familiar enough
with Python to work my way around these problems when they arise, but
not so for Ruby - how do you install a gem-distributed package locally
without interfering with your system Ruby libraries?)</p></div>
</div></div>
<div class="para"><p>But - since most of the rest of the world apparently shares none of my
concerns with these issues, I struggle manfully onwards.</p></div>
<div class="para"><p>This week, there's been a thread on the <a href="http://lists.idyll.org/listinfo/testing-in-python"  title="//lists.idyll.org/listinfo/testing-in-python"  class="http">testing-in-python</a> mailing
list
"<a href="http://lists.idyll.org/pipermail/testing-in-python/2008-September/000966.html"  title="//lists.idyll.org/pipermail/testing-in-python/2008-September/000966.html"  class="http">why
you should distribute tests with your application / module</a>". I agree
100% - tests should always be distributed with applications; I think
almost all of the software I've ever written has had a bundled
test-suite. (This was particularly useful for Fortran software, where there is
such a wide range of compilers, but it still helps in tracking down
system-specific issues even in Python).</p></div>
<div class="para"><p>Unfortunately, nose's <tt>setup.py</tt> requirements fly in the face of
this; most users won't have nose installed. I could just about forgive
nose for this, if I could rely on distributing custom plugins
with my package, and being able to pick them up from the local path;
but I can't even do that.</p></div>
<div class="para"><p>It's also worth noting that this is an issue even for software with a
very limited distribution. If you're working collaboratively on a
project, then all your colleagues need to be able to run tests too;
and if you're working in a heterogeneous environment, then adding
<a href="http://www.yosefk.com/blog/redundancy-vs-dependencies-which-is-worse.html"  title="//www.yosefk.com/blog/redundancy-vs-dependencies-which-is-worse.html"  class="http">additional dependencies and installation requirements</a> becomes rapidly
onerous, and liable to piss off your co-developers.</p></div>
<div class="para"><p>Anyway. To round this story off with at least a moderately cheerful
ending, I was happy enough with nose's usability not to abandon it,
but pissed off with its requirements enough to try and fix
them. There's a
<a href="http://code.google.com/p/python-nose/issues/detail?id=26&amp;colspec=ID%20Type%20Status%20Priority%20Stars%20Milestone%20Owner%20Summary"  title="//code.google.com/p/python-nose/issues/detail?id=26&amp;colspec=ID%20Type%20Status%20Priority%20Stars%20Milestone%20Owner%20Summary"  class="http">patch
in the nose bug-tracker</a> which at least partly fixes the issue, so
that nose will pick up plugins from sys.path.</p></div>
<div class="para"><p>I suspect in the long term, though, the answer to most of these issues
lies in the use of
<a href="http://pypi.python.org/pypi/virtualenv"  title="//pypi.python.org/pypi/virtualenv"  class="http">virtualenv</a>. Enough people
insist on requiring setuptools-based install, that it will probably be
easier simply to isolate every app with its own dependencies in a
virtualenv, and just distribute that instead.</p></div>
<div class="para"><p>In the meantime, for anyone actually reading this; <strong>REQUIRING
SETUPTOOLS IS FUCKING ANNOYING, MMM'KAY? DON'T DO IT</strong></p></div>
<a href="http://uszla.me.uk/space/blog/2008/09/19"  title="//uszla.me.uk/space/blog/2008/09/19">@</a>]]></description>
        </item>
                
        <item>
            <title><![CDATA[New job + Python descriptors]]></title>
            <link>http://scispace.net/tow21/weblog/2071.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/2071.html</guid>
            <pubDate>Thu, 11 Sep 2008 17:37:54 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/389874272/11">http://feeds.feedburner.com/~r/Uszla/blog/~3/389874272/11</a></span></p> <div class="para"><p>So, there's been a bit of a hiatus in my blogging activity, which has
coincided with a change in my job.</p></div>
<div class="para"><p>I'm no longer employed by the university - as of the start of August
I've been working as a founder of a startup. We're still in stealth
mode, so output here will be work-related, but not too revealing,
initially at least. I think it's probably safe to say that there will
be much more Python than Fortran from now on!</p></div>
<div class="para"><p>Anyway, I thought it good practice to start writing English again,
after several weeks of nothing but Python. Naturally of course these
English words will concern Python &#8230;</p></div>
<div class="para"><p>So today I first used Python descriptors in anger. The particular
pattern used I hadn't seen before, so I thought I'd write about it.</p></div>
<div class="para"><p>The problem I faced was how to nicely deal with an object which is
expensive to initialize, which there should only be one of, and which is used by a number of other
objects. If I were writing in Java, this would be a classic use-case
for a Singleton, with some form of delayed initialization. How to do
it in a more Pythonic way, though?</p></div>
<div class="para"><p>The easiest way to get Singleton-ish behaviour is probably to have the
<tt>ExpensiveObject</tt> defined in its own module, with one instance
instantiated as a module-level variable, and thus initialized on
module import. This means that any other objects which need access to
it can simply have a class attribute pointing at it.</p></div>
<div class="listingblock">
<div class="content">
<pre>
elsewhere.py:
<span class="htmlfontify-keyword">class</span> <span class="htmlfontify-type">ExpensiveObject</span>(object):
    ...

expensive_instance = ExpensiveObject()
</pre></div></div>
<div class="listingblock">
<div class="content">
<pre>
user.py:
<span class="htmlfontify-keyword">class</span> <span class="htmlfontify-type">ObjectUser</span>(object):
    <span class="htmlfontify-keyword">from</span> elsewhere <span class="htmlfontify-keyword">import</span> expensive_instance
    reference = expensive_instance
    ...
</pre></div></div>
<div class="para"><p>This doesn't delay instantiation, though - the
instantiation is performed whenever the <tt>ObjectUser</tt> definition is
processed. Since <tt>expensive_instance</tt> isn't always needed,
it's annoying to have to always create it.</p></div>
<div class="para"><p>In order to do avoid that, clearly we need to remove the
expensive_instance from <tt>elsewhere</tt> and replace the ObjectUser attribute
reference with a function call.</p></div>
<div class="para"><p>We could do this in <tt>ObjectUser</tt> by overriding its <tt><em>getattr</em></tt>
appropriately, to do the normal trick where we check whether
<tt>expensive_instance</tt> is defined on this object, and if not, putting it
there:</p></div>
<div class="listingblock">
<div class="content">
<pre>
<span class="htmlfontify-keyword">def</span> <span class="htmlfontify-function-name">__getattr__</span>(<span class="htmlfontify-py-pseudo-keyword-face">self</span>, name):
    <span class="htmlfontify-keyword">if</span> name == <span class="htmlfontify-string">'reference'</span> <span class="htmlfontify-keyword">and</span> name <span class="htmlfontify-keyword">not</span> <span class="htmlfontify-keyword">in</span> <span class="htmlfontify-py-pseudo-keyword-face">self</span>.__dict__:
        <span class="htmlfontify-keyword">from</span> elsewhere <span class="htmlfontify-keyword">import</span> expensive_instance
        object.__class__.name =  expensive_instance
    <span class="htmlfontify-keyword">return</span> object__getattr__(<span class="htmlfontify-py-pseudo-keyword-face">self</span>, name)
</pre></div></div>
<div class="para"><p>which has a few problems.</p></div>
<div class="ilist"><ul>
<li>
<p>
Firstly, this involves doing this check for
<strong>every</strong> attribute access on this object, which is an unnecessary
price.
</p>
</li>
<li>
<p>
Secondly, if we are doing lots of getattr tricks for other attributes as well, it's messy to have them all in the same method.
</p>
</li>
<li>
<p>
Thirdly, we've set expensive_instance to be a class attribute,
which means that every class for which we do this will get its own
expensive_instance.
</p>
</li>
</ul></div>
<div class="para"><p>We could solve the second two issues with inheritance - have a small
class (<tt>ExpensiveFactory</tt>?)
which does nothing but override <tt><em>getattr</em></tt> for the attribute of
interest. This isolates the <tt><em>getattr</em></tt> logic for this attribute,
and makes sure that only one copy of <tt>expensive_instance</tt> is
instantiated (as a class variable of <tt>ExpensiveFactory</tt>)</p></div>
<div class="listingblock">
<div class="content">
<pre>
<span class="htmlfontify-keyword">class</span> <span class="htmlfontify-type">ObjectUser</span>(object, ExpensiveFactory):
    ...
</pre></div></div>
<div class="para"><p>But: we still haven't solved the first problem (speed of
<tt><em>getattr</em></tt>) and we've introduced another - if any of these child
classes want to override getattr, they have to remember to call
<tt>super()</tt> all the way up the inheritance hierarchy (see
<a href="http://fuhm.net/super-harmful/"  title="//fuhm.net/super-harmful/"  class="http">Python's Super considered harmful</a>)</p></div>
<div class="para"><p>Anyway - so this (I think) was exactly the reason that descriptors
were invented. Instead of <tt>ExpensiveFactory</tt>, we have
<tt>ExpensiveDescriptor</tt>:</p></div>
<div class="listingblock">
<div class="content">
<pre>
elsewhere.py:
<span class="htmlfontify-keyword">class</span> <span class="htmlfontify-type">ExpensiveDescriptor</span>(object):
    _expensive_instance = <span class="htmlfontify-py-pseudo-keyword-face">None</span>
<span class="htmlfontify-keyword">def</span> <span class="htmlfontify-function-name">__get__</span>(<span class="htmlfontify-py-pseudo-keyword-face">self</span>, instance, owner):
        <span class="htmlfontify-keyword">if</span> <span class="htmlfontify-py-pseudo-keyword-face">self</span>.__class__._expensive_instance <span class="htmlfontify-keyword">is</span> <span class="htmlfontify-py-pseudo-keyword-face">None</span>:
             <span class="htmlfontify-keyword">from</span> elsewhere <span class="htmlfontify-keyword">import</span> ExpensiveObject
             <span class="htmlfontify-py-pseudo-keyword-face">self</span>.__class__._expensive_instance = ExpensiveObject()
        <span class="htmlfontify-keyword">return</span> <span class="htmlfontify-py-pseudo-keyword-face">self</span>.__class__._expensive_instance
</pre></div></div>
<div class="listingblock">
<div class="content">
<pre>
user.py:
<span class="htmlfontify-keyword">class</span> <span class="htmlfontify-type">ObjectUser</span>(object):
    expensive_instance = elsewhere.ExpensiveDescriptor()
    ...
</pre></div></div>
<div class="para"><p>Whenever <tt>ObjectUser().expensive_instance</tt> is accessed, the descriptor's
<tt><em>get</em></tt> method is invoked, and an <tt>ExpensiveObject</tt> created - but not
before then.</p></div>
<div class="para"><p>This happens for <tt>ObjectUser</tt>, and any classes which inherit from it,
without any further interference in them.</p></div>
<div class="para"><p>And, <tt><em>get</em></tt> is implemented to have no cost when accessing any other
attributes.</p></div>
<div class="para"><p>And, of course, since <tt>_expensive_instance</tt> is a class attribute of
the Descriptor, there should only ever be one created.</p></div>
<div class="para"><p>Actually, you could have <tt>ExpensiveDescriptor</tt> manipulating the
module attribute <tt>elsewhere.expensive_instance</tt> - this would let you
get at the <tt>expensive_instance</tt> from anywhere in the code without
having to go through an object - but only after it had been
instantiated by one of the accessing objects. Might or might not be
useful, depending on your use cases.</p></div>
<div class="para"><p>Anyway, so that's why descriptors are brilliant!
For more reading, try:</p></div>
<div class="ilist"><ul>
<li>
<p>
<a href="http://users.rcn.com/python/download/Descriptor.htm"  title="//users.rcn.com/python/download/Descriptor.htm"  class="http">How-To Guide for Descriptors</a>
</p>
</li>
<li>
<p>
<a href="http://gulopine.gamemusic.org/2007/nov/23/python-descriptors-part-1-of-2/"  title="//gulopine.gamemusic.org/2007/nov/23/python-descriptors-part-1-of-2/"  class="http">Python Descriptors part 1 of 2</a>
</p>
</li>
<li>
<p>
<a href="http://gulopine.gamemusic.org/2007/nov/24/python-descriptors-part-2-of-2/"  title="//gulopine.gamemusic.org/2007/nov/24/python-descriptors-part-2-of-2/"  class="http">Python Descriptors part 1 of 2</a>
</p>
</li>
</ul></div>
<a href="http://uszla.me.uk/space/blog/2008/09/11"  title="//uszla.me.uk/space/blog/2008/09/11">@</a>]]></description>
        </item>
                
        <item>
            <title><![CDATA[XPath and QNames in content]]></title>
            <link>http://scispace.net/tow21/weblog/1217.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/1217.html</guid>
            <pubDate>Wed, 23 Apr 2008 21:27:31 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/276408652/23">http://feeds.feedburner.com/~r/Uszla/blog/~3/276408652/23</a></span></p> <p>As any fule no, QNames are how XML does namespaces.
Where a namespace has been declared:</p>
<div class="listingblock">
<div class="content">
<pre>
&lt;c:cml xmlns:c=<span class="htmlfontify-string">&quot;</span><span class="htmlfontify-string"><a href="http://www.xml-cml.org/schema/">http://www.xml-cml.org/schema/</a>&gt;</span><span class="htmlfontify-string">
</span></pre></div></div>
<p>and the "c" prefix on the element name is associated, via
the xmlns attribute, with the namespace URI. This is
trivially manipulable with any namespace-aware tool.</p>
<p>So far so good. However, when QNames are used in content (typically, as
an attribute value) then the situation is more complex. The two nodes
below are equivalent under QName-in-content processing.</p>
<div class="listingblock">
<div class="content">
<pre>
&lt;c:cml xmlns:c=<span class="htmlfontify-string">&quot;<a href="http://www.xml-cml.org/schema">http://www.xml-cml.org/schema</a>&quot;</span>
  att=<span class="htmlfontify-string">&quot;c:comp&quot;</span>/&gt;
&lt;d:cml xmlns:d=<span class="htmlfontify-string">&quot;<a href="http://www.xml-cml.org/schema">http://www.xml-cml.org/schema</a>&quot;</span>
  att=<span class="htmlfontify-string">&quot;d:comp&quot;</span>/&gt;
</pre></div></div>
<p>This usage is blessed by the W3C, <a href="http://www.w3.org/2001/tag/doc/qnameids.html"  title="//www.w3.org/2001/tag/doc/qnameids.html"  class="http">http://www.w3.org/2001/tag/doc/qnameids.html</a>, and <a href="http://www.w3.org/TR/xslt"  title="//www.w3.org/TR/xslt"  class="http">XSLT</a> depends on it working.</p>
<p>But it's significantly harder to work with using most XML toolkits.</p>
<div class="listingblock">
<div class="content">
<pre>
<span class="htmlfontify-function-name">node</span>()[@<span class="htmlfontify-variable-name">att</span>=<span class="htmlfontify-string">'string'</span>]
</pre></div></div>
<p>The above XPath returns all nodes which have <tt>att="string"</tt>.
However, it turns out that matching on a namespace-resolved QName needs the following:</p>
<div class="listingblock">
<div class="content">
<pre>
<span class="htmlfontify-function-name">node</span>()[substring-after(@att, <span class="htmlfontify-string">':'</span>)=<span class="htmlfontify-string">'comp'</span>
       and @att[../namespace::*
                 [name()=substring-before(../@att,<span class="htmlfontify-string">':'</span>)]
                =<span class="htmlfontify-string">'<a href="http://www.xml-cml.org/schema">http://www.xml-cml.org/schema</a>'</span>]
      ]
</pre></div></div>
<p>if you only allow for prefixed QNames (<strong>eg</strong> <tt>c:comp</tt> above). If you want to be able to match unprefixed QNames as well, that is, QNames in the default namespace:</p>
<div class="listingblock">
<div class="content">
<pre>
&lt;cml xmlns=<span class="htmlfontify-string">&quot;<a href="http://www.xml-cml.org/schema">http://www.xml-cml.org/schema</a>&quot;</span>
  att=<span class="htmlfontify-string">&quot;comp&quot;</span>/&gt;
</pre></div></div>
<p>then you need to extend the expression to the following:</p>
<div class="listingblock">
<div class="content">
<pre>
<span class="htmlfontify-function-name">node</span>()[(substring-after(@att, <span class="htmlfontify-string">':'</span>)=<span class="htmlfontify-string">'comp'</span>
        and @att[../namespace::*
                  [name()=substring-before(../@att,<span class="htmlfontify-string">':'</span>)]
                 =<span class="htmlfontify-string">'<a href="http://www.xml-cml.org/schema">http://www.xml-cml.org/schema</a>'</span>])
    or (@<span class="htmlfontify-variable-name">att</span>=<span class="htmlfontify-string">'comp'</span> and
         and namespace::*[name()=<span class="htmlfontify-string">''</span>]
              =<span class="htmlfontify-string">'<a href="http://www.xml-cml.org/schema">http://www.xml-cml.org/schema</a>'</span>)
       ]
</pre></div></div>
<p>which is hardly transparent!</p>
<p>Much as I think XPath 2 is a bad idea in general, this is one area where it is
a significant step forward; it offers node functions:</p>
<ul>
<li>
<p>
<a href="http://www.w3.org/TR/xquery-operators/#func-local-name-from-QName"  title="//www.w3.org/TR/xquery-operators/#func-local-name-from-QName"  class="http">fn:local-name-from-QName</a>
</p>
</li>
<li>
<p>
<a href="http://www.w3.org/TR/xquery-operators/#func-namespace-uri-from-QName"  title="//www.w3.org/TR/xquery-operators/#func-namespace-uri-from-QName"  class="http">fn:namespace-uri-from-QName</a>
</p>
</li>
</ul>
<p>which will do what they suggest. Of course XPath 2 then buggers things up again by saying:</p>
<div class="quoteblock">
<div class="quoteblock-content">
<p>In XPath Version 2.0, the namespace axis is deprecated and need not be supported by a host language</p>
<div class="attribution">
<span class="emphasis">W3C Recommendation 23 January 2007</span><br />
&#8212; XML Path Language (XPath) 2.0
</div></div></div>
<p>Who needs backwards compatibility anyway?</p>
<p>But since <a href="http://xmlsoft.org/"  title="//xmlsoft.org/"  class="http">libxml2</a> doesn't support XPath2, I don't propose to worry very much about it.</p>
<p>In any case, unwieldy though the above solutions are, they work correctly.</p>
<a href="http://uszla.me.uk/space/blog/2008/04/23"  title="//uszla.me.uk/space/blog/2008/04/23">@</a>]]></description>
        </item>
                
        <item>
            <title><![CDATA[m4Y - the Y combinator in m4]]></title>
            <link>http://scispace.net/tow21/weblog/1218.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/1218.html</guid>
            <pubDate>Tue, 22 Apr 2008 13:25:11 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/275386225/22">http://feeds.feedburner.com/~r/Uszla/blog/~3/275386225/22</a></span></p> <p>Just to get the goods up front:</p>
<div class="listingblock">
<div class="content">
<pre>
<span class="htmlfontify-keyword">define</span>(`m4Y', `<span class="htmlfontify-comment">dnl</span>
<span class="htmlfontify-keyword">pushdef</span>(`m4Y_recur',<span class="htmlfontify-comment">dnl</span>
`<span class="htmlfontify-keyword">pushdef</span>(`m4Y_LL',<span class="htmlfontify-comment">dnl</span>
`<span class="htmlfontify-variable-name">$1</span>''<span class="htmlfontify-keyword">changequote</span>([,])(['<span class="htmlfontify-keyword">changequote</span>([,])`<span class="htmlfontify-keyword">changequote</span>`]`$[]1'(``$[]1'')<span class="htmlfontify-comment">dnl</span>
['<span class="htmlfontify-keyword">changequote</span>([,])'<span class="htmlfontify-keyword">changequote</span>`])<span class="htmlfontify-keyword">changequote</span>`<span class="htmlfontify-comment">dnl</span>
(<span class="htmlfontify-keyword">changequote</span>([,])`$[]1'<span class="htmlfontify-keyword">changequote</span>)<span class="htmlfontify-comment">dnl</span>
`<span class="htmlfontify-keyword">popdef</span>(`m4Y_LL')')'<span class="htmlfontify-comment">dnl</span>
`m4Y_LL')<span class="htmlfontify-comment">dnl</span>
<span class="htmlfontify-keyword">pushdef</span>(`m4Y_LL',`<span class="htmlfontify-comment">dnl</span>
m4Y_recur(`m4Y_recur')'<span class="htmlfontify-keyword">changequote</span>([,])(`$[]1')<span class="htmlfontify-keyword">changequote</span>`<span class="htmlfontify-comment">dnl</span>
<span class="htmlfontify-keyword">popdef</span>(`m4Y_recur')`'<span class="htmlfontify-keyword">popdef</span>(`m4Y_LL')')`'<span class="htmlfontify-comment">dnl</span>
'`m4Y_LL')`'<span class="htmlfontify-comment">dnl</span>
</pre></div></div>
<p>So as seems to be popular, I've been working
my way through <a href="http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=4825"  title="//mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;tid=4825"  class="http">The Little Schemer</a> over the last few weeks.</p>
<p>And, as is equally common, I ground to a halt at the derivation
at the end of Chapter IX, where they spring the
<a href="http://en.wikipedia.org/wiki/Fixed_point_combinator"  title="//en.wikipedia.org/wiki/Fixed_point_combinator"  class="http">Y Combinator</a>
on the unsuspecting audience. <a href="http://weblog.raganwald.com/2007/02/but-y-would-i-want-to-do-thing-like.html"  title="//weblog.raganwald.com/2007/02/but-y-would-i-want-to-do-thing-like.html"  class="http">The best way to understand it is to work through it by
yourself</a>, so I thought I would see if you could do one in
<a href="http://en.wikipedia.org/wiki/M4_%28computer_language%29"  title="//en.wikipedia.org/wiki/M4_%28computer_language%29"  class="http">m4</a>.
And it turns out you can, though it's not very pretty!</p>
<p>Clearly what the world needs is to know about it, so I wrote it
up, and you can follow the derivation in two essays:</p>
<ol>
<li>
<p>
<a href="http://uszla.me.uk/space/essays/m4HOP"  class="wiki"  title="essays/m4HOP was updated 1 hour, 19 minutes ago">Higher-Order Programming in m4</a>, which shows you
how to do proper quoting to get macro <a href="http://en.wikipedia.org/wiki/Currying"  title="//en.wikipedia.org/wiki/Currying"  class="http">Currying</a>
to work.
</p>
</li>
<li>
<p>
<a href="http://uszla.me.uk/space/essays/m4Y"  class="wiki"  title="essays/m4Y was updated 1 hour, 35 minutes ago">The Y Combinator in m4</a>, which uses those quoting
techniques to do the full derivation of <tt>m4Y</tt> above.
</p>
</li>
</ol>
<div class="quoteblock">
<div class="quoteblock-content">
<p>Beware that m4 may be dangerous for the health of compulsive programmers.</p>
<div class="attribution">
&#8212; The GNU m4 manual
</div></div></div>
<div class="technorati_tags"  align="right">Technorati Tags: <a href="http://technorati.com/tag/essays"  rel="tag">essays</a>, <a href="http://technorati.com/tag/m4"  rel="tag">m4</a>, <a href="http://technorati.com/tag/m4hop"  rel="tag">m4hop</a>, <a href="http://technorati.com/tag/m4y"  rel="tag">m4y</a>, <a href="http://technorati.com/tag/programming"  rel="tag">programming</a>, <a href="http://technorati.com/tag/space"  rel="tag">space</a></div><a href="http://uszla.me.uk/space/blog/2008/04/22"  title="//uszla.me.uk/space/blog/2008/04/22">@</a>]]></description>
        </item>
                
        <item>
            <title><![CDATA[asciidoc source code highlighting]]></title>
            <link>http://scispace.net/tow21/weblog/1219.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/1219.html</guid>
            <pubDate>Mon, 21 Apr 2008 21:49:05 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/274963001/21">http://feeds.feedburner.com/~r/Uszla/blog/~3/274963001/21</a></span></p> <p>As I've mentioned before, all the entries in this blog are written
in <a href="http://www.methods.co.nz/asciidoc/"  title="//www.methods.co.nz/asciidoc/"  class="http">asciidoc</a>, which is very nice
for a lightweight markup language, particularly in terms of
embedding code fragments and having them marked up nicely.</p>
<p><a href="http://www.methods.co.nz/asciidoc/"  title="//www.methods.co.nz/asciidoc/"  class="http">Asciidoc</a> uses
<a href="http://www.gnu.org/software/src-highlite/"  title="//www.gnu.org/software/src-highlite/"  class="http">GNU Source-highlight</a> as
its backend for generating pretty code fragments, which does a
reasonable job, and <a href="http://www.lorenzobettini.it/"  title="//www.lorenzobettini.it/"  class="http">its author</a> is very
responsive to feedback - he's fixed a couple of bugs in the Fortran
and XML modules for me.</p>
<p>However, I've been growing dissatisfied with its use, for two reasons.</p>
<ol>
<li>
<p>
it has a <strong>very</strong> heavyweight dependency on <a href="http://www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html"  title="//www.boost.org/doc/libs/1_35_0/libs/regex/doc/html/index.html"  class="http">boost</a>, for its regex library.
Compiling boost takes several hours, and this seems to me like massive overkill for a bit of code highlighting.
</p>
</li>
<li>
<p>
It is purely regex-driven. Furthermore, all language front-ends are defined in terms of a mini-regex language. This means that its markup
capabilities are fundamentally limited to a very simple regex subset.
</p>
</li>
</ol>
<p>In any case, it can't approach the expressiveness of <a href="http://www.gnu.org/software/emacs/"  title="//www.gnu.org/software/emacs/"  class="http">Emacs</a> font-lock
highlighting, which is what I'm used to.</p>
<p>So, I thought it ought to be possible to abuse one of <a href="http://www.emacswiki.org/cgi-bin/wiki/SaveAsHtml"  title="//www.emacswiki.org/cgi-bin/wiki/SaveAsHtml"  class="http">several available emacs-lisp packages</a> to do the job, and indeed it was. The <a href="http://source.uszla.me.uk/misc/htmlfontify"  title="//source.uszla.me.uk/misc/htmlfontify"  class="http">script available here</a> is a wrapper around a modified
version of <a href="http://rtfm.etla.org/emacs/htmlfontify/"  title="//rtfm.etla.org/emacs/htmlfontify/"  class="http">htmlfontify</a>, and works like so:</p>
<div class="listingblock">
<div class="content">
<pre>
htmlfontify -mode $<span class="htmlfontify-variable-name">MODENAME</span> $<span class="htmlfontify-variable-name">FILENAME</span>
</pre></div></div>
<p>or if <tt>$FILENAME</tt> is <tt>-</tt>, it takes input on <tt>stdin</tt>. It will print out
a properly marked-up fragment of HTML on <tt>stdout</tt>, marked up according
to emacs, in <tt>$MODENAME-mode</tt> fontification.</p>
<div class="admonitionblock">
<table><tr>
<td class="icon">
<div class="title">Note</div>
</td>
<td class="content">Importantly, it is entirely standalone, with no dependencies beyond Emacs 21 or better, which is installed everywhere these days.</td>
</tr></table>
</div>
<p>This was easy to write an asciidoc filter for, so code in this blog will henceforth be marked up by emacs.</p>
<a href="http://uszla.me.uk/space/blog/2008/04/21"  title="//uszla.me.uk/space/blog/2008/04/21">@</a>]]></description>
        </item>
                
        <item>
            <title><![CDATA[Finder WebDAV bugs, part II]]></title>
            <link>http://scispace.net/tow21/weblog/1188.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/1188.html</guid>
            <pubDate>Wed, 16 Apr 2008 14:23:32 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/271602055/16">http://feeds.feedburner.com/~r/Uszla/blog/~3/271602055/16</a></span></p> <p>As a follow-up to my last post on this, some good news
and some bad.</p>
<p>Good news: Apple Engineering got back rapidly, with a good understanding
of the authentication issue, and a good suggestion for how they might fix it. Of course
the fix won't emerge until at least 10.5.3.</p>
<p>Bad news: there is another bug lurking in Finder's webdav implementation
that I keep coming across. I haven't characterized it well enough to
report, but I'm noting it down here so Google has some record of it at
least.</p>
<p>The symptom is that after the state of the webdav server changes in
some way (Certainly not every time it changes; I <strong>think</strong> this occurs
 when a directory that was previously readable
has become unreadable because permissions have changed) then when
you try and eject the mounted disk, Finder refuses with one of two
error messages.</p>
<ol>
<li>
<p>
It complains that the disk is in use, even when it's not - and this
can be confirmed by
</p>
<div class="listingblock">
<div class="content">
<pre><tt>sh-3<span style="color:#990000">.</span><span style="color:#993399">2</span><span style="font-style: italic"><span style="color:#9A1900"># lsof /Volumes/webdav_mount</span></span>
lsof<span style="color:#990000">:</span> WARNING<span style="color:#990000">:</span> can<span style="color:#FF0000">'t stat() webdav file system /Volumes/webdav_mount</span>
<span style="color:#FF0000">      Output information may be incomplete.</span>
<span style="color:#FF0000">      assuming "dev=2d000009" from mount table</span>
</tt></pre></div></div>
<p>The fix for this is a simple</p>
<div class="listingblock">
<div class="content">
<pre><tt>umount -f /Volumes/webdav_mount
</tt></pre></div></div>
</li>
<li>
<p>
It gives the unhelpful message: "error code -8072"
</p>
<p>In this case, the fix is first to unmount the disk with <tt>umount</tt> as above, and then
to restart Finder.</p>
<div class="admonitionblock">
<table><tr>
<td class="icon">
<div class="title">Note</div>
</td>
<td class="content"><strong>Make sure to unmount the disk first!!!</strong> If you try and restart Finder (or logout, or
reboot) without doing so, then the OS is liable to hang in an unretrieveable state,
so that only pulling the power cable fixes it - which has happened to me more than
once.</td>
</tr></table>
</div>
</li>
</ol>
<a href="http://uszla.me.uk/space/blog/2008/04/16"  title="//uszla.me.uk/space/blog/2008/04/16">@</a>]]></description>
        </item>
                
        <item>
            <title><![CDATA[shell history]]></title>
            <link>http://scispace.net/tow21/weblog/1135.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/1135.html</guid>
            <pubDate>Thu, 10 Apr 2008 16:26:55 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/267795158/26">http://feeds.feedburner.com/~r/Uszla/blog/~3/267795158/26</a></span></p> <p>Borrowed from <a href="http://plasmasturm.org/log/497/"  title="//plasmasturm.org/log/497/"  class="http">plasmasturm</a>.</p>
<div class="listingblock">
<div class="content">
<pre><tt>sloth<span style="color:#990000">:~</span> tow$ <span style="font-weight: bold"><span style="color:#0000FF">history</span></span><span style="color:#990000">|</span>awk <span style="color:#FF0000">'{a[$2]++} END</span>
<span style="color:#FF0000">  {for(i in a){printf "%5d</span><span style="color:#CC33CC">t</span><span style="color:#FF0000">%s</span><span style="color:#CC33CC">n</span><span style="color:#FF0000">",a[i],i}}'</span><span style="color:#990000">|</span>sort -rn<span style="color:#990000">|</span>head
   <span style="color:#993399">99</span>   ls
   <span style="color:#993399">85</span>   cd
   <span style="color:#993399">66</span>   git
   <span style="color:#993399">53</span>   xsltproc
   <span style="color:#993399">44</span>   vi
   <span style="color:#993399">38</span>   ssh
   <span style="color:#993399">22</span>   python
   <span style="color:#993399">15</span>   grep
    <span style="color:#993399">8</span>   wget
    <span style="color:#993399">8</span>   rm
</tt></pre></div></div>
<a href="http://uszla.me.uk/space/blog/2008/04/10/16/26"  title="//uszla.me.uk/space/blog/2008/04/10/16/26">@</a>]]></description>
        </item>
                
        <item>
            <title><![CDATA[The problem with visual programming languages]]></title>
            <link>http://scispace.net/tow21/weblog/1136.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/1136.html</guid>
            <pubDate>Thu, 10 Apr 2008 16:13:00 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/267772818/10">http://feeds.feedburner.com/~r/Uszla/blog/~3/267772818/10</a></span></p> <p>People seem to like <a href="http://en.wikipedia.org/wiki/Visual_programming_language"  title="//en.wikipedia.org/wiki/Visual_programming_language"  class="http">visual (or graphical) programming languages</a>, (VPLs), but I don't think they should.</p>
<h4>Reports of success</h4>
<p>This was prompted by a talk on Monday, at the <a href="http://royalsociety.org/event.asp?id=6066"  title="//royalsociety.org/event.asp?id=6066"  class="http">Royal Society meeting on environmental e-Science</a>. One of the speakers (I forget who) was demonstrating a workflow system, run through a VPL environment. He talked about having given a workshop, showing scientists how to use the system, and quoted one of them as saying (paraphrased from memory):</p>
<div class="quoteblock">
<div class="quoteblock-content">
<p>I've accomplished in one afternoon what it took me the whole of Summer 2005 to do.</p>
<div class="attribution">
</div></div></div>
<p>and used this as evidence for how wonderful such "friendly" VPL environments are.</p>
<p>I've been to a number of talks where such things are shown off, and it's undoubtedly true that placing these tools into the hands of some scientists does result in such reactions (although I suspect less often that their proponents like to think; and I'm not sure how long-lived such reactions are).</p>
<p>However, contrary to the conclusions usually drawn, I don't think that the praise should be given to the VPL environment - indeed I think such things are actively harmful, and actually, you could get the same reactions via different means.</p>
<h4>Cleaner interfaces</h4>
<p>I suspect that actually, such positive reactions aren't actually caused by the visual nature of the environment, so much as the fact that (compared with the typical workflow system of bodged-together Perl scripts)</p>
<ol>
<li>
<p>
interfaces between components are much simpler,
</p>
</li>
<li>
<p>
they've been  pre-written by someone else
</p>
</li>
<li>
<p>
they've been designed to plug together.
</p>
</li>
</ol>
<p>But because humans are very visual creatures, it is the obvious differences in the visual aspect of the interface that is noticed, and it's to that that positive effects are ascribed. It may well be that there is a small advantage there, but I think it is fairly small, and very easily overstated; see for example "<a href="http://portal.acm.org/citation.cfm?id=203251"  title="//portal.acm.org/citation.cfm?id=203251"  class="http">Why looking isn't always seeing: readership skills and graphical programming</a>", which I don't think enough relevant people have read.</p>
<p>When pulling things together using bodged Perl scripts, then you are reliant on whatever interfaces to the script, and to whatever other programs are being called, that someone else has written.</p>
<p>These interfaces are probably not quite what you want for your purposes, so you'll need to munge them a bit.</p>
<p>They're almost certainly not well-designed - indeed probably little thought has gone into interface design at all, so much as ensuring that the necessary scientific job gets done.</p>
<p>They may not be very functionally oriented; i.e. not well-suited to being called as part of a larger workflow. For example, there may be lots of out-of-band set-up required in the way of global environment variables and so forth.</p>
<p>All of these problems will be ameliorated in building components for any workflow system, certainly those of the type likely to underlie typical VPLs. Interface design will be an integral part of making components that fit together to make workflows of the type envisaged; components will have been co-designed, so that interfaces between them match well their intended use; and they will have to be built such that they don't rely on global state.</p>
<p>The end result is that the components of such a system can be pulled together and made to interact far far easier than a set of programs designed in isolation with little thought for reuse. (Though if badly done, it may result in components which can't be easily re-used outside the original workflow domain.) And of course this is true regardless of what programming interface is used to edit the resulting workflows.</p>
<p>However, I also believe that beyond this, VPLs are actively harmful.</p>
<h4>Text-munging tools</h4>
<p>My objection boils down to the fact that we have an enormous range of text-munging tools, but we have far fewer tools for munging whatever graphical representations are fed to us on the screen.</p>
<p>The issue that most obviously shows itself to me is version control, or revision tracking.</p>
<p>As any halfway-sensible programmer does, I keep all my projects under version control. The reasons are well-rehearsed, but I firmly believe that just as any program longer than 10 lines probably has a bug, any program longer than 10 lines should be kept under version control. The same applies to programs in VPLs. If you've got more than 10 or so components strung together, each of which probably has 5 or 6 tunable parameters, then you want to be able to preserve the state of the system, and record changes between them.</p>
<p>And this is not just in order to be able to roll back to previous versions; but to be able to usefully compare source trees:</p>
<ul>
<li>
<p>
so that you can see the difference between last week's and this week's version
</p>
</li>
<li>
<p>
so that you can see the difference between your version and your colleagues version.
</p>
</li>
<li>
<p>
so that you can usefully merge in adaptations from variant source trees.
</p>
</li>
</ul>
<p>The tools for doing this are well-established for text source, as are standard
practices for improving source code. For example, reformatting - whitespace-only
- updates to source code are often deliberately isolated from semantic changes.
Similarly, groups of related changes are often made together in changesets.</p>
<p>Such practices facilitate the use of text-based tools, and as a result, management of multiple similar source trees is well understood and documented.</p>
<p>By comparison; if code were stored, for example, in a Word document (or to make the same point more extravagantly, as a bitmap image showing the text), this would be impossible. It would be just as easy for the programmer to read - but you'd lose all the power of automatic comparisons; source trees diverge wildly with every minor change. This wouldn't stop you using version control, and backtracking to previous versions, but your diffs would carry no useful information.</p>
<p>And so with visual programming. Clearly there is some serialization underlying whatever interface you're given, and more than likely that serialization is textual. So in principle, you could keep your programs under version control (though your IDE might not make it very easy.</p>
<p>However, the visual interface is at liberty to rewrite the serialization without consulting you; and what might be a minor, or indeed insignificant change to you (moving a box without changing its connectors; adding one additional connector) might well result in the serialization being completely restructured.</p>
<p>As a result, you end up in the same situation as with your Word-encoded source. You can't compare your workflows with a colleagues. You can't trivially diff your current version with that of 6 months ago, and identify key differences. You can't check it against yesterday's version, and work out what you changed which caused the testsuite breakage that you noticed this morning.</p>
<p>Although version control is the most obvious to me as a useful application of text-munging. There's any number of others that are useful; automatic bug-finding tools; refactoring tools, etc, which rely on the well-understood machinery of text analysis.</p>
<p>Even if some of these may exist in a given visual IDE (and I wouldn't be surprised if a few did), they certainly don't all, nor would it be easy to transfer them between IDEs, since there is no agreed upon standard way to do visual programming.</p>
<h4>Source control</h4>
<p>The open source movement relies on source to programs being available. What that means in precise terms begins to break down when we move away from the realm of traditional compiled languages; but in any case, what it means to me is the ability to usefully manipulate and inspect the algorithms behind a program.</p>
<p>You can't do this if you're given only the compiled program - all the usefully-human-readable information is in the source code, and is thrown away when compiling; minor changes in source code can result in very different machine code, and
decompilers are very imperfect instruments.</p>
<p>In fact, that's not quite true; there's three layers of information. There's the human-readable description of what the code should do. That's probably in someone's head. There's the partly human-, partly machine-readable form in the source code, which can be programmatically manipulated (or "compiled" as we like to say) into alternative forms, one of which is machine code for execution.</p>
<p>When writing in VPLs, you radically lower the utility of the intermediate stage.
 The human-accessible code is in people's heads, and is transferred to a graphical form that isn't actually very human-readable - at least, not in the sense that I can write tools to do anything with it.</p>
<p>So until people devote as much time to research into software engineering and code management tasks for non-textual code representations, I don't think VPLs are ever going to be properly successful.</p>
<p>In fact, any code that is written in such environments is information that is being essentially thrown away - expertise that is being just as much wasted as if you threw away the source code to your compiled programs.</p>
<a href="http://uszla.me.uk/space/blog/2008/04/10"  title="//uszla.me.uk/space/blog/2008/04/10">@</a>]]></description>
        </item>
                
        <item>
            <title><![CDATA[NO2ID update]]></title>
            <link>http://scispace.net/tow21/weblog/1103.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/1103.html</guid>
            <pubDate>Fri, 04 Apr 2008 12:06:37 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/263921968/06">http://feeds.feedburner.com/~r/Uszla/blog/~3/263921968/06</a></span></p> <p>Finishing this political digression:</p>
<p>The motion <a href="http://uszla.me.uk/space/blog/2008/03/31"  title="//uszla.me.uk/space/blog/2008/03/31"  class="http">mentioned above</a> passed
under the rules of the local branch. A motion is now going forward to the UCU
national Congress requesting that UCU formally affiliate itself to the NO2ID
campaign.</p>
<p>Thanks to everyone who took the time to email a vote. The next stage is Congress
itself, which is at the end of May.</p>
<a href="http://uszla.me.uk/space/blog/2008/04/04/12/06"  title="//uszla.me.uk/space/blog/2008/04/04/12/06">@</a>]]></description>
        </item>
                
        <item>
            <title><![CDATA[Mac OS X Finder WebDAV client authentication bug]]></title>
            <link>http://scispace.net/tow21/weblog/1104.html</link>
            <guid isPermaLink="true">http://scispace.net/tow21/weblog/1104.html</guid>
            <pubDate>Fri, 04 Apr 2008 12:06:32 GMT</pubDate>
            <description><![CDATA[<p><span class="blog_post_source"><a href="http://feeds.feedburner.com/~r/Uszla/blog/~3/263921969/04">http://feeds.feedburner.com/~r/Uszla/blog/~3/263921969/04</a></span></p> <p>Recording this for the benefit of anyone else who comes across this problem:</p>
<p>The Mac OS X Finder can act as a webdav client, allowing data exposed through
a webdav interface to be mounted as an apparently native filesystem. This is
how, for example, <a href="http://www.apple.com/dotmac/idisk.html"  title="//www.apple.com/dotmac/idisk.html"  class="http">iDisk</a> works.</p>
<p>WebDAV authentication is taken care of the same way as HTTP; for any given
operation, the server can demand authentication, by returning a 401 error
and requiring an authentication token from the client. This  token can be
of various forms, but the simplest is Basic authentication - a username/password
combination.</p>
<p>And indeed, when you ask to mount a webdav filesystem protected by Basic
authentication, Finder will pop up an appropriate dialogue box, and authenticate
you.</p>
<p>However - HTTP authentication works per-operation, not per-server. That is,
different resources on the same server can have different authentication
requirements; indeed different operations on the same resource may have
different requirements (for example, user A may be allowed to read and write,
user B may only be allowed to read).</p>
<p>Finder doesn't completely understand this; or rather, the UI it presents to
deal with this is broken. If you try and mount a directory, only some of
whose sub-directories you have access to, then Finder will try and read all
of the subdirectories. Given that you don't have access to everything it
is trying to read, it will then conclude you don't have access to anything
at all, and will only let you read any resources which are completely
unauthenticated.</p>
<p>A temporary work-around is on the server to move any such subdirectories
elsewhere, and put temporary HTTP redirects in place. This seems to quieten
Finder down, and give the desired result. Unfortunately, it doesn't usefully
scale where there are many such cases; nor where these
permissions are liable to relatively frequent change.</p>
<p>As a result, it's essentially impossible to use Finder as a webdav
client for a server with any complex ACL policy.</p>
<p>I've reported this to Apple as <a href="http://uszla.me.ukrdar://problem/5837223"  title="//problem/5837223 on another Wiki"  class="interwiki">bug 5837223</a>.</p>
<a href="http://uszla.me.uk/space/blog/2008/04/04"  title="//uszla.me.uk/space/blog/2008/04/04">@</a>]]></description>
        </item>
        
    </channel>
</rss>