Log on:

Geobrowsers :: Blog

December 10, 2008

A friend I walk with is keen on micronavigation his skill with map and compass is a good match for anyone wielding a GPS. This dependency on the earths magnetic field and our need to "correct for magnetic variation" makes me inquisitive about the nature of the earths magnetic field. As illustrated by the links below it comes as no surprise that a vast amount of intellectual effort has been expended in understanding geomagnetism since the early days of William Gilbert (1600) [2] upto modern day MHD computations modelling the geodynamo [1].

I have investigated a simple model of the earths magnetic field based on the idea that the earths magnetic field may be represented by a single current loop. We use numerical integration with the Biot-Savart law.

  • Radius of the current loop is 4000km
  • Loop carries a current of 1500MA

The following scilab script computes the magnetic field due to a current loop over a 22x22x22 region of total size 11Re where Re is the radius of the earth (Re=6371km).Such a crrent loop is illustrated below.

For each computed field point the Biot Savart law is used to compute the field due to the loop, the numerical integration over the current elements is performed using Simpsons rule. The scilab script to perform this computation is linked below. The script requires a function routine to compute the simpson rule integration, these are contained in the zip file geomagresources.zip. The resources file also contains the net file used to visualise the results with IBM data explorer.

Results calculated using the scilab model and using the bfield2.net data explorer network are shown below.

The scilab script file generates general data scriptions that may be read by IBM data explorer. Using IBM data explorer to run the visual program bfield2.net we may view streamlines representing the magnetic field lines and indivdual magnetic field vectors at each spatial location. The visualisation has a control panel labelled controls that may be used to explore these different aspects of the data set. The results have been compared to the results generated using geomagnetic models provided by the geophysical data centre [7] and are representative within at least an order of magnitude. MHD models of the geodynamo enable an understanding of palaeomagnetism and the evolution of the earths mgnetic field. Results calculated using the MHD geomagnetic dynamo are shown below [1].

In 1838 the German mathematician and magnetician Frederick Gauss developed a method of representing the magnetic field in terms of a converging series of spherical harmonics, whose terms were functions of latitude, longitude and radial distance from the centre of the earth [5]. There exist a wide range of techniques for the computation of magnetic fields, this is important in a wide range of medical, scientific and technological disciplines. An interesting method is one that uses an expansion of spherical harmonics in reciprocal space [9].

In the final blog entry in this series of three we will investigate charged particle motion in the earths magnetic field.

Links

  1. Geodynamo simulations using MHD
  2. De magnete by William Gilbert (1600)
  3. The geodynamo
  4. Magnetic field of a current loop
  5. Gauss spherical harmonic model for representing the geomagnetic field
  6. Geophysical data centre- Geomagnetism
  7. Geophysical data centre- Geomagnetic models and software
  8. Numerical integration techniques for computing magnetic fields
  9. Calculating magnetic field using semi analytical methods - reciprocal space expansion
  10. IBM Data Explorer
  11. Scilab

Posted by Mike Griffiths | 0 comment(s)

October 31, 2008

This is the first of three articles about charged particle motion in electromagentics fields and the earths magnetic field. The articles will include simple demonstrations built using the matlab clone scilab and the visualisation tool IBM Open data explorer, both of these are open source applications. The first article  will describe a simple scilab based application for modelling charged particle motion. The Lorentz force can be used to model a wide range of systems and phenomena including

  • Motion of particles in colliders and their detectors e.g. the CERN Large Hadron Collider
  • Understanding the solar interior and atmosphere
  • Understanding  the charge particles in the ionosphere e.g. the borealis
  • Confinement of plasmas  for experimental fusion reactors
  • Focusing of beams for electron microscopy

The Lorentz force is the force on point charges due to electric and magnetic fields the elctric field gives rise to a linearly increasing force relationship between the charge and the elctric field intensity. The force generated through the magnetic field is such that it is perpendicular to the plane formed by the particle velocity vector and the magnetic field. This explains the action of the vecotor cross product term.

The cross product term can be undersood from the relativistic nature of the elctromagnetic interaction. It is important to remember that the relativistically covariant Maxwell equations and the special theory of relativity enable us to understand the unified nature of the single electromagnetic interaction. When the lorentz transformations are applied to the electric field we have a cross product relationship between the velocity of the particle and the magnetic field it therfore appears that the magnetic interaction is generated by a relativistic effect, an article in the links provides a good description.

The scilab script uses the lorentz force to update the position of a particle in a constant and uniform electromagnetic field. The equations of motion are solved using a simple Euler integration step. When executed, the script  starts a number of dialogs in turn requesting the user to

  • Define the initial velocity
  • Define the b field
  • Define the e field
  • Provide a tile for the plot
  • The plost is drawn
  • The user is asked if they want to save the plot, if yes and OK are clicked a file save dialog opens.

The particle mass and charge are hard coded at the start of the script but can be altered if the user requires. Not surprisingly there is a lot of information about the lorentz force and charged particle motion, including some interesting video content on utube, one such link in the useful links section below

 

Useful Links

Scilab script requires the lorentz force function file

Matlab script requires the lorentz force function file

Scilab home page

IBM Open data explorer

Wikipedia on Lorentz force

Wikibook on the lorentz force

Science world info about the lorentz force

Article about relativistic transformation of electromagnetic fields

Utube demonstration of Lorentz force

Keywords: lorentz, matlab, scilab

Posted by Mike Griffiths | 0 comment(s)

September 19, 2008

http://feeds.feedburner.com/~r/Uszla/blog/~3/397414246/19

I use nose for my Python tests. It's not the only Python testing framework out there, but it seems to fit my needs.

Anyway; so nose has this concept of plugins, which let you extend test discovery , or add extra fixtures, or whatever. Indeed, nose's core functionality is implemented by bundled plugins. It picks up all available plugins automatically by scanning the entrypoints from packages installed by setuptools. This has the irritating effect that

  • nose itself can't work without being installed, since it needs to find its own bundled plugins.

  • any additional plugins you write or use have to be installed as well.

Now I don't like this at the best of times; I get annoyed by software that insists it knows better than me where it should live, and I especially don't like blindly installing new software which might go & stomp all over existing installed software. Once you introduce versioning into the equation, I get even more annoyed; you end up with a python version of DLL Hell, with one application needing version 0.8 of a package, and another needing version 0.9.

But - since most of the rest of the world apparently shares none of my concerns with these issues, I struggle manfully onwards.

This week, there's been a thread on the testing-in-python mailing list "why you should distribute tests with your application / module". I agree 100% - tests should always be distributed with applications; I think almost all of the software I've ever written has had a bundled test-suite. (This was particularly useful for Fortran software, where there is such a wide range of compilers, but it still helps in tracking down system-specific issues even in Python).

Unfortunately, nose's setup.py requirements fly in the face of this; most users won't have nose installed. I could just about forgive nose for this, if I could rely on distributing custom plugins with my package, and being able to pick them up from the local path; but I can't even do that.

It's also worth noting that this is an issue even for software with a very limited distribution. If you're working collaboratively on a project, then all your colleagues need to be able to run tests too; and if you're working in a heterogeneous environment, then adding additional dependencies and installation requirements becomes rapidly onerous, and liable to piss off your co-developers.

Anyway. To round this story off with at least a moderately cheerful ending, I was happy enough with nose's usability not to abandon it, but pissed off with its requirements enough to try and fix them. There's a patch in the nose bug-tracker which at least partly fixes the issue, so that nose will pick up plugins from sys.path.

I suspect in the long term, though, the answer to most of these issues lies in the use of virtualenv. Enough people insist on requiring setuptools-based install, that it will probably be easier simply to isolate every app with its own dependencies in a virtualenv, and just distribute that instead.

In the meantime, for anyone actually reading this; REQUIRING SETUPTOOLS IS FUCKING ANNOYING, MMM'KAY? DON'T DO IT

@

Posted by Toby White | 0 comment(s)

September 11, 2008

http://feeds.feedburner.com/~r/Uszla/blog/~3/389874272/11

So, there's been a bit of a hiatus in my blogging activity, which has coincided with a change in my job.

I'm no longer employed by the university - as of the start of August I've been working as a founder of a startup. We're still in stealth mode, so output here will be work-related, but not too revealing, initially at least. I think it's probably safe to say that there will be much more Python than Fortran from now on!

Anyway, I thought it good practice to start writing English again, after several weeks of nothing but Python. Naturally of course these English words will concern Python …

So today I first used Python descriptors in anger. The particular pattern used I hadn't seen before, so I thought I'd write about it.

The problem I faced was how to nicely deal with an object which is expensive to initialize, which there should only be one of, and which is used by a number of other objects. If I were writing in Java, this would be a classic use-case for a Singleton, with some form of delayed initialization. How to do it in a more Pythonic way, though?

The easiest way to get Singleton-ish behaviour is probably to have the ExpensiveObject defined in its own module, with one instance instantiated as a module-level variable, and thus initialized on module import. This means that any other objects which need access to it can simply have a class attribute pointing at it.

elsewhere.py:
class ExpensiveObject(object):
    ...
expensive_instance = ExpensiveObject()
user.py:
class ObjectUser(object):
    from elsewhere import expensive_instance
    reference = expensive_instance
    ...

This doesn't delay instantiation, though - the instantiation is performed whenever the ObjectUser definition is processed. Since expensive_instance isn't always needed, it's annoying to have to always create it.

In order to do avoid that, clearly we need to remove the expensive_instance from elsewhere and replace the ObjectUser attribute reference with a function call.

We could do this in ObjectUser by overriding its getattr appropriately, to do the normal trick where we check whether expensive_instance is defined on this object, and if not, putting it there:

def __getattr__(self, name):
    if name == 'reference' and name not in self.__dict__:
        from elsewhere import expensive_instance
        object.__class__.name =  expensive_instance
    return object__getattr__(self, name)

which has a few problems.

  • Firstly, this involves doing this check for every attribute access on this object, which is an unnecessary price.

  • Secondly, if we are doing lots of getattr tricks for other attributes as well, it's messy to have them all in the same method.

  • Thirdly, we've set expensive_instance to be a class attribute, which means that every class for which we do this will get its own expensive_instance.

We could solve the second two issues with inheritance - have a small class (ExpensiveFactory?) which does nothing but override getattr for the attribute of interest. This isolates the getattr logic for this attribute, and makes sure that only one copy of expensive_instance is instantiated (as a class variable of ExpensiveFactory)

class ObjectUser(object, ExpensiveFactory):
    ...

But: we still haven't solved the first problem (speed of getattr) and we've introduced another - if any of these child classes want to override getattr, they have to remember to call super() all the way up the inheritance hierarchy (see Python's Super considered harmful)

Anyway - so this (I think) was exactly the reason that descriptors were invented. Instead of ExpensiveFactory, we have ExpensiveDescriptor:

elsewhere.py:
class ExpensiveDescriptor(object):
    _expensive_instance = None
def __get__(self, instance, owner):
        if self.__class__._expensive_instance is None:
             from elsewhere import ExpensiveObject
             self.__class__._expensive_instance = ExpensiveObject()
        return self.__class__._expensive_instance
user.py:
class ObjectUser(object):
    expensive_instance = elsewhere.ExpensiveDescriptor()
    ...

Whenever ObjectUser().expensive_instance is accessed, the descriptor's get method is invoked, and an ExpensiveObject created - but not before then.

This happens for ObjectUser, and any classes which inherit from it, without any further interference in them.

And, get is implemented to have no cost when accessing any other attributes.

And, of course, since _expensive_instance is a class attribute of the Descriptor, there should only ever be one created.

Actually, you could have ExpensiveDescriptor manipulating the module attribute elsewhere.expensive_instance - this would let you get at the expensive_instance from anywhere in the code without having to go through an object - but only after it had been instantiated by one of the accessing objects. Might or might not be useful, depending on your use cases.

Anyway, so that's why descriptors are brilliant! For more reading, try:

  • How-To Guide for Descriptors

  • Python Descriptors part 1 of 2

  • Python Descriptors part 1 of 2

@

Posted by Toby White | 0 comment(s)

August 22, 2008

This week work has continued with understanding the visualisation process for MHD data generated using VAC. It has been quite challenging for the following reasons;

  • Processing a fairly large volume of data 415 slices from a data set of 12 double precision fields over an array of 1976x400, this is a binary data file of approximately 40GB
  • For the model under investigation it was necessary to learn how to use the IDL visualisation tool
  • In order to visualise data using data explorer it was necessary to convert the data to an ascii format and then undertake some post processing and data reduction so as to enable data exploration, this was both time and space consuming
  • FORTRAN format statements within the data conversion routines provided by VAC resulted in the incorrect translation of data this was found by using both matlab and IBM data explorer for visualisation.

Current Objectives 

  1. Enable data processing and production of visualisation output using a selection of different compute resources
  2. Process and visualise MHD data generated and stored on range of different compute resources
  3. Automate the production of metadata
  4. Develop visualisation tools enabling researchers to work collaboratively with a data set visualisation

The objectives identified above require that we use open source applications such as IBM data explorer and matlab clones such as scilab. Code for enabling collaborative visualisation has already been tested using IBM DX. Before testing modules using data explorer output was checked against the output generated from exisiting routines with IDL. A rapid introduction was required with frequent reference to the following

IDL tutorial

IDL tutorial2 

IDL online help 

Visualisation routines for IDL had already been generated for the problem under investigation these were fairly easy to use and generated high quality visualisations. However given a lack of experience it was difficult to generate new visualisations. The two other difficulties with IDL was as follows

  • Licensed package only available on patforms where a valid license is available
  • Tools for collaborative visualisation have not yet been developed for IDL
For the tests undertaken this week and given the tools provide by VAC it is clear that the resulting application or research environment should enable researchers to exploit the benefits of the different analysis and visualisation tools. Thus the application will allow the user to provide scripts for driving selected visualisation tools. 

Revised Objectives

  1. Prepare scripts that will be used with the EASA portal
  2. Write matlab data translation and processing task for VAC using the parfor loops with the matlab parallel computing toolbox.
  3. Use matlab and vac programs to generate jpg files from plots, use parallel matlab to run through all the steps
  4. Use DX to enable distributed processing on separate nodes   
  5. Use dx to enable customisable collaborative vusalisation of data   
  6. Write scilab equivalent of data translation task
  7. Investigate use of IBM DX data import module for importing binary VAC data directly into DX  
In the next phase of this work  we are preparing the scripts for creating and running VAC models one of the tasks will be to compare performance of the model with different parallel interconnects. An important test of the scripts if will be to examine the possibility of applying them to a model provided by an independent researcher. Deveopment of code enabling automated metadata capture is in progress.

 

Keywords: data, explorer, IBM dx, IDL, matlab, MHD, Visualisation

Posted by Mike Griffiths | 0 comment(s)

August 15, 2008

It is always rewarding to visualise data sets, particularly those visualisations that exhibit characteristics of the systems that we are attempting to model. I've had such an experience today.  

Data visualisation is a key part of the scientific research process it is,

  • an important stage in the knowledge creation and discovery  process
  • aids the validation of data sets

The model described in the previous post describes some of the MHD modelling we are attempting to undertake. The simulation generated 400 time series points from a total of 80000 iterations and the stored configuration size was an array of 3x 2-vectors and 4 scalars. For each time step there is a total of 1976x400 points. Visualisation of the raw data can be acheived using tools such as Matlab, IDL, AVS or IBM Data Explorer.

For the case considered visualising the raw data is quite challenging and a recommended approach is to undertake some post processing of the data to take averaging or data samples from the computed mesh.  The first stage in this process was the use of a covertdata routine provided with VAC.

 The advantages of using data explorer include

  • Well supported with good documentation and an active user forum
  • A visual programming environment enabling rapid development of applications
  • A powerful data buffering capability, once a large data set has been read it is possible to modify and reexecute the program without having to reload the data set.
  • Each of the modules feature useful descriptions providing user guidelines

For the data set considered here we generated two applications using data explorer. The first application reduces and selects the data to a manageable volume. Very often there is a need to preserve the original "raw" data, this comes at a storage cost. This application can be run non-interactively and submitted to a job queue, given the size of the raw data it can take a few hours.

The second application is the application used to visualise the data this application is run in interactive mode and is used by the researcher for generating images and movies that might be shared with the community.

The main IBM data explorer modules used for the post processing stage are as follows

  • construct
  • regrid
  • CollectNamed
  • Connect
  • Export

Having imported a data set and selected a member of a data set that data can be mapped onto a new grid for this purpose we use the construct and regrid modules. The construct module is used to specify the form of the new grid the counts enable us to specify the number of items in the new grid, the deltas enablem us to select the correct items from the input data field. The output from the construct module feeds into the grid input of regrid module. The selected and imported data feeds into the input of the regrid module.

Having regridded data we can collect together all the selected data fields using the CollectNamed module, the resulting data object may then be exported to an output file for use in the final visualisation stage. We mention here the use of the connect module which is used to provide connection information for the data set, this information enables data interapolation.   

At each stage of the research process, from preparation of the model to the generation of the raw data and the final post processing stages, there are quite a few data transformations.  For studies where the researcher investigates a range of state points in the modelling phase space  this can lead to an overwhelming collection of data in the problems are increased as the number of compute nodes and storage resources are increased. This problem requires the utilisation of metadata capture techniques, this includes suitable storage, search and querying mechanisms.

In later posts we will review metadata, capture techniques and search and querying mechanisms. We will also look at the data provenance capture and querying mechanisms. We add here that, given the nature of the research process and the exploration of generated data sets provenance capture may provide an imporatnt research tool, we can envisage paralles with social book marking and "MyExperiment"  style environments.

 

 

Keywords: data explorer, data provenance, metadata, mhd, visualisation

Posted by Mike Griffiths | 1 comment(s)

August 08, 2008

We've been undertaking further work to understand how the research undertaken by  the SPARC group at The University of Sheffield may benefit from techniques promoted by e-Science.Our approach is to understand the user activities and capture requirements for Solar Physics Research. I've had a detailed discussion with one of the members of the research group and I have run through the process for model generation, pre processing, running the model, post processing and data visualisation.

 One approach to constructing theoretical models of the sun requires the solution of the equations of Magnetohydrodynamics[1],[2]. The solution of these equations to provide models that might be representative of solar interior and corona is particularly challenging.

The Versatile Advection Code is used for much of the work at Sheffield, a parallel version of the code is used, VAC is a collection of FORTRAN source code that features a number of modules that may be devloped and included by the user, for example, one of the models developed by the SPARC group investigates MHD for gravitationally stratified media.

The resulting computational tasks for 2D models require at least 10 processors they take approximately 100hrs of wall clock time (on an AMD 64 bit opteron processor) and generate around 50GB of data. It can be seen that the nature of these computations require the introduction of more sophisticated management techniques. The group will soon run 3D models making the provision of these management techniques a more pressing need.

Given the requirement for storage and processing cores it seems that this project may benefit from access to additional compute clusters and the processing power and storage made available. Utilisation of resources in this way increases the complexity of the research process. This necessitates the use of management tools to simplify  the research process. We will provide prototype tools from the web accessible portal described in an earlier posting. To simplify the development task the application will be tested on the Sheffield node of the White Rose grid.

After writing the user module, describing the physics for the problem of interest. The researcher undertakes the following steps, we assume here using 10 processors.

  1. Change parameters and source expressions hard coded in the user FORTRAN module.
  2. Before compiling the models set the parameters for building the model e.g. switch MPI on (or off) and set the mesh size. The setvac program enables these settings to be made.
    1.  ./setvac -on=mpi   switching mpi on
    2. ./setvac -g=1976,44
    3. In the above case the model is a 1976x400 model it will run on 10 processors each processor will have 1976x(40+2+2) cells
      this includes 2 layers of ghost cells at each boundary
  3. Edit the initialisation parameters (vacini.par) for example
    1. the domain for this will be 1976,400
    2. set the path to the .ini file
    3. set the atmosphere file  this is density profile through solar interior
  4. Edit vac.par
      set filelist the filenamein and filename  need to be set correctly
  5. Make the initilisation and the simulation routine using the model must be compiled using the appropriate parallel compiler
    1. make -vacini
    2. make -vac
  6. Generate the vacin file as follows
    1. ./vacini < vacini.par
  7. Distribuute the ini file across the processors using the distribution program, eg.
    ./distribution -D -s =0 /data/username/VAC_NN/2_6Mnzx1976400.ini /data/username/VAC_NN/2_6mnzx1976400_np0110.ini
     Notation is as follows

    Model depth is 2_6Mm
    nzx  1976x400  (z direction is 1976)
    np is number of processors in z and x directions
    01 processors in z direction
    10 processors in x direction
  8. Submit job to batch queue using the correct parallel queue
  9. Gather all the generated data into same data files
    vac4.52/data/distribution 2_6Mnzx1023400_cont2_np0110.out test.out

    the processor rank in filename(if any) is ignored
  10. if successful remove the old files that are unmerged
  11. convertdata to dx format
  12. Run the data visualiser
In future posts we will describe how these requirements and user activities are converted into proposed user application modules. Preliminary suggestions are as follows.
  • Web application generates parameter and initilisation files and compiles executables the generated module is returned for execution/submission by the user.
  • Submit a job (as generated above) to the grid, ensure the generated data is moved to the correct location.
  • Post process the data and prepare for use by the visualisation tools
  • Tool to generate images and movies for a specified model
  • Metadata management and generation

Keywords: MHD, model, SPARC, VAC

Posted by Mike Griffiths | 0 comment(s)

August 07, 2008

Video conferencing technology is a powerful way for fostering and maintaining collaborative projects. This is a brief description of experiences in establishing a parallel computing special interest group with Clemson University in South Carolina.

The motivation for setting up such a group has been encouraged by the following events

  • Clemson has developed the Palmetto cluster, details are given in its top500 listing
  • Matlab is used for parallel computing at The University of Sheffield and Clemson University

Our first experience with meeting up used access grid at first both sites were using different access grid servers. We were unable to connect. Our work around was to use the Evo collaboration network. This proved to be successful video quality was excellent audio quality was average (acceptable) quality was determined by the fact that the session was run from a personal laptop. Fortunately, this was just a test session.

We've just checked that we can connect to the video conferencing facility. Evo users a client called Koala. To connect to the video conference suite we select call from the menu on the Koala client. The call menu has an h323 item and when selected this opens the call h.323 dialog. In this dialog complete the entry in the h323.IP edit box clicking the telephone in this dialog connects to the video conference.

So far so good....  we shall see if this can improve our audio quality for the next meeting.

Other Links

Clemson University Computing and Information Technology

News Item about Clemson develops Palmetto Cluster

Report from Matlab High Productivity Computing Advisory Board Meeting

Presentation to Matlab High Productivity Computing Advisory Board Meeting

Keywords: evo, matlab, parallel

Posted by Mike Griffiths | 3 comment(s)

August 04, 2008

The SPARC group at the The University of Sheffield use Magnetiohydrodynamics (MHD) to undertake research into Solar magnetoconvection. This work is undertaken in collaboration with The MHD Theory group at St. Andrews.

The work is computationaly intensive and the group are one of the main users of the Sheffield node of the White Rose Grid. We are currently working to provide a simple web accessible front end that will;

  1. Enable submission of models to compute nodes at Sheffield and St. Andrews
  2. Manage transfer of data between compute resources
  3. Automate generation of metadata for data sets generated during runs and used for further modelling work

The reserachers use a FORTRAN based code called the Versatile Advection code. We have already generated a test portal application, however further work is being undertaken with the SPARC groups research code.

The effort to prepare this for publication on the application portal is quite a challenge and the we find that most of our effort is used to prepare the source code for compilation on the compute nodes. This is faced by many researchers using new research codes it is one of the objectives of the current work to remove some of this effort for researchers.

We're currently building the applications using different versions of MPI with different interconnect architectures. In this blog I'll detail some of the progress we make in developing the application portal for this group. 

As a taster here is an example mp4 movie of magnetic field evolution from in magentoconvection studies in magneto hydrodynamics . The movie illustrates the suppression of the magnetic field by convective currents. The result was generated using the Hurlburt Toomre model of solar magneto convection. For further information see the links below.

Slides on magneto convection.

Example VAC models for magneto convection studies.

 

 

Keywords: MHD, Solar, SPARC, VAC

Posted by Mike Griffiths | 0 comment(s)

April 23, 2008

http://feeds.feedburner.com/~r/Uszla/blog/~3/276408652/23

As any fule no, QNames are how XML does namespaces. Where a namespace has been declared:

and the "c" prefix on the element name is associated, via the xmlns attribute, with the namespace URI. This is trivially manipulable with any namespace-aware tool.

So far so good. However, when QNames are used in content (typically, as an attribute value) then the situation is more complex. The two nodes below are equivalent under QName-in-content processing.

<c:cml xmlns:c="http://www.xml-cml.org/schema"
  att="c:comp"/>
<d:cml xmlns:d="http://www.xml-cml.org/schema"
  att="d:comp"/>

This usage is blessed by the W3C, http://www.w3.org/2001/tag/doc/qnameids.html, and XSLT depends on it working.

But it's significantly harder to work with using most XML toolkits.

node()[@att='string']

The above XPath returns all nodes which have att="string". However, it turns out that matching on a namespace-resolved QName needs the following:

node()[substring-after(@att, ':')='comp'
       and @att[../namespace::*
                 [name()=substring-before(../@att,':')]
                ='http://www.xml-cml.org/schema']
      ]

if you only allow for prefixed QNames (eg c:comp above). If you want to be able to match unprefixed QNames as well, that is, QNames in the default namespace:

<cml xmlns="http://www.xml-cml.org/schema"
  att="comp"/>

then you need to extend the expression to the following:

node()[(substring-after(@att, ':')='comp'
        and @att[../namespace::*
                  [name()=substring-before(../@att,':')]
                 ='http://www.xml-cml.org/schema'])
    or (@att='comp' and
         and namespace::*[name()='']
              ='http://www.xml-cml.org/schema')
       ]

which is hardly transparent!

Much as I think XPath 2 is a bad idea in general, this is one area where it is a significant step forward; it offers node functions:

which will do what they suggest. Of course XPath 2 then buggers things up again by saying:

In XPath Version 2.0, the namespace axis is deprecated and need not be supported by a host language

W3C Recommendation 23 January 2007
— XML Path Language (XPath) 2.0

Who needs backwards compatibility anyway?

But since libxml2 doesn't support XPath2, I don't propose to worry very much about it.

In any case, unwieldy though the above solutions are, they work correctly.

@

Posted by Toby White | 0 comment(s)

<< Back