Dtrace : oops Wednesday, 26 January 2011  
Just an update - I noticed some minor annoyances in dtrace, which I want to fix, so I can use it in anger. Here they are:

Cant trace syscalls for a 32-bit app on a 64-bit kernel. Duh! Yes, but I never finished that work so I forgot!

Access to "cpu" for printing doesnt work unless the /usr/lib/dtrace files are installed. Sometimes I run from a build dir, and thats annoying so I will see if I can modify dtrace to look where the binary is running from rather than just the lib. (You can override the include dir on the command line, but its too much work to read the help that dtrace provides or that I augmented :-) Double-duh!)

I notice some fbt "entry" points dont have corresponding "return" exits, which is a nuisance. Probably the disassembler is missing some magic in the instruction sequences, so will look to tidy that up.

Still not happy with timestamps in dtrace - they appear to work, yet everytime I try to measure entry-to-exit times, the values look "wrong", e.g. it almost looks like the timestamp (hrtime) code is freezing the value for all matched probes rather than the wall clock time. Not sure if its me being a dtrace-noobie, or some sillyness in the code. (Theres also a bug in reading 64-bit timestamps where the low order 32-bits can wrap and lead to "negative" time occasionally; either that, or a problem when context switches cpus).

Also, could do better at providing context-switch provider probes, as thats something always useful.

And lastly, really need to provide a mutex provider, since the linux kernel inlines the assembler for locks and semaphores, and make it impossible to count or monitor these probes.

Thats enough to keep me busy for a few days, as my crisp fixes slip behind. (Getting line wrapping to work properly without display surprises is time consuming, even trying to remember how the code was supposed to work).

Posted at 23:08:06 by fox | Permalink
  Windows 7: You made a friend Tuesday, 25 January 2011  
I just had to do some home surgery. Son dropped laptop, and hard drive didnt recover. Shame, since the harddrive was supposed to be able to survive these (thats what Compaq/HP says on the sales blurb; me - I never believe any of that :-) )

After a quick, "what next? dash to the stores?", decided to reclaim an unused spare 80GB laptop drive and go with that : zero cost fix.

So, onto the Windows Vista recovery disks we made when the laptop was new. Great, after 3 CDROMs (or were they DVDs?), we reboot and we get a nice "Windows cannot proceed with the installation" type dialog and reboots. Nothing, including safe-mode will work.

Quick google search: you cannot install Vista from the recovery disks. Pardon? Like most PCs out there, they all ship without recovery media, and you have to make your own. I do hate that. Its a "con" for the general public and they dont realise it. I have a copy of Vista on DVD, but after trying that, that was going nowhere fast.

So I tried the Window7 recovery disk - and it worked a treat. It installed (despite the recovery disk being for a Dell), and a short time later, a fully functioning Windows 7 install. Wifi worked; laptop screen resolution was spot on, and absolutely nothing to dislike.

Of course, it doesnt have a license, and sooner or later, Windows 7 will remind us. But I dont really care. I paid for Vista - twice, once for a machine that never needed or used it, and once for the compaq, which lost its hard drive, and for which the recovery disks were a waste of life spent making them. (Maybe I can salvage my sons core data: itunes library, and firefox bookmarks, but he isnt that fussed).

World-of-warcraft is busy downloading updates, and he is using the other PC until he can get a few seconds to walk across the room and carry on life as if nothing bad ever happens.

All the time, I am wondering: there must be more to life than watching progress bars on screens.

Posted at 21:10:34 by fox | Permalink
  Dtrace - 32bit apps on 64bit kernels Monday, 24 January 2011  
I have never really tested this, but there are issues if you use ustack() on a 32bit app on a 64bit kernel. I had previously written that stack walking on Linux with gcc is problematic because of the susceptability of omit-frame-pointers meaning the only correct way to walk a stack is via the DWARF debug records. (There is some dwarf support in the kernel code in dtrace, but its not complete, and theres a danger if its invoked, that bad pointers can generic kernel GPFs).

When the user space dtrace wakes up, having fetched a buffer of info from the kernel, it may include references to user processes, and, if "ustack()" is used, then dtrace will examine the running process to get the loaded libraries and walk the stack.

In theory, dtrace should handle this (mostly via the ELF libraries), but, dtrace assumes a Solaris style /proc filesystem, and not the Linux one. (The Linux port attemps to get this "right" but its not fully proven).

I will look at what/where the "gotchas" are. Would be nice to not worry about the binary type.

Posted at 21:54:58 by fox | Permalink
  dtrace + libelf Sunday, 23 January 2011  
Something funny happened on the way to the forum.

Dtrace (the user command) relies on the libelf library to allow introspection of target applications and for the USDT code for creating probable libraries.

The naked Ubuntu (and many other distros) provide a core set of packages to work, but not the development packages. The dtrace release tells you what you need.

What I have found is that there are so many libelfs out there and they do not all agree. Eg. some include ELF_C_MMAP_READ and some dont. Worse, the enums for the various values are different, leading to potential of an app build with one set of headers causing strange and difficult to debug error codes from elf_begin() and friends.

I need to add some better autodetection to the dtrace code, or, one possibilty is to move totally away from libelf and use the elf library I put together for the CRiSP/elfrewrite code. (Am loathe to do that, but it would sever any dependencies and provide better support on old/very old systems).

I put out a new release today to fix some more build regressions, but I have someone reporting the failing to build the "simple"/USDT example code. If anyone is trying this out, try:

$ make -i all

as a temporary work around, assuming its just the "simple" example which fails and nothing else.

Posted at 10:51:06 by fox | Permalink
  dtrace updates Friday, 21 January 2011  
I posted a new release of dtrace last night, and theres some silly issues to resolve in that release. Its fine on Ubuntu 10.04 and 10.10 and probably lots of earlier releases, but thought it worth highlighting some of the blips I found on an RedHat AS4 build:

  1. If yacc is used, especially older yaccs, a bison construct can lead to a compile time error. Bison allows "string" tokens, so that on an error message it doesnt print something ugly, like "Syntax error near DT_OPEN_BRACE". Unfortunately, old yacc either doesnt reject this or happily accepts a token like "sizeof" which then causes a #define of sizeof - the keyword, and causes strange and difficult to diagnose compiler diagnostics. I will need to put in a yacc/bison detect to handle this.
  2. INET_ADDRSTRLEN are not being handled properly for older kernels, when the #define is in a file we are not including.
  3. The changes for the lockless ioctl() result in backward compile time problems since neither unlock_ioctl() or compat_ioctl() are available and the code doesnt default to the old style ioctl(). (The major difference here is that the old ioctl() callback would have a 'struct node' and 'struct file' argument, whereas both unlocked_ioctl and compat_ioctl only pass in a struct file. Some #ifdef's should handle this.
  4. Another one or two minor issues.

Keep an eye out on the download page for a new release this weekend.

Posted at 21:58:37 by fox | Permalink
  dtrace progress 20110118 Tuesday, 18 January 2011  
Having done various "other" projects (which are still ongoing, including a TV Guide browser/diary), its back to dtrace. Suddenly decided I was annoyed dtrace wasnt running on the latest kernels (thanks to everyone who prompted and reminded me of this).

Its a bit of a pain: 2.6.26 and 2.6.37 changed enough things to make life difficult doing a backwards/forwards series of code changes. Added to which, my current development laptop is missing some of the older kernels (available on my other machine), so even if I get it to compile, I cannot easily guarantee I havent broken a prior kernel build.

Some things, like the "ioctl()" driver function changed, as Linux worked out the big-kernel-lock (BKL) - which is a good thing, but enough to complicate code having to handle old and new kernels.

Others are a bit more curious (e.g. kmalloc() prototype not visible unless <linux/slab.h> is included).

Also, USDT on my Ubuntu 8.04/32bit is generating bad ELF object files, which means a clean compile wont proceed to completion.

Then we have more changes to mutexes and the structures, so I need to be careful we dont break dtrace.

Hope to put out a new release soon that restores "upto dateness".

Posted at 22:05:50 by fox | Permalink
  lm technologies nano usb wifi adaptor and the mac-mini Friday, 14 January 2011  
Oh well - it only cost 12 pounds, but it has failed - getting the same erratic "no route to host" problems with this adaptor as if it wasnt there. Thats: wifi, wired ethernet, airport express, homeplug, USB wifi - all the same symptoms on the appallingly poor mac mini hardware.

So, I now get to decide on what to replace the mac mini with. An Inspiron Zino is my current favorite - but the cpu specs are a bit on the low side.

Or maybe a small/refurb/cheap desktop. Or maybe a nettop device (which have very poor cpu performance). Oh well. <sob>.

Posted at 22:06:07 by fox | Permalink
  elfrewrite and lm technologies nano usb wifi adaptor Thursday, 13 January 2011  
Had a couple of bugs to fix in elfrewrite - it didnt correctly handle ELF binaries which had both .hash/.gnu.hash. Found this out when running on my older Ubuntu system and it caused bad binaries to be generated. I hadnt realised that was even going on with Ubuntu 10.04 and earlier (should have been obvious that -hash-style=both was the default).

Also, removed a silly dependency on libelf, so that it runs on more systems. (The elfrewrite is available as part of the crisp install; i may package it up separately, but it serves my purpose to do that).

The LM Tech nano usb is a teeny weeny USB wifi adaptor. I have had no end of problems with wired and wifi on my Intel Mac Mini - such a poor and broken piece of Apple tech. Nothing worked to make either reliable - with erratic lack of network connectivity. I am close to dumping this horrible piece of kit, but the LM Tech nano USB wifi seems to be working. Lets test it out for a few weeks. 150Mbps 802.11n - strong signal strength (strange - it reports 90+% signal strength whilst the wireless router reports 23%!). Not hugely fast (because of distance from macmini to router and its hiding behind, rather than in front of the mac - more signal blockage). But, for 12 GBP - its a bargain (vs the 90 GBP for an airport express which has yet to serve the purpose i purchased it for).

Meanwhile, need to get back to some more coding updates. (Added a CSV macro to crisp, which now allows column-based searching, e.g. "col select 3==hello" shows all lines where column 3 contains "hello". Need to extend it more to support AND/OR scenarios, but it meets my use-case with room to extend.

Posted at 23:16:56 by fox | Permalink