dtrace progress 20080801 | Friday, 01 August 2008 |
On holiday, with a 32-bit Linux laptop - so I managed to clean up and fix a number of 32-bit issues with dtrace. It now works quite well on Ubuntu 8.
I have just put out a new release - and fixed the accidentally broken 64-bit dtrace userland binary.
I have been working on getting the D functions: stack() and ustack() to work. For 32-bit kernels, ustack seems to work - but I have a lot of work to do to get the Psymtab.c code to lookup the symbol entries and print them symbollically. I hit issues with libelf.so not seeming to work properly for me, so need to investigate that issue.
The kernel stack is probably wrong and a pain due to -fomit-frame-pointer use in the Ubuntu kernel - the goal is to achieve what a kernel stack trace can do, along with symbols. (dtrace reads /proc/kallsyms so we should be able to see something).
The Sun code walks the stack looking at signals and interrupts - and I ripped out that code to get it to work at all. I will need to spend more effort here.
I need to get the 64-bit kernel and user stack dumps working, as that will give a huge amount of functionality to probe the system. If i can get the symtabs to work - that will be a great milestone.
Unfortunately, Unix has gotten itself in a mess with the ELF file format: the simple libelf library has difficulty handling 32 or 64 bit files given that the source process may be 32 or 64 bit itself, and so libraries such as <gelf.h> seem to exist to try and hide the word size issue.
Given that Solaris is a pure ELF system, and much of the symtab lookup on a Linux system is embedded in gdb (along with stack tracing), means that we end up with a bit of an emulation on top of an emulation in dtrace - but so far, no big issues (other than concern over the multitudes of libelf.so variants in the wild).
dtrace is still not ready for a prime time production system - it may work for you, it may not, but hopefully, over time, the port will become more robust, and work in most areas.
I hit some issues with accessing /proc/PID/mem where we can't access certain areas of memory - and I think this may be a bug in Linux kernels, but I need to work out if its me at fault or Linux.