dtrace progress 20081026 Sunday, 26 October 2008  
Getting there...

Now that the kernel stands up fine to running user space probes, I now need to figure out where are they, or what to do with them.

They dont show up in 'dtrace -l', but I suspect thats more of a misunderstanding on my part about what/how.

The kernel does have the PID provider code linked in (fasttrap.c and related files), but some stuff is still commented out.

Perusing the solaris kernel shows I have a number of things missing, such as static probes on process creation/exit, and other subsystems. This will require some hacking to get working, but I *dont think* is important just now. (Hacking means I am going to have to disassemble these functions and patch in static probes, unless I can find good hooks within the kernel which I can daisy chain onto).

Theres complexity in handling user programs since the dtrace libraries rely on the procfs way of manipulating applications and Linux has ptrace() along with the new fprobe() (?) interface.

I am tempted to create a procfs() driver for Linux to hide this complexity. I have never found the ptrace() syscall interface friendly for multithreaded apps, but then its very easy to make mistakes and take a good while to debug them.

Am just poking around at the moment. This is the final lynchpin of dtrace -- if this can be made to work then the rest is just quality control.

I got my first quantize() graph out of dtrace today (its always worked, just I had never gotten around to trying it out).


Posted at 20:38:41 by Paul Fox | Permalink
  testing .. testing Saturday, 25 October 2008  
Just playing with my site/ftp upload script.... nothing to see here.

Posted at 21:46:43 by Paul Fox | Permalink
  Debugging a driver in Linux Friday, 24 October 2008  
How do you debug a driver in Linux?

Any form of programming results in bugs - unexpected behavior. A driver is no different - but the bugs are harder to fathom because many of your favorite debugging aids cannot work in kernel space.

My solution, is printk(). Its very low tech but it works...

I am a fan of gdb and have written my own x86 kernel debuggers (for x386 and x186 processors), but porting them is a pain. Now we have 64-bit chips, I have to decide if I want to port my ancient code. (The debugger is powerful but nothing grand).

In a driver you have to crawl - one atom at a time; enabling huge wads of code and expecting it to work, well, er, wont. Disabling lots of code and ensuring structure is there and then a sprinkling of print statements works well.

I try to use vmware whilst debugging since bad pointers can corrupt filesystems, and losing hundreds of gigs of filesystem is not nice, waiting to fsck or reformat/reinstall.

Linux has a nice feature: GPFs (bad pointers) are caught and logs written to /var/log/messages. If you are lucky no reboot is needed.

Having a stripped down startup is essential - being able to reboot in about 10s is ideal - no waiting for GUI startups. (I use rlogin or telnet or ssh into the vm session).

Mutex debugging is a pain, but I have found that Linux has a drop dead timer: if the kernel is unresponsive after 10+s a message is printed on the console. Next is a reboot (dont resume a VM snapshot, since you will not have access to what went wrong).

After reboot, /var/log/messages will have your printk() statements to help track down where you got stuck.

Next step is to avoid making bugs in the first place....


Posted at 23:49:10 by Paul Fox | Permalink
  DTrace Progress 20081023 Thursday, 23 October 2008  
Just a minor update on dtrace and USDT progress. As detailed in he last couple of entries, USDT work is progressing; we can now generate and compile ELF binaries which include user space probes.

I have been working on the /dev/dtrace_helper driver code which is used by a user space app, to find where its not working. This has lead me into a corner of the Linux port, which had been stubbed out or #define'd to compile, saving for a rainy day, the work required to resolve.

That day arrived: much of the Linux kernel driver code is just plain ol' C with a few bits of assembler. However, the dtrace helper ioctl() code needs to be able to find and store attributes of real processes. So, my process-shadowing veneer needs to be stronger than it is. This is the key code which negates the need to change the GPL Linux kernel source; by keeping the dtrace driver pure of GPL, it needs a way to find stuff which is not normally of a concern to a driver.

This is not so much a GPL vs CDDL licensing issue, but more of an issue in that the Linux kernel changes, sometimes quite dramatically, from one release to another. If dtrace is not a part of the kernel, then it needs to be a good citizen and provide easy adaption to new kernels, or kernels compiled with differing compile time options.

Applications like VMware have a similar issue - many times a new kernel will come out and VMware wont work on it, since the compiled code doesnt conform to the new headers or functions.

Of the many thousands of lines of code in dtrace, only a very few (10-20) care about this aspect of the kernel, e.g. convert a PID to a process structure, store/retrieve attributes affecting scheduling. But these few lines are the most difficult...or not...

Other tasks have been taking my time this week, so I shall be going back into the water in a few days....


Posted at 22:43:01 by Paul Fox | Permalink
  USDT -- some details Sunday, 19 October 2008  
(This text is located in the doc/usdt.html file in the dtrace/linux distribution; it will be updated as I can confirm implementation details).

User Defined Tracing - How it works

USDT is a mechanism for user land applications to embed their own probes into executables. For example, a Perl or Python interpreter might use it to gain access to stack traces of applications which are already started.

The goal of the DTrace team was near zero overhead when not invoked. This works well - even commercial applications can embed probes and not worry about performance or run time dependencies.

There are a number of steps to make this work.

Heres an example:

# include # include

int main(int argc, char **argv) { while (1) { printf("here on line %d\n", __LINE__); DTRACE_PROBE1(simple, saw__line, 0x1234); printf("here on line %d\n", __LINE__); DTRACE_PROBE1(simple, saw__word, 0x87654321); printf("here on line %d\n", __LINE__); DTRACE_PROBE1(simple, saw__word, 0xdeadbeef); printf("here on line %d\n", __LINE__); sleep(1); } }

The DTRACE_PROBEx macros translate into a function call. To gain near-zero overhead, during linking, the function call is replaced by a series of NOP instructions. Heres an example disassembly of the above

0000000000400e7c <main>:
  400e7c:       55                      push   %rbp
  400e7d:       48 89 e5                mov    %rsp,%rbp
  400e80:       48 83 ec 10             sub    $0x10,%rsp
  400e84:       89 7d fc                mov    %edi,0xfffffffffffffffc(%rbp)
  400e87:       48 89 75 f0             mov    %rsi,0xfffffffffffffff0(%rbp)
  400e8b:       be 07 00 00 00          mov    $0x7,%esi
  400e90:       bf df 11 40 00          mov    $0x4011df,%edi
  400e95:       b8 00 00 00 00          mov    $0x0,%eax
  400e9a:       e8 01 f9 ff ff          callq  4007a0 <printf@plt>
  400e9f:       bf 34 12 00 00          mov    $0x1234,%edi
  400ea4:       90                      nop
  400ea5:       90                      nop
  400ea6:       90                      nop
  400ea7:       90                      nop
  400ea8:       90                      nop
  400ea9:       be 09 00 00 00          mov    $0x9,%esi
  400eae:       bf df 11 40 00          mov    $0x4011df,%edi
  400eb3:       b8 00 00 00 00          mov    $0x0,%eax
  400eb8:       e8 e3 f8 ff ff          callq  4007a0 <printf@plt>
  400ebd:       bf 21 43 65 87          mov    $0x87654321,%edi
  400ec2:       90                      nop
  400ec3:       90                      nop
  400ec4:       90                      nop
  400ec5:       90                      nop
  400ec6:       90                      nop
  400ec7:       be 0b 00 00 00          mov    $0xb,%esi
  ....

When an application is built, dtrace is run on the object files to rewrite the objects, stubbing out the calls for probes, and creating a table in the executable of the places where the stubs are located. (The code for this is located in libdtrace/dt_link.c).

When the application is started up, a piece of code is executed (before main() is called). [Code located in libdtrace/drti.c]. This code looks at the current system, to see if dtrace is loaded into the kernel and communicates with the /dev/dtrace/helper driver to inform it that new probes are available in this process.

Voila! We are done. Or nearly.

At this point, whilst the application is running, 'dtrace -l' should reveal your new probes.

The Kernel

When a user elects to monitor the probe, the patched (NOP-ed) code will be change into a call back into the kernel to notify the function/probe is being invoked.

Cancellation of the probe will undo the patched code and we are done.


Posted at 12:50:18 by Paul Fox | Permalink
  DTrace USDT progress Sunday, 19 October 2008  
I think the generation of a compiled executable now works. I can create an executable which links in the dtrace probes, and have some, hopefully minor fixes to drti.o to make so it can communicate with the driver.

Next step is ensuring the driver helper functions are enabled and to test it out with a simple example in the release.

Its been a slow week trying to debug just a few lines of code in dt_link.c as GNU object files and Sun ones are subtly different in the way undef symbols are defined in object files.

Stay tuned...


Posted at 10:10:43 by Paul Fox | Permalink
  DTrace Progress 20081012 Sunday, 12 October 2008  
I have managed to get back into the swing of things on DTrace. My focus this week has been user defined dtrace probes (USDT). I have created a sample script in the usdt/ dir so I can work through what needs to be done.

Its not finished yet, but heres what I have found:

libdtrace/drti.c needs to be compiled to an object file, and linked in with target apps. (This now compiles).

"dtrace -G" is the magic used to covert a prototype file to the object file needed to link with the application to be probed.

We need the dtrace 'helper' device - something I had commented out early on, since I wasnt sure what it was. This is now enabled in /dev/dtrace_helper. (If I can work out how to create a /dev/dtrace/ dir, then I can more closely mimic Solaris; not a big deal whether this is done).

libdtrace/dt_link.c has needed a few minor mods to fit in with the new device name, but some workarounds but assumptions about /usr/ccs/ which is not the compiler directory under Linux.

A small complication is needed to store state on a per proc/task basis, but the shadow mechanism (par_alloc) is used for this. (I need to intercept process/task death to do the garbage collection; subject for another day).

This will be a major milestone to get this working; people have been asking about USDT for Perl/Ruby, and I want to put a probe into CRiSP - just so I can understand the in's and out's.


Posted at 21:34:18 by DTrace Progress 20081012 | Permalink
  USDT - Dtrace Wednesday, 08 October 2008  
Had two requests today about USDT - User defined Dtrace probes. Wasnt sure what this was about totally, but found a good URL:

http://blogs.sun.com/barts/entry/putting_user_defined_dtrace_probe

which lets me get started. So, in theory, once I can compile with a decent header file, and reasonable portability, I can go digging in the kernel driver to see if we can invoke/intercept it.

Keep fingers crossed !


Posted at 22:40:30 by Paul Fox | Permalink
  fcterm progress Monday, 06 October 2008  
What is fcterm ? Its a color terminal emulator.

I had recently taken a short break from dtrace whilst enhancing fcterm to include the following features:

Infinite scrollback (sort of, now spills to files and can page in from the files, but you dont really want infinite scroll).

Performance: now faster than all the competition.

Auto-restart: if the X server crashes, fcterm will carry all of your pty state over to the newly started X server, without missing a heartbeat. Yes, it waits for a new X server to come along and continues from where you left off; no more lost editing sessions, or shell sessions.

Fixed a typo in dtrace today...time to go back in the water... stay tuned....


Posted at 23:46:30 by Paul Fox | Permalink