Dtrace - thats all forks ! (just kidding) Saturday, 31 January 2009  
A sillyism in the code, and now USDT is working beautifully - the target app no longer core dumps after the first trap.

What does this mean?

It means the dtrace experiment is over. It works on Linux.

Yes, there are cleanups to do and some missing code to handle forks and garbage collection of shadow procs and stuff and stuff.

But we have now exercised pretty much the code, and I need to do some more USDT exercises (like strings and stack dumps; I need to re-research the ruby stuff on Adams blog).

Of course, the code is very kernel specific - the kernel changes often from one release to the next, and it maybe possible to get smarter about handling forwards/backwards compatibility.

Some point in the future, I want to write D scripts for Linux and not be debugging dtrace. Time will tell what we can do and theres lots of existing D scripts to learn from.

I'll continue writing up progress on dtrace, and hopefully more people can try it out and report back on kernel build issues.


Posted at 00:23:24 by Paul Fox | Permalink
  dtrace progress - USDT works almost Thursday, 29 January 2009  
After some head scratching, heres an example of USDT on Linux:
$ dtrace -n :::saw-line
dtrace: description ':::saw-line' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0   1899                    main:saw-line

This example is taken from the simple-c example app bundled in the distribution. At this point, the target app died with a SIGTRAP since I havent finished testing.

What made this work?

The code I have ported (my fault) confuses Sun's 'regs' array with Linux's 'pt_regs' array. I've done some mappings so we get the correct interrupt level context, but had to comment out a few references to unsupported registers on Linux (eg %gs, %fs, etc). I assume the references are needed in probe context for D apps that want them.

Shame that the target user space binary died, but now hopefully I can make even more progress, e.g. for the other trap types (which I dont fully understand yet, but then, I am being thick).

I'll release new code with these changes in.


Posted at 23:29:27 by Paul Fox | Permalink
  Linux is depressing...Very much so Monday, 26 January 2009  
I find Linux depressing. It really is depressing. The whole lot is out of control, all in the interests of kindness.

I upgraded one laptop last week to Ubuntu 8.10. Worked like a dream.

Until I tried to suspend the laptop, and sometimes it wouldnt recover and most of the time the stupid, very very stupid NetworkManager ... didnt. After some research, I found a kit to replace this.

But, still the same issues ... not working reliably after suspending. And definitely not reconnecting to the WPA wifi.

I cant believe how horrible and complicated this has become. In the old days there were just config files and /etc/rc.d files to worry about. Theres unnamed daemons controlling everything with layers of too much complexity.

I updated the kernel to 2.6.28.2. I lost my sound. I lose my sound every time I build a stock kernel, and I still dont know what/where.

Then my master server - Fedora Core 8 - tried to upgrade that to 9 or 10, and now, it too, wont restore (non-WIFI) ethernet on start, without me going to the other room to poke it with a sharp and hot stick ('ifconfig eth0 down; ifconfig eth0 up; route add default gw ....')

Honestly, I am ready to retire and give up on this.

At this rate, Windows 7 will be my prime operating system, or I am going to live in a cave where people dont upgrade things that werent broken.

Of course, its all my fault. I thought it was time to get 'real'. Silly me.


Posted at 20:54:40 by Paul Fox | Permalink
  CRiSP - Unified Linux binary 9.3.6a Thursday, 22 January 2009  
Job done: From now on, only two Linux releases of CRiSP will be produced - linux-x86_32 and linux-x86_64.

This is presently being produced on a Fedora Core (FC8) box running glibc2.7, but runs on AS2.1, AS3, AS4, Ubuntu 7/8 (and probably 9+ as well).

Having tracked down what was causing such dependencies, and the arms-race to keep up with distros, I found that really only a couple of things caused this. Strangely, the ancient C functions ctype.h (isalpha(), isdigit(), etc) were the biggest nuisance, since later glibcs use GCC smarts and libc versioning to disallow a new binary running on an earlier release.

I wrote a tool to patch the ELF section headers to remove the enforced GLIBC dependencies, and it works.

(I wasted a lot of time, because my FC8 box got updated and the X11 libs/headers moved around and I thought my unification was triggering the bizarre errors I was getting).

(Along the way, my dynamic IP address also changed, which I only found out when I tried to download from my own site).

And to make matters worse, one of my laptops suffered a "I am going to update apt-get and break your system badly". This was an old Knoppix release - which was nice since it was an old GLIBC release, but an upgrade pushed me into glibc2.7 territory, invalidating the first rule of software development: dont update things because it seems like a good idea. It isnt :-)

So, now that laptop gets the Ubuntu treatment. I'm feeling a lot more happy with apt-get, and now two systems (plus 1 vmware) is all on Ubuntu, with the master being RedHat (which is now a blacksheep because it has no easy upgrade without pain of a big download or Ubuntu). Still, diversity is good.

What else would I do if I didnt have to fight silly issues ?! Dtrace maybe ....

Yes, but now I need to fix fcterm - the terminal emulator which, added to the chores above by core dumping whilst running gdb. (UTF-8).


Posted at 22:33:17 by Paul Fox | Permalink
  Inches away .. dtrace progress 2009018 Sunday, 18 January 2009  
I can now intercept application INT3 breakpoint traps, and pass them into dtrace. Its not quite right yet (and, if you load dtrace into your kernel, it will presently break gdb and single step / breakpoints), but I hope to fix that.

So now, we can have a USDT app tell the kernel it has probes, have /usr/bin/dtrace monitor the probes, have the app hit INT3 to jump into the kernel, and the next bit is to have the dtrace engine talk back to the application.

I peeked at FreeBSD again, only to find all this is commented out over there, so we are ahead in this area compared to FreeBSD. Next is to work out some details in dtrace_user_probe(), and just use it for a bit.


Posted at 21:54:03 by Paul Fox | Permalink
  CRiSP and a universal Linux binary Sunday, 18 January 2009  
For years - since the day CRiSP for Linux was built, I have been plagued with Linux ABI binary portability, meaning that CRiSP has had to be built for every combination of glibc (and now, 32+64 bit) platforms.

Why? Because, if you run a later crisp on an earlier system, the binaries will refuse to run, complaining about glibc mismatches.

This drives me nuts. For years I had been meaning to see what the cause was, and I was surprised. Very surprised how the glibc maintainers could do this.

No other platform: Windows, Mac, or any other Unix has this problem. (Well, Mac can be nearly as bad, but definitely not Windows, or any SVR4/BSD derivative - to my knowledge).

Take the standard C library for <ctype.h>. Its existed since practically day 1 of the C language, providing useful functions like isalpha(), isdigit(), etc. Did you realise that this family can cause binary API problems? Well, it does. Somewhere in glibc 2.[567] they made these functions Unicode and obscure-aware (eg, isalpha(EOF) should not cause an array bounds indexing violation). So, the simple #defines or array lookups of old are replaced with calls to a function in libc.so, which may not exist in older libc.so's. Yuk. This isnt an option that is turned on because you want, and its almost undocumented.

So, one of the trivialest functions in libc.so is being replaced by a private implementation.

pthreads is another issue - I am aware that at some point in the past, the size of structures for pthreads changed, and this caused portability issues for apps. Instead of hiding this in the implementation, they use versioning of symbols.

In GCC 4.x, it supports functions for detecting stack frame smashing, but this is turned on by default. If you compile with -D_FORTIFY_SOURCE=0, then these API compatibility issues are removed. (I am not advising others to do that; I test my apps with valgrind and my own builtin memory corruption detector).

I had to do lots of stuff to find this out, e.g.

objdump -T binary | grep GLIBC

Will tell some of the story.

objdump -p binary | grep VER
will tell the rest of the story. The definitions for VERNEED, VERNEEDNUM and VERSYM stops a later binary running on an old system. When I have finished writing a tool to strip this out of a binary, then I can run a glibc2.7 application on an AS2.1 (glibc 2.3 or glibc2.2).

I will then be able to build just two Linux releases: 32 and 64 bit, and use my latest development system to create a binary compatible release.

I have to say that doing this means the onus is on me to work around why such symbol versioning occurs, but its a nuisance.

I have lots of vmware and systems running a variety of Linux releases, but its an annoyance to have customers tell me that Ubuntu 8.10 isnt supported, even tho I use it myself (for dtrace work).


Posted at 21:43:44 by Paul Fox | Permalink
  dtrace for OpenBSD ? Saturday, 17 January 2009  
Just reading on the openbsd mailing list about ZFS for OpenBSD, and someone saying wouldnt dtrace be better. Was wondering about that comment. Yes, porting dtrace to OpenBSD should be easier than for Linux given that OpenBSD is a derivative or ancestor of FreeBSD. I dont know the relative maturity of one vs the other, although I think FreeBSD has a bigger user base, but, in theory, it follows, it is doable.

Would I do it? Maybe, if someone asked. But before then ... Linux needs to get a little bit further forward.

With regards Linux dtrace, I have a piece of glue to place -- on the interrupt vector which handles a user-space breakpoint trap. I can see the code in Solaris, and now need to work out the best place to put this in Linux, and that should handle the full cycle from user-to-kernel-to-user-to-kernel which is needed for USDT. Let me see how I can get on with this, and then some cleanups can start to happen....


Posted at 22:52:51 by Paul Fox | Permalink
  dtrace on windows Friday, 16 January 2009  
I wander if that grabs your attention :-)

I was wandering if it was doable/viable/workable. To be honest, I dont see why not.

I am not proposing to attempt this (not unless I am really bored and Linux dtrace is 'finished').

But technically, most of the dtrace code is just plain-ol-C. Theres bits to hook into the kernel and userspace, but the dtrace code is modular and segregated that actually the Unix specific pieces are relatively small.

For anyone who has tackled Windows device drivers (and they are not that difficult, although operate in a more complex way than Unix), it should be doable.

Theres more layers in Windows (core kernel, nt.dll, win32, user, gdi, ...), but the fundamentals of reading/writing memory is what is crucial.

Of course, Windows doesnt support ELF, and I would hate to run a 'dtrace -l' inside a CMD.EXE window.


Posted at 23:39:02 by Paul Fox | Permalink
  CRiSP and Large Files Friday, 16 January 2009  
Just wanted to take a detour away from dtrace for a moment. I rarely comment or write on CRiSP, even although it is a mature baby.

Someone asked me about editing/viewing large files in CRiSP. I thought I would crib some of the mail I sent.

Heres a question: What is the largest file you could edit on a 16-bit machine? 32-bit? 64-bit? (CRiSP has survived these CPU architectural changes over the years).

The answer is the same for all: how big is your hard drive. Naive coding would lead to just loading the file into memory and hence you would be limited to the size of RAM and addressability of the CPU. This has never been a good thing: if you spend all your time in the same editor for small files, you almost certainly want to use that tool for large files too, e.g. >4GB files.

The largest file I tried to test in CRiSP is around 16 GB. I didnt go much further (this was a 32-bit cpu), because it got boring waiting for the file to page in via the O/S, but it works.

Of course, you can find a weak spot in this: just try taking a huge file and do a search and replace of every character in the file. CRiSP will attempt to save the undo information and you will wait a long time for the I/O. At least CRiSP tries - and tries to be efficient.

So, the answer to the question is: How long do you want to wait?

CRiSP can support almost infinitely large files (upto the size of your hard disk or filesystem), but what you do next will really depend.

Its worth reiterating this point. Whether your tool of choice can survive being pushed to extremes, and whether its performance degrades linearly, exponentially, or catastrophically. That is an interesting topic for technically interested people. Maybe not for everyone.


Posted at 20:29:21 by Paul Fox | Permalink
  dtrace progress 20090115 Thursday, 15 January 2009  
Some degree of success! We can now run a USDT enabled process, run dtrace on the probes of that process, and I can see the INT 0x3 (0xcc) instruction being written to the probe points of that proc. The kernel writes a breakpoint instruction with the goal of /usr/bin/dtrace monitoring the child for SIGTRAP signals. (And, presumably, to fire the callback for the process .. not sure what happens next).

I know the kernel isnt logging the triggered probe (or maybe my example simple.c is too simple!)

Alas, the proc falls over when it hits the SIGTRAP, since the ptrace parent isnt doing the right thing.

To see this happen, I modified simple.c to checksum its own code (very simple hack) and could see the checksum change, immediately followed by the SIGTRAP abort.

Next step is to get /usr/bin/dtrace to trace the child properly. Lets see what happens.

As always, latest code on my dtrace download site.


Posted at 23:29:19 by Paul Fox | Permalink
  STUPID STUPID ME ! dtrace progress Tuesday, 13 January 2009  
Found it! After days/weeks of perusing source code, trying to understand the PID provider and fasttrap code, and pulling (what little) hair I have out, I found it.

When a user space app registers itself as a provider, it would not show up in 'dtrace -l'. Why?

Because I am stupid and missed the blindingly obvious.

Fasttrap.c has a limit on how many user space providers can be created - to avoid crashing or DOSing the kernel. But I forget (or rather, didnt realise) the variable was not set. (In Sun land, they read the attributes from kernel config variables, but I had commented that out).

Stupid me! Now I can see the provider. Heres an example:

/home/fox/src/dtrace/drivers/dtrace@vmubuntu: dtrace -l | tail
 1859        fbt              fuse                         fuse_iget entry
 1860        fbt              fuse                         fuse_iget return
 1861        fbt              fuse                  fuse_set_nowrite entry
 1862        fbt              fuse                  fuse_set_nowrite return
 1863        fbt              fuse                   fuse_abort_conn entry
 1864        fbt              fuse                   fuse_abort_conn return
 1865        fbt              fuse             fuse_flush_writepages entry
 1866        fbt              fuse             fuse_flush_writepages return
 1867 simple5555          simple-c                              main saw-line
 1868 simple5555          simple-c                              main saw-word

Now, hopefully I can make some real progress.


Posted at 23:14:39 by Paul Fox | Permalink
  dtrace progress Sunday, 11 January 2009  
Progress is slow at the moment. In the continuing battle to get USDT to work, I am reaching some roadblocks.

The 'easy' part was getting core dtrace into the kernel - wherever something was wrong, I would crash the kernel, so, I could track down where it broke and work backwards.

With USDT its slightly different. After getting a userland binary to have probes in it, it runs and tells the kernel it is probable. Kernel trace messages show the probe exists, yet 'dtrace -l' doesnt list the probe. (I am using MacOS to compare what *should* happen with what *does* happen on Linux). I am obviously missing something here.

Its a bit of chicken-and-egg trying to work out the flaw, e.g. it could be the userspace implementation not being complete, or it could be a sillyness in the kernel, or even something I have forgotten to do.

Interestingly, when running a USDT app, it declares the probes, and you can see them (eg on the Mac) with 'dtrace -l'.

You can run in two ways: run the app on its own, and attach to the probe with dtrace, or, do both together, launch dtrace to fire the app and monitor the probes.

Interestingly, on the Mac, gcc seems to have some enhancements to allow the inline probe declarations to work. Statically disassembling the binary and disassembling whilst the app is running shows the kernel correctly putting in "INT 3" instructions into the userspace code area.

Its possible on Linux that dtrace is too divorced from the real kernel, or I just had something stubbed out.

I also hit a problem with "dtrace -c ..." in Linux. I dont know if this is a pthreads issue or a Linux issue, but Linux doesnt allow ptrace(PTRACE_CONT) to be executed from a child thread, when the child target process is forked() from the main thread. In Linux, the target proc and the controlling thread are like siblings instead of parent-child. (I solved this temporarily by moving fork/exec creation to the monitoring thread, but its still a bit flaky).

I am spending a lot of time statically reviewing the dtrace code to work out where the problem is. I can find lots of code I want to be executed to handle USDT, but, am missing a vital cog to make it hang together...


Posted at 16:44:27 by Paul Fox | Permalink
  dtrace for freebsd 7.1 Saturday, 10 January 2009  
FreeBSD 7.1 came out this week to a mild amount of fanfare. Thats a good thing. Its great that people spend a lot of effort on distro's for themselves and their own communities.

I grabbed the distro and the source to see what had changed in dtrace. It looks like "not a lot" from the source snapshots I had earlier in 2008.

Alas, disappointingly, USDT dtrace doesnt work. (I couldnt get dtrace to work at all in FreeBSD from the stock download for x86-64; I guess I need to rebuild the kernel).

Searching the web reveals user land tracing is not complete. This is a shame, because I have been using the FreeBSD model of implementation for Linux. I have had a hard time, because it looks like there are subtle things wrong/broken in FreeBSD/USDT tracing (e.g. the way a process is launched and ptrace() is used to attach to the process is missing some key lines of code).

I have spent the last week poring over the subtleties of what FreeBSD do, along with Sun and Apple. I should be able to get this bit to work, however I am not sure about other aspects of the tracing, such as aborting or skipping over syscalls. (The ptrace() syscall is simply not as powerful as Sun's /procfs interface).

I know most of the ELF code works for symtab lookups, so I should be able to make some new progress. I'll update the blog and put out a new source tarball when I feel happy with what I have.


Posted at 09:43:47 by Paul Fox | Permalink
  dtrace progress 20090105 Monday, 05 January 2009  
As always, things have been slow, but they sped up over the last few days. (I've been ill with flu over Xmas, which didnt help; every thought of dtrace made my head explode!)

First, the /proc/$$/ctl driver sort-of-nearly-almost-but-doesnt work. It hooks into the kernel and can respond to calls, but theres a problem/difficulty: I havent figured out how to simply intercept syscall entry/exit on a per process/thread basis, without lots of kernel hacking or a brute force patch on entry/exit to the syscall handlers. This would be against the ideal of dtrace having a zero-impact approach to monitoring. Maybe its doable long term (I do so love the solaris approach to procfs; ptrace doesnt cut the mustard).

In any case, this may not matter; I have spent more time understanding the libdtrace library about how it handles:

dtrace -c prog
and how it grabs a running process. I took a new look at FreeBSD and noticed it used the only other valid alternative: ptrace, so I am grabbing ideas and code from FreeBSD to see if I can make progress.

Side note: using the Apple code is rather pointless, since it relies on the MACH underlying OS calls to do process manipulation and theres nothing similar in Linux - i.e. an uphill struggle.

The FreeBSD code is nice and simple, except it does rely on the EVENT subsystem in FreeBSD for inter-thread communication (not sure I fully follow it). I have stubbed it out for now - just so I can get something/anything working.

Hopefully when this is done, I can handle the reverse journey for USDT.

Lets hope 2009 is a better dtrace year. It will be a long slog to get dtrace reliable, and the more that people try it or comment on it, the better, but I feel comfortable that key parts of dtrace just work, but I havent addressed quality. (I am slowly trying to clean up compiler warnings, for instance, which many times obscure real silliness on my behalf).


Posted at 18:59:48 by Paul Fox | Permalink