dtrace progress - the difficult climb Sunday, 30 November 2008  
Things have been quiet on the release front recently, so its probably worth describing why.

In order to get USDT to work, we need to get a decent mechanism to handle proc debugging (start/stop). I created /dev/dtrace_ctl, but wasnt happy that this is a global driver rather than a PID specific driver.

So I have been studying the /proc filesystem code to see how to get a new entry into /proc/pid/ctl.

If I was modifying kernel source, this would be easy. But I am not.

So, we have to take the point of view of being a root kit: we need to intercept key parts of the proc driver so we can install a new entry in the static data structure (tgid_base_stuff, in file fs/proc/base.c).

This is a counted array, not a null terminated one, and there are two references in the kernel to this, in base.c, so we need to intercept those and put in our own table.

The goal here is to have /proc/pid/ctl, so the libproc.a code can issue SVR4 style PIOC procfs ioctls. Theres no guarantee that the difficulty I am making for myself will work, or I wont hit a bigger issue having tackled this specific issue. (Nobody said I had to 'do this right' !)

Patching the kernel is nasty: the kernel will evolve and dtrace wont work, but we can live with that: lets make it work on kernels which exist, and see what happens in the future.

I am going to do a small amount of code reorg (moving internals in fbt_linux.c to dtrace_linux.c, where I am trying to keep most of the general code).

Posted at 19:24:34 by Paul Fox | Permalink
  Schroedingers /proc Tuesday, 25 November 2008  
Whilst examining the kernel source - specifically the implementation of the /proc filesystem code, I was wondering:

Does the content of /proc exist, if you are not looking at it ?!

Theres two answers to the question:

Yes. /proc entries come and go as processes get created and die, and this would add a lot of overhead to proc creation/death and be a problem.

No. Its an illusion: when you look, e.g. via opendir()/getdents()/open(), then the entries are constructed for you.

Why does this matter? Well, in dtrace we need to track process creation (I am concentrating on /proc/pid/ctl - how to create a sub-entry in /proc, which exists/disappears, with the process it is tied to).

I am being thick: I am studying the code, and am nearly there in totally understanding it, but missing some 'glue' logic.

What it does is succinct and clever.

The "Quantum" nature of /proc can be exposed. I conjured up a test to try it out.

Look at /proc/$$/fd/0 -- stdin for the current process. What would happen if we did something like this:

$ sleep 10000 < /proc/1234/fd/0 &

and killed the process (in this example, pid 1234). What would happen to our sleep's fd/0 stdin? It doesnt show as a deleted file entry.

Go on. Try it.

I then thought about what would happen if a new proc 1234 is created. (I had to create a fork bomb (by accident!) - to get the pid in question reused).

It opened my eyes to a potential security hole (which, fortunately) does not exist.

But it is amazing the cleverness and subtlety of something we take for granted on a day to day basis...

Posted at 22:41:59 by Paul Fox | Permalink
  DTrace Progress .. slow .. Sunday, 23 November 2008  
I havent update the blog in a while, so I should explain why...

Things have been hectic in the day job, but thats a partial excuse. I have been sweeping around trying to find other things to do before continuing with DTrace.

These other things include some minor updates to CRiSP, and lots of enhancements/fixes to fcterm. fcterm, as I have written before, is the fast terminal emulator around, but has specific features, like multiple tabs, infinite scrollback, and an ability to not-die if the X server crashes (actually, it retries to connect within 10s, useful if you need to bounce the X server, eg when uing something like Hummingbird X, rather than a native X server).

Anyway, back to the plot. The next phase is to get dtrace to run an ELF executable, e.g. a USDT conforming app, and this requires access to /proc/pid/ctl, which does not exist on Linux (the SVR4 procfs(4) filesystem).

We can either use the ptrace() syscall, or write our own, and avoid lots of changes to the userland dtrace libs+bins.

ptrace() has survived well for decades, despite some shortcomings, but it misses a crucial feature which Solaris truss(1) has had for requires: run-on-last-close. If you kill -9 a truss app, the truss'ed app will continue to run, and this is not the case for a ptrace()'ed app - it will be stuck in the STOPped state.

Fixing this isnt difficult in the kernel source, but we are not going to modify the kernel source.

So, /dev/dtrace_ctl has been borne to try and implement the functionality. I would like to implement /proc/pid/ctl, but havent found out how to hook into /proc as yet from outside the kernel, without having concerns on PID death affecting us. (I may be worrying unnecessarily).

I will try the dtrace_ctl driver approach and may retry the /proc approach too.

Posted at 20:23:21 by Paul Fox | Permalink