Dtrace - thats all forks ! (just kidding) | Saturday, 31 January 2009 |
What does this mean?
It means the dtrace experiment is over. It works on Linux.
Yes, there are cleanups to do and some missing code to handle forks and garbage collection of shadow procs and stuff and stuff.
But we have now exercised pretty much the code, and I need to do some more USDT exercises (like strings and stack dumps; I need to re-research the ruby stuff on Adams blog).
Of course, the code is very kernel specific - the kernel changes often from one release to the next, and it maybe possible to get smarter about handling forwards/backwards compatibility.
Some point in the future, I want to write D scripts for Linux and not be debugging dtrace. Time will tell what we can do and theres lots of existing D scripts to learn from.
I'll continue writing up progress on dtrace, and hopefully more people can try it out and report back on kernel build issues.
dtrace progress - USDT works almost | Thursday, 29 January 2009 |
$ dtrace -n :::saw-line dtrace: description ':::saw-line' matched 1 probe CPU ID FUNCTION:NAME 0 1899 main:saw-line
This example is taken from the simple-c example app bundled in the distribution. At this point, the target app died with a SIGTRAP since I havent finished testing.
What made this work?
The code I have ported (my fault) confuses Sun's 'regs' array with Linux's 'pt_regs' array. I've done some mappings so we get the correct interrupt level context, but had to comment out a few references to unsupported registers on Linux (eg %gs, %fs, etc). I assume the references are needed in probe context for D apps that want them.
Shame that the target user space binary died, but now hopefully I can make even more progress, e.g. for the other trap types (which I dont fully understand yet, but then, I am being thick).
I'll release new code with these changes in.
Linux is depressing...Very much so | Monday, 26 January 2009 |
I upgraded one laptop last week to Ubuntu 8.10. Worked like a dream.
Until I tried to suspend the laptop, and sometimes it wouldnt recover and most of the time the stupid, very very stupid NetworkManager ... didnt. After some research, I found a kit to replace this.
But, still the same issues ... not working reliably after suspending. And definitely not reconnecting to the WPA wifi.
I cant believe how horrible and complicated this has become. In the old days there were just config files and /etc/rc.d files to worry about. Theres unnamed daemons controlling everything with layers of too much complexity.
I updated the kernel to 2.6.28.2. I lost my sound. I lose my sound every time I build a stock kernel, and I still dont know what/where.
Then my master server - Fedora Core 8 - tried to upgrade that to 9 or 10, and now, it too, wont restore (non-WIFI) ethernet on start, without me going to the other room to poke it with a sharp and hot stick ('ifconfig eth0 down; ifconfig eth0 up; route add default gw ....')
Honestly, I am ready to retire and give up on this.
At this rate, Windows 7 will be my prime operating system, or I am going to live in a cave where people dont upgrade things that werent broken.
Of course, its all my fault. I thought it was time to get 'real'. Silly me.
CRiSP - Unified Linux binary 9.3.6a | Thursday, 22 January 2009 |
This is presently being produced on a Fedora Core (FC8) box running glibc2.7, but runs on AS2.1, AS3, AS4, Ubuntu 7/8 (and probably 9+ as well).
Having tracked down what was causing such dependencies, and the arms-race to keep up with distros, I found that really only a couple of things caused this. Strangely, the ancient C functions ctype.h (isalpha(), isdigit(), etc) were the biggest nuisance, since later glibcs use GCC smarts and libc versioning to disallow a new binary running on an earlier release.
I wrote a tool to patch the ELF section headers to remove the enforced GLIBC dependencies, and it works.
(I wasted a lot of time, because my FC8 box got updated and the X11 libs/headers moved around and I thought my unification was triggering the bizarre errors I was getting).
(Along the way, my dynamic IP address also changed, which I only found out when I tried to download from my own site).
And to make matters worse, one of my laptops suffered a "I am going to update apt-get and break your system badly". This was an old Knoppix release - which was nice since it was an old GLIBC release, but an upgrade pushed me into glibc2.7 territory, invalidating the first rule of software development: dont update things because it seems like a good idea. It isnt :-)
So, now that laptop gets the Ubuntu treatment. I'm feeling a lot more happy with apt-get, and now two systems (plus 1 vmware) is all on Ubuntu, with the master being RedHat (which is now a blacksheep because it has no easy upgrade without pain of a big download or Ubuntu). Still, diversity is good.
What else would I do if I didnt have to fight silly issues ?! Dtrace maybe ....
Yes, but now I need to fix fcterm - the terminal emulator which, added to the chores above by core dumping whilst running gdb. (UTF-8).
Inches away .. dtrace progress 2009018 | Sunday, 18 January 2009 |
So now, we can have a USDT app tell the kernel it has probes, have /usr/bin/dtrace monitor the probes, have the app hit INT3 to jump into the kernel, and the next bit is to have the dtrace engine talk back to the application.
I peeked at FreeBSD again, only to find all this is commented out over there, so we are ahead in this area compared to FreeBSD. Next is to work out some details in dtrace_user_probe(), and just use it for a bit.
CRiSP and a universal Linux binary | Sunday, 18 January 2009 |
Why? Because, if you run a later crisp on an earlier system, the binaries will refuse to run, complaining about glibc mismatches.
This drives me nuts. For years I had been meaning to see what the cause was, and I was surprised. Very surprised how the glibc maintainers could do this.
No other platform: Windows, Mac, or any other Unix has this problem. (Well, Mac can be nearly as bad, but definitely not Windows, or any SVR4/BSD derivative - to my knowledge).
Take the standard C library for <ctype.h>. Its existed since practically day 1 of the C language, providing useful functions like isalpha(), isdigit(), etc. Did you realise that this family can cause binary API problems? Well, it does. Somewhere in glibc 2.[567] they made these functions Unicode and obscure-aware (eg, isalpha(EOF) should not cause an array bounds indexing violation). So, the simple #defines or array lookups of old are replaced with calls to a function in libc.so, which may not exist in older libc.so's. Yuk. This isnt an option that is turned on because you want, and its almost undocumented.
So, one of the trivialest functions in libc.so is being replaced by a private implementation.
pthreads is another issue - I am aware that at some point in the past, the size of structures for pthreads changed, and this caused portability issues for apps. Instead of hiding this in the implementation, they use versioning of symbols.
In GCC 4.x, it supports functions for detecting stack frame smashing, but this is turned on by default. If you compile with -D_FORTIFY_SOURCE=0, then these API compatibility issues are removed. (I am not advising others to do that; I test my apps with valgrind and my own builtin memory corruption detector).
I had to do lots of stuff to find this out, e.g.
objdump -T binary | grep GLIBC
Will tell some of the story.
objdump -p binary | grep VERwill tell the rest of the story. The definitions for VERNEED, VERNEEDNUM and VERSYM stops a later binary running on an old system. When I have finished writing a tool to strip this out of a binary, then I can run a glibc2.7 application on an AS2.1 (glibc 2.3 or glibc2.2).
I will then be able to build just two Linux releases: 32 and 64 bit, and use my latest development system to create a binary compatible release.
I have to say that doing this means the onus is on me to work around why such symbol versioning occurs, but its a nuisance.
I have lots of vmware and systems running a variety of Linux releases, but its an annoyance to have customers tell me that Ubuntu 8.10 isnt supported, even tho I use it myself (for dtrace work).
dtrace for OpenBSD ? | Saturday, 17 January 2009 |
Would I do it? Maybe, if someone asked. But before then ... Linux needs to get a little bit further forward.
With regards Linux dtrace, I have a piece of glue to place -- on the interrupt vector which handles a user-space breakpoint trap. I can see the code in Solaris, and now need to work out the best place to put this in Linux, and that should handle the full cycle from user-to-kernel-to-user-to-kernel which is needed for USDT. Let me see how I can get on with this, and then some cleanups can start to happen....
dtrace on windows | Friday, 16 January 2009 |
I was wandering if it was doable/viable/workable. To be honest, I dont see why not.
I am not proposing to attempt this (not unless I am really bored and Linux dtrace is 'finished').
But technically, most of the dtrace code is just plain-ol-C. Theres bits to hook into the kernel and userspace, but the dtrace code is modular and segregated that actually the Unix specific pieces are relatively small.
For anyone who has tackled Windows device drivers (and they are not that difficult, although operate in a more complex way than Unix), it should be doable.
Theres more layers in Windows (core kernel, nt.dll, win32, user, gdi, ...), but the fundamentals of reading/writing memory is what is crucial.
Of course, Windows doesnt support ELF, and I would hate to run a 'dtrace -l' inside a CMD.EXE window.
CRiSP and Large Files | Friday, 16 January 2009 |
Someone asked me about editing/viewing large files in CRiSP. I thought I would crib some of the mail I sent.
Heres a question: What is the largest file you could edit on a 16-bit machine? 32-bit? 64-bit? (CRiSP has survived these CPU architectural changes over the years).
The answer is the same for all: how big is your hard drive. Naive coding would lead to just loading the file into memory and hence you would be limited to the size of RAM and addressability of the CPU. This has never been a good thing: if you spend all your time in the same editor for small files, you almost certainly want to use that tool for large files too, e.g. >4GB files.
The largest file I tried to test in CRiSP is around 16 GB. I didnt go much further (this was a 32-bit cpu), because it got boring waiting for the file to page in via the O/S, but it works.
Of course, you can find a weak spot in this: just try taking a huge file and do a search and replace of every character in the file. CRiSP will attempt to save the undo information and you will wait a long time for the I/O. At least CRiSP tries - and tries to be efficient.
So, the answer to the question is: How long do you want to wait?
CRiSP can support almost infinitely large files (upto the size of your hard disk or filesystem), but what you do next will really depend.
Its worth reiterating this point. Whether your tool of choice can survive being pushed to extremes, and whether its performance degrades linearly, exponentially, or catastrophically. That is an interesting topic for technically interested people. Maybe not for everyone.
dtrace progress 20090115 | Thursday, 15 January 2009 |
I know the kernel isnt logging the triggered probe (or maybe my example simple.c is too simple!)
Alas, the proc falls over when it hits the SIGTRAP, since the ptrace parent isnt doing the right thing.
To see this happen, I modified simple.c to checksum its own code (very simple hack) and could see the checksum change, immediately followed by the SIGTRAP abort.
Next step is to get /usr/bin/dtrace to trace the child properly. Lets see what happens.
As always, latest code on my dtrace download site.
STUPID STUPID ME ! dtrace progress | Tuesday, 13 January 2009 |
When a user space app registers itself as a provider, it would not show up in 'dtrace -l'. Why?
Because I am stupid and missed the blindingly obvious.
Fasttrap.c has a limit on how many user space providers can be created - to avoid crashing or DOSing the kernel. But I forget (or rather, didnt realise) the variable was not set. (In Sun land, they read the attributes from kernel config variables, but I had commented that out).
Stupid me! Now I can see the provider. Heres an example:
/home/fox/src/dtrace/drivers/dtrace@vmubuntu: dtrace -l | tail 1859 fbt fuse fuse_iget entry 1860 fbt fuse fuse_iget return 1861 fbt fuse fuse_set_nowrite entry 1862 fbt fuse fuse_set_nowrite return 1863 fbt fuse fuse_abort_conn entry 1864 fbt fuse fuse_abort_conn return 1865 fbt fuse fuse_flush_writepages entry 1866 fbt fuse fuse_flush_writepages return 1867 simple5555 simple-c main saw-line 1868 simple5555 simple-c main saw-word
Now, hopefully I can make some real progress.
dtrace progress | Sunday, 11 January 2009 |
The 'easy' part was getting core dtrace into the kernel - wherever something was wrong, I would crash the kernel, so, I could track down where it broke and work backwards.
With USDT its slightly different. After getting a userland binary to have probes in it, it runs and tells the kernel it is probable. Kernel trace messages show the probe exists, yet 'dtrace -l' doesnt list the probe. (I am using MacOS to compare what *should* happen with what *does* happen on Linux). I am obviously missing something here.
Its a bit of chicken-and-egg trying to work out the flaw, e.g. it could be the userspace implementation not being complete, or it could be a sillyness in the kernel, or even something I have forgotten to do.
Interestingly, when running a USDT app, it declares the probes, and you can see them (eg on the Mac) with 'dtrace -l'.
You can run in two ways: run the app on its own, and attach to the probe with dtrace, or, do both together, launch dtrace to fire the app and monitor the probes.
Interestingly, on the Mac, gcc seems to have some enhancements to allow the inline probe declarations to work. Statically disassembling the binary and disassembling whilst the app is running shows the kernel correctly putting in "INT 3" instructions into the userspace code area.
Its possible on Linux that dtrace is too divorced from the real kernel, or I just had something stubbed out.
I also hit a problem with "dtrace -c ..." in Linux. I dont know if this is a pthreads issue or a Linux issue, but Linux doesnt allow ptrace(PTRACE_CONT) to be executed from a child thread, when the child target process is forked() from the main thread. In Linux, the target proc and the controlling thread are like siblings instead of parent-child. (I solved this temporarily by moving fork/exec creation to the monitoring thread, but its still a bit flaky).
I am spending a lot of time statically reviewing the dtrace code to work out where the problem is. I can find lots of code I want to be executed to handle USDT, but, am missing a vital cog to make it hang together...
dtrace for freebsd 7.1 | Saturday, 10 January 2009 |
I grabbed the distro and the source to see what had changed in dtrace. It looks like "not a lot" from the source snapshots I had earlier in 2008.
Alas, disappointingly, USDT dtrace doesnt work. (I couldnt get dtrace to work at all in FreeBSD from the stock download for x86-64; I guess I need to rebuild the kernel).
Searching the web reveals user land tracing is not complete. This is a shame, because I have been using the FreeBSD model of implementation for Linux. I have had a hard time, because it looks like there are subtle things wrong/broken in FreeBSD/USDT tracing (e.g. the way a process is launched and ptrace() is used to attach to the process is missing some key lines of code).
I have spent the last week poring over the subtleties of what FreeBSD do, along with Sun and Apple. I should be able to get this bit to work, however I am not sure about other aspects of the tracing, such as aborting or skipping over syscalls. (The ptrace() syscall is simply not as powerful as Sun's /procfs interface).
I know most of the ELF code works for symtab lookups, so I should be able to make some new progress. I'll update the blog and put out a new source tarball when I feel happy with what I have.
dtrace progress 20090105 | Monday, 05 January 2009 |
First, the /proc/$$/ctl driver sort-of-nearly-almost-but-doesnt work. It hooks into the kernel and can respond to calls, but theres a problem/difficulty: I havent figured out how to simply intercept syscall entry/exit on a per process/thread basis, without lots of kernel hacking or a brute force patch on entry/exit to the syscall handlers. This would be against the ideal of dtrace having a zero-impact approach to monitoring. Maybe its doable long term (I do so love the solaris approach to procfs; ptrace doesnt cut the mustard).
In any case, this may not matter; I have spent more time understanding the libdtrace library about how it handles:
dtrace -c progand how it grabs a running process. I took a new look at FreeBSD and noticed it used the only other valid alternative: ptrace, so I am grabbing ideas and code from FreeBSD to see if I can make progress.
Side note: using the Apple code is rather pointless, since it relies on the MACH underlying OS calls to do process manipulation and theres nothing similar in Linux - i.e. an uphill struggle.
The FreeBSD code is nice and simple, except it does rely on the EVENT subsystem in FreeBSD for inter-thread communication (not sure I fully follow it). I have stubbed it out for now - just so I can get something/anything working.
Hopefully when this is done, I can handle the reverse journey for USDT.
Lets hope 2009 is a better dtrace year. It will be a long slog to get dtrace reliable, and the more that people try it or comment on it, the better, but I feel comfortable that key parts of dtrace just work, but I havent addressed quality. (I am slowly trying to clean up compiler warnings, for instance, which many times obscure real silliness on my behalf).