Strace .. strace .. again | Thursday, 14 November 2013 |
I omitted to put in the punch-line. How do you see what strace is doing?
$ strace strace -o /dev/null df
Is good. But this confused me:
$ strace -f strace df
I will let you ponder the "why".
Off to debug an issue with stat and time_t....
strace strace ... | Wednesday, 13 November 2013 |
Long before I ever started on dtrace, I used truss, on Solaris. Brilliant tool. (My hobby/interest has always been tools to help debug programs or systems). truss has some great features, but it was very beholden to Solaris.
I wrote my own truss - I called it ptrace; this was in the days before Linux was a successful and broad operating system. It did what I would call "hacky" things to enhance what truss did. And it worked on the various Unix flavors. By the time Linux became successful and prominent, strace appeared. Strace for many years was much reduced in functionality and reliability - on Linux - compared to truss or ptrace (my tool).
I had run out of ideas for ptrace, and I note my last change was back in 2007.
In 2010 or so I picked up the baton on dtrace.
strace has evolved and so has the Linux kernel; last I looked at the strace source, it was disappointing - reflecting some deficiencies in the Linux kernel (in terms of process control and debugging).
More recently I have had a chance to look at strace, and my ptrace, to reassess the state of the art in user space tracing. (Dtrace, as good as it is, is somewhat crude for doing what strace/ptrace and truss do - dtrace doesnt make it easy to create a standalone application that doesnt need full privs and has a good quality command line parser; dtrace comes with a truss emulation, but its not refined and its not good at decoding the arguments to syscalls...but I digress).
Looking at strace, it lacks a facility I was interested in, and to which ptrace has: stack dumps of the syscalls. I found a package on google called strace+ but it wont build, and I gave up trying to figure out what was wrong, such was the brokenness of the build.
So, I re-evaluated my ptrace. Last I had touched it, in 2007, was a while ago and it just didnt work - reliably. It didnt acknowledge 64 bit processors/processes or a mixed 64/32 bit world. And it didnt compile anymore.
After a couple of days, I got it up and running again; I started adding the best bits of strace and other enhancements, and now, its pretty good. Its reliable (thanks to bug fixes but also the kernel ptrace(2) enhancements make it much more resilient). On older Linux kernels, kill -9 of strace or ptrace could hang or kill the processes being traced. On modern Linux kernels, this horrific situation is resolved.
I have an arsenal of LD_PRELOAD bits - which are useful for debugging or monitoring specific scenarios, but strace/ptrace/dtrace are great for pure unobtrusive debugging.
I used the strace source to help fix/understand some of the issues in my own code. And am now considering adding more functionality - much more than truss/strace has. FYI, heres the help/usage for ptrace as it currently stands (some features are broken - I need to fix them - especially the i386 specific code).
I may or may not release ptrace as source - I dont necessarily have an interest in maintaining it - as its potentially fast paced for the situation at hand I am debugging. (ptrace gives me the luxury of writing C code rather than D code, in user space to do very specific things - similar to LD_PRELOAD, but in a way that can rarely accidentally kill the target; and ptrace is more portable than Dtrace to systems where you dont have root access to debug scenarios).
ptrace: Trace process execution. (C) 1990-2014 PD Fox, Foxtrot Systems Ltd Usage: trace [-delay nn] [-d nn] [-gethostid nnn] [-trap] [-gethostname name] [-llib ...] [-o output] [-fchnt] [-p pid] [-s size] [-v [!]syscall,...] [-r [!]fd,...] [-w [!]fd,...] [-size nn] [-stack:nn] [-regs] [command]-a Print argv on execv calls. -flush Flush output as we go along. -func
List of functions to trace -gethostid Intercept gethostid() system call and fake return value. -gethostname Intercept gethostname() system call and fake return value. -hex Dump ASCII strings in 1-byte hex -hex2 Dump ASCII strings in 2-byte hex -hex4 Dump ASCII strings in 4-byte hex -name Sort by name. -nest Allow for nested functions -time Sort by syscall time. -tee file Write output to specified file and stdout. -trap Map SIGTRAP signal -trace Trace with -l switch -pc Show PC of system call -c Display system call counts -delay nn Sleep for nn msec before each syscall -d nn Detach after nn calls to gethostid() -e Dump out exec() functions -f Follow child processes -h Display strings in hex/asc (read()/write()). -llib Preload shared library. -m
Intercept page faults. -multiline When printing certain arguments, use multilines to make pretty printing. -n Print network addresses numerically. -nosyscalls Don't print syscalls (monitor page faults only) -o file Write output to specified file. -p pid Trace specified process. -ptr Show pointers for arguments. -q Quiet mode -- dont print output. -r [!]fd,... Dump read buffers for specified file descriptors. -regs Dump registers. -s Trace list of signals. -size size Specify size of strings to print out. -stack:nn Dump call stack (depth of nn). -t Print timestamps. (msec accuracy) -tt Print timestamps. (usec accuracy) -v [!]syscall,... Specify syscalls to [ignore]/trace. -verbose Add extra detail for some args. -w [!]fd,... Dump write buffers for specified file descriptors. -warp YYYYMMDD-HH:MM:SS Warp clock system calls. Advanced switches:
-nouse_process_vm Avoid Linux 3.4 dependency
Set PTRACE_OPTS to pass in command line arguments.
Version: b6
Things I hate... | Thursday, 07 November 2013 |
I normally dont mind filling in the odd survey - especially as I have actually used the site. But, as you hunt around the web, many times you shift away immediately from a site. Grabbing you on the entrance just makes you go away and *never* come back.
Just wanted to check news and traffic info on a site - and "Bang!" up pops the dreaded survey. No thanks - I needed the info quickly.
I filled in a BBC iPlayer survey the other day and specifically stated they could contact me to get "real" information, not the worthless rubbish for a questionaire that pops up, but nobody cares.
It really is a shame.
Or the sites which popup a "Do you need help" when you havent even had a chance to look.
Please, web designers - treat your customers with respect.
And as for the EU Cookie directive - that was as well thought out as a wet fish trying to get a suntan.
Oh well..back to pivot tables.
libdwarf vs libdw - revisited | Sunday, 03 November 2013 |
Unfortunately, theres two versions of this library - libdwarf and libdw, and your head will swim trying to figure out the difference.
Over the weekend I upgraded to Ubuntu 13.10 (gcc-4.8), and dtrace would no longer compile; specifically - the tail part for creating the ctfconvert tool (mkctf.sh) would abort with an error - which is really intractable.
I looked at the code causing the error, but the error is inside a function (dwarf_loclist). With two families of dwarf libs - one, has this function, the other doesnt.
I spent some time perusing the FreeBSD source code, the libdwarf source code, and hacking on ctfconvert. All to no avail. Trying to use the FreeBSD version of dwarf.c relied on the specific version of the dwarf libs on FreeBSD, and despite the libdwarf vs libdw confusion, and the fact that FreeBSD has its own (later?) version of libdw, its so confusing.
From what I read on the web, libdw is the "new" version to replace libdwarf. RedHat helped build this. Some distros have one or the other. So, theres no "correct" thing to do - we currently use libdwarf, but if we use libdw, we may find issues across other distros and older and future versions. It truly is a mess. Added to which theres near zero documentation, except in the comments.
This annoys me; users who download dtrace complain about the build error (which is actually not really an error - I have modified the warning to let people know that if ctfconvert/mkctf.sh wont run, they can still use dtrace).
I realised that there is a way to create a portable libdwarf library and solve this nicely. I created a very basic tool (tools/readelf.pl) which parses the output of:
$ readelf --debug-info=dump ...
This dumps out the DWARF debug info in a way that exposes the records in a dwarf section, and allows a simple parser to spit out all the struct/union definitions. Instead of directly linking with libdwarf, I could use (something like) this script as a pipe, to read the output of readelf. This moves the whole portability issue to that of readelf itself. Assuming readelf works, then dtrace would be immune from libdwarf/libdw confusion.
I started looking at this - readelf.pl could grow into a more concrete "list-the-types" tool, but thats dirty. I started looking at the failing code, but I hit against the issue of having the right source code for my Ubuntu libdwarf tool and/or understanding the impact on prior versions of libdwarf (ie we may break dtrace for older distros).
Eventually, I realised that the error is being caused because gcc-4.8 is defaulting to DWARF-4 format (I have only briefly looked to spot the differences). Since the error is caused by one file (driver/ctf_struct.c) which is used to spit out all the kernel struct/unions, for D scripts, its actually a much simpler fix to force gcc to use DWARF-2 spec - the one we have all used for many years. This fixes the build error on Ubuntu-13.10. Its a suboptimal hack (its not really a hack, since nobody cares how dtrace is compiled - certainly not most users, unless they might use some future kernel debugger).
So, it is fixed. I still dislike libdwarf and the total confusion and maybe one day I will finish off proper support for DWARF-4, or cut my own libdward, or ... whatever.