Dtrace release 2010-08-30 Monday, 30 August 2010  
This release includes a working kernel ctf file dump. Finally, the following trace script should work:

        printf("pathname=%s\n", args[2]->fi_pathname);

Note the fact we are referring to fi_pathname, with no struct definition in sight! This works because /usr/lib/dtrace/io.d is read, and this works with the kernel ctf file.

A big problem with this release is for those of you on older kernels, this likely means older glibc releases and unless you have libdwarf.a you wont be able to compile the ctfconvert utility. Compilation will proceed, but you wont get the kernel (linux-`uname -r`.ctf) file.

And hence, you wont have access to the args[n] arguments to io:::start.

I'm debating what to do - some older systems have libdw.a, but thats effectively an out of date version of the required library and is missing various systems. (Essentially, its gcc which is the important bit - if GCC can generate certain DWARF constructs, but libdw.a cannot parse them, then we lose).

The alternative is to add libdwarf source to the distro or provide an autodownload link. I'll see what people have to say/complain before doing anything in this space.

Better to have a working dtrace for the latest kernels, and try and get people to upgrade to match the requirements. (I dont like doing this but its acceptable if I am to make any progress).

Posted at 20:07:51 by fox | Permalink
  Blocking on dtrace Saturday, 28 August 2010  
Just having some mental blocks on dtrace. Specifically, we now create a kernel .ctf file - but looks like this is missing something we need, despite putting in the bufinfo_t structure in to the stubs file.

Without that, we cannot invoke args[n] because dtrace doesnt know how to convert the struct.

Hopefully wont take too long to figure out what is going on, and then I can do another release.

Posted at 21:31:48 by fox | Permalink
  DTrace status - 20100825 Wednesday, 25 August 2010  
A dtrace spectator asked me for a "status" of dtrace (Edward Peschko) -
a very good suggestion, so that people dont have to figure out from my
titbits in the blog, what works/doesnt. So this is my attempt
to catalog working features and in-progress or broken features.

So here goes - feel free to feedback to me where I am not answering a question, or am vague, or wrong, or .. whatever you like!

I will attempt to keep this succinct and update periodically.

Working Features

o Works on AS4/64 bit kernels, Ubuntu 8.xx - 10.xx (32-bit and 64-bit). Not every kernel version tested, but should build on at least 2.6.12 onwards. o Tested up to 2.6.32 kernels, but not proven/tested under later kernels. o FBT Provider: fully functional, except argument types are not presently supported. You can access arg0, arg1, ... as values or pointers but no type info to support structure accessing. Should be safe to probe all functions in the kernel. (DTrace keeps a toxic list of functions we mustnt touch) o INSTR Provider: example provider for instruction level tracing - works on 32/64 bit kernels. o SYSCALL provider: fully implemented - can trace all syscalls. o SDT Provider: first provider io:::start/io:::done "works" but debugging the typed arguments (args[0], args[1], args[2]) o dtrace.conf permissioning model to avoid need for root access o stack()/ustack(): implemented, but at the mercy of code which doesnt have frame pointers (in kernel or user space - stacks may have bogus entries). o Kernel GPF protection: all D scripts and pointer accesses are protected from panicing the kernel. o D scripts should work, except where they rely on providers/features not yet ready in Linux/dtrace (eg cannot run standard scripts like iosnoop.d or other scripts out there).

Partially Working / Not yet quality controlled:

o USDT Basic C/C++ works - demo program supplied, but not fully quality controlled that all D functions work. o CTF framework for the kernel: Since we dont build the kernel, the kernel lacks the .SUNW_ctf symbol table needed by dtrace to allow D scripts to use struct/union pointers to view data structures. This is currently work in progress to provide an alternate mechanism.

Not really working / not yet implemented

o PID provider: can trace specific pids, but the process control leverages disruptive ptrace() syscall and if dtrace is killed, the target process may be left in a STOPPed state. (Not good for system daemons, e.g. if a bug strikes or session abruptly disconnected) o Java, Python, Ruby, etc: No attempt to test the other languages in a USDT context. For non-java, it should be possible to build instrumented apps (drti.o is provided to link with). I havent reviewed what the jstack() code does not tested to see if it works/not-works. Shouldnt be an issue technically in getting this to work assuming the process control issues in the PID provider are resolved. o Most of the SDT probes (NFS, VM, etc) are not yet implemented. Will require special work because we do not modify kernel source and kernel build is not in our control.


Q Can this be used in a production context? A Dont know. I believe so but please do not take my word for it. If your kernel is a recent release (2.6.28+) then it should work out of the box, and certainly syscall tracing appears to work. (I have dtraced an X server with GNOME starting up without issue).

Q Why are you doing this? A Because its relaxing and a challenge and I want dtrace to be available everywhere so that when I want it, I know its there. If distros could bundle it - superb. If I have to build it myself, less so.

Q What can I do ? A You can use it, report issues, track progress, submit patches or suggestions

Q What about Oracle, the CDDL license, GPL? A What about them? Nobody has told me this is breaking any license, and I have no qualms with Oracle.

Q What about systemtap? A Super! Competition. dtrace can be used around systemtap - trace systemtap executing. I dont know much about systemtap, and it may do more clever things than dtrace. In which case Dtrace/linux will evolve to try and be a better systemtap than systemtap

Q Is this real? A Yes

Posted at 22:26:25 by fox | Permalink
  CTF On Linux: ctfconvert Monday, 23 August 2010  
I am adding the ctfconvert binary in the next release of dtrace. This is an important utility because it helps to complete the loop on having CTF (Compact Type Framework) data on Linux.

This data is needed so that SDT can access kernel structures, e.g. for the io:::start probe and others.

I spent the last week trying to work out where/what this missing piece was. Why can't dtrace use structures defined in /usr/lib/dtrace/*.d. Now I know why: because everything it possibly wants to know is sitting in the kernel binary. Except on Linux, where we have no CTF data in the kernel.

So we need to provide an alternate mechanism to get to the Linux kernel structures. Enter ctfconvert.

I took the code from Apple/MacOS, which is pretty much the same as the Solaris code plus/minus some Apple code changes.

So, heres the first Linux object file with a .SUNW_ctf section.

As a note: on the next dtrace release, you will need libdwarf-dev and binutils-dev packages if you want this binary to be built. (DTrace will build without those, but display a warning).

Next step is to figure out how to get a shadowed kernel file. (Should be straightforward, but its now on my todo list).

/home/fox/src/dtrace@delly: build/ctfconvert -L lab1 -o /tmp/ctf build/ctfconvert.obj/ctfconvert.o
/home/fox/src/dtrace@delly: objdump -h /tmp/ctf

/tmp/ctf: file format elf64-x86-64

Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000659 0000000000000000 0000000000000000 00000040 2**2 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 1 .data 00000000 0000000000000000 0000000000000000 000011b0 2**2 CONTENTS, ALLOC, LOAD, DATA 2 .bss 0000001c 0000000000000000 0000000000000000 000011b0 2**3 ALLOC 3 .rodata 00000402 0000000000000000 0000000000000000 000011b0 2**4 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA 4 .comment 00000024 0000000000000000 0000000000000000 000019f0 2**0 CONTENTS, READONLY 5 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00001a14 2**0 CONTENTS, READONLY 6 .eh_frame 000000a8 0000000000000000 0000000000000000 00001a18 2**3 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA 7 .SUNW_ctf 00000764 0000000000000000 0000000000000000 00002458 2**2 CONTENTS, READONLY

Posted at 21:47:00 by fox | Permalink
  MacMini - wifi problems Sunday, 22 August 2010  
My MacMini has driven me nuts over the last couple of years. I have the older Intel MacMini, 2GHz dual core, and have had problems with the wifi (lots of similar reports on the web). This is a design flaw in the hardware.

I was so close to just jumping up and down on it and melting it.

Originally, the wifi would stop working for a day or two. About once per month. After a while, it grew to maybe working one or two days before giving up.

There are numerous reports of heat problems with the card - which is bad news. If the problem strikes, it will get worse until total failure.

I tried a homeplug ethernet connection to bypass the wifi. This worked for a while, but after a while, I had problems with this. Never sure if it was the macmini getting flaky or the homeplug. I do so hate devices that wont tell you what they are doing, so I relented back to the wifi.

When the problems got so bad, I searched the net for various solutions. USB wifi devices seemed to be the solution, but I am loathe to buy one of these devices, only to find Snow Leopard isnt supported or even some future O/S. Much better if Apple supports it - more chance of driver updates when needed.

So I eventually settled on an Airport Express. I hadnt really wanted one of these - although I had thought that having a bridging wifi might solve/help the problems. I had a lot of problems getting it to work. (My setup is complicated because I wanted to join my existing LAN). But after a few hours of screaming at it, and the macmini, it works.

I even learned about WDS and the Airport Express has an output for speakers, which might solve another problem I have. So, I like the device - very much overpriced, and it gets a little hot to the touch, but lets see if it can last a year or two.

Posted at 21:17:49 by fox | Permalink
  Dtrace: D is not C. Repeat after me. Friday, 20 August 2010  
I'm getting a little frustrated with Dtrace. I feel I understand a lot about it, but the fact that it looks like C doesnt mean it bears any resemblance to C. At all. No!

Heres some issues:

  1. No support for "if". Everything must be using the ?: construct.
  2. No loops. Ok, we understand that - because we are running in kernel context. But we could allow time limits of instruction execution limits to avoid long execution times.
  3. No functions. No macros. (You can use the C preprocessor but thats optional, and not a part of the default/core language). Functions are nice - they let you break up long complex scripts, and avoid repetition. But we can't.
  4. No #include. Well, we can have it if we use the C preprocessor.
  5. No "struct" or "typedef" inheritance from /usr/lib/dtrace

This last one is strange. dtrace reads the files in /usr/lib/dtrace "in a strange way" to hide and augment implementation scenarios.

With SDT in Linux, but no CTF support (kernel data structures), we can achieve a goal of ensuring base types and structures are usual in D scripts, by creating public struct/union/typedefs. These are needed for static probes and the "translators" which map from internal C format to D format, in a safe way.

But we can't put struct/union/typedef in a "*.d" file in /usr/lib/dtrace, because although dtrace reads these files, it discards the structs.

Most D scripts dont explicitly "#include" - but dtrace arranges things so it knows the type of args[0], args[1], ... for the probe.

So, as of today, I have io:::start and io:::done working, but only if I code the struct/translator into the calling script. Thats not how Apple/Solaris work.

What gets me is the "surprise" factor - I'll work out a solution that makes this seemless, but its annoying to bump into a "it doesnt work that way!" scenario.

Posted at 22:52:47 by fox | Permalink
  dtrace: d_path() - a wonderful function Tuesday, 17 August 2010  
Given a "file" structure, the d_path() is a cool function which gives you the full path of the underlying file. This is what the /proc/pid/fd code uses to give you the mapping of file descriptors to real filenames.

And now, dtrace for linux, in the io:::start provider can reflect the Sun standard structures for buf_t, fileinfo_t etc.

I need to polish off the code, but its looking good - we can see the full filename of the entity being manipulated and proves the io::: provider will be most useful for not only writing D scripts, but porting existing Dtrace utilities (like iosnoop.d)

Posted at 23:39:01 by fox | Permalink
  DTrace Progress 20100816 - SDT Probes - at last Monday, 16 August 2010  
At last some progress in the SDT probes area. I had started work a few weeks ago and hit some issues in trying to align the invoked probe with the expected user interface arguments (translators).

I have finally got a version of this working (now need to clean it up). There was some subtley in sdt_subr.c as I tried to isolate how the args work, along with a lot of time studying the stack tracing mechanism. I went up various dead-ends, but can now get to a point where I can communicate an io:::start probe to the outside world.

I hope to clean this up. (I am not convinced my io:::start probe is the "right" one, e.g. compared to a Mac, I am firing on all reads/writes, but at least I can get the cosmetics working and then hunt a better place to plant the probe).

Also found that if a translator mis-references structure offsets, this will cause problems and/or a kernel segmentation violation - so I need to ensure this is safe from userland destruction techniques.

Posted at 21:22:51 by fox | Permalink
  Security idea Monday, 16 August 2010  
Was wondering - there are so many sites out there trying to crack into your home or work systems - but why is it so easy for them?

Consider this. If I have a secured host, then typically it sets behind a firewall, and it takes a lot of dedication to get this right, and keep it right.

And what makes life so difficult is you cannot sit back and assume you have done a good job. You have to track software updates and exposures and be careful if a new exposure is found.

But how do the crackers do this? Typically, things like dictionary attacks against known ports, e.g. trying all known passwords against the ssh port.

Some of these are easily defeated - just pick a non standard port for your ssh daemon. And your ftp daemon. And your web server. And your PHP server. And ...

This is crazy/insane.

How about this for an idea. Have a "randomise" button on all your appliances. In the same way that a video/mp3 server can connect automatically to iTunes or some other home network (NAS, NFS, CIFS, etc) and you can use DHCP to autoallocate resources, you would have a tool - which all systems respond to.

When you select "randomise" all services on your internal intranet randomise the ports used for standard services, and add a new encryption or obfuscation key. All devices need to partake in this.

Now, external entities have nothing to target. They dont know what ports to use, and if they find a port, they dont know whats behind it. Normal sniffing techniques wont work - its all encrypted or obfuscated. Things like HTTP protocols or FTP protocols could be reprogrammed to use different words in the header request, or put a random preamble there.

Even the bytes in packets could be jumbled up, so packet injection wont work.

Theres a lot of details to work out - on each tool, protocol and network layer. But I cant think of a way to hack a network when there is nothing to gain insight into. (Internal to outgoing connections would need to negotiate if randomisation is possible, and/or routers would need to be developed to allow intranet to extranet connections).

This would save so much problems, trying to adjust the files in /etc and manually reconfiguring systems.

Posted at 21:10:17 by fox | Permalink
  Linux build sizes - 6GB - insane ! Sunday, 15 August 2010  
Just compiling up a new Linux kernel. By the time it had finished, my disk usage went up by 6GB. Its really painful configuring a new kernel (despite copying the old kernel config) - I am probably doing something wrong, but I find it offensive the vmlinux.o - prior to linking is 256MB in size (final vmlinux executable is 96MB).

I know most of this is debug symbols, but its a pain to compile and find space in the VMs.

I wish the kernel linking phase was clever enough to optimise and strip out most of this stuff. It really is huge.

(Yes, I appreciate /bin/ld and gcc do their best to optimise things away, and I know it doesnt hurt performance, but its unmanageable to be going from a 1-2MB kernel, to one requiring 96MB of disk space).

C'est la vie.

Posted at 20:20:50 by fox | Permalink
  Blog script link Saturday, 14 August 2010  
Forgot to add the link to the perl script for blogging - it may not be useful to people, but I offer it up, so people can play.


Posted at 14:54:39 by fox | Permalink
  New blogger script Saturday, 14 August 2010  
I've updated the blog.pl script so that I can better keep the crisp blog and the blogspot blogs in sync.


I like the crtags.blogspot.com way of creating blogs - I use a CRiSP macro to allow me to compose and send updates via the Google CLI tools.

But the older log - http://crisp.demon.co.uk/blog - carries a lot of google indexing status (search for "dtrace linux"). And I know some people are looking at the crisp weblog which then doesnt get updated so frequently - because I have had to tinker with my machines and the way I do things.

Hopefully, going forward, I can more easily keep the sites in sync. (Its the same data).

I just need to fix one thing, and then I can go back to Dtrace and CRiSP to do more fruitful things.

Posted at 14:47:58 by fox | Permalink
  dtrace linux progress 20100808 Sunday, 08 August 2010  
Been a busy week trying to fix a variety of issues - some good, some not so good.

Recent work has looked at the stack/ustack functions to make them reliable and working on 32/64 bit platforms. This is getting closer to completion (I have decided to comment out the DWARF code for now and rely on the algorithm the Linux kernel does for stacks - try and assume a frame pointer, but if not present, do word-at-a-time sampling).

I added a definition for the uid_t typedef, so that the "uid" function can be used (Apple/Solaris rely on the kernel providing CTF typedefs to fill in the gaps; eventually we need to do something similar - I have ideas on how to approach this).

Some typos on the scripts/tools fixed also.

I've introduced some regressions in the last few days - hope to restabilise dtrace for linux in the coming days.

Posted at 21:24:59 by fox | Permalink
  Silly me - ustack isnt prime time yet Tuesday, 03 August 2010  
The reason invoking ustack() causes dtrace to hang procs in the system is simply because its doing a "grab" (ptrace(PTRACE_ATTACH)) on the target so we can get a symtab for the ustack.

Since I hadnt fully finished debugging that, it can result in the target app hanging on the ptrace call - which explains why it was causing me a headache.

So, looks like thats next on my target list to fix.

Posted at 23:51:54 by fox | Permalink
  DTrace: Why not "if"? Tuesday, 03 August 2010  
I dont understand the rationale for not supporting if-then-else statements in DTrace. It leads to illegible code. I know why while/for-loops are not supported, and I do wonder about the syntax of probes - which preclude functions/macros from being implemented too.

I think whats needed is a DTrace++ which extends the syntax whilst maintaining the current virtual machine.

Just fixing some things I dont like (hangs/crashes the kernel). Just found that doing this:

$ dtrace -n "io:::{ustack();}"
causes traced processes to be stuck on a SIGSTOP signal. Need to figure out where this is coming from, and why.

I have temporarily disabled ustack() from doing anything useful - so I can figure this out, and hope to restore it soon.

I am also contemplating how to generate function prototypes from the kernel source so we can access typed arguments to kernel functions (since we dont have a CTF ELF section in the kernel, we need some way to get high level access to structures; still debugging translators).

On another note, I am realising that I need to use DTrace more as a real user - its helping to find things that are broken or just not easily implementable without low level kernel knowledge. E.g. timing a process being blocked due to a task schedule is almost easy but we dont have access to the arguments to deref the process (struct task_struct) easily.

So, all in all, lots of things to experiment with and move forward, but reliability/dependability is a key goal for the short term.

Posted at 23:19:55 by fox | Permalink
  Where in the world is args[] setup ? ! Sunday, 01 August 2010  
This is driving me nuts. I am trying to debug a translator - a simple one for SDT probes. Heres the translator:
translator fileinfo_t <buf_t *B> {
	fi_offset = B->fi_offset;

In Dtrace, when a probe fires, we get the arguments to the probe via arg0, arg1, .... In addition, DTrace arranges args[0..n] to be a 'translated' version of a kernel binary structure. The "translator" maps from kernel format to publically visible format.

(See http://wikis.sun.com/display/DTrace/Translators for more details).

What is frustrating is that by suitable type casting, arg2 has my pointer in for io:::start, but args[2] - when going through the translator, doesnt access the same pointer ("B" in the case above).

I know this wont make sense to most of the world, but I cannot even find the code which sets up the args[] array.

The key here is that when probes fire, the memory we need to print things out has to be copied from kernel space to user space. DTrace does this by assuming everything is some form of struct/union, and the whole translator business allows DTrace to marshall the data structures in a safe and coherent way. Without them, D code would be fugly with lots of pointer/typecasts. In a sense, a translator is like a method, but its a funny method which deals with user-level typecasting, rather than executing procedural code. If done properly, most users wont ever know how/what happens - they will just do stuff.

Why do I even care? Because I am trying to get /usr/lib/dtrace/io.d to be correct and allow use of 3rd party scripts which utilise the IO provider to intercept when apps do I/O and sleep.

Oh well, more on another day.

Posted at 21:19:27 by fox | Permalink