Are your backups *good*? Sunday, 22 April 2012  
Wanted to recount a little scenario that happened to me, and a warning to others.

I've been writing a "makefile" generator tool. The makefiles for CRiSP have become a little unwieldy over the years - specifically, parallelisation issues, incomplete dependencies, and inability to adapt them quickly to a new scenario.

CRiSP runs on MacOS, Linux and Windows. The Mac and Linux makefiles are shared (as are all the Unix makefiles), with an auxiliary small makefile to handle the Cocoa part of the build cycle. Unfortunately, for Windows, nothing is shared - it has a complete set of its own makefiles.

I wanted to compile CRiSP under MingW - a GCC port to windows, because I wanted to use GCC, and avoid issues with Microsoft Visual Studio. MingW provides a Unix like environment, but Windows is too far from Unix and MingW is too far from Windows, in terms of my makefiles.

So, I built a makefile generator. It has some good features - auto/recursive header file dependencies, avoiding "cd" and creating a parallelisable tree, along with proper cleaning - clean only what is built.

That latter feature had a bug in it: the initial version did the equivalent of "rm -f */*". Unfortunately, when I ran it, it hadnt cd'ed to the build directory, so it deleted all my files in subdirectories. Oops!

Not to worry - I do backups every few days and propagate the sources to other machines. Mount the USB drive and go look at the .tar.bz2 file containing the current sources. (Alas, about 2-3 days old, but that didnt matter).

What I found was a corrupted .tar.bz2 file. My initial thoughts were, *how* did this happen? My backup script is used all the time, and I have validated the backups, but this was strange and new.

Never mind, I reconstituted the missing files from my other backups and systems.

But I was curious, what could cause this to happen.

On investigation, I found the following worrying sign. I tend to backup to a USB flash drive. I use Linux. I use swsusp to suspend to RAM. My current kernel/distro has a bug in it. When you suspend with mounted USB drives, on a wakeup, it doesnt understand the filesystem was mounted or the hardware needs to be reprobed. "mount" would show the filesystem mounted but the device was totally empty. If I unmounted the rogue device, and remounted, all the files were present.

I am guessing that I did a backup and left the device mounted, suspended the system, and didnt notice for a while (1-2 days) and eventually this may have lead to the corruption.

Moral: even if your backup system is perfect and doing validation - your operating system (or some other component) may work against you. You may not know this.

In my case, I may have to strengthen the backup system to consider applying md5sums to the files, and validating them before writing to the device, or maybe to cache the backups on HD and verify before dropping the local HD.

How good are *your* backups?


Posted at 16:41:46 by fox | Permalink
  dtrace, ftrace, ltrace, strace .. so many to choose from ! Wednesday, 11 April 2012  

Vicente-Cheng has been asking questions about some curious kernel messages when dtrace is loaded into the kernel, and it seems like a good topic to cover in more detail. So, here is his question:

The dtrace_xcall code patch failed. (Actually, I don't know how you know that from log.) by this description? "ftrace failed to modify [ffffffffa04cdd44] dtrace_xcall+0x4/0x28 [dtracedrv]"

The ftrace module will got something wrong when dtrace kernel module loading.

The tainted kernel is normal because of the different license (GPL & CDDL).

Lets do a little history lesson.

In the beginning, Solaris gave rise to a tool called "truss". "truss" is a tool for tracing system calls. It was a breakthrough tool (versions for other OS's existed previously, but the truss implementation was simple and easy to use; BSD kernels had there own tool .. but I digress). truss lets you understand what calls a tool makes and you can see the parameters to the syscall. Great for educational purposes or for diagnosing performance problems in an application.

On Linux systems (and other Unixes), strace was created to emulate truss - a system call tracing tool.

"ltrace" is another tool, which can be used to trace any dynamic library function call. Sometimes very useful, but at the library level, the number of calls executed can be huge - something as simple as starting an X Windows program can involve lots of nested calls to fopen, string library and other calls - but because of the level of detail - they can make it hard to understand what an application is doing.

A few years back - dtrace was born. dtrace is a tracing tool which runs at the system level - this means you can trace all processes, into and out of the kernel, rather than selecting a single process to trace.

There are two common modes of operation: trace system calls (you can see who executes the call - rather than with strace/truss, you pick a single process and see what calls it does), and you can trace kernel function calls (seeing which process triggers the trace). [dtrace does other things, such as tracing user land shared libraries, just like ltrace, but again, you can find which processes invoke a function rather than knowing up front, which one to trace].

When tracing in the kernel (and user space), the basic mechanism is that of a breakpoint: one or more breakpoint instructions are placed at the places you choose (using the dtrace probes syntax), and dtrace converts these breakpoint traps into the events and callbacks that a D script can trigger on.

DTrace works at the function level: you pick a function to trace and dtrace computes the start of the function (very easy - the symbol table gives us the start address of each function), and also, the (multiple) end addresses of a function. (Each function can have multiple exit points, so dtrace needs to "find" each of these). Finding the exit points of a function involves disassembling the function and trapping the RET instruction. (Solaris/dtrace traps the LEAVE instruction which preceeds the RET instruction, but this is not very useful on Linux, when using gcc, since the compiler rarely emits a LEAVE/RET instruction sequence).

Its possible for the disassembler to get confused and Linux's dtrace is tuned for the type of output typical from gcc. It can result in not finding the end part of a function or getting confused by inlined constant tables (gcc used to create switch-statement jump tables and inline them in the code body; fortunately, it no longer does this - all read-only data is put in its own ELF section, and code in memory is exactly that -- code -- nothing else).

dtrace needs to intercept loadable kernel modules - so that it can expose tracable functions, and, intercept module unloading - otherwise it could leave a trap exposed in a part of memory that is freed or used for something else.

Lets switch to ftrace: ftrace is a Linux kernel subsystem, similar to dtrace, and an integral part of the kernel. Because of licensing conflicts between dtrace, ftrace exists in its own right. It works similarly to dtrace - intercepting module loads and letting you place traps in the kernel. (ftrace is different in many respects to dtrace - it provides a richer API for kernel tracing, but it can allow dangerous user scripts to crash the kernel).

ftrace is quite vocal - if it detects anything it cannot handle in a loadable module, it will log a message - indicating what and why it couldnt handle the instruction sequence it meets. This results in a kernel log message, such as the following, when dtrace is loaded:

[  357.679832] WARNING: at kernel/trace/ftrace.c:1509 ftrace_bug+0x28f/0x2e0()
[  357.679834] Hardware name: VirtualBox
[  357.679835] Modules linked in: dtracedrv(PO+) lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev snd_intel8x0 parport_pc parport i2c_piix4 i2c_core e1000 snd_ac97_codec microcode ac97_bus snd_pcm snd_page_alloc snd_timer snd soundcore uinput [last unloaded: scsi_wait_scan]
[  357.679855] Pid: 2350, comm: insmod Tainted: P           O 3.3.1-3.fc17.x86_64 #1
[  357.679857] Call Trace:
[  357.679862]  [] warn_slowpath_common+0x7f/0xc0
[  357.679866]  [] ? __probe_kernel_read+0x46/0x70
[  357.679874]  [] ? ctf_hash_size+0x4/0x20 [dtracedrv]
[  357.679881]  [] ? ctf_hash_size+0x4/0x20 [dtracedrv]
[  357.679883]  [] warn_slowpath_null+0x1a/0x20
[  357.679885]  [] ftrace_bug+0x28f/0x2e0
[  357.679888]  [] ftrace_process_locs+0x39c/0x560
[  357.679891]  [] ftrace_module_notify+0x47/0x50
[  357.679895]  [] notifier_call_chain+0x4d/0x70
[  357.679898]  [] __blocking_notifier_call_chain+0x58/0x80
[  357.679901]  [] blocking_notifier_call_chain+0x16/0x20
[  357.679904]  [] sys_init_module+0x10a3/0x20b0
[  357.679906]  [] system_call_fastpath+0x16/0x1b
[  357.679908] ---[ end trace 54106b526adf7ab1 ]---
[  357.679909] ftrace failed to modify [] ctf_hash_size+0x4/0x20 [dtracedrv]
[  357.679915]  actual: e8:0f:62:03:00

Here, the code in ftrace.c at line 1509, is complaining. Its not obvious what its complaining about, but the last two lines highlight what ftrace was confused about. The "actual" message shows the instruction it didnt like. Those bytes correspond to:

e8 0f 62 03 00          call   0x00036214

I dont know what it doesnt like about that, but it is possibly suggesting that ftrace, in computing the call-graph of the kernel, may have found a code path which could result in a kernel hang (this might be dtrace violating a kernel API constraint, or, because ftrace doesnt realise how dtrace works).

To date, I have not observed any side effect of this warning (I havent tried to use ftrace on a system running dtrace, so its possible that ftrace or dtrace may cause a hang, when they are enabled, together).

By the way - always treat kernel stack traces (such as this) or even from dtrace itself, with suspicion. Because of the way GCC compiles code, it is not possible to create a 100% accurate stack trace in all scenarios. The stack walker looks at all the words on the stack and lists out potentially likely active stack frames. (The "?" marks in the trace show the stack walker highlighting definitely questionable stack boundaries; walking the stack has enough gotchas and complexity that maybe I will return to the subject in a future blog topic).


Posted at 21:09:56 by fox | Permalink
  New CRiSP Features coming Sunday, 01 April 2012  
CRiSP has had an impressive life - starting out as an editor on MSDOS-3.0. It has been updated to run on all CPUs, operating systems and platforms - from 16-bit to 80-bit processors (yes, 80, it was ported to the CRAY-1).

I am just finalising some new features (document map, annotation window), and a new feature, which some people may find useful - file-saving.

Until recently, you could use CRiSP to view files and explore aspects of the files and directories, such as cross-referencing. You had the ability to do infinite-undo (years before anyone else could do the same), and CRiSP was the first syntax coloring editor on the Unix platform.

But the new feature means that as well as editing your files, you can now save them. Previously, you could load a file, and make changes. On your next session, you could make additional changes, but only as long as you remade the prior sessions edits.

When storing files in source code control systems, and building new versions of software, having to type in your edits repeatedly, was a slight problem. (CRiSP has a fast-autorepeat, so this helped enormously when scrolling to relevant parts of a buffer).

With the "File/Save" feature (which no other editor on the market yet supports, although I am guessing some will try to copy CRiSP's unique traits), your files are saved, permanently. When you return to editing, CRiSP will be able to retrieve the last modified file.

Its taken a while to get this feature working properly, and had me doing some research into various algorithms which allow for file saving. (Many of the algorithms are quite technical, and caused me much *grief* trying to get all permutations working).

The first file-saving implementation is geared around Cellular automata (otherwise known as "Paper"). With this, you commit your changes to paper, and when you restart CRiSP, you no longer have to remember what changes you have made, since the "Paper" copy will let you see immediately. The advantage of this "cellular" technology is that you can keep older versions of the files you have saved.


Posted at 13:45:16 by fox | Permalink