Saturday, January 24, 2009

Freebsd kernel process tracing

ktrace utility enables to trace and log kernel system calls made by process. By default, it logs to ktrace.out file but this can be overwritten by providing another log filename with -f parameter. You need to pass your command to ktrace or use the pid of a running process.
Also you have to say the kernel what system calls to trace by -t parameter. -t parameter has following options:

c trace system calls
n trace namei translations
i trace I/O
s trace signal processing
t trace various structures
u userland traces
w context switches
+ trace the default set of trace points - c, n, i, s, t, u

While tracing is going in kernel , logging stops when the process stops execution or trace popint ends. The other way is to use -c parameter of the ktrace and provide the pid of the process to stop tracing any further.

to trace and log any running processes simply use the -p parameter and pass the process number (pid) to ktrace. here is a simple example with ktrace to trace find command:

# ktrace -t+w /usr/bin/find /

and following is a short snippet from the log created by ktrace and dispayled using kdump utility

# kdump -f ktrace.out

The log file created by ktrace can be read with kdump utility. Simply pass your ktrace log filename to kdump with -f parameter.

Thursday, January 22, 2009

freebsd network tuning

Few days ago, I couldn't reach on of my freebsd 7.1 servers via ssh. Machine was not giving any response to any packets on the network. This is a server which gets moderate network traffic created by short and long lived network connections. After log in from console, I ran the "vmstat -z" command.
Looking closely to vmstat -z output, I figured out that some kernel zone allocations failed for following zones:
mbuf_cluster: 2048, 25600, 1278, 24322, 393553280, 1384
tcptw: 52, 5184, 0, 5184, 3348441, 1304539
tcpreass: 20, 1690, 0, 1690, 10759020, 503124

tcptw -> tcp timewait
tcpreass -> tcp reassembly
mbuf_cluser -> network buffer data stored by freebsd kernel

I decided to bump the default numbers. I have used /etc/sysctl.conf to increase
net.inet.tcp.maxtcptw = 12000
kern.ipc.nmbclusters = 32768

numbers. then run /etc/rc.d/sysctl restart. you can run vmstat -z to see if the numbers are in effect. tcpreass is a bit different. It should be written in /boot/loader.conf file and will take effect after rebooting your machine.

net.inet.tcp.reass.maxsegments = 4096

After rebooting my machine I checked the results but I didn't get tcpreass numbers as I wrote to boot loader.conf file. I decided to look what happened. I examined the freebsd 7.1 kernel sources and see tcp_reass_init() has an EVENTHANDLER_REGISTER which calls tcp_reass_zone_change() function when nmbclusters numbers changed by sysctl. The rule is simple, when the machine boots
tcp_reass_init functions calculates tcpreass default value as nmbclusters / 16. But if you add net.inet.tcp.reass.maxsegments to your boot loader it skips the auto calculated default and gets the number you've given. Then as default register a event handler to watch nmbclusters changes by users for auto calculate tcpreass again.In my case it gets the given number but when systcl gets into the account kernel detects that nmbclusters changed and recalculates it. So, I decied to remove my parameter from boot loader.conf and instead of that increase the mbuf cluster number to a little bit higer via sysctl.conf. The reasons here is that tcpreass queue uses mbufs, therefore it's auto tuned by freebsd kernel to not run out of mbuf clusters on system.

NOTE: put this accounting that each mbuf cluster allocates 2KB in memory.