Friday, February 13, 2009

IRQ affinity in Linux

Some hardware components like ethernet cards,disk controllers,etc. produces interrupts when needs to get attention from cpu. For example, when ethernet card receives packet from network. You can examine your machine's interrupts usage on cpu's by looking in /proc/interrupts proc entry.

# cat /proc/interrupts

This information includes which devices are working on which irq and how many interrupts processed by each cpu for this device.In normal cases, you will not face with a problem and no need to change irq handling process for any cpu.But in some cases, for example if you are running a linux box as firewall which has high incoming or outgoing traffic, this can cause problems. Suppose that you have 2 ethernet cards on your firewall and both ethernet cards handling many packets. In some cases you can see high cpu usage on one of your cpus.This can be caused by many interrupts produced by your network cards. You can check this by looking in /proc/interrupts and see if that cpu is handling interrupts of both cards. If this is the case, what you can do is looking for most idle cpus on your system and specify those ethernet card irqs to be served by each cpu seperately. But beware that you can only do this in a system with IO-APIC enabled device drivers. You can check if your device supports IO-APIC by looking /proc/interrupts.

In ethernet card case, there is an implementation called NAPI which can reduce interrupt usage on incoming network traffic.

You can see which irq is served by which cpu or cpus by looking in /proc/irq directory. Directory layout is very simple. For every used irq in system it is presented by a directory by it's irq number.Every directory contains a file called smp_affinity where you can set cpu settings. File content shows currently which cpu is serving this irq. The value is in hex format. Calculation is shown below.As you can see in example figure, eth0 is on irq 25 and eth1 is in irq 26. Let's say we want to set irq25 to be served only by cpu3. First of all, we have to calculate the value for cpu3 in hex value. Calculation is shown below.

            Binary       Hex
CPU 0 0001 1
CPU 1 0010 2
CPU 2 0100 4
+ CPU 3 1000 8
both 1111 f

Calculation is shown for 4 cpu system for simplicity. normally the value for all cpus on a system is represented by 8 digit hex value.As you can see in binary format every bit represents a cpu. We see that binary representation of cpu3 is 8 in hex. Then we write it into smp_affinity file for irq 25 as show below.

# echo 8 > /proc/irq/25/smp_affinity

You can check the setting by looking in to file content.

# cat /proc/irq/25/smp_affinity

Another example, let's say we want irq25 to handled by cpu0 and cpu1.

   CPU 0    0001         1
+ CPU 1 0010 2
0011 3
Setting bit for cpu0 and cpu1 is giving us the value 3.For example if we need all cpus to handle our device's irq, we set every bit in our calculation and write it into smp_affinity as hex value which is F. There is an implementation in Linux called irqbalance , where a daemon distributes interrupts automatically for every cpu in the system. But in some cases this is giving bad performance where you need to stop the service and do it manually by yourself as I described above for higher performance.Also, irqbalance configuration let's you to configure where it will not balance given irqs or use specified cpus. In this case you can configure it to not touch your manually configured irqs and your preferred cpus and let it run to automatically load balance rest of the irqs and cpus for you.


Oat's Personal Blog said...

I experienced overruns on my network interface when irqs are not balanced among all CPUs. But when they are all spread out there is no overrun.

For some reasons new Dell servers and linux kernels don't automatically balance out the irq anymore. Do you have any ideas?

Unknown said...

You got overrun probably kernel cannot handle incoming hardware interrupts due to high number network packets.

Handling network irq through many cpus is not a good idea, it probably give you latency when packet assembling is required.

I think you have two choices. First one is channel bonding.Second one is playing with InterruptThrottleRate. if you increase it, you'll get high cpu usage and lower latency. it's a trade off. lowering it, will give you less cpu usage but high latency on packets. Also, you need to increase RX queue limit. If we're talking about high incoming network traffic.

leighporter said...

I know this was a while ago but..
Is it the case that whatever core handles the interrupt will process the packet wether it be traversing a bridge, IP routing, IP reassembly (if required) or will nor core process the interrupt and then share subsequent processing across all cores?


Anonymous said...

How would the calculation if there were 8 cpus and wanted to manipulate the CPU 6?

Anonymous said...

Hi, thanks it works fine.
Under Ubuntu, sudoers don't have the right to perform it. It require to be logged as root.