Wednesday, May 11, 2016

Memory Corruption Detection – An approach using Remote Debugging


While working on an activity to identify the memory corruption issues on a C++ application built using highly asynchronous I/O methodology and time sensitive/dependent processing legs and where the call stack in most cases were typically of the order of 20 or more frames, making high number of possible ordering permutations of call frames to reach a particular function, it was becoming difficult and time-consuming to find those memory corruptions/segmentation violations that were happening on rare and specific call frame orders.
The quest to be efficient led to a something new learning. Before describing how the corruptions were handled, need to describe few terminologies.

Definitions

Memory Corruption

Memory corruption (as defined in Wikipedia [3]) occurs in a computer program when the contents of a memory location are unintentionally modified due to programming errors; this is termed violating memory safety. When the corrupted memory contents are used later in that program, it leads either to program crash or to strange and bizarre program behavior. Nearly 10% of application crashes on Windows systems are due to heap corruption.

Why is detecting memory corruption a tiresome activity?

Memory corruption is one of the most intractable class of programming errors, for two reasons [3]:
  1. The source of the memory corruption and its manifestation may be far apart, making it hard to correlate the cause and the effect.
  2. Symptoms appear under unusual conditions, making it hard to consistently reproduce the error.
  3. Many times normal runs of the program and even run using a debugger becomes successful, even though the program has lurking corruption issues.

Valgrind

Memory profiling tools like valgrind in addition to finding memory leaks, finds issues like a) illegal read/write, b) use of uninitialized and un-addressable values, c) illegal free, d) mismatch in de-allocation function, e) overlapping of source and destination block in copy operation and f) suspected memory size allocation.
But with valgrind one gets the memory profiling report at the end of the execution and there is no effective way to know the state of the system and process when the corruption started happening, i.e. when one of the above-mentioned error points were hit.

gdbserver

[3] gdbserver is a computer program that makes it possible to remotely debug other programs. Running on the same system as the program to be debugged, it allows the GNU Debugger to connect from another system; that is, only the executable to be debugged needs to be resident on the target system ("target"), while the source code and a copy of the binary file to be debugged reside on the developer’s local computer ("host").

Requirement

What was required here was to be able to use memory profiler (valgrind) along with the debugger (gdb), such that the profiler on hitting any of the valgrind monitorable errors breaks automatically allowing inspection of the states of various objects and threads that led to the scenario. Note that manual or conditional breakpoints were ineffective as in most of the cases the process flow was working fine, and breaking manually had impacts on other time sensitive legs.
Didn’t expected any tool to provide the combined functionality. But unexpectedly there was a way available by using the two tools in an integrative way.

Approach

Here what was required is the ability to use the functionality of the debugger (like gdb) together with  memory profiler (like valgrind), where the process on hitting any of the possible memory corruption points automatically pause/break. This allows one to inspects the state of the various process and thread variables/objects, that led to the condition.
This integration was very easily achievable using the valgrind remote debugging hook (its gdbserver implementation). Though remote debugging is originally meant for debugging a program remotely from a separate machine, it can be used on the same m/c as the program being run. This is what was used to detect nasty memory corruptions in a wild asynchronous multi-threaded world in a very simple and efficient way.
The below steps describes in brief what has to be done.
Run the program as below:
              $ valgrind –vgdb=yes –vgdb-error=0 <program name>
where vgdb is a built-in gdbserver on valgrind,
–vgdb-error allows specifying the no. of the errors after which the gdbserver should become active.
On another shell, issue the following commands:
gdb <program name>
$ (gdb)  target remote | vgdb
$ (gdb)  continue
Below is a snapshot of the shell where SIGTRAP was received by the gdb client application from valgrind run on another shell.
Now valgrind on encountering any suspected and valid errors (like invalid write, double free) will signal the gdbserver (vgdb) which in turn will inform the gdb, making the program to break, allowing inspection of the process, threads and objects state, to drill down on the situations leading to corruptions.

Conclusion

Experience to know the power of the approach. It was found it to be very effective in resolving the memory corruptions issues fast.

References

[1]    Debugging remote programs - http://davis.lbl.gov/Manuals/GDB/gdb_17.html
[2]    Using and understanding the Valgrind core: Advanced Topics - http://valgrind.org/docs/manual/manual-core-adv.html
[3]    Memory Corruption - https://en.wikipedia.org/wiki/Memory_corruption

Tuesday, April 5, 2016

Server Load Balancing using iptables

Load Balancing Objective

When a single resource type (computer, disk drive, network link, CPU, Memory, database server, application server etc.) is not able to achieve the desired workload requirement, one has to resort to one of the load balancing methods to distribute the workload among multiple servers. The load balancing approach chosen should optimize resource use, maximize throughput minimize response time and ensure fair distribution of load across the resources.

Figure 1 - Load Balancer balancing traffic from n clients to m servers

Methods (iptables)

In many myriad ways (L2-L7) load balancing can be done. This paper focuses on a load-balancing approach using iptables (available on Linux) which is a Layer-3 based cost-effective and efficient way to implement server load-balancing. With the data forwarding path being operating at the Layer-3 (IP Layer) of the network layer stack in the OS kernel very high throughput is achievable using a low-end hardware in comparison to any other approach that uses user-plane/proxy based load-balancing. Considering that the overhead of layer-7 switching in user-space being very high (excluding Intel DPDK), having load-balancing decision tree inside a kernel module avoids the overhead of context switching  between user-space and kernel-space.

Netfilter

iptables uses Netfilter which is a framework provided by the Linux kernel that allows various networking-related operations to be implemented in the form of customized handlers. Netfilter offers various functions and operations for packet filtering, network address translation, and port translation, which provide the functionality required for directing packets through a network, as well as for providing ability to prohibit packets from reaching sensitive locations within a computer network.
Netfilter extension is done by merely using a series of hooks provided by it in various points in an IPv4/IPv6 protocol stack.
  Figure 2 - Packet Traversing in the Netfilter System
Kernel modules can register to listen at any of these hooks which gets called from the core networking code. Each module registered is called in the order of priorities, and is free to manipulate the packet. The custom module can respond back to Netfilter core framework with one of the following action:
  • NF_ACCEPT: continue traversal as normal.
  • NF_DROP: drop the packet; don't continue traversal.
  • NF_STOLEN: I've taken over the packet; don't continue traversal.
  • NF_QUEUE: queue the packet (usually for userspace handling).
  • NF_REPEAT: call this hook again.
The priority numbers associated with the hooks to provide a deterministic ordering of the rules.

iptables

iptables is the user space module for packet selection, modification and action built over the Netfilter framework. Iptables where it creates the following tables for packet processing:
  • ‘filter’ table: for packet filtering
  • ‘nat’ table: for Network Address Translation.
  • ‘mangle’ table: for pre-route packet mangling. Mangling is the modification of IP header fields, like IP addresses for NAT and TTL (Time to Live) and TOS (Type of Service) for QoS.
Tables like ‘conntrack’, ‘raw’ have been left out of the scope to focus on the core framework only. Iptables also allows registration of new rules tables and makes the packet traverse through it. Each tables ultimately gets registered to one of the Netfilter hooks.

Figure 3 - iptables hooks for tables

Iptables Chains

The iptables rules are grouped into chains which in turn are contained in tables (filter, mangle, nat) mentioned above or any of the user-defined tables/sub-tables. iptables allows rule chains to be associated to the hook-points. Following are the chains supported:
Chain
NetFilter Hook
Allowed Tables
Description
INPUT
NF_IP_LOCAL_IN
mangle, filter
Contains rules for packets terminating on the  host
OUTPUT
NF_IP_LOCAL_OUT
nat, mangle, filter
Contains rules for packets originating from the  host
FORWARD
NF_IP_FORWARD
mangle, filter
Contains rules for packets that neither originate nor should terminate on the  host
PREROUTING
NF_IP_PREROUTING
mangle, nat
Contains rules packets that traverses this chain before the routing decision is made
POSTROUTING
NF_IP_POSTROUTING
mangle, nat
Contains rules packets traverses this chain after the routing decision is made

Filtering Criteria

Matching Criteria involves - protocol type, destination or source address, destination or source port, destination or source network, input or output interface, headers, or connection state among many other criteria.  Also extensibility is possible to allow new fields, e.g. regular expression on the payload section.

Simple Load Balancing Rules Example

Rules:
iptables -t nat -A PREROUTING -p tcp --dport 80 -m state --state NEW -m statistic --mode nth --every 2 --packet 0 -j DNAT --to-destination 10.1.1.1:80
iptables -t nat -A PREROUTING -p tcp --dport 80 -m state --state NEW -m statistic --mode nth --every 1 --packet 0 -j DNAT --to-destination 10.1.1.2:80
The above 2 rules does simple round-robin load-balancing between two Web Servers (10.1.1.1 and 10.1.1.2) for any incoming request on port 80 on the load-balancer data-path by mangling the Destination NAT IP address.
More complex rules using no. of connections, traffic rate, payload etc. can also be created using iptables.

Highlights of the Solution

Direct Routing L3/L4 Load Balancer - Suited for highly efficient server load balancing providing a transparent path. Note that any proxy like approach where two IP (TCP/UDP) connections are bridged has very high processing overhead due to two full network stack traversal and kernel to user-space switching for handling packets.
NAT – Allows the load balancer to become entry (ingress router) and exit (gateway) point of the cluster of servers. Note that here the client do not always needs to forward request to be load-balancer IP address but can be one of the IP address (unassigned public IP address) advertised by the Gateway to the external network.
Simplified Load Balancer – Very low development effort as the required functionalities are realized with the existing software framework (Netfilter and Iptables) available on Linux OS.
Stateful Load Balancer – iptables allows capability of stateful packet filtering, with states been NEW, ESTABLISHED, RELATED and INVALID. Example of RELATED being the FTP Control and Data Session, where the DATA session is related to the earlier established CONTROL session. The stateful capability can be used in load balancer where the decision to load balance is be made on the NEW state and rest of the handling for subsequent packets is done by the NAT session tables.
Connection and Traffic Accounting – iptables allows accounting extensions where the b/w (ingress/egress packets and bytes counts) can be used to arrive at the load balancing decision.
Protocol Specific Decision – Common protocols and port based rules can be created to handle interesting traffic differently. Say if the requirement is to load-balance only the web requests and pass through others.
Extensibility – High level of extensibility through the hooks, like load-balancing decision based on Deep Packet Inspection (DPI). E.g. inspection of packets payload to arrive at the decision.

Conclusion

The NetFilter packet filtering framework and the Iptables are meant to implement firewall solutions on Linux servers. The NetFilter kernel hooks work at Layer-3 of the networking stack and provide powerful control over packets as they are processed by the system. The extensible framework of NetFilter allows creating high throughput stable, feature-rich load-balancer with very low development effort if any.

References

[3]    Simple Stateful Load Balancer with iptables and NAT - https://www.webair.com/community/simple-stateful-load-balancer-with-iptables-and-nat/
[4]    Advanced Features of netfilter/iptables - http://linuxgazette.net/108/odonovan.html
[5]    Firewalling with netfilter/iptables - http://linuxgazette.net/103/odonovan.html
[6]    Load balancing (computing) - https://en.wikipedia.org/wiki/Load_balancing_(computing)