Commit graph

89135 commits

Author SHA1 Message Date
Patrick McHardy 4910a08799 [NETFILTER]: nf_nat: add DCCP protocol support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:50 +02:00
Patrick McHardy 2bc780499a [NETFILTER]: nf_conntrack: add DCCP protocol support
Add DCCP conntrack helper. Thanks to Gerrit Renker <gerrit@erg.abdn.ac.uk>
for review and testing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:49 +02:00
Patrick McHardy d63a650736 [NETFILTER]: Add partial checksum validation helper
Move the UDP-Lite conntrack checksum validation to a generic helper
similar to nf_checksum() and make it fall back to nf_checksum()
in case the full packet is to be checksummed and hardware checksums
are available. This is to be used by DCCP conntrack, which also
needs to verify partial checksums.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:49 +02:00
Patrick McHardy 6185f870e2 [NETFILTER]: nf_nat: add UDP-Lite support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:48 +02:00
Patrick McHardy 2d2d84c40e [NETFILTER]: nf_nat: remove unused name from struct nf_nat_protocol
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:48 +02:00
Patrick McHardy ca6a507490 [NETFILTER]: nf_conntrack_netlink: clean up NAT protocol parsing
Move responsibility for setting the IP_NAT_RANGE_PROTO_SPECIFIED flag
to the NAT protocol, properly propagate errors and get rid of ugly
return value convention.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:47 +02:00
Patrick McHardy 535b57c7c1 [NETFILTER]: nf_nat: move NAT ctnetlink helpers to nf_nat_proto_common
Move to nf_nat_proto_common and rename to nf_nat_proto_... since they're
also used by protocols that don't have port numbers.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:47 +02:00
Patrick McHardy 5abd363f73 [NETFILTER]: nf_nat: fix random mode not to overwrite port rover
The port rover should not get overwritten when using random mode,
otherwise other rules will also use more or less random ports.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:46 +02:00
Patrick McHardy 937e0dfd87 [NETFILTER]: nf_nat: add helpers for common NAT protocol operations
Add generic ->in_range and ->unique_tuple ops to avoid duplicating them
again and again for future NAT modules and save a few bytes of text:

net/ipv4/netfilter/nf_nat_proto_tcp.c:
  tcp_in_range     |  -62 (removed)
  tcp_unique_tuple | -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0
 2 functions changed, 321 bytes removed

net/ipv4/netfilter/nf_nat_proto_udp.c:
  udp_in_range     |  -62 (removed)
  udp_unique_tuple | -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0
 2 functions changed, 321 bytes removed

net/ipv4/netfilter/nf_nat_proto_gre.c:
  gre_in_range |  -62 (removed)
 1 function changed, 62 bytes removed

vmlinux:
 5 functions changed, 704 bytes removed

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:46 +02:00
Patrick McHardy 544473c166 [NETFILTER]: {ip,ip6,arp}_tables: return EAGAIN for invalid SO_GET_ENTRIES size
Rule dumping is performed in two steps: first userspace gets the
ruleset size using getsockopt(SO_GET_INFO) and allocates memory,
then it calls getsockopt(SO_GET_ENTRIES) to actually dump the
ruleset. When another process changes the ruleset in between the
sizes from the first getsockopt call doesn't match anymore and
the kernel aborts. Unfortunately it returns EAGAIN, as for multiple
other possible errors, so userspace can't distinguish this case
from real errors.

Return EAGAIN so userspace can retry the operation.

Fixes (with current iptables SVN version) netfilter bugzilla #104.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:45 +02:00
Patrick McHardy fa913ddf63 [NETFILTER]: nf_conntrack_sip: clear address in parse_addr()
Some callers pass uninitialized structures, clear the address to make
sure later comparisions work properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:45 +02:00
Jan Engelhardt c2f9c68398 [NETFILTER]: Explicitly initialize .priority in arptable_filter
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:44 +02:00
Jan Engelhardt 3bb0362d2f [NETFILTER]: remove arpt_(un)register_target indirection macros
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:44 +02:00
Jan Engelhardt 95eea855af [NETFILTER]: remove arpt_target indirection macro
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:43 +02:00
Jan Engelhardt 4abff0775d [NETFILTER]: remove arpt_table indirection macro
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:43 +02:00
Jan Engelhardt 72b72949db [NETFILTER]: annotate rest of nf_nat_* with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:42 +02:00
Jan Engelhardt 58c0fb0ddd [NETFILTER]: annotate rest of nf_conntrack_* with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:42 +02:00
Jan Engelhardt 5452e425ad [NETFILTER]: annotate {arp,ip,ip6,x}tables with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:35 +02:00
Jan Engelhardt 3cf93c96af [NETFILTER]: annotate xtables targets with const and remove casts
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:05 +02:00
Jan Engelhardt b9f61b1603 [NETFILTER]: xt_sctp: simplify xt_sctp.h
The use of xt_sctp.h flagged up -Wshadow warnings in userspace, which
prompted me to look at it and clean it up. Basic operations have been
directly replaced by library calls (memcpy, memset is both available
in the kernel and userspace, and usually faster than a self-made
loop). The is_set and is_clear functions now use a processing time
shortcut, too.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:04 +02:00
Robert P. J. Day fdccecd0cc [NETFILTER]: Use non-deprecated __RW_LOCK_UNLOCKED macro
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:03 +02:00
Robert P. J. Day 0718300c06 [NETFILTER]: bridge netfilter: use non-deprecated __RW_LOCK_UNLOCKED macro.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:03 +02:00
Alexey Dobriyan 666953df35 [NETFILTER]: ip_tables: per-netns FILTER/MANGLE/RAW tables for real
Commit 9335f047fe aka
"[NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW"
added per-netns _view_ of iptables rules. They were shown to user, but
ignored by filtering code. Now that it's possible to at least ping loopback,
per-netns tables can affect filtering decisions.

netns is taken in case of
	PRE_ROUTING, LOCAL_IN -- from in device,
	POST_ROUTING, LOCAL_OUT -- from out device,
	FORWARD -- from in device which should be equal to out device's netns.
		   This code is relatively new, so BUG_ON was plugged.

Wrappers were added to a) keep code the same from CONFIG_NET_NS=n users
(overwhelming majority), b) consolidate code in one place -- similar
changes will be done in ipv6 and arp netfilter code.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:02 +02:00
Patrick McHardy 36e2a1b0f7 [NETFILTER]: {ip,ip6}t_LOG: print MARK value in log output
Dump the mark value in log messages similar to nfnetlink_log. This
is useful for debugging complex setups where marks are used for
routing or traffic classification.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:01 +02:00
Alexey Dobriyan b916f7d4b7 [NETFILTER]: nf_conntrack: less hairy ifdefs around proc and sysctl
Patch splits creation of /proc/net/nf_conntrack, /proc/net/stat/nf_conntrack
and net.netfilter hierarchy into their own functions with dummy ones
if PROC_FS or SYSCTL is not set. Also, remove dead "ret = 0" write
while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:01 +02:00
Patrick McHardy 159d83363b [BRIDGE]: Fix crash in __ip_route_output_key with bridge netfilter
The bridge netfilter code attaches a fake dst_entry with a pointer to a
fake net_device structure to skbs it passes up to IPv4 netfilter. This
leads to crashes when the skb is passed to __ip_route_output_key when
dereferencing the namespace pointer.

Since bridging can currently only operate in the init_net namespace,
the easiest fix for now is to initialize the nd_net pointer of the
fake net_device struct to &init_net.

Should fix bugzilla 10323: http://bugzilla.kernel.org/show_bug.cgi?id=10323

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:46:01 -07:00
Pavel Emelyanov 4dee959723 [NETFILTER]: ipt_CLUSTERIP: fix race between clusterip_config_find_get and _entry_put
Consider we are putting a clusterip_config entry with the "entries"
count == 1, and on the other CPU there's a clusterip_config_find_get
in progress:

CPU1:							CPU2:
clusterip_config_entry_put:				clusterip_config_find_get:
if (atomic_dec_and_test(&c->entries)) {
	/* true */
							read_lock_bh(&clusterip_lock);
							c = __clusterip_config_find(clusterip);
							/* found - it's still in list */
							...
							atomic_inc(&c->entries);
							read_unlock_bh(&clusterip_lock);

	write_lock_bh(&clusterip_lock);
	list_del(&c->list);
	write_unlock_bh(&clusterip_lock);
	...
	dev_put(c->dev);

Oops! We have an entry returned by the clusterip_config_find_get,
which is a) not in list b) has a stale dev pointer.

The problems will happen when the CPU2 will release the entry - it
will remove it from the list for the 2nd time, thus spoiling it, and
will put a stale dev pointer.

The fix is to make atomic_dec_and_test under the clusterip_lock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 00:44:52 -07:00
Gerrit Renker f5572855ec [SKB]: __skb_queue_tail = __skb_insert before
This expresses __skb_queue_tail() in terms of __skb_insert(),
using __skb_insert_before() as auxiliary function.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:05:28 -07:00
Gerrit Renker 7de6c03336 [SKB]: __skb_append = __skb_queue_after
This expresses __skb_append in terms of __skb_queue_after, exploiting that

  __skb_append(old, new, list) = __skb_queue_after(list, old, new).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:05:09 -07:00
Gerrit Renker bf29927588 [SKB]: __skb_queue_after(prev) = __skb_insert(prev, prev->next)
By reordering, __skb_queue_after() is expressed in terms of __skb_insert().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:04:51 -07:00
Gerrit Renker f525c06d12 [SKB]: __skb_dequeue = skb_peek + __skb_unlink
By rearranging the order of declarations, __skb_dequeue() is expressed in terms of

 * skb_peek() and
 * __skb_unlink(),

thus in effect mirroring the analogue implementation of __skb_dequeue_tail().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:04:12 -07:00
Rami Rosen 0912ea38de [IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().
This patches adds a call to increment IPSTATS_MIB_OUTFORWDATAGRAMS
when forwarding the packet in ip6_mr_forward() in the IPv6 multicast
routing module (net/ipv6/ip6mr.c).

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:59:13 -07:00
YOSHIFUJI Hideaki 9625ed72e8 [IPV6] ADDRCONF: Don't generate temporary address for ip6-ip6 interface.
As far as I can remember, I was going to disable privacy extensions
on all "tunnel" interfaces.  Disable it on ip6-ip6 interface as well.

Also, just remove ifdefs for SIT for simplicity.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:47:11 -07:00
YOSHIFUJI Hideaki b077d7abab [IPV6] ADDRCONF: Ensure disabling multicast RS even if privacy extensions are disabled.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:42:18 -07:00
YOSHIFUJI Hideaki e9df2e8fd8 [IPV6]: Use appropriate sock tclass setting for routing lookup.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:40:51 -07:00
YOSHIFUJI Hideaki 7cd636fe9c [IPV6]: IPv6 extension header structures need to be packed.
struct ipv6_opt_hdr is the common structure for IPv6 extension
headers, and it is common to increment the pointer to get
the real content.  On the other hand, since the structure
consists only of 1-byte next-header field and 1-byte length
field, size of that structure depends on architecture; 2 or 4.
Add "packed" attribute to get 2.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:33:52 -07:00
Jan Engelhardt 0b18542b7f [NET]: Sink IPv6 menuoptions into its own submenu
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:30:47 -07:00
YOSHIFUJI Hideaki e7712f1a7c [IPV6]: Share common code-paths for sticky socket options.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:21:52 -07:00
YOSHIFUJI Hideaki cee8947338 [IPV6] MROUTE: Do not call ipv6_find_idev() directly.
Since NETDEV_REGISTER notifier chain is responsible for creating
inet6_dev{}, we do not need to call ipv6_find_idev() directly here.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:21:16 -07:00
David S. Miller b45e9189c0 [IPV6]: Fix ipv6 address fetching in raw6_icmp_error().
Fixes kernel bugzilla 10437

Based almost entirely upon a patch by Dmitry Butskoy.

When deciding what raw sockets to deliver the ICMPv6
to, we should use the addresses in the ICMPv6 quoted
IPV6 header, not the top-level one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:14:15 -07:00
Patrick McHardy 2ed9926e16 [NET]: Return more appropriate error from eth_validate_addr().
Paul Bolle wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9923 would have been much easier to
> track down if eth_validate_addr() would somehow complain aloud if an address 
> is invalid. Shouldn't it make at least some noise?

I guess it should return -EADDRNOTAVAIL similar to eth_mac_addr()
when validation fails.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:45:40 -07:00
Paul Bolle d2dcba612b [ISDN]: Do not validate ISDN net device address prior to interface-up
Commit bada339 (Validate device addr prior to interface-up) caused a regression
in the ISDN network code, see: http://bugzilla.kernel.org/show_bug.cgi?id=9923
The trivial fix is to remove the pointer to eth_validate_addr() in the
net_device struct in isdn_net_init().
    
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:44:20 -07:00
Pavel Emelyanov 671a1c7401 [NETNS][DCCPV6]: Make per-net socket lookup.
The inet6_lookup family of functions requires a net to lookup
a socket in, so give a proper one to them.

No more things to do for dccpv6, since routing is OK and the
ipv4-like transport layer filtering is not done for ipv6.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:33:06 -07:00
Pavel Emelyanov 334527d351 [NETNS][DCCPV6]: Actually create ctl socket on each net and use it.
Move the call to inet_ctl_sock_create to init callback (and
inet_ctl_sock_destroy to exit one) and use proper ctl sock
in dccp_v6_ctl_send_reset.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:32:45 -07:00
Pavel Emelyanov 0204774191 [NETNS][DCCPV6]: Move the dccp_v6_ctl_sk on the struct net.
And replace all its usage with init_net's socket.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:32:25 -07:00
Pavel Emelyanov 8231bd270d [NETNS][DCCPV6]: Add dummy per-net operations.
They will be responsible for ctl socket initialization, but
currently they are void.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:32:02 -07:00
Pavel Emelyanov 68d185980f [NETNS][DCCPV6]: Don't pass NULL to ip6_dst_lookup.
This call uses the sock to get the net to lookup the routing
in. With CONFIG_NET_NS this code will OOPS, since the sk ptr
is NULL.

After looking inside the ip6_dst_lookup and drawing the analogy
with respective ipv6 code, it seems, that the dccp ctl socket 
is a good candidate for the first argument.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:31:32 -07:00
Pavel Emelyanov fc5f8580d3 [NETNS][DCCPV4]: Enable DCCPv4 in net namespaces.
This enables sockets creation with IPPROTO_DCCP and enables
the ip level to pass DCCP packets to the DCCP level.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:31:05 -07:00
Pavel Emelyanov b9901a84c9 [NETNS][DCCPV4]: Make per-net socket lookup.
The inet_lookup family of functions requires a net to lookup
a socket in, so give a proper one to them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:30:43 -07:00
Pavel Emelyanov f54873982c [NETNS][DCCPV4]: Use proper net to route the reset packet.
The dccp_v4_route_skb used in dccp_v4_ctl_send_reset, currently
works with init_net's routing tables - fix it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:30:19 -07:00