linux/net/ipv4
Neal Cardwell d88270eef4 tcp: fix tcp_mark_head_lost to check skb len before fragmenting
This commit fixes a corner case in tcp_mark_head_lost() which was
causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.

tcp_mark_head_lost() was assuming that if a packet has
tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
maintained, this is not always true.

For example, suppose the sender sends 4 1-byte packets and have the
last 3 packet sacked. It will merge the last 3 packets in the write
queue into an skb with pcount = 3 and len = 3 bytes. If another
recovery happens after a sack reneging event, tcp_mark_head_lost()
may attempt to split the skb assuming it has more than 2*MSS bytes.

This sounds very counterintuitive, but as the commit description for
the related commit c0638c247f ("tcp: don't fragment SACKed skbs in
tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
coalesces adjacent regions of SACKed skbs, and when doing this it
preserves the sum of their packet counts in order to reflect the
real-world dynamics on the wire. The c0638c247f commit tried to
avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
where the non-proportionality between pcount and skb->len/mss is known
to be possible. However, that commit did not handle the case where
during a reneging event one of these weird SACKed skbs becomes an
un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.

The fix is to simply mark the entire skb lost when this happens.
This makes the recovery slightly more aggressive in such corner
cases before we detect reordering. But once we detect reordering
this code path is by-passed because FACK is disabled.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 16:02:48 -08:00
..
netfilter inet: frag: Always orphan skbs inside ip_defrag() 2016-01-28 16:00:46 -08:00
af_inet.c net: add validation for the socket syscall protocol argument 2015-12-14 16:09:30 -05:00
ah4.c ah4: Fix error return in ah_input(). 2015-08-25 13:38:50 -07:00
arp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-20 06:08:27 -07:00
cipso_ipv4.c
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c netlink: Rightsize IFLA_AF_SPEC size calculation 2015-10-21 19:15:20 -07:00
esp4.c esp4: Switch to new AEAD interface 2015-05-28 11:23:20 +08:00
fib_frontend.c net: Flush local routes when device changes vrf association 2015-12-13 23:58:44 -05:00
fib_lookup.h ipv4: consider TOS in fib_select_default 2015-07-24 22:46:11 -07:00
fib_rules.c net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
fib_semantics.c net: Fix prefsrc lookups 2015-11-04 21:34:37 -05:00
fib_trie.c fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key 2015-10-27 18:14:51 -07:00
fou.c udp: restrict offloads to one namespace 2016-01-10 17:28:24 -05:00
gre_demux.c gre: Remove support for sharing GRE protocol hook. 2015-08-10 14:03:54 -07:00
gre_offload.c ipv6: gre: support SIT encapsulation 2015-10-26 22:01:18 -07:00
icmp.c Revert "ipv4/icmp: redirect messages can use the ingress daddr as source" 2015-10-14 06:01:07 -07:00
igmp.c ipv4: igmp: Allow removing groups from a removed interface 2015-12-03 12:07:05 -05:00
inet_connection_sock.c tcp: ensure proper barriers in lockless contexts 2015-11-15 18:36:38 -05:00
inet_diag.c net: diag: support v4mapped sockets in inet_diag_find_one_icsk() 2016-01-20 18:51:31 -08:00
inet_fragment.c inet: kill unused skb_free op 2016-01-05 22:25:57 -05:00
inet_hashtables.c tcp/dccp: fix hashdance race for passive sessions 2015-10-23 05:42:21 -07:00
inet_lro.c
inet_timewait_sock.c tcp/dccp: fix timewait races in timer handling 2015-09-21 16:32:29 -07:00
inetpeer.c net: Add helper function to compare inetpeer addresses 2015-08-28 13:32:36 -07:00
ip_forward.c net: Pass net into dst_output and remove dst_output_okfn 2015-10-08 04:26:54 -07:00
ip_fragment.c inet: frag: Always orphan skbs inside ip_defrag() 2016-01-28 16:00:46 -08:00
ip_gre.c ip_tunnel: Move stats update to iptunnel_xmit() 2015-12-25 23:32:23 -05:00
ip_input.c ipv4: Pass struct net into ip_defrag and ip_check_defrag 2015-10-12 19:44:16 -07:00
ip_options.c
ip_output.c net: preserve IP control block during GSO segmentation 2016-01-15 14:35:24 -05:00
ip_sockglue.c ipv4: fix a potential deadlock in mcast getsockopt() path 2015-11-04 21:29:59 -05:00
ip_tunnel.c ip_tunnel: Move stats update to iptunnel_xmit() 2015-12-25 23:32:23 -05:00
ip_tunnel_core.c ipv4: fix endianness warnings in ip_tunnel_core.c 2016-01-08 21:30:43 -05:00
ip_vti.c ip_tunnel: Move stats update to iptunnel_xmit() 2015-12-25 23:32:23 -05:00
ipcomp.c
ipconfig.c net/ipv4/ipconfig: Rejoin broken lines in console output 2015-11-24 12:00:09 -05:00
ipip.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-31 18:20:10 -05:00
ipmr.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
Kconfig ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV 2016-01-25 10:45:41 -08:00
Makefile tcp: track the packet timings in RACK 2015-10-21 07:00:48 -07:00
netfilter.c ipv4: Pass struct net into ip_route_me_harder 2015-09-29 20:21:32 +02:00
ping.c ipv4: eliminate lock count warnings in ping.c 2016-01-08 21:30:43 -05:00
proc.c net: track success and failure of TCP PMTU probing 2015-07-21 22:36:33 -07:00
protocol.c
raw.c net: Propagate lookup failure in l3mdev_get_saddr to caller 2016-01-04 22:58:30 -05:00
route.c net: Do not drop to make_route if oif is l3mdev 2015-10-08 05:18:47 -07:00
syncookies.c net: Allow accepted sockets to be bound to l3mdev domain 2015-12-18 14:43:38 -05:00
sysctl_net_ipv4.c ipv4: Namespecify the tcp_keepalive_intvl sysctl knob 2016-01-10 17:32:09 -05:00
tcp.c net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label 2016-01-14 16:00:49 -08:00
tcp_bic.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_cdg.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_cong.c tcp: remove tcp_ecn_make_synack() socket argument 2015-09-25 13:00:38 -07:00
tcp_cubic.c tcp_cubic: do not set epoch_start in the future 2015-09-17 22:35:07 -07:00
tcp_dctcp.c tcp: allow dctcp alpha to drop to zero 2015-10-23 02:46:52 -07:00
tcp_diag.c net: diag: Support destroying TCP sockets. 2015-12-15 23:26:52 -05:00
tcp_fastopen.c tcp/dccp: fix hashdance race for passive sessions 2015-10-23 05:42:21 -07:00
tcp_highspeed.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_htcp.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_hybla.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_illinois.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_input.c tcp: fix tcp_mark_head_lost to check skb len before fragmenting 2016-01-28 16:02:48 -08:00
tcp_ipv4.c tcp: fix NULL deref in tcp_v4_send_ack() 2016-01-21 11:20:14 -08:00
tcp_lp.c
tcp_memcontrol.c mm: memcontrol: switch to the updated jump-label API 2016-01-14 16:00:49 -08:00
tcp_metrics.c net: Add helper function to compare inetpeer addresses 2015-08-28 13:32:36 -07:00
tcp_minisocks.c tcp: honour SO_BINDTODEVICE for TW_RST case too 2015-12-22 17:03:05 -05:00
tcp_offload.c tcp: reserve tcp_skb_mss() to tcp stack 2015-06-11 16:33:10 -07:00
tcp_output.c net: tcp_memcontrol: simplify linkage between socket and page counter 2016-01-14 16:00:49 -08:00
tcp_probe.c
tcp_recovery.c tcp: use RACK to detect losses 2015-10-21 07:00:53 -07:00
tcp_scalable.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_timer.c ipv4: Namespecify the tcp_keepalive_intvl sysctl knob 2016-01-10 17:32:09 -05:00
tcp_vegas.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_vegas.h
tcp_veno.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_westwood.c
tcp_yeah.c tcp_yeah: don't set ssthresh below 2 2016-01-11 17:25:16 -05:00
tunnel4.c
udp.c udp: fix potential infinite loop in SO_REUSEPORT logic 2016-01-19 13:52:25 -05:00
udp_diag.c soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF 2016-01-04 22:49:59 -05:00
udp_impl.h
udp_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-01-11 23:55:43 -05:00
udp_tunnel.c ip_tunnel: Move stats update to iptunnel_xmit() 2015-12-25 23:32:23 -05:00
udplite.c
xfrm4_input.c netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2015-12-22 16:26:31 -05:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c