2023 年即将过去,让我们快速回顾一下 BPF 在过去一年的进展。
BPF Verifier
BPF 验证器仍然是 BPF 世界中最难以理解的一部分,有超过 20000 行代码。过去一年,有如下一系列改进:
- BPF verifier's state equivalence checks
- Enable struct_ops programs to be sleepable in verifier
- BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities
- Add support for cpuv4 instructions
- Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call
- Rework BPF verifier log behavior and implement it as a rotating log by default with the option to retain old-style fixed log behavior
- Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register known bits information propagation
- Remove a misleading BPF verifier env->bypass_spec_v1 check on variable offset stack read as earlier Spectre checks cover this
- Improve BPF verifier handling of '<non_const>' to better detect whether in particular jmp32 branches are taken
- Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree
- Improve verifier u32 scalar equality checking in order to enable LLVM transformations which earlier had to be disabled specifically for BPF backend
- Add precision propagation to verifier for subprogs and callbacks
- Various dyn-pointer verifier improvements to relax restrictions
- Fix regsafe() in verifier to call check_ids() for scalar registers
- Fix a BPF verifier issue related to bpf_kptr_xchg() with local kptr where the map's value kptr type and locally allocated obj type mismatch
- Fix BPF verifier's check_func_arg_reg_off() function wrt graph root/node which bypassed reg->off == 0 enforcement
- Lift BPF verifier restriction in networking BPF programs to treat comparison of packet pointers not as a pointer leak
- Improve BPF verifier log output for scalar registers to better disambiguate their internal state wrt defaults vs min/max values matching
- Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4 security mitigations in BPF verifier
- Improve BPF verifier's JEQ/JNE branch taken logic to also consider signed bounds knowledge
- Huge batch of verifier changes to improve BPF register bounds logic and range support along with a large test suite, and verifier log improvements
- Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques
- Complete precision tracking support for register spills
- Fix verification of possibly-zero-sized stack accesses
- Fix access to uninit stack slots
- Track aligned STACK_ZERO cases as imprecise spilled registers
- Fix verifier retval logic
- Add verifier support for annotating user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience
- Support BPF verifier tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like
验证器正在变得越来越复杂,好消息是验证器越来越聪明,之前版本里遇到的一些莫名其妙的验证错误可能已经被修复了。坏消息是学习成本越来越高。好在遇到问题时,我们总能说一句:责任全在验证器。
JIT
2023 年,我们可以看到对 JIT 的修改集中在 BPF Trampoline 的支持以及对新增的 BPF CPU v4 指令的支持:
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks
- Simplify the parsing logic of structure parameters for BPF trampoline in the x86-64 JIT compiler
- Support for up to 12 arguments in BPF trampoline
- Add support for BPF trampoline on s390x
- Implement BPF trampoline for RV64 JIT compiler
- Implement workarounds in the mips BPF JIT for DADDI/R4000
- Enable kfunc support for riscv BPF JIT
- Support 64-bit pointers to kfuncs needed for archs like s390x
- Support new insns from cpu v4
- Add support BPF cpu v4 instructions for arm64 JIT compiler
- Add support BPF cpu v4 instructions for riscv64 JIT compiler
- Add mcpu=v4 support to arm32
- Fix bpf tailcall interaction with bpf trampoline
- Implement BPF CPUv4 support for s390x BPF JIT
- Add LoongArch support to libbpf's bpf_tracing helper header
- Fix LoongArch BPF JIT to always use 4 instructions for function address so that instruction sequences don't change between passes
- Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT
- Add support BPF cpu v4 instructions for LoongArch JIT
截至内核 6.7 的开发周期,我们已经补全了龙架构的 BPF CPU v4 指令支持,待补齐 BPF Trampoline 功能,我们将赶上主流架构的开发进度。
XDP
XDP/AF_XDP 的开发也相对活跃,主要集中在 multi-buffer 及驱动的支持上:
- Implement XDP hints via kfuncs with initial support for RX hash and timestamp metadata kfuncs
- Add multi-buffer XDP support to ice driver
- Add capability to export the XDP features supported by the NIC.
- Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses
- Add XDP hint kfunc metadata for RX hash/timestamp for igc
- multi-buffer support in AF_XDP
- Add tracepoint to xdp attaching failure
- Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload
- Support for VLAN tag in XDP hints
Helpers
辅助函数是 BPF 程序与内核交互的重要手段,它们也迎来了一系列改进:
- Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers
- Optimize bpf_local_storage_elem by removing 56 bytes of padding
- Fix bpf_fib_lookup to only return valid neighbors and add an option to skip the neigh table lookup
- Extend bpf_fib_lookup helper to allow passing the route table ID
- Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving the source IP address with a new BPF_FIB_LOOKUP_SRC flag
libbpf, selftests and docs
每个开发周期,我们都能看到对 libbpf 和 selftests 的大量更新,selftests 是学习 BPF 开发的一手好材料。文档方面,BPF 指令集似乎正在提交 IETF 进行标准化,可能是为了更好的跨操作系统互操作性。
Highlights
邮件列表上的补丁多如牛毛,以下是我认为比较有意思的:
- bpf_assert(), bpf_throw(), exceptions in bpf progs
- Add BPF programmable net device where bpf_mprog defines the logic of its xmit routine. It can operate in L3 and L2 mode (netkit)
- Introduce BPF token object
- Expand bpf_cgrp_storage to support cgroup1 non-attach
- BPF file verification via fsverity
其中,BPF token 系列被撤回了,Linus 本人还亲自回复并说了脏话。
过去一年,勤劳的华为工程师们一直活跃在社区里,向他们致敬!
祝大家新年快乐,2024 我们再会。