Tungsten Fabric知识库丨vRouter内部运行探秘



  • 原文链接:
    https://github.com/tnaganawa/tungstenfabric-docs/blob/master/TungstenFabricKnowledgeBase.md

    作者:Tatsuya Naganawa 译者:TF编译组

    本系列为“Tungsten Fabric入门宝典”的姊妹篇,补充介绍有关Tungsten Fabric部署的各类主题。

    vhost0设备

    首次启动vRouter时,将创建vhost0接口,并将最初分配给物理接口的IP和MAC移至vhost0。

    因此,自然假设的情况是,vhost0是vRouter本身,它对外部结Fabric进行ARP响应,流量首先通过vhost0,然后进入虚拟机。

    transit traffic:
     vm - vhost0 - eth0
    self traffic:
     vhost0 - eth0
    

    实际上,事实并非如此。

    transit traffic:
     vm - (dp-core) - eth0
    self traffic:
     vhost0 - (dp-core) - eth0
    

    在由dp-core服务的某些桥接域(bridge-domain)中,vhost0与irb相似,而eth0是此桥接域中的L2接口之一。

    因此,当eth0首次收到来自Fabric的ARP请求时,dp-core将基于最初分配给eth0的MAC地址返回ARP响应。

    然后其它计算节点将向该vRouter节点发送一些流量,例如overlay流量或自流量(self-traffic)。

    使用overlay流量时(基于udp端口或gre标头,它由dp-core标识),dp-core会剥离外部IP和标签,并进行VRF路由到标签所指示的特定VM。

    • 使用L3 VXLAN时,它将基于L3 VRF中的路由表进行路由查找
    • 使用MPLS时,标签本身会标识最终接口

    当dp-core接收到自流量(self traffic)后,将在vhost_tx中使用hif_rx(后者又使用linux函数netif_rx,以skb作为参数)将流量发送到vRouter节点上的linux接口,即vhost0。

    因此,对于用于自流量(self-traffic)的rx / tx,数据包始终通过dp-core,而对于传输流量(transit traffic),则不会通过vhost0。

    skb to vr_packet

    Linux网络堆栈使用sk_buff作为数据包的内存存储。

    而在dp-core中,则使用vr_packet,因此它们之间如何转换是一个有趣的主题。

    为此,使用vp_os_packet函数。

    static inline struct sk_buff *
    
    vp_os_packet(struct vr_packet *pkt)
    {
        return CONTAINER_OF(cb, struct sk_buff, pkt);
    }
    

    因此,实际上vr_packet是在skb结构中的某个位置定义的(sk_buff->cb,它是某些应用程序使用的成员变量)。从而,skb和vr_packet可以通过指针操作进行转换。

    请注意,由于cb最大为48字节,因此vr_packet不能大于该数值。这里有一些关于此问题的讨论。

    https://github.com/tungstenfabric/tf-vrouter/blob/master/include/vr_packet.h#L195-L198

    0f63fc50-5129-4b6f-a757-de7091646641-image.png

    vRouter创建的Linux接口

    首次启动vrouter-agent容器时会创建多个接口,即使vrouter-agent停止,实际上也不会删除该接口。

    出于什么目的使用它,是一个有趣的主题。

    综上所述,vrouter.ko中的vif接口始终与相应的linux netdevice绑定,因此使用vif --create等创建一些vRouter接口,同时也将创建linux netdevice,这可以从ip link或ls /sys/class/net中看到。

    来自“ip tuntap list”的一个例证。

    [root@ip-172-31-12-55 ~]# ip -o a
    1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
    1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
    2: ens3    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
    3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
    16: vhost0    inet 172.31.12.55/20 brd 172.31.15.255 scope global dynamic vhost0\       valid_lft 3118sec preferred_lft 3118sec
    16: vhost0    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
    17: pkt0    inet6 fe80::5094:6cff:fefb:42f7/64 scope link \       valid_lft forever preferred_lft forever
    [root@ip-172-31-12-55 ~]# ip tuntap list
    pkt0: tap
    [root@ip-172-31-12-55 ~]#
    

    因此从Linux的角度来看,pkt0实际上是一个Tap设备。

    从某种意义上说,vif命令将使vRouter与某些由nova-vif-driver创建的Linux网络设备(例如tapxxxx-xxxx)建立vRouter接口,以使通过该设备的数据包被dp-core接收 。

    因此,当CNI等发现与容器连接的Tap设备时,它将发送内部创建vif的vrouter-api,其名称与Tap设备相同,以便将进入Tap设备的数据包转发到vRouter(dp-core)。

    启动vrouter-agent时会创建一些特殊设备,即vhost0,pkt0,pkt1,pkt2,pkt3。

    如前所述,vhost0与dp-core的irb接口相似,因此在dp-core路由完成后,它将接收到vRouter节点本身的数据包。

    由于vrouter-agent容器在启动时会创建/etc/sysconfig/network-scripts/{ifup-vhost,ifdown-vhost},因此它可以由ifup / ifdown直接控制,其内部类型为vif --add vhost0,可以直接在命令行中创建和删除它。

    https://github.com/tungstenfabric/tf-container-builder/blob/master/containers/vrouter/base/network-functions-vrouter-kernel#L41

    这里,pkt1,pkt2,pkt3是在vrouter_linux_init中的linux_pkt_dev_alloc里定义的接口,其中vrouter_linux_init是vrouter.ko的module_init。

    linux/vrouter_mod.c
     module_init(vrouter_linux_init);
    
    static int
    linux_pkt_dev_alloc(void)
    {
        if (pkt_gro_dev == NULL) {
            pkt_gro_dev = linux_pkt_dev_init("pkt1", &pkt_gro_dev_setup,
                                             &pkt_gro_dev_rx_handler);
            if (pkt_gro_dev == NULL) {
                vr_module_error(-ENOMEM, __FUNCTION__, __LINE__, 0);
                return -ENOMEM;
            }
        }
    
        if (pkt_l2_gro_dev == NULL) {
            pkt_l2_gro_dev = linux_pkt_dev_init("pkt3", &pkt_l2_gro_dev_setup,
                                             &pkt_gro_dev_rx_handler);
            if (pkt_l2_gro_dev == NULL) {
                vr_module_error(-ENOMEM, __FUNCTION__, __LINE__, 0);
                return -ENOMEM;
            }
        }
    
        if (pkt_rps_dev == NULL) {
            pkt_rps_dev = linux_pkt_dev_init("pkt2", &pkt_rps_dev_setup,
                                            &pkt_rps_dev_rx_handler);
            if (pkt_rps_dev == NULL) {
                vr_module_error(-ENOMEM, __FUNCTION__, __LINE__, 0);
                return -ENOMEM;
            }
        }
    
        return 0;
    }
    

    它使用了一些GRO和RPS功能,这对于提高内核vRouter的性能很重要。

    • 它们被初始化为空的net_device_ops和随机的ethernet addr。
    linux/vr_host_interface.c
    
    
    /*
     * pkt_rps_dev_ops - netdevice operations on RPS packet device. Currently,
     * no operations are needed, but an empty structure is required to
     * register the device.
     *
     */
    static struct net_device_ops pkt_rps_dev_ops;
    
    (snip)
    
    /*
     * pkt_rps_dev_setup - fill in the relevant fields of the RPS packet device
     */
    static void
    pkt_rps_dev_setup(struct net_device *dev)
    {
        /*
         * Initializing the interfaces with basic parameters to setup address
         * families.
         */
        random_ether_addr(dev->dev_addr);
        dev->addr_len = ETH_ALEN;
    
        dev->hard_header_len = ETH_HLEN;
    
        dev->type = ARPHRD_VOID;
        dev->netdev_ops = &pkt_rps_dev_ops;
        dev->mtu = 65535;
    
        return;
    }
    

    这里pkt0稍有不同,它用于将数据包从dp-core发送到vrouter-agent。

    实际上它是在vrouter-agent首次启动时,根据vrouter-agent的请求创建的,以创建与vrouter-agent进行通信的Tap设备。

    因此,如果将数据包从dp-core发送到该接口,则vrouter-agent将接收该数据包,以在内部处理该数据包(arp、dhcp等都以这种方式处理)。

    作为描述此行为的另一说明,当modprobe vrouter,ifup vhost0,vrouter-agent启动完成后,我将添加ip -o addr,ip link,vif –list的结果。

    # docker-compose -f /etc/contrail/vrouter/docker-compose.yaml down
    # ifdown vhost0
    # modprobe vrouter
    
    [root@ip-172-31-12-55 ~]# ip -o a
    1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
    1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
    2: ens3    inet 172.31.12.55/20 brd 172.31.15.255 scope global dynamic ens3\       valid_lft 3561sec preferred_lft 3561sec
    2: ens3    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
    3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
    [root@ip-172-31-12-55 ~]# 
    
    [root@ip-172-31-12-55 ~]# ip link
    1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: ens3:  mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
    3: docker0:  mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
        link/ether 02:42:34:e8:c3:14 brd ff:ff:ff:ff:ff:ff
    9: pkt1: <> mtu 65535 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/void be:f9:01:0e:4d:38 brd 00:00:00:00:00:00
    10: pkt3: <> mtu 65535 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/void 46:f8:5c:cb:79:8e brd 00:00:00:00:00:00
    11: pkt2: <> mtu 65535 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/void a2:b0:40:5c:03:d4 brd 00:00:00:00:00:00
    [root@ip-172-31-12-55 ~]#
    [root@ip-172-31-12-55 ~]# vif --list
    Vrouter Interface Table
    
    Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
           Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
           D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
           Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
           Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows, L=MAC Learning Enabled
           Proxy=MAC Requests Proxied Always, Er=Etree Root, Mn=Mirror without Vlan Tag, HbsL=HBS Left Intf
           HbsR=HBS Right Intf, Ig=Igmp Trap Enabled
    
    vif0/4350   OS: pkt3
                Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
                Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
                RX packets:0  bytes:0 errors:0
                TX packets:0  bytes:0 errors:0
                Drops:0
    
    vif0/4351   OS: pkt1
                Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
                Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
                RX packets:0  bytes:0 errors:0
                TX packets:0  bytes:0 errors:0
                Drops:0
    
    [root@ip-172-31-12-55 ~]# 
    
    
    # ifup vhost0
    
    [root@ip-172-31-12-55 ~]# ip -o a
    1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
    1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
    2: ens3    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
    3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
    12: vhost0    inet 172.31.12.55/20 brd 172.31.15.255 scope global dynamic vhost0\       valid_lft 3594sec preferred_lft 3594sec
    12: vhost0    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
    [root@ip-172-31-12-55 ~]# 
    [root@ip-172-31-12-55 ~]# 
    [root@ip-172-31-12-55 ~]# ip link
    1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: ens3:  mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
    3: docker0:  mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
        link/ether 02:42:34:e8:c3:14 brd ff:ff:ff:ff:ff:ff
    9: pkt1:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/void be:f9:01:0e:4d:38 brd 00:00:00:00:00:00
    10: pkt3:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/void 46:f8:5c:cb:79:8e brd 00:00:00:00:00:00
    11: pkt2:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/void a2:b0:40:5c:03:d4 brd 00:00:00:00:00:00
    12: vhost0:  mtu 9001 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
        link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
    [root@ip-172-31-12-55 ~]# 
    
    [root@ip-172-31-12-55 ~]# vif --list
    Vrouter Interface Table
    
    Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
           Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
           D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
           Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
           Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows, L=MAC Learning Enabled
           Proxy=MAC Requests Proxied Always, Er=Etree Root, Mn=Mirror without Vlan Tag, HbsL=HBS Left Intf
           HbsR=HBS Right Intf, Ig=Igmp Trap Enabled
    
    vif0/2      OS: ens3 (Speed 10000, Duplex 1)
                Type:Physical HWaddr:06:6c:0b:c8:dd:64 IPaddr:0.0.0.0
                Vrf:0 Mcast Vrf:65535 Flags:XTcL3L2Vp QOS:0 Ref:1
                RX packets:54  bytes:13325 errors:0
                TX packets:39  bytes:4452 errors:0
                Drops:0
    
    vif0/16     OS: vhost0
                Type:Host HWaddr:06:6c:0b:c8:dd:64 IPaddr:0.0.0.0
                Vrf:0 Mcast Vrf:65535 Flags:XL3L2 QOS:0 Ref:1
                RX packets:39  bytes:4452 errors:0
                TX packets:54  bytes:13325 errors:0
                Drops:0
    
    vif0/4350   OS: pkt3
                Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
                Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
                RX packets:0  bytes:0 errors:0
                TX packets:0  bytes:0 errors:0
                Drops:0
    
    vif0/4351   OS: pkt1
                Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
                Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
                RX packets:0  bytes:0 errors:0
                TX packets:0  bytes:0 errors:0
                Drops:0
    
    [root@ip-172-31-12-55 ~]# 
    
    # docker-compose -f /etc/contrail/vrouter/docker-compose.yaml up -d
    
    [root@ip-172-31-12-55 ~]# ip -o a
    1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
    1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
    2: ens3    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
    3: docker0    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0\       valid_lft forever preferred_lft forever
    16: vhost0    inet 172.31.12.55/20 brd 172.31.15.255 scope global dynamic vhost0\       valid_lft 3552sec preferred_lft 3552sec
    16: vhost0    inet6 fe80::46c:bff:fec8:dd64/64 scope link \       valid_lft forever preferred_lft forever
    17: pkt0    inet6 fe80::5094:6cff:fefb:42f7/64 scope link \       valid_lft forever preferred_lft forever
    [root@ip-172-31-12-55 ~]# 
    [root@ip-172-31-12-55 ~]# ip link
    1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: ens3:  mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
    3: docker0:  mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
        link/ether 02:42:34:e8:c3:14 brd ff:ff:ff:ff:ff:ff
    13: pkt1:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/void 36:72:98:97:9b:31 brd 00:00:00:00:00:00
    14: pkt3:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/void 92:aa:52:e8:d5:c5 brd 00:00:00:00:00:00
    15: pkt2:  mtu 65535 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/void 42:b2:46:73:3d:6c brd 00:00:00:00:00:00
    16: vhost0:  mtu 9001 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
        link/ether 06:6c:0b:c8:dd:64 brd ff:ff:ff:ff:ff:ff
    17: pkt0:  mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
        link/ether 52:94:6c:fb:42:f7 brd ff:ff:ff:ff:ff:ff
    [root@ip-172-31-12-55 ~]# 
    [root@ip-172-31-12-55 ~]# vif --list
    Vrouter Interface Table
    
    Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
           Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
           D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
           Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
           Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows, L=MAC Learning Enabled
           Proxy=MAC Requests Proxied Always, Er=Etree Root, Mn=Mirror without Vlan Tag, HbsL=HBS Left Intf
           HbsR=HBS Right Intf, Ig=Igmp Trap Enabled
    
    vif0/0      OS: ens3 (Speed 10000, Duplex 1) NH: 4
                Type:Physical HWaddr:06:6c:0b:c8:dd:64 IPaddr:0.0.0.0
                Vrf:0 Mcast Vrf:65535 Flags:TcL3L2VpEr QOS:-1 Ref:7
                RX packets:165  bytes:97837 errors:0
                TX packets:156  bytes:124911 errors:0
                Drops:0
    
    vif0/1      OS: vhost0 NH: 5
                Type:Host HWaddr:06:6c:0b:c8:dd:64 IPaddr:172.31.12.55
                Vrf:0 Mcast Vrf:65535 Flags:PL3DEr QOS:-1 Ref:8
                RX packets:159  bytes:125878 errors:0
                TX packets:192  bytes:98971 errors:0
                Drops:7
    
    vif0/2      OS: pkt0
                Type:Agent HWaddr:00:00:5e:00:01:00 IPaddr:0.0.0.0
                Vrf:65535 Mcast Vrf:65535 Flags:L3Er QOS:-1 Ref:3
                RX packets:31  bytes:2666 errors:0
                TX packets:34  bytes:13535 errors:0
                Drops:0
    
    vif0/4350   OS: pkt3
                Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
                Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
                RX packets:0  bytes:0 errors:0
                TX packets:0  bytes:0 errors:0
                Drops:0
    
    vif0/4351   OS: pkt1
                Type:Stats HWaddr:00:00:00:00:00:00 IPaddr:0.0.0.0
                Vrf:65535 Mcast Vrf:65535 Flags:L3L2 QOS:0 Ref:1
                RX packets:0  bytes:0 errors:0
                TX packets:0  bytes:0 errors:0
                Drops:0
    
    [root@ip-172-31-12-55 ~]#
    

Log in to reply