1. 程式人生 > >學習Linux-4.12核心網路協議棧(1.7)——網路裝置的初始化(struct net_device)

學習Linux-4.12核心網路協議棧(1.7)——網路裝置的初始化(struct net_device)

在linux的網路裝置裡,其中一個最關鍵的結構體應該要算net_device了,它由對應的網路裝置驅動進行建立和初始化,服務於核心網路子系統。

1. struct net_device 註釋分析

struct net_device這個結構體比較大,在瞭解它之前,我們先看一下它的註釋:

1433 /**
1434  *  struct net_device - The DEVICE structure.
1435  *      Actually, this whole structure is a big mistake.  It mixes I/O  //這個結構體的設計是一個很大的失誤,它並沒有對IO資料和高級別的資料進行區分,也就是說這個結構
1436  *      data with strictly "high-level" data, and it has to know about   //體並沒有對資料的來源是普通記憶體還是快取記憶體進行辨別,因此在INET模型裡面,它不得不處理各種
1437  *      almost every data structure used in the INET module.    //不同的資料型別
1438  *
1439  *  @name:  This is the first field of the "visible" part of this structure   //它代表一個介面的名字,在設備註冊的時候,我們可以指定介面名字,如果沒指定,他會自動申請
1440  *      (i.e. as seen by users in the "Space.c" file).  It is the name  //一個自加1的名字,比如eth0,eth1,eth2...
1441  *      of the interface.
1442  *
1443  *  @name_hlist:    Device name hash chain, please keep it close to name[]  //以名字為索引的雜湊表
1444  *  @ifalias:   SNMP alias  // snmp的別名
1445  *  @mem_end:   Shared memory end   //每一個裝置都會分配一塊記憶體區域,start和end指定了這塊區域
1446  *  @mem_start: Shared memory start
1447  *  @base_addr: Device I/O address //網路硬體裝置的基地址,記憶體管理系統將每一個外部裝置都看作一塊連續的地址,然後將它與記憶體中的一塊地址進行對映,這樣操作記憶體地址就相當於操作這塊網路硬體裝置的地址,而這裡的基地址就是這個網路硬體裝置的起始地址。他會在probe的時候初始化

1448  *  @irq:       Device IRQ number  //該裝置對應的中斷號
1449  *
1450  *  @carrier_changes:   Stats to monitor carrier on<->off transitions
1451  *
1452  *  @state:     Generic network queuing layer state, see netdev_state_t //表示裝置的狀態,它很重要
1453  *  @dev_list:  The global list of network devices  //所有net_device物件組成的一個連結串列,可以說系統中所有的網路裝置都可以通過它查到
1454  *  @napi_list: List entry used for polling NAPI devices //如果該支援NAPI,會將它掛到這個連結串列上,CPU就可以更快的找到NAPI poll的裝置
1455  *  @unreg_list:    List entry  when we are unregistering the //正在被解除安裝的裝置會加到這個連結串列
1456  *          device; see the function unregister_netdev
1457  *  @close_list:    List entry used when we are closing the device //正在被關閉的裝置會加到這個連結串列
1458  *  @ptype_all:     Device-specific packet handlers for all protocols  //某些特定協議的處理函式會掛接在這裡,但是未必是需要的
1459  *  @ptype_specific: Device-specific, protocol-specific packet handlers

1460  *
1461  *  @adj_list:  Directly linked devices, like slaves for bonding
1462  *  @features:  Currently active device features //用來標識介面的各種能力和特性
1463  *  @hw_features:   User-changeable features //一些硬體相關的特性,這些是可以在使用者空間修改的
1464  *
1465  *  @wanted_features:   User-requested features
1466  *  @vlan_features:     Mask of features inheritable by VLAN devices //是否支援vlan功能
1467  *
1468  *  @hw_enc_features:   Mask of features inherited by encapsulating devices  //是否支援硬體封裝功能
1469  *              This field indicates what encapsulation
1470  *              offloads the hardware is capable of doing,
1471  *              and drivers will need to set them appropriately.
1472  *
1473  *  @mpls_features: Mask of features inheritable by MPLS
1474  *
1475  *  @ifindex:   interface index  //核心指定的索引號,比如第一個,第二個裝置等等
1476  *  @group:     The group the device belongs to  //這個裝置屬於哪個組
1477  *
1478  *  @stats:     Statistics struct, which was left as a legacy, use  //一些介面的資訊,用於提供給舊介面的使用者空間獲取
1479  *          rtnl_link_stats64 instead
1480  *
1481  *  @rx_dropped:    Dropped packets by core network,  //被核心丟掉的包,注意不是被driver丟的
1482  *          do not use this in drivers
1483  *  @tx_dropped:    Dropped packets by core network,
1484  *          do not use this in drivers
1485  *  @rx_nohandler:  nohandler dropped packets by core network on
1486  *          inactive devices, do not use this in drivers
1487  *
1488  *  @wireless_handlers: List of functions to handle Wireless Extensions,  //無線子系統的一些介面
1489  *              instead of ioctl,
1490  *              see <net/iw_handler.h> for details.
1491  *  @wireless_data: Instance data managed by the core of wireless extensions
1492  *
1493  *  @netdev_ops:    Includes several pointers to callbacks,   //很重要!操作網路裝置的函式都聚集在這裡了,在網路初始化的時候被初始化,具體支援哪些操作函式,
1494  *          if one wants to override the ndo_*() functions   //請看這個函式struct net_device_ops()
1495  *  @ethtool_ops:   Management operations  //ethtool的操作介面
1496  *  @ndisc_ops: Includes callbacks for different IPv6 neighbour
1497  *          discovery handling. Necessary for e.g. 6LoWPAN.
1498  *  @header_ops:    Includes callbacks for creating,parsing,caching,etc  //對L2頭部處理的函式
1499  *          of Layer 2 headers.
1500  *
1501  *  @flags:     Interface flags (a la BSD)  //標識介面的狀態,比如UP/down等,可以通過使用者空間修改
1502  *  @priv_flags:    Like 'flags' but invisible to userspace,  //和flags類似,但是使用者空間不能修改
1503  *          see if.h for the definitions
1504  *  @gflags:    Global flags ( kept as legacy )  //全域性標識,和flags配合使用
1505  *  @padded:    How much padding added by alloc_netdev()  //對齊時使用的位元組數,在申請net_device的時候,需要進行對齊,它表示填充的位元組數
1506  *  @operstate: RFC2863 operstate
1507  *  @link_mode: Mapping policy to operstate
1508  *  @if_port:   Selectable AUI, TP, ...  目前較少用,對於支援多介質的網路裝置時,用來指定哪種裝置的介面
1509  *  @dma:       DMA channel  //為該裝置分配的DMA通道,如果支援的話,目前來說應該都支援了
1510  *  @mtu:       Interface MTU value  //這個不用說了,一般1500
1511  *  @min_mtu:   Interface Minimum MTU value
1512  *  @max_mtu:   Interface Maximum MTU value
1513  *  @type:      Interface hardware type  //介面的硬體型別,目前來說主要都是乙太網

1514  *  @hard_header_len: Maximum hardware header length. 
1515  *  @min_header_len:  Minimum hardware header length
1516  *
1517  *  @needed_headroom: Extra headroom the hardware may need, but not in all  //需要頭部空間嗎
1518  *            cases can this be guaranteed
1519  *  @needed_tailroom: Extra tailroom the hardware may need, but not in all
1520  *            cases can this be guaranteed. Some cases also use
1521  *            LL_MAX_HEADER instead to allocate the skb
1522  *
1523  *  interface address info:
1524  *
1525  *  @perm_addr:     Permanent hw address  //燒寫在硬體中的地址,初始化的時候讀取到這裡
1526  *  @addr_assign_type:  Hw address assignment type  //硬體地址分配型別,目前來說都是支援使用者空間對硬體地址進行設定了
1527  *  @addr_len:      Hardware address length //這個不用說了,14B
1528  *  @neigh_priv_len:    Used in neigh_alloc() 
1529  *  @dev_id:        Used to differentiate devices that share  //這個應該很少用了,如果有多個裝置共用一個mac地址,就會有它的作用了,目前見過這樣的產品,雖然mac
1530  *              the same link layer address //地址一樣,但是硬體裝置不一樣,工作是沒有問題的
1531  *  @dev_port:      Used to differentiate devices that share  //如果有多個網路介面實現相同的功能就會用到
1532  *              the same function
1533  *  @addr_list_lock:    XXX: need comments on this one
1534  *  @uc_promisc:        Counter that indicates promiscuous mode  //我們知道,如果不是在混雜模式下,網絡卡只會接收發往自己的單播地址, 但是如果同時想接收發往其他
1535  *              has been enabled due to the need to listen to  //mac的單播地址,就需要新增到這裡讓驅動不要過濾掉
1536  *              additional unicast addresses in a device that
1537  *              does not implement ndo_set_rx_mode()
1538  *  @uc:            unicast mac addresses  //自己的單播地址
1539  *  @mc:            multicast mac addresses  //自己的廣播地址
1540  *  @dev_addrs:     list of device hw addresses  //現在的裝置可能同時使用多個mac地址,那麼將會保留在這個連結串列裡面
1541  *  @queues_kset:       Group of all Kobjects in the Tx and RX queues  //Tx和Rx鏈的物件
1542  *  @promiscuity:       Number of times the NIC is told to work in  //是否工作在混雜模式
1543  *              promiscuous mode; if it becomes 0 the NIC will
1544  *              exit promiscuous mode
1545  *  @allmulti:      Counter, enables or disables allmulticast mode  //開啟或關閉allmulti功能,可以通過ifconfig命令設定
1546  *
1547  *  @vlan_info: VLAN info  //顧名思義
1548  *  @dsa_ptr:   dsa specific data  //下面是各種不同型別包
1549  *  @tipc_ptr:  TIPC specific data
1550  *  @atalk_ptr: AppleTalk link
1552  *  @dn_ptr:    DECnet specific data
1553  *  @ip6_ptr:   IPv6 specific data  //
1554  *  @ax25_ptr:  AX.25 specific data
1555  *  @ieee80211_ptr: IEEE 802.11 specific data, assign before registering
1556  *
1557  *  @dev_addr:  Hw address (before bcast,  //裝置的mac地址
1558  *          because most packets are unicast)
1559  *
1560  *  @_rx:           Array of RX queues  //與發包相關的一些設定
1561  *  @num_rx_queues:     Number of RX queues
1562  *              allocated at register_netdev() time
1563  *  @real_num_rx_queues:    Number of RX queues currently active in device
1564  *
1565  *  @rx_handler:        handler for received packets   //收包處理函式
1566  *  @rx_handler_data:   XXX: need comments on this one
1567  *  @ingress_queue:     XXX: need comments on this one
1568  *  @broadcast:     hw bcast address //廣播地址
1569  *
1570  *  @rx_cpu_rmap:   CPU reverse-mapping for RX completion interrupts,
1571  *          indexed by RX queue number. Assigned by driver.
1572  *          This must only be set if the ndo_rx_flow_steer
1573  *          operation is defined
1574  *  @index_hlist:       Device index hash chain
1575  *
1576  *  @_tx:           Array of TX queues  //與收報相關的以下設定
1577  *  @num_tx_queues:     Number of TX queues allocated at alloc_netdev_mq() time
1578  *  @real_num_tx_queues:    Number of TX queues currently active in device
1579  *  @qdisc:         Root qdisc from userspace point of view
1580  *  @tx_queue_len:      Max frames per queue allowed
1581  *  @tx_global_lock:    XXX: need comments on this one
1582  *
1583  *  @xps_maps:  XXX: need comments on this one
1584  *
1585  *  @watchdog_timeo:    Represents the timeout that is used by   //initial的時候該函式被初始化,網路層確定傳輸已經超時,將會呼叫driver中的tx_timeout處理時間
1586  *              the watchdog (see dev_watchdog())
1587  *  @watchdog_timer:    List of timers
1588  *
1589  *  @pcpu_refcnt:       Number of references to this device  //該裝置被多少個CPU引用
1590  *  @todo_list:     Delayed register/unregister  //下面是和解除安裝相關的一些設定
1591  *  @link_watch_list:   XXX: need comments on this one
1592  *
1593  *  @reg_state:     Register/unregister state machine
1594  *  @dismantle:     Device is going to be freed
1595  *  @rtnl_link_state:   This enum represents the phases of creating
1596  *              a new link
1597  *
1598  *  @needs_free_netdev: Should unregister perform free_netdev?
1599  *  @priv_destructor:   Called from unregister
1600  *  @npinfo:        XXX: need comments on this one
1601  *  @nd_net:        Network namespace this network device is inside
1602  *
1603  *  @ml_priv:   Mid-layer private  //統計資訊
1604  *  @lstats:    Loopback statistics
1605  *  @tstats:    Tunnel statistics
1606  *  @dstats:    Dummy statistics
1607  *  @vstats:    Virtual ethernet statistics
1608  *
1609  *  @garp_port: GARP //免費ARP介面
1610  *  @mrp_port:  MRP  //MAR介面
1611  *
1612  *  @dev:       Class/net/name entry   //雖然是網路裝置,它終究是普通裝置,所以它也有普通裝置該有的屬性,也就是struct device結構體裡面的屬性
1613  *  @sysfs_groups:  Space for optional device, statistics and wireless
1614  *          sysfs groups
1615  *
1616  *  @sysfs_rx_queue_group:  Space for optional per-rx queue attributes
1617  *  @rtnl_link_ops: Rtnl_link_ops    //netlink介面操作函式
1618  *
1619  *  @gso_max_size:  Maximum size of generic segmentation offload
1620  *  @gso_max_segs:  Maximum number of segments that can be passed to the
1621  *          NIC for GSO
1622  *
1623  *  @dcbnl_ops: Data Center Bridging netlink ops  //橋接操作函式
1624  *  @num_tc:    Number of traffic classes in the net device 
1625  *  @tc_to_txq: XXX: need comments on this one
1626  *  @prio_tc_map:   XXX: need comments on this one
1627  *
1628  *  @fcoe_ddp_xid:  Max exchange id for FCoE LRO by ddp
1629  *
1630  *  @priomap:   XXX: need comments on this one
1631  *  @phydev:    Physical device may attach itself 
1632  *          for hardware timestamping
1633  *
1634  *  @qdisc_tx_busylock: lockdep class annotating Qdisc->busylock spinlock
1635  *  @qdisc_running_key: lockdep class annotating Qdisc->running seqcount
1636  *
1637  *  @proto_down:    protocol port state information can be sent to the
1638  *          switch driver and used to set the phys state of the
1639  *          switch port.
1640  *
1641  *  FIXME: cleanup struct net_device such that network protocol info
1642  *  moves out.
1643  */
1644

2. struct net_device 結構體

上面這些是對struct net_device的基本介紹,下面將進一步介紹結構體的具體定義,需要說明的是,這個結構體很重要,所以瞭解越詳細越好。

1645 struct net_device {
1646     char            name[IFNAMSIZ];
1647     struct hlist_node   name_hlist;
1648     char            *ifalias;
1649     /*
1650      *  I/O specific fields
1651      *  FIXME: Merge these and struct ifmap into one
1652      */
1653     unsigned long       mem_end;
1654     unsigned long       mem_start;
1655     unsigned long       base_addr;
1656     int         irq;
1657
1658     atomic_t        carrier_changes;
1659
1660     /*
1661      *  Some hardware also needs these fields (state,dev_list,
1662      *  napi_list,unreg_list,close_list) but they are not
1663      *  part of the usual set specified in Space.c.
1664      */
1665
1666     unsigned long       state;
1667
1668     struct list_head    dev_list;
1669     struct list_head    napi_list;
1670     struct list_head    unreg_list;
1671     struct list_head    close_list;
1672     struct list_head    ptype_all;
1673     struct list_head    ptype_specific;
1674
1675     struct {
1676         struct list_head upper;
1677         struct list_head lower;
1678     } adj_list;
1679
1680     netdev_features_t   features;
1681     netdev_features_t   hw_features;
1682     netdev_features_t   wanted_features;
1683     netdev_features_t   vlan_features;
1684     netdev_features_t   hw_enc_features;
1685     netdev_features_t   mpls_features;
1686     netdev_features_t   gso_partial_features;
1687
1688     int         ifindex;
1689     int         group;
1690
1691     struct net_device_stats stats;
1692
1693     atomic_long_t       rx_dropped;
1694     atomic_long_t       tx_dropped;
1695     atomic_long_t       rx_nohandler;
1696
1697 #ifdef CONFIG_WIRELESS_EXT
1698     const struct iw_handler_def *wireless_handlers;
1699     struct iw_public_data   *wireless_data;
1700 #endif
1701     const struct net_device_ops *netdev_ops;
1702     const struct ethtool_ops *ethtool_ops;
1703 #ifdef CONFIG_NET_SWITCHDEV
1704     const struct switchdev_ops *switchdev_ops;
1705 #endif
1706 #ifdef CONFIG_NET_L3_MASTER_DEV
1707     const struct l3mdev_ops *l3mdev_ops;
1708 #endif
1709 #if IS_ENABLED(CONFIG_IPV6)
1710     const struct ndisc_ops *ndisc_ops;
1711 #endif
1712
1713 #ifdef CONFIG_XFRM
1714     const struct xfrmdev_ops *xfrmdev_ops;
1715 #endif
1716
1717     const struct header_ops *header_ops;
1718
1719     unsigned int        flags;
1720     unsigned int        priv_flags;
1721
1722     unsigned short      gflags;
1723     unsigned short      padded;
1724
1725     unsigned char       operstate;
1726     unsigned char       link_mode;
1727
1728     unsigned char       if_port;
1729     unsigned char       dma;
1730
1731     unsigned int        mtu;
1732     unsigned int        min_mtu;
1733     unsigned int        max_mtu;
1734     unsigned short      type;
1735     unsigned short      hard_header_len;
1736     unsigned char       min_header_len;
1737
1738     unsigned short      needed_headroom;
1739     unsigned short      needed_tailroom;
1740
1741     /* Interface address info. */
1742     unsigned char       perm_addr[MAX_ADDR_LEN];
1743     unsigned char       addr_assign_type;
1744     unsigned char       addr_len;
1745     unsigned short      neigh_priv_len;
1746     unsigned short          dev_id;
1747     unsigned short          dev_port;
1748     spinlock_t      addr_list_lock;
1749     unsigned char       name_assign_type;
1750     bool            uc_promisc;
1751     struct netdev_hw_addr_list  uc;
1752     struct netdev_hw_addr_list  mc;
1753     struct netdev_hw_addr_list  dev_addrs;
1754
1755 #ifdef CONFIG_SYSFS
1756     struct kset     *queues_kset;
1757 #endif
1758     unsigned int        promiscuity;
1759     unsigned int        allmulti;
1760
1761
1762     /* Protocol-specific pointers */
1763
1764 #if IS_ENABLED(CONFIG_VLAN_8021Q)
1765     struct vlan_info __rcu  *vlan_info;
1766 #endif
1767 #if IS_ENABLED(CONFIG_NET_DSA)
1768     struct dsa_switch_tree  *dsa_ptr;
1769 #endif
1770 #if IS_ENABLED(CONFIG_TIPC)
1771     struct tipc_bearer __rcu *tipc_ptr;
1772 #endif
1773     void            *atalk_ptr;
1774     struct in_device __rcu  *ip_ptr;
1775     struct dn_dev __rcu     *dn_ptr;
1776     struct inet6_dev __rcu  *ip6_ptr;
1777     void            *ax25_ptr;
1778     struct wireless_dev *ieee80211_ptr;
1779     struct wpan_dev     *ieee802154_ptr;
1780 #if IS_ENABLED(CONFIG_MPLS_ROUTING)
1781     struct mpls_dev __rcu   *mpls_ptr;
1782 #endif
1783
1784 /*
1785  * Cache lines mostly used on receive path (including eth_type_trans())
1786  */
1787     /* Interface address info used in eth_type_trans() */
1788     unsigned char       *dev_addr;
1789
1790 #ifdef CONFIG_SYSFS
1791     struct netdev_rx_queue  *_rx;
1792
1793     unsigned int        num_rx_queues;
1794     unsigned int        real_num_rx_queues;
1795 #endif
1796
1797     struct bpf_prog __rcu   *xdp_prog;
1798     unsigned long       gro_flush_timeout;
1799     rx_handler_func_t __rcu *rx_handler;
1800     void __rcu      *rx_handler_data;
1801
1802 #ifdef CONFIG_NET_CLS_ACT
1803     struct tcf_proto __rcu  *ingress_cl_list;
1804 #endif
1805     struct netdev_queue __rcu *ingress_queue;
1806 #ifdef CONFIG_NETFILTER_INGRESS
1807     struct nf_hook_entry __rcu *nf_hooks_ingress;
1808 #endif
1809
1810     unsigned char       broadcast[MAX_ADDR_LEN];
1811 #ifdef CONFIG_RFS_ACCEL
1812     struct cpu_rmap     *rx_cpu_rmap;
1813 #endif
1814     struct hlist_node   index_hlist;
1815
1816 /*
1817  * Cache lines mostly used on transmit path
1818  */
1819     struct netdev_queue *_tx ____cacheline_aligned_in_smp;
1820     unsigned int        num_tx_queues;
1821     unsigned int        real_num_tx_queues;
1822     struct Qdisc        *qdisc;
1823 #ifdef CONFIG_NET_SCHED
1824     DECLARE_HASHTABLE   (qdisc_hash, 4);
1825 #endif
1826     unsigned long       tx_queue_len;
1827     spinlock_t      tx_global_lock;
1828     int         watchdog_timeo;
1829
1830 #ifdef CONFIG_XPS
1831     struct xps_dev_maps __rcu *xps_maps;
1832 #endif
1833 #ifdef CONFIG_NET_CLS_ACT
1834     struct tcf_proto __rcu  *egress_cl_list;
1835 #endif
1836
1837     /* These may be needed for future network-power-down code. */
1838     struct timer_list   watchdog_timer;
1839
1840     int __percpu        *pcpu_refcnt;
1841     struct list_head    todo_list;
1842
1843     struct list_head    link_watch_list;
1844
1845     enum { NETREG_UNINITIALIZED=0,
1846            NETREG_REGISTERED,   /* completed register_netdevice */
1847            NETREG_UNREGISTERING,    /* called unregister_netdevice */
1848            NETREG_UNREGISTERED, /* completed unregister todo */
1849            NETREG_RELEASED,     /* called free_netdev */
1850            NETREG_DUMMY,        /* dummy device for NAPI poll */
1851     } reg_state:8;
1852
1853     bool dismantle;
1854
1855     enum {
1856         RTNL_LINK_INITIALIZED,
1857         RTNL_LINK_INITIALIZING,
1858     } rtnl_link_state:16;
1859
1860     bool needs_free_netdev;
1861     void (*priv_destructor)(struct net_device *dev);
1862
1863 #ifdef CONFIG_NETPOLL
1864     struct netpoll_info __rcu   *npinfo;
1865 #endif
1866
1867     possible_net_t          nd_net;
1868
1869     /* mid-layer private */
1870     union {
1871         void                    *ml_priv;
1872         struct pcpu_lstats __percpu     *lstats;
1873         struct pcpu_sw_netstats __percpu    *tstats;
1874         struct pcpu_dstats __percpu     *dstats;
1875         struct pcpu_vstats __percpu     *vstats;
1876     };
1877
1878 #if IS_ENABLED(CONFIG_GARP)
1879     struct garp_port __rcu  *garp_port;
1880 #endif
1881 #if IS_ENABLED(CONFIG_MRP)
1882     struct mrp_port __rcu   *mrp_port;
1883 #endif
1884
1885     struct device       dev;
1886     const struct attribute_group *sysfs_groups[4];
1887     const struct attribute_group *sysfs_rx_queue_group;
1888
1889     const struct rtnl_link_ops *rtnl_link_ops;
1890
1891     /* for setting kernel sock attribute on TCP connection setup */
1892 #define GSO_MAX_SIZE        65536
1893     unsigned int        gso_max_size;
1894 #define GSO_MAX_SEGS        65535
1895     u16         gso_max_segs;
1896
1897 #ifdef CONFIG_DCB
1898     const struct dcbnl_rtnl_ops *dcbnl_ops;
1899 #endif
1900     u8          num_tc;
1901     struct netdev_tc_txq    tc_to_txq[TC_MAX_QUEUE];
1902     u8          prio_tc_map[TC_BITMASK + 1];
1903
1904 #if IS_ENABLED(CONFIG_FCOE)
1905     unsigned int        fcoe_ddp_xid;
1906 #endif
1907 #if IS_ENABLED(CONFIG_CGROUP_NET_PRIO)
1908     struct netprio_map __rcu *priomap;
1909 #endif
1910     struct phy_device   *phydev;
1911     struct lock_class_key   *qdisc_tx_busylock;
1912     struct lock_class_key   *qdisc_running_key;
1913     bool            proto_down;
1914 };
1915 #define to_net_dev(d) container_of(d, struct net_device, dev)

3. 網路裝置有關的結構組織

net_device結構包含了網路裝置驅動相關的所有資訊,按照資訊的分類又把一些型別的資訊組織到其他結構中,並巢狀在net_device 裡面,比如與ipv4相關的配置巢狀在 in_device結構中,驅動的私有資料則巢狀在struct device中:


網路裝置是通過多條連結串列串連在一起的,具體怎麼串連稍後再講。我們前面看到了,每一個net_device結構體都是由多個成員組成的,然而每個成員也有可能組成那麼自己的連結串列,比如mc_list和ip_ptr,還有priv,雖然這個版本沒有明確的定義priv這個指標,但是從alloc_netdev函式可以知道仍然為它保留著,只要傳進去的sizeof_priv大於0.

下面我們看看其中一個很重要的成員ip_ptr (struct in_device __rcu  *ip_ptr)。它是一個頭指標,指向struct in_device物件,那它表示什麼意思呢?我們知道,每一個網路裝置都可以設定IP地址,而且這些引數也可以通過應用層進行修改,這些資訊是每一個介面獨有的,雖然並不是每一個都需要設定這些資訊,但設定的時候,它的存放位置就是在in_ptr指定連結串列裡面。

下面我們來對比一下程式碼和實際輸出:
root:/# ifconfig br-lan
br-lan    Link encap:Ethernet  HWaddr 0A:02:8E:93:DD:3B  
          inet addr:192.168.1.129  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::802:8eff:fe93:dd3b/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:211672 errors:0 dropped:0 overruns:0 frame:0
          TX packets:120803 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:15794642 (15.0 MiB)  TX bytes:24446287 (23.3 MiB)

 23 struct in_device {
 24     struct net_device   *dev;  //它繞回去指向net_device結構體頭部
 25     atomic_t        refcnt;   //這個物件被引用多少次
 26     int         dead;
 27     struct in_ifaddr    *ifa_list;  /* IP ifaddr chain      */
 28   為什麼是連結串列裡面?一個結構體物件不久夠了嗎?事實是一個介面往往不僅可以設定多個mac地址,當然也可以設定多個IP地址,最常見的是IPv4地址和IPv6地址。
 29     struct ip_mc_list __rcu *mc_list;   /* IP multicast filter chain    */
 30     struct ip_mc_list __rcu * __rcu *mc_hash;
 31
 32     int         mc_count;   /* Number of installed mcasts   */
 33     spinlock_t      mc_tomb_lock;
 34     struct ip_mc_list   *mc_tomb;
 35     unsigned long       mr_v1_seen;
 36     unsigned long       mr_v2_seen;
 37     unsigned long       mr_maxdelay;
 38     unsigned char       mr_qrv;
 39     unsigned char       mr_gq_running;
 40     unsigned char       mr_ifc_count;
 41     struct timer_list   mr_gq_timer;    /* general query timer */
 42     struct timer_list   mr_ifc_timer;   /* interface change timer */
 43
 44     struct neigh_parms  *arp_parms;
 45     struct ipv4_devconf cnf;
 46     struct rcu_head     rcu_head;
 47 };

下面這個圖是關於ip_ptr和priv兩者的記憶體分配關係,這裡需要注意的是,ip_ptr指向的連結串列是記憶體隨機分配空間的,但是priv則不一樣,他的空間是緊緊接在net_device結構體後面的!

1.裝置無關層採用 in_device{}資料結構儲存 IP 地址和鄰居資訊——雖然是間接的
2.網路抽象層採用 net_device{}資料結構儲存裝置的名字、編號、地址等共性
3.裝置特定層的資料則有裝置驅動開發人員自己定義,一般有硬體傳送、接收緩衝區、晶片暫存器的資訊等等。 這片記憶體區一般是緊跟在 net_device{}後面,由驅動程式在建立 net_device{}的時候順帶把這塊記憶體也建立了。當然還是用 priv指標指向,以方便訪問。
雖然說priv指向的私有資料空間是緊接在net_device後面,其實實際上更應該像這樣添加了位元組對齊:

為了更好的理解這一點,我們直接看程式碼:

7851 struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
7852         unsigned char name_assign_type,
7853         void (*setup)(struct net_device *),
7854         unsigned int txqs, unsigned int rxqs)
7855 {
7856     struct net_device *dev;
7857     size_t alloc_size;
7858     struct net_device *p;
7859
         .......
7873
7874     alloc_size = sizeof(struct net_device);  //這裡獲取到net_device的大小
7875     if (sizeof_priv) {   //看一下傳進來的希望申請的私有空間大小是多少
7876         /* ensure 32-byte alignment of private area */
7877         alloc_size = ALIGN(alloc_size, NETDEV_ALIGN);  //對齊
7878         alloc_size += sizeof_priv;
7879     }
7880     /* ensure 32-byte alignment of whole construct */
7881     alloc_size += NETDEV_ALIGN - 1; //32-1=31
7882
7883     p = kvzalloc(alloc_size, GFP_KERNEL | __GFP_REPEAT); //這就是net_device和priv一起申請空間的地方

這樣就組成了多個net_device結構:

前面說過,net_device是由多種連結串列串連在一起的,那麼是由哪些連結串列呢?我們來看看:

從圖中可以知道,一共有三個連結串列:

dev_name_head: 基於介面名字的查詢, dev->name,對應的函式是dev_get_by_name()

dev_index_head: 基於介面索引的查詢,dev->ifindex, 對應的函式是dev_get_by_index()

dev_base: 基於其他引數的查詢,比如裝置型別,mac地址和標識等等


在瞭解了net_device後,我們後面講繼續瞭解 裝置驅動模組的載入,裝置的註冊和裝置的啟動