Wednesday, June 27, 2012

NAS , SAN , Data Center Bridging(DCB)

  • DCB is a suite of Institute of Electrical and Electronics Engineers (IEEE) standards that enable Converged Fabrics in the data center, where storage, data networking, cluster Inter-Process Communication (IPC), and management traffic all share the same Ethernet network infrastructure.
  • DCB provides hardware-based bandwidth allocation to a specific type of network traffic and enhances Ethernet transport reliability with the use of priority-based flow control.
    Hardware-based bandwidth allocation is essential if traffic bypasses the operating system and is offloaded to a converged network adapter, which might support Internet Small Computer System Interface (iSCSI), Remote Direct Memory Access (RDMA) over Ethernet, or Fiber Channel over Ethernet (FCoE).
    Priority-based flow control is essential if the upper layer protocol, such as Fiber Channel, assumes a lossless underlying transport.
    https://docs.microsoft.com/en-us/windows-server/networking/technologies/dcb/dcb-top

  • In addition, you can use Windows PowerShell commands to enable Data Center Bridging (DCB), create a Hyper-V Virtual Switch with an RDMA virtual NIC (vNIC), and create a Hyper-V Virtual Switch with SET and RDMA vNICs. https://docs.microsoft.com/en-us/windows-server/networking/sdn/software-defined-networking

  • It is surprisingly hard to find a common definition of the All Flash Array (or AFA), but one thing that everyone appears to agree on is the shared nature of AFAs – they are network-attached, shared storage (i.e. SAN or NAS).

  • An all-flash array is a shared storage array in which all of the persistent storage media comprises of flash memory.

      Hybrid AFAs
      The hybrid AFA is the poor man’s flash array. Its performance can best be described as “disk plus” and it is extremely likely to descend from a product which is available in all-disk, mixed (disk+SSD) or all-SSD configurations. Put simply, a hybrid AFA is a disk array in which the disks have been swapped out for SSD

      SSD-based AFAs
      all-flash arrays that have been architected with flash in mind but which only use flash in the form of solid-state drives (SSDs).AFA consists of two controllers (usually Intel x86-based servers) and one or more shelves of SSDs.

      Ground-Up AFAs
      The final category is the ground-up designed AFA – one that is architected and built from the ground up to use raw flash media straight from the NAND flash fabricators.
      A ground-up array implements many of its features in hardware and also takes a holistic approach to managing the NAND flash media because it is able to orchestrate the behavior of the flash across the entire array (whereas SSDs are essentially isolated packages of flash).
      https://flashdba.com/2015/06/25/all-flash-arrays-what-is-an-afa/

      what to look for in an AFA? Given that the purpose of an AFA is speed, look at IOPS.
      because AFAs have grown out of their original Fibre Channel-SAN environment and now come in iSCSI or NAS configurations too. In fact, the fastest AFAs generally are based on Ethernet, using multiple 40 GbE links, and we’ll soon see Ethernet extending its lead to include RDMA and NVMe over Ethernet support and 100 GbE. An alternative is to use InfiniBand, which is more expensive and harder to manage, but which claims an edge in latency.
      https://www.networkcomputing.com/storage/choosing-all-flash-array/1250381987


    • This IDC study examines the role of flash storage in the data center, defining the many ways in which it is being deployed. This study defines the different markets for flash-optimized storage solutions, both internal and external and discusses the workloads that products in each of the three all-flash array (AFA) markets are targeting. The definitions provided in this taxonomy represent the scope of IDC's flash-based storage systems research.
    • https://www.idc.com/getdoc.jsp?containerId=US42606418

    • NAS provides both storage and a file system. SAN (Storage Area Network), which provides only block-based storage and leaves file system concerns on the "client" side. SAN protocols are SCSI, Fibre Channel, iSCSI, ATA over Ethernet (AoE), or HyperSCSI.
    NAS appears to the client OS (operating system) as a file server (the client can map network drives to shares on that server) whereas a disk available through a SAN still appears to the client OS as a disk, visible in disk and volume management utilities (along with the client's local disks), and available to be formatted with a file system and mounted.

    http://en.wikipedia.org/wiki/Storage_area_network
    http://www.webopedia.com/TERM/S/SAN.html


    • SANs typically utilizes Fibre Channel connectivity, while NAS solutions typically use TCP/IP networks, such as Ethernet.

    But the real difference is in how the data is accessed. A SAN accesses data as blocks while an NAS accesses data as files.

    NAS requires a dedicated piece of hardware, usually referred to as the head, which connects to the LAN. This device is responsible for authenticating clients and managing file operations, a lot like ordinary file servers, using established network protocols. Additionally, NAS devices typically run a built-in OS, without a monitor and keyboard.

    SAN typically uses Fibre Channel and connects via storage devices that are capable of sharing certain low-level data among themselves.
    https://www.zadarastorage.com/blog/tech-corner/san-versus-nas-whats-the-difference/



    • NAS vs SAN

    iSCSI vs Fiber Channel
    iSCSI uses existing ethernet network to connect to SAN.
    cheap and poor performance

    Fiber Channel needs a second switch
    better performance
    requires a dedicated card(hba)
    • With the features we built into Openfiler, you can take advantage of file-based Network Attached Storage and block-based Storage Area Networking functionality in a single cohesive framework.
    www.openfiler.com


    • openmediavault is the next generation network attached storage (NAS) solution based on Debian Linux. It contains services like SSH, (S)FTP, SMB/CIFS, DAAP media server, RSync, BitTorrent client and many more. Thanks to the modular design of the framework it can be enhanced via plugins.

    https://www.openmediavault.org/


    • FreeNAS vs TrueNAS

    http://www.freenas.org/blog/freenas-vs-truenas/


    • NAS4Free supports sharing across Windows, Apple, and UNIX-like systems. It includes ZFS v5000 , Software RAID (0,1,5), disk encryption, S.M.A.R.T / email reports etc. with the following protocols: CIFS/SMB (Samba), Active Directory Domain Controller (Samba), FTP, NFS, TFTP, AFP, RSYNC, Unison, iSCSI (initiator and target), HAST, CARP, Bridge, UPnP, and Bittorent which is all highly configurable by its WEB interface. NAS4Free can be installed on Compact Flash/USB/SSD key, Hard disk or booted from a LiveCD/LiveUSB with a small usbkey for config storage.

    https://www.nas4free.org
    • The iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). RDMA is provided by either the Transmission Control Protocol (TCP) with RDMA services (iWARP) that uses existing Ethernet setup and therefore no need of huge hardware investment, RoCE (RDMA over Converged Ethernet) that does not need the TCP layer and therefore provides lower latency, or InfiniBand. It permits data to be transferred directly into and out of SCSI computer memory buffers (which connects computers to storage devices) without intermediate data copies and without much of the CPU intervention.
    From Wikipedia, the free encyclopedia

    • iSCSI
    Internet Small Computer Systems Interface, an Internet Protocol (IP)-based storage networking standard for linking data storage facilities. It provides block-level access to storage devices by carrying SCSI commands over a TCP/IP network. iSCSI is used to facilitate data transfers over intranets and to manage storage over long distances. It can be used to transmit data over local area networks (LANs), wide area networks (WANs), or the Internet and can enable location-independent data storage and retrieval.
    From Wikipedia, the free encyclopedia

    • BACKSTORES
    Backstores are different kinds of local storage resources that the kernel target uses to "back" the SCSI devices it exports. The mappings to local storage resources that each backstore creates are called storage objects.

    FILEIO
    Allows files to be treated as disk images. When storage objects of this type are created, they can support either write-back or write-thru operation. Using write-back enables the local filesystem cache, which will improve performance but increase the risk of data loss. It is also possible to use fileio with local block device files if a buffered operation is needed.
    PSCSI
    Allows a local SCSI device of any type to be shared. It is generally advised to prefer the block backstore if sharing a block SCSI device is desired.
    RAMDISK
    Allows kernel memory to be shared as a block SCSI device. Since memory is volatile, the contents of the ramdisk will be lost if the system restarts, and this backstore is best used for testing only
    https://www.systutorials.com/docs/linux/man/8-targetcli/


    • When running a website or similar workload, in general, the best measurement of the disk subsystem is known as IOPS: Input/Output Operations per Second.https://www.binarylane.com.au/support/solutions/articles/1000055889-how-to-benchmark-disk-i-o


    • In iSCSI terminology, the system that shares the storage is known as the target. The storage can be a physical disk, or an area representing multiple disks or a portion of a physical disk. For example, if the disk(s) are formatted with ZFS, a zvol can be created to use as the iSCSI storage.
    The clients which access the iSCSI storage are called initiators. To initiators, the storage available through iSCSI appears as a raw, unformatted disk known as a LUN. Device nodes for the disk appear in /dev/ and the device must be separately formatted and mounted.

    FreeBSD provides a native, kernel-based iSCSI target and initiator.
    https://www.freebsd.org/doc/handbook/network-iscsi.html




    • LinuxIO (LIO) has been the Linux SCSI target since kernel version 2.6.38.[1][2] It supports a rapidly growing number of fabric modules, and all existing Linux block devices as backstores.
    http://linux-iscsi.org/wiki/LIO
    • iSCSI stands for  Internet Small Computer Systems Interface, an IP-based storage, works on top of internet protocol by carrying SCSI commands over IP network.
    iSCSI transports block-level data between an iSCSI initiator on a client machine and an iSCSI target on a storage device (server).


    What is iSCSI?
    It is a network storage protocol above TCP/IP.
    This protocol encapsulates SCSI data into TCP packets.
    iSCSI allows us to connect a host to a storage array via a simple Ethernet connection (tape drive).
    This solution is cheaper than the Fibre Channel SAN (Fibre channel HBAs and switches are expensive).
    From the host view the user sees the storage array LUNs like a local disks.
    iSCSI devices should not be confused with the NAS devices (for example NFS).
    The most important difference is that NFS volumes can be accessed by multiple hosts, but one iSCSI volume can by accessed by one host.

    Some critics said that iSCSI has a worse performance compared to Fibre Channel and causes high CPU load at the host machines. I think if we use Gigabit ethernet, the speed can be enough. To overcome the high CPU load, some vendors developed the iSCSI TOE-s (TCP Offload Engine). It means that the card has a built-in network chip, which creates and computes the tcp frames. The Linux kernel doesn't support directly this and the card vendors write their own drivers for the OS.

    Initiator:
    The initiator is the name of the iSCSI client. The iSCSI client has a block level access to the iSCSI devices, which can be a disk, tape drive, DVD/CD writer. One client can use multiple iSCSI devices.

    Target:
    The target is the name of the iSCSI server. The iSCSI server offers its devices (disks, tape, dvd/cd ... etc.) to the clients. One device can be accessed by one client.

    iSCSI naming:
    The iSCSI name consists of two parts: type string and unique name string.
    The type string can be the following:
        iqn. : iscsi qualifiled name
        eui. : eui-64 bit identifier

    Most of the implementations use the iqn format

    initiator name: iqn.1993-08.org.debian:01.35ef13adb6d

    iqn            : we use iSCSI qualified name adress.
    1993-08   : the year of the month on which the naming authority acquired the domain name which is used in the iSCSI name.
    org.debian : reversed dns name which defines the organizational naming authority.
    01.35ef13adb6d    : this string is defined by the naming authority.

    Our target name is similar (iqn.1992-08.com.netapp:sn.84211978). The difference is that contains the serial number of Netapp filer.

    The Open-iSCSI project is the newest implementation.
    It can be used with 2.6.11 kernels and up.
    It contains kernel modules and an iscsid daemon.
    /etc/init.d/open-scsi start

    The configuration files are under the /etc/iscsi directory:
        iscsid.conf:         Configuration file for the iscsi daemon. It is read at startup.
        initiatorname.iscsi:    The name of initator, which the daemon reds at the startup.
        nodes directory:         The directory contains the nodes and their targets.
        send_targets directory: The directory contains the discovered targets.


    https://www.howtoforge.com/iscsi_on_linux

    • Open-iSCSI is partitioned into user and kernel parts.
    The kernel portion of Open-iSCSI is maintained as part of the Linux kernel and is licensed under the GPL version 2. The kernel part implements iSCSI data path (that is, iSCSI Read and iSCSI Write), and consists of several loadable kernel modules and drivers.

    The Open-ISCSI user space is maintained on the projects GitHub account
    User space contains the entire control plane: configuration manager, iSCSI Discovery, Login and Logout processing, connection-level error processing, Nop-In and Nop-Out handling, etc.

    The Open-iSCSI user space consists of a daemon process called iscsid, and a management utility iscsiadm.
    http://www.open-iscsi.com


    • iSCSI (Internet Small Computer Systems Interface). Like Fibre Channel, iSCSI provides all of the necessary components for the construction of a Storage Area Network.
    iSCSI Initiator
    These SCSI commands are packaged in IP packets
    iSCSI Software Initiator: an iSCSI Initiator implemented by software.
         Linux Open-iSCSI Initiator
        Microsoft iSCSI Software Initiator
        VMware iSCSI Software Initiator for VMware ESX/ESXi

    iSCSI Target
    Such an iSCSI Target can provide one or more so-called logical units (LUs). The abbreviation “LUN” is often used for the term “logical unit” (although this abbreviation actually means “LU Number” or “logical unit number”).

    https://www.thomas-krenn.com/en/wiki/ISCSI_Basics

    • Open-source SCSI targets
    The two main open-source multiprotocol SCSI targets in the industry are:
    LIO (Linux-IO) is the standard open-source SCSI target in Linux by Datera, Inc.

    In Linux, there are also three out-of-tree or legacy SCSI targets:
    STGT (SCSI Target Framework) has been the standard multiprotocol SCSI target in Linux. It aimed to simplify SCSI target driver creation and maintenance. Its key goals were the clean integration into the scsi-mid layer and implementing a great portion of the target in user space. STGT was superseded by LIO with Linux kernel 2.6.38


    http://www.linux-iscsi.org/wiki/Features#Comparison

    • targetcli is the general management platform for the LinuxIO.
    LinuxIOs can now be instantiated as described for the fabrics, respectively:
    iSER (InfiniBand) on Mellanox
    http://www.linux-iscsi.org/wiki/Targetcli

    • InfiniBand provides the target for various IB Host Channel Adapters (HCAs).
    The LinuxIO supports iSER and SRP target mode operation on Mellanox HCAs.

    The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes such as storage devices. InfiniBand forms a superset of the Virtual Interface Architecture (VIP).

    InfiniBand is an industry standard, channel-based, switched-fabric, interconnect architecture for servers. It is used predominantly in high-performance computing (HPC), and recently has enjoyed increasing popularity for SANs. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable.

    InfiniBand over Ethernet (IBoE): A technology that makes high-bandwidth low-latency communication possible over DCB Ethernet networks. Typically called RDMA over Converged Enhanced Ethernet (RoCE)

    RDMA over Converged Ethernet (RoCE): A network protocol that allows RDMA over DCB ("lossless") Ethernet networks by running the IB transport protocol using Ethernet frames. RoCE is a link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain. RoCE packets consist of standard Ethernet frames with an IEEE assigned Ethertype, a GRH, unmodified IB transport headers and payload.[1] RoCE is sometimes also called InfiniBand over Ethernet (IBoE).

    iSCSI Extensions for RDMA (iSER): A protocol model defined by the IETF that maps the iSCSI protocol directly over RDMA and is part of the "Data Mover" architecture.

    Host Channel Adapter (HCA): provides the mechanism to connect InfiniBand devices to processors and memory.

    Converged Enhanced Ethernet (CEE): A set of standards that allow enhanced communication over an Ethernet network. CEE is typically called Data Center Bridging (DCB).
    Data Center Bridging (DCB): A set of standards that allow enhanced communication over an Ethernet network. DCB is sometimes called Converged Enhanced Ethernet (CEE), or loosely "lossless" Ethernet.


    http://www.linux-iscsi.org/wiki/InfiniBand


    • The SCSI RDMA Protocol (SRP) is a network protocol that allows one computer system to access SCSI devices attached to another computer system via RDMA

    SRP is based on the SCSI protocol, which is a point-to-point protocol with corresponding design limitations. In contrast, iSER is based on iSCSI, and thus better accommodates modern network requirements, including complex topologies, multipathing, target discovery, etc. Hence, iSER is most likely the best choice for InfiniBand networks going forward
    http://www.linux-iscsi.org/wiki/SRP

    • Sharing via iSCSI
    Unfortunately, sharing ZFS datasets via iSCSI is not yet supported with ZFS on Linux
    https://pthree.org/2012/12/31/zfs-administration-part-xv-iscsi-nfs-and-samba

    • iSCSI is a way to share storage over a network. Unlike NFS, which works at the file system level, iSCSI works at the block device level.

    In iSCSI terminology, the system that shares the storage is known as the target. The storage can be a physical disk, or an area representing multiple disks or a portion of a physical disk. For example, if the disk(s) are formatted with ZFS, a zvol can be created to use as the iSCSI storage

    The clients which access the iSCSI storage are called initiators. To initiators, the storage available through iSCSI appears as a raw, unformatted disk known as a LUN. Device nodes for the disk appear in /dev/ and the device must be separately formatted and mounted.
    https://www.freebsd.org/doc/handbook/network-iscsi.html

    • iSCSI on Gluster can be set up using the Linux Target driver. This is a user space daemon that accepts iSCSI (as well as iSER and FCoE.) It interprets iSCSI CDBs and converts them into some other I/O operation, according to user configuration. In our case, we can convert the CDBs into file operations that run against a gluster file. The file represents the LUN and the offset in the file the LBA.

    LIO is a replacement for the Linux Target Driver that is included in RHEL7.

    In this setup a single path leads to gluster, which represents a performance bottleneck and single point of failure. For HA and load balancing, it is possible to set up two or more paths to different gluster servers using mpio; if the target name is equivalent over each path, mpio will coalless both paths into a single device.
    https://docs.gluster.org/en/v3/Administrator%20Guide/GlusterFS%20iSCSI/


    • Edge computing

    Edge computing is computing that’s done at or near the source of the data — oftentimes collecting data from hundreds or even thousands of IoT devices. Because the edge processes data close to the source, it needs to be secure. With Docker containers, you can securely distribute software to the edge and run containerized applications on a lightweight framework that can be easily patched and upgraded
    https://www.docker.com/solutions/docker-edge

    why vSAN?
    new storage protocol, vSAN
    fiber channel, block level iSCSI,NFS protocols exit 
    because of operation requirements
    VM - VMDK  on ESXi is backed by datastore either by LUN(block) or NFS
    LUN configured by RAID5
    what if new VM-VMDK requires LUN configured by RAID6? create new datastore(DS) backed by LUN configured by RAID6
    service requirement is delivered by not per VM level but per LUN level 
    how big data store is required upfront?
    storage administrator has to preconfigure everything
    SPBM, Storage Policy Based Management
    vSAN is a storage solution that provides storage per VM basis
    vSAN is single DataStore(DS) which provides all storage services
    vSAN cluster hosts VMs backed by LUNs configured with either RAID5 or RAID1
    vSAN, no need for preconfiguration storage services 
    vSAN, distributed, determines where the data for the VMs in the cluster is stored, SPBMs for the VMs regulate
    vSAN,for distributed vSAN, vSAN cluster exist on a network which requires  layer 3 connectivity among VMs, no need multicast but regular unicast
    vSAN supports layer2 and layer3 networking

    • Introduction to VMware vSAN

    total cost of ownership (TCO)
    converts DAS(Direct Attached Storage) and x86 servers into HCI(hyper converged infrastructure)
    vsa software defined enterprise storage solution

    • SAS (based on the SCSI command set) and SATA (based on the ATA command set) are historic protocols developed for mechanical media. They do not have the characteristics to take advantage of the benefits of flash media.

    NVMe is a standard based on peripheral component interconnect express (PCIe), and is built for physical slot architecture.
    NVMe also allows for the use of 2.5in format solid state drives via the U.2 connector on the card body.
    http://www.computerweekly.com/feature/Storage-briefing-NVMe-vs-SATA-and-SAS
    • SATA, or Serial ATA, is a type of connection interface used by the SSD to communicate data with your system

    You can think of PCIe, or Peripheral Component Interconnect Express, as a more direct connection to a motherboard — a motherboard extension, if you will. It’s typically used with things like graphics cards and network cards, which need low latency, but has proven useful for data storage as well.
    M.2 (M dot two”) and U.2 (U dot two”) are form factor standards that specify the shape, dimensions, and layouts of a physical device. Both the M.2 and U.2 standards support both SATA and PCIe connections.
    https://www.makeuseof.com/tag/pcie-vs-sata-type-ssd-best/



    • The Storage Performance Development Kit (SPDK) provides a set of tools and libraries for writing high performance, scalable, user-mode storage applications. It achieves high performance by moving all of the necessary drivers into userspace and operating in a polled mode instead of relying on interrupts, which avoids kernel context switches and eliminates interrupt handling overhead.

    https://github.com/spdk/spdk
    • The Storage Performance Development Kit (SPDK) provides a set of tools and libraries for writing high performance, scalable, user-mode storage applications. It achieves high performance through the use of a number of key techniques:

    Moving all of the necessary drivers into userspace, which avoids syscalls and enables zero-copy access from the application.
    Polling hardware for completions instead of relying on interrupts, which lowers both total latency and latency variance.
    Avoiding all locks in the I/O path, instead relying on message passing.
    https://spdk.io/
    • VPLEX implements a distributed "virtualization" layer within and across geographically disparate Fibre Channel storage area networks and data centers

    XtremIO uses remote procedure calls (RPC) for control messages and remote direct memory access (RDMA) for moving data blocks.


    Switched Fabric or switching fabric is a network topology in which network nodes interconnect via one or more network switches (particularly crossbar switches). 
    Because a switched fabric network spreads network traffic across multiple physical links, it yields higher total throughput than broadcast networks, such as the early 10BASE5 version of Ethernet, or most wireless networks such as Wi-Fi. 
    The generation of high-speed serial data interconnects that appeared in 2001–2004 which provided point-to-point connectivity between processor and peripheral devices are sometimes referred to as fabrics; however, they lack features such as a message passing protocol

    In the Fibre Channel Switched Fabric (FC-SW-6) topology, devices are connected to each other through one or more Fibre Channel switches.
    this topology has the best scalability of the three FC topologies (the other two are Arbitrated Loop and point-to-point)
    it is the only one requiring switches
    Visibility among devices (called nodes) in a fabric is typically controlled with Fibre Channel zoning. 
    Multiple switches in a fabric usually form a mesh network, with devices being on the "edges" ("leaves") of the mesh.
    Most Fibre Channel network designs employ two separate fabrics for redundancy. 
    The two fabrics share the edge nodes (devices), but are otherwise unconnected. 
    One of the advantages of such setup is capability of failover, meaning that in case one link breaks or a fabric goes out of order, datagrams can be sent via the second fabric. 
    The fabric topology allows the connection of up to the theoretical maximum of 16 million devices, limited only by the available address space (224). 


    Intelligent Storage System
    a new breed of storage solutions known as an intelligent storage system has evolved. 
    These arrays have an operating environment that controls the management, allocation, and utilization of storage resources. 
    These storage systems are configured with large amounts of memory called cache.

    Components of an Intelligent Storage System
    An intelligent storage system consists of four key components: front end, cache, back end, and physical disks. An I/O request received from the host at the front-end port is processed through cache and the back end, to enable storage and retrieval of data from the physical disk. A read request can be serviced directly from cache if we find the requested data in cache.

    http://www.sanadmin.net/2015/10/fc-storage.html



    No comments:

    Post a Comment