Thursday, November 1, 2018

cache


  • Buffer

container to hold data for a short period of time
normal speed storage
mostly used for I/O operation
only part of RAM
made from dynamic RAM
policy is first-in first-out

Cache
storage for speeding up
high speed storage area
used during R/W operation
part of disk
made from static RAM
policy is least recently used.

  • Reading from a disk is very slow compared to accessing (real) memory.


consider how often the command ls might be run on a system with many users. By reading the information from disk only once and then keeping it in memory until no longer needed, one can speed up all but the first read. This is called disk buffering, and the memory used for the purpose is called the buffer cache.

Since memory is,scarce resource,the buffer cache usually cannot be big enough (it can't hold all the data one ever wants to use)
When the cache fills up, the data that has been unused for the longest time is discarded and the memory thus freed is used for the new data

http://www.tldp.org/LDP/sag/html/buffer-cache.html


  • (Computer) Memory = Every computer system must have 2 kinds of memory.

Primary Memory/Storage: (also known as internal memory) Is the only one directly accessible to the CPU. The CPU continuously reads instructions stored there and executes them as required.
E.g: RAM, ROM, Processor registers, Processor cache etc.

Secondary Memory/storage: (also known as external memory) is not directly accessible by the CPU. The computer usually uses its I/O channels to access secondary storage and transfers the desired data using data buffer (intermediate area) in primary storage. Secondary storage does not lose the data when the device is powered down—it is non-volatile.
E.g: HDD, SSD, ODD, USB flash drives or keys, floppy disks, magnetic tape etc.
https://www.quora.com/What-is-the-difference-between-computer-RAM-and-memory

  • Buffer vs. cache

there are fundamental differences in intent between the process of caching and the process of buffering.
Fundamentally, caching realizes a performance increase for transfers of data that is being repeatedly transferred. While a caching system may realize a performance increase upon the initial (typically write) transfer of a data item, this performance increase is due to buffering occurring within the caching system.

With read caches, a data item must have been fetched from its residing location at least once in order for subsequent reads of the data item to realize a performance increase by virtue of being able to be fetched from the cache's (faster) intermediate storage rather than the data's residing location.
With write caches, a performance increase of writing a data item may be realized upon the first write of the data item by virtue of the data item immediately being stored in the cache's intermediate storage, deferring the transfer of the data item to its residing storage at a later stage or else occurring as a background process.

With typical caching implementations, a data item that is read or written for the first time is effectively being buffered;and in the case of a write, mostly realizing a performance increase for the application from where the write originated.Additionally, the portion of a caching protocol where individual writes are deferred to a batch of writes is a form of buffering.
The portion of a caching protocol where individual reads are deferred to a batch of reads is also a form of buffering.
In practice, caching almost always involves some form of buffering, while strict buffering does not involve caching.

A buffer is a temporary memory location that is traditionally used because CPU instructions cannot directly address data stored in peripheral devices.Thus, addressable memory is used as an intermediate stage.
Additionally, such a buffer may be feasible when a large block of data is assembled or disassembled (as required by a storage device), or when data may be delivered in a different order than that in which it is produced.
a whole buffer of data is usually transferred sequentially (for example to hard disk)
so buffering itself sometimes increases transfer performance
reduces the variation or jitter of the transfer's latency as opposed to caching where the intent is to reduce the latency
A cache also increases transfer performance.A cache's sole purpose is to reduce accesses to the underlying slower storage. 

https://en.wikipedia.org/wiki/Cache_(computing)




  • Buffer


1. Container to hold data for a short period of time
2. Buffer is normal speed storage
3. Buffer is mostly used for I/O operation
4. Buffer is part of ram only
5. Buffer is made from dynamic ram
6. Buffer's policy is first-in, first-out
Cache
1. A cache is a storage for speeding up certain operation
2. Cache is high speed storage area
3. Cache is used during R/W operation
4. Cache is part of disk also
5. Cache is made from Static ram
6. Cache's policy is Least Recently Used
https://www.youtube.com/watch?v=BYDIekbwz-o

  • Read-Through Cache

Read-through cache sits in-line with the database. When there is a cache miss, it loads missing data from the database, populates the cache and returns it to the application

Both cache-aside and read-through strategies load data lazily, that is, only when it is first read

  • Write-Through Cache

data is first written to the cache and then to the database.The cache sits in-line with the database and writes always go through the cache to the main database

https://codeahoy.com/2017/08/11/caching-strategies-and-how-to-choose-the-right-one/



  • dm-cache is a component (more specifically, a target) of the Linux kernel's device mapper, which is a framework for mapping block devices onto higher-level virtual block devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides secondary storage performance improvements
The design of dm-cache requires three physical storage devices for the creation of a single hybrid volume; dm-cache uses those storage devices to separately store actual data, cache data, and required metadata.
dm-cache uses solid-state drives (SSDs) as an additional level of indirection while accessing hard disk drives (HDDs), improving the overall performance by using fast flash-based SSDs as caches for the slower mechanical HDDs based on rotational magnetic media. As a result, the costly speed of SSDs becomes combined with the storage capacity offered by slower but less expensive HDDs.[1] Moreover, in the case of storage area networks (SANs) used in cloud environments as shared storage systems for virtual machines, dm-cache can also improve overall performance and reduce the load of SANs by providing data caching using client-side local storage
dm-cache is implemented as a component of the Linux kernel's device mapper, which is a volume management framework that allows various mappings to be created between physical and virtual block devices.
https://en.wikipedia.org/wiki/Dm-cache

  • The latest lvm2 tools have support for lvmcache which is a front-end to dm-cache and is much easier to use.
What’s dm-cache?
Dm-cache is a device-mapper level solution for caching blocks of data from mechanical hard drives to solid state SSDs. The goal is to significantly speed up throughput and latency to frequently accessed files.
There are three ways you can do this:
    Create two traditional partitions
    Use device mapper’s dm-linear feature to split up a single partition
    Use LVM as a front-end to device mapper
https://blog.kylemanna.com/linux/ssd-caching-using-dmcache-tutorial/


  • It is possible to achieve the same solution in Red Hat Enterprise Linux by configuring an SSD to act as a cache device for a larger HDD. This has the added benefit of allowing you to choose your storage vendor without relying on their cache implementation. As SSD prices drop and capacities increase, the cache devices can be replaced without worrying about the underlying data devices.
A supported solution in Red Hat Enterprise Linux is to use a dm-cache device. Since this is part of devicemapper, we don’t need to worry about kernel modules and kernel configuration options, and no tuning has been necessary for the tests performe
https://www.redhat.com/en/blog/improving-read-performance-dm-cache

  • Modern operating systems do not normally write files immediately to RAID systems or hard disks. Temporary memory that is not currently in use will be used to cache writes and reads.

I/O performance measurements will not be affected by these caches (temporary memory), the oflag parameter can be used.
    direct (use direct I/O for data)
    dsync (use synchronized I/O for data)
    sync (likewise, but also for metadata)
    For measuring write performance, the data to be written should be read from /dev/zero and ideally written it to an empty RAID array, hard disk or partition (such as using of=/dev/sda for the first hard disk or of=/dev/sda2 for the second partition on the first hard disk).When writing to a device (such as /dev/sda), the data stored there will be lost. For that reason, you should only use empty RAID arrays, hard disks or partitions.
    If this is not possible, a normal file in the file system (such as using of=/root/testfile) can be written.
    In order to get results closer to real-life, we recommend performing the tests described several times (three to ten times, for example).

https://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd


  • /dev/zero is a special file in Unix-like operating systems that provides as many null characters (ASCII NUL, 0x00) as are read from it. One of the typical uses is to provide a character stream for initializing data storage. 

A hybrid HDD/SSD caching setup has some usability benefits
PC users commonly pair a midsize (60GB to 120GB) SSD with a larger hard-disk drive dedicating the SSD to handling the operating system and frequently used applications and data, while using the HDD for bulk storage.Such a configuration provides excellent overall performance and storage capabilities.Smart Response Technology hides all or part of the SSD from the operating system, and it caches data automatically. No additional drive letters are necessary, and data moves dynamically to and from the SSD based on individual usage patterns. The result is a system that delivers SSD-like performance and HDD-like capacities, without the user's having to manage multiple drive letters.

https://www.pcworld.com/article/248828/how_to_setup_intel_smart_response_ssd_caching_technology.html

  • Linux device mapper writecache

A computer cache is a component (typically leveraging some sort of performant memory) that temporarily stores data for current write and future read I/O requests.
In the event of write operations, the data to be written is staged and will eventually be scheduled and flushed to the slower device intended to store it. 
As for read operations, the general idea is to read it from the slower device no more than once and maintain that data in memory for as long as it is still needed.
Historically, operating systems have been designed to enable local (and volatile) random access memory (RAM) to act as this temporary cache.

Using I/O Caching
Unlike its traditional spinning hard disk drive (HDD) counterpart, SSDs comprise a collection of computer chips (non-volatile NAND memory) with no movable parts.
To keep costs down and still invest in the needed capacities, one logical solution is to buy a large number of HDDs and a small number of SSDs and enable the SSDs to act as a performant cache for the slower HDDs.

Common Methods of Caching
However, you should understand that the biggest pain point for a slower HDD is not accessing sectors for read and write workloads sequentially, it is random workloads and, to be more specific, random small I/O workloads that is the issue. 

    Writeback caching. In this mode, newly written data is cached but not immediately written to the destination target.
    Write-through caching. This mode writes new data to the target while still maintaining it in cache for future reads.
    Write-around caching or a general-purpose read cache. Write-around caching avoids caching new write data and instead focuses on caching read I/O operations for future read requests.

Many userspace libraries, tools, and kernel drivers exist to enable high-speed caching

dm-cache
The dm-cache component of the Linux kernel's device mapper.

bcache
Very similar to dm-cache, bcache too is a Linux kernel driver, although it differs in a few ways. For instance, the user is able to attach more than one SSD as a cache and is designed to reduce write amplification by turning random write operations into sequential writes.

dm-writecache
Fairly new to the Linux caching scene, dm-writecache was officially merged into the 4.18 Linux kernel
Unlike the other caching solutions mentioned already, the focus of dm-writecache is strictly writeback caching and nothing more: no read caching, no write-through caching. The thought process for not caching reads is that read data should already be in the page cache, which makes complete sense.

Other Caching Tools

    RapidDisk. This dynamically allocatable memory disk Linux module uses RAM and can also be used as a front-end write-through and write-around caching node for slower media.
    Memcached. A cross-platform userspace library with an API for applications, Memcached also relies on RAM to boost the performance of databases and other applications.
    ReadyBoost. A Microsoft product, ReadyBoost was introduced in Windows Vista and is included in later versions of Windows. Similar to dm-cache and bcache, ReadyBoost enables SSDs to act as a cache for slower HDDs.

http://www.admin-magazine.com/HPC/Articles/Linux-writecache?utm_source=ADMIN+Newsletter&utm_campaign=HPC_Update_130_2019-11-14_LInux_writecache&utm_medium=email

  • Building and Installing the rapiddisk kernel modules and utilities

https://github.com/pkoutoupis/rapiddisk/