Flash Research Comprehensive


Prawar Poudel

prawar(dot)poudel(at)hotmail(dot)com



This document contains short description about all the research items read by Prawar Poudelduring his/her course of research. The items studied can be categorized into following classes.

Please follow the link below to jump to respective section:



NVM Introduction

Memory forms an integral component of every computing systems. These are the parts of the computer that are used for storage and retrival of code and data that are essential for operation of computers. It is these retrival of code and data the makes these computers or computing systems perform all the tasks that we, the user, instruct them to do.

When the term "memory" is used, it generally means the Random Access Memory (RAM) or the memory where the instructions for current (and maybe more) operations are stored. But, this is just a small part of memory system in a computer system. Now a reader might be wondering where the bulk files, multimedia, etc are stored. There is another memory, which is more generally called "storage". It is in this storage system that all bulk files are stored. In the conventional workstation computer settings, Magnetic Disks form the storage system. Nowadays, Solid State Drives (SSDs) are getting traction as well for storage purposes.

In the conventional computer settings, memory or RAM is composed of Dynamic Random Access Memory (DRAM). These memory loose their content when the power is removed, thus they form "Volatile Memory". Another example of volatile memory is SRAM, which is extensively used in caches in conventional PC settings while in embedded systems SRAM is used to serve as RAM.

Magnetic Disks and Optical Disks are some of the non-volatile memory as they do not remove their content when the power is turned off. These devices store the content as physical change in them, and this physical change do not depend on the power being supplied. However for SRAM and DRAM, they need to be supplied with power in order for them to function or retain their content.

Both the kinds of non-volatile memory (NVM) presented above, namely, Magnetic and Optical Disks are not semiconductor based. Thus, when people talk about NVM, these two are kind of left out. We will present a short description of these in the below section, however, they will not be major focus.

Semiconductor NVMs are based on the similar principle as Magnetic and Optical Disks in the sense that the content or the data to be stored in them are etched as a change in their physical property and they do not need a constant supply of power to hold the power. Flash Memory are currently the major semiconductor right now, and it has still not realized its full market potential. As metioned above, flash memories are used for storage purpose in newer computing systems as SSD drives while they have found their majoirty usage in portable storage media. In low-end computing systems, a special kind flash memory is even used for RAM memories. We will see about different kinds of flash memories in later sections.

Other semiconductor NVMs are listed below:

Each of these have their separate sections below. Please find the approprite section and study the further descriptions presented. On each of the topics, will present a brief introduction and some relevant papers and their discussion.



Flash Memory

Following are the papers studied:


  1. 3D Stacked Architecture:
    • first idea and straightforward idea that stacking multiple planar layers of memory arrays would yield 3D layer.
    • As shown in image above.
    • Here, drain and bitline contacts are shared between NAND strings belonging to different layers, while source/wordlines contacts and source/drain selectors are associated to separate layers.
    • Cost and process technologies considerations of this architecture can be derived from those of the planar products
    • the major hurdle is represented by the thermal budget of the manufacturing process to grow and populate additional layers
    • here each layer is manufactured separately and this the architecture is flexible.
    • Since the layers are fabricated independently, there is a significant difference in the threshold voltage distribution with ISPP programming.

  2. BiCS Architecture:
    • Control Gates are the different rectangles stacked on top of each other.
    • The bottom rectangle plate is the ensemble of Source Line Selectors terminating flash string.
    • Multiple holes are drilled through the stacks and filled with poly-silicon in order to form a series of vertically arranged NAND flash memory cells.
    • Bitline Selectors (BLS) and Bitlines (BL) contacts are on top of the structure
    • Each cell in the BiCS architecture works in depletion-mode since the poly-silicon constituting the body of the transistor is lightly n-doped with a uniform profile or even left un-doped. This reduces the manufacturing complexity of the p-n junction along the vertical direction of the plugs (also called pillars)
    • The CG plate intersection with a pillar maps a single memory cell. Each NAND Flash string of cells is connected to a BL contact via BLS, whereas the bottom of the string is connected to a common source diffusion formed directly on the process substrate made of silicon.
  3. P-BiCS
    • BiCS evidenced some critical issues such as poor reliability characteristics of the memory cells interms of endurance and data retention, poor SLS performances (i.e., cut-off) and a high resistance of the SL, which limits the sensing performance
    • To solve these issues, a pipe-shaped BiCS architecture has been developed, namely the P-BiCS. This integration approach adopts a U-shaped vertical AND string
  4. VRAT Architecture
    • Vertical Recess Array Transistor by Samsung
    • Still a charge trap layer is used
    • no staircase structure
  5. VSAT Architecture
    • Vertical Stacked Array Transistor






Security Primitives in Flash Memory



Flash PUFs

The concept presented here in bullets are presented from the paper PUFs at a Glance

Following is the list of papers studied while creating this document. The description of each papers are maintained and linked here so it would be easy for anyone doing a survey to fetch the appropriate description in timely fashion.


This paper discusses seven techniques for generating fingerprints from flash memory. Published in 2011, the author P Prabhu was affiliated with UCSD while collaborators are from Cornell University. The original paper can be found at this link. Their conclusion is the four of the seven techniques provide usable signatures.

The set up consists of Xilinx FPGA connected to the flash chip through a custom built flash controller. The measurements can be made with a resolution of 10ns.


This paper demonstrates NAND flash memrory as a source of entropy for TRNG and PUF generation, and is work done by Wang et al at Cornell University. The idea is based on repeated partial program operation on the flash memory. The original paper can be found at this link.

For RNG, they make use of the Random Telegraphic Noise (RTN) as the source of randomness. RTN is the alternating capturing and emission of carriers at a defect site in a very small electronic device. This capturing and emission, that is random and exponentially distributed, generates discrete variation in the channel current.

To observe this noise, the flash memory needs to be in unreliable state so that noise effects the outputs. Thus partial program operation is used to achieve this state that is in-between erased and programmed state. The initial algorithm is described as follows:

  1. The flash memory block is partially programmed. The duration of partial program applied is T (unspecified)
  2. The flash memory block is read N times.
  3. Then, for each of the bits, it is checked if there exists RTN.
  4. If it does, the number of partial program operation it took is noted, and the bit position is marked as selected.
  5. This partial program operation is repeated until all the bits are marked selected. Go to step 1.
The second part of the algorithm, then can be used to generate RNG.
  1. Partial program the flash cell to appropriate levels (as dictated by the number of partial program operations from above)
  2. Read each bit M times
  3. Record the sequence of up-times and down-times. Since it is RTN, the duration of being up-times and down-times are randomly distibuted.
  4. Produce Random bits from the up-times and down-times. If the time is odd, output 1 else output a 0.
  5. Perform debiasing

For the device fingerprint, repeated partial program operation is utilized. Following describes the steps involved:

  1. initialize bitRank[i] = 0 for i = all bits in a page
  2. Perform partial program operation of flash page (T < rated program time)
  3. For all the bits in page:
    • If the bit is programmed and bitRank[this bit] = 0, bitRank[this bit] = partial program number
  4. 1Goto Step2 until 99% bits in the page are programmed
This essentially records the order in which the flash bits attain programmed state from erased state in a flash page.


This paper characterizes the errors and source of errors in flash memory. The authors were affiliated with Cornell University at the time of publication. The original paper can be found at this link.

Since this is not focused about PUF but analyzing the nature of variations in flash, will populate later


This paper presents three techniques for PUF-based key generator using NAND flash memory. The techniques are partial erase, partial programming and program disturbance. The primary author was a member of Data Assurance and Communication Security Research Center, China. The original paper can be found at this link.

This paper focuses on robust keys rather than just determinig keys from the flash memories. They find the cells in the flash memory whose output are the most reliable over the life time of the flash memory. The position of such flash cells is the helper data.

Partial Erase based operation is more or less similar to the idea proposed in Wang et al. paper presented above. However, since this proposal is based on partial erase opration, the number of (fixed duration) partial erase operation that it takes for each flash cell in a page is recorded and is the identity for that particular cell. In a way, it captures the order in which the cells attain erase state from a programmed state.

The total number of partial erase operation is limited however to PENum. For some cells that do not attain the partial erase state even after the max number of partial erase operation performed, they are assigned PENum+1.

Partial Program based operation is exactly same as partial erase based operation above. Here, the order in which the cells attain erased state is recorded as the number of (fixed duration) partial program operation required by each flash cells to attain programmed state is recorded. After a fixed number of partial program operations PPNum, the cells that are still not in programmed state are assigned PPNum+1

Program Disturb based PUF generation induces disturbance in adjacent flash page by performing repeated program operation in a page. Here, fixed number of program operation is performed and after each of the program operation, the state of adjacent page is recorded to see if any cells might have been programmed enough to change their state to programmed state. Following the same suit, here again the number of program operations in the flash page is recorded for each of the flash cells in adjacent page it is needed to flip their state.

For reliable cell selection, they employ two methods as follows:


This is not a paper but a tutorial session presented by Sanu Matthew. Matthew was with Circuit Research Labs, Intel @ Hillsboro, Oregon while this presentation was made in ISQED 2020.

This is not NAND based, but 14nm CMOS based, and is presented here because of the idea it has. The publication for this can be found at this link.

The idea presented can be divided into two major categories:

The idea for PUF generation or TRNG genration presented is to select the candidate bit for either operation. So, a selection criteria is selected, for example: read the bits multiple times (64 times as presented), and pick the ones with higher bias (either towards 0 or 1 state, determined using entropy computation) for PUF generation. Time varying bits can be used for TRNG purpose. This is based on the idea that the bits are time-variant, thus Temporal Majority Voting (TMV) is used. This idea can be easily ported in case of flash memories combined with the idea of flash cells oscillating. Please refer to this paper.

The idea of self-calibration is presented whereupon entropy is tracked for TRNG. If the output of von-Neumann extractor for TRNG is less than 1bit/cycle, then a different column is chosen for TRNG. (the entropy source bits are arranged in 64x8 organization of rows and columns)


This document presents PUF generation for NOR flash memory. The author was associated with Virginia Tech and the document is the master's thesis of author. The original document can be found at this link.

The platforms used for demonstration of their idea is Altera DE1-SoC and Altera DE2-115.

The basic idea used in this project is partial programming. The threshold volatge variation is utilized to generate PUF. Here, the address of the flash memory location is the challenge and the response is the bit position of the cell with minimum threshold voltage. Essentially, the idea is to find the flash cell that gets programmed first while applying repeated partial programming operations on a flash memory location.

    Following are the steps involved:
  1. A partial programming time T is chosen for a flash location.
  2. The flash location is erased.
  3. Program the flash location for duration T
  4. Read the value of the flash location
    • If any one bit are flipped, the location of the flipped bit is encoded into 3-bit form
    • If nore than one bit is flipped, T is reduced and go to Step 3
    • Else, go to Step 3.


This document presents PUF generation and TRNG for Superflash NOR memory. The author was associated with Arizona State University. The original document can be found at this link.

The main idea exploited in this paper is the erase speed variability for the flash cells. The memory used in this research is 1.5T SST Superflash memory that has faster erase because split-gate flash memory has different organzation than stacked gate flash memory.

Since such memory are higly efficient, program operation takes only one clock while erase requires multiple cycles. Thus in this research, they employ partial erase operation by interrupting erase operation. Erase operation, however, is yet very fast to be interrupted at nominal operating condition, thus the operating voltage VDD is reduced to lower the internal charge pump voltage.

Interrupting erase operation shows that the number of flash cells that attain erased state increases monotonically with higher erase times. They pick up a time that gives a little more than 50% 1's (flash cells in erased state). The distribution at this point in time shows that the flash cells in each of 1 and 0 state are almost randomly distributed over the block.

For authentication, the same operation is repeated but such that it yields less than 50% 1's. The idea is again based on monotonicity: this means that if a binary signature with (for example) 45% 1s is compared with (challenge code) 55% 1s signature, the bits that are 1 in 45% signature should already be at 1 in 55% signature.


This paper demonstrates a technique for generation of PUF from NAND flash memory such that the PUF is aging-resistant by introducing tunable-parameters. Authors of this papers were associated with The University of Alabama in Huntsville while it was published. The original paper can be found at this link.

The basic idea behind the PUF generation in this article is program disturb. However, this program disturb method differs from previous proposals in the sense that here a single flash memory page is stressed and analyzed (unlike previous implementations where disturbance in adjacent page were observed).

Following are the steps involved for PUF generation:

These unstable bits are filtered out to generate more accurate PUF. For this, a plot with the progression of flash page from complete erased to programed state at different stress-levels (number of program operations) is plotted. Based on observation from this plot, two different threshold stress-levels are identified: one at the early stages to identify stable 0 cells (PS0) and second at the late stage to identify stable 1 cells (PS1). This gives the idea about the flash cells that attain the programmed state quickly (stable 0 cells) and cells that resist the change and maintain their state for longest (stable 1s).

To generate the PUF of n-bits, at least n/2 stable 0s are generated and rest are stable 1s are obtained, thus removing the unstable flash cells that change their state in between the two thresholds discussed above.

To obtain at least n/2 stable 0s, PS0 program operations as discussed above are performed. This will flip at least n/2 bits to 0 and gives information about the early flippers or stable 0s. Next, the page is stressed for PS1 program operations, and ensure that there are only n/2 erased bits left. These are stable 1s. The complete sequence (entire page) is compared at states PS0 and PS1 applied. If the states of the flash cells are same, they contribute to the PUF, else they are unstable bits.

For authentication, the number of stresses to be performed is between the PS0 and PS1 (PS0<PSU<PS1).

With usage or aging, the rate of bits that flip from erased to program state with program disturb decreases and might present issue for PUF. To counter this, PSU is adapted with the usage rate that page has endured.

PSUnew = PSU+k*npe, where





Sanitization of Flash Memory

The content here is derived from the proposal that was submitted to NSF by Dr Biswajit Ray and Dr Aleksandar Milenkovic.

The major idea is that instant deletion of data from the memory has become extremely important to preserve privacy of the user. According to Data Protection Act (DPA) 2018, the deletion of imformation should be real and should not be recoverable in any way. However, current trend in SSD's do not offer any permanent data deletion strategy.

NAND flash or flash memory in general follow the trend of erase before write paradigm, which means before any program operation erase has to be performed. But the granularity of erase (a block) is much larger than program (a page), thus for any modification of data is performed by copying the entire data, modifying it and writing it to a new location. The old data is simply marked as invalid or is unlinked where the original data content still remains. These pages are technically unreachable or unaddressable through the Flash Translation Layer (FTL).

For the page level deletion of data, Wei et al introduced the concept of "scrubbing", which means write 0s to all the flash memory location. This corresponds to essentially deletion of data from a page by program operation. Program operation is possible in page-level granularity, thus deletion from page is possible. However, the "scrubbing" technique does not properly delete the data. There has been experimental demonstration where the data is recovered from a "scrubbed" flash memory by analyzing the physical property of the scrubbed flash memory.


The content here is still derived from the same proposal.


The content here is derived from the paper that can be found at this link.

It is challenging to erase a file without large performance penalty or reliability issue in modern NAND. Evanesco is a new technique for high-density 3D NAND flash memory. Evanseco, instead of physically destroying data, blocks access to the data. Two commands, pLock and bLock are designed that block access to the page and block respecively of the data deleted. These locked memories can only be accessed after erase operation, thus the claim is that a strong security is guaranteed. Erase or program based data deletion technique will rapidly reduce the quality of the flash memory.

Performance analysis done on 160 3D TLC NAND memory on FlashBench with their proposal enabled flash model. Benchmarks are workloads collected from enterprise servers and mobile systems.

In the new architecture, a read request to the locked location will always return all 0s.

FTL is the special embedded software that is employeed in flash-based storage system. FTL writes in append-only fashion which means new data is stored in a new physical page for performance reasons by avoiding longer block erases. Thus a logical to physical mapping table (L2P) is maintained. For updating a data, the updated data is written to a new free page, the link in LPA table is updated to new physical address, state of new physical page changed to valide from free and the old physical page's state is made invalid.

When the system is about to run out of the free pages, a Garbage Collector is invoked. This reclaims the free pages by erasing victim blocks (blocks with invalid pages). If there are valid pages in the victim blocks that are to be erased, these pages have to be copied elsewhere first, and remapped in the L2P.

The trend of having a multple copies of a file while updating (or deleting) a file is called data versioning problem in this paper. Experiment to measure how many invalid versions of a file exist throughout the lifetime of a file is conducted. Tools used are: VerTrace, an extension of IOPro and FlashBench (they made all three of these tools). Essentially, the goal is to keep track of number of valid and invalid pages of a file at any time in the flash memory. Three benchmarks traces, Mobile, MailServer and DBServer are used. Maximum storage emulated is 16GiB. Ultimate goal of the evaluation is to find the number of invlaid versions of file that exists and for how long the invalid versions remain.

The observation show that files have a large number of invalid pages for a long time.

Existing technique destroy data by changing Vth of flash cell. Scrubbing technique increases Vth of all flash cell in WL so that Vth distrubutions of different states are mixed together such that original data identification is impossible.

But this technique is not efficient in MLC or TLC NAND as there are multiple pages in a single Wordline. Thus efficient reprogramming based sanitization technique is proposed for MLC NAND memory. This technique uses one-shot programming technique with lowered voltage such that content can be safely destroyed, and other pages in the same wordline is not impacted. Zero copy overhead is incurred as there is no copy operation needed. Sanitization of LSB and MSB can be independent. But there is always chance of overprogramming, so that the shift of Vth is too much.


The content here is derived from the paper that can be found at this link.

The access to the page is controlled by access-permission (AP) flag. Two kinds of AP flags: pAP (for page) and bAP (for block) inside the flash chip that are controlled by two commands pLock and bLock respectively. Command pLock<ppn> locks the physical page number ppn by setting the pAP flag to disable state. Similarly, bLock<pbn> blocks the access to the physical block number pbn by setting bAP flag to disabled state. No unlock commands are present and unlock is automatically done once the block is erased. Thus data is permanently inaccessible before erase cycle once locked. The logic is implemented into the flash chip itself.

Page Level Sanitization: For each page, along with the main area of data storage, some spare storage space for flags storage is available. The flags are stored in this location. For every access made to a page, the data copied to the cache or buffer is only sent to output if the flag value pAP is not disabled. Since in MLC and TLC, multiple flags are needed in each WL. For individual programming of these cells, self-boost program inhibit (SBPI) is used that allows flash cells in a single WL to be selectively programed by choosing different voltage settings for different BLs.

Block Level Sanitization: If a large number of pages are to be sanitized, PLock becomes non-trivial and incurs overhead. A single block sanitization can sanitize a large number of pages at once. This only works in 3D as the bAP flag is implemented in SSL (Source Select Line) of 3D NAND as 3D NAND use normal flash cell (that allows for SSL to act as WL) for SSL rather than transistor.


The original paper can be found at this link

This paper deals in opposite to what sanitization is about, as it discusses the techniques to extract data from a NAND flash memory that is acquired as part of some investigation.

When the chips are obtained intact in the circuit, a method called chip-off is used to remove the chip from the cicruit using some heating technique.

Two major observations made in the experiments are:

This experiment explores a hardware based approach: fine grained read reference voltage control mechanism implemnted in modern NAND, called read retry.

Read retry can compensate change leakage that occurs due to retention loss and thermal based chip removal.

Citation number [7], paper found at this link discusses reverse engineering the content of NAND memory. Thesis at this link also discusses along the same lines.

Modern NAND flash have some form of read-retry methods ie by dynamically adjusting read reference voltages, in a fine grained manner.

The papers mentions that the flash controllers tries read operation with a number of reference voltages. However, the exact detail of read retry operation is not made public by the manufacturers as in the paper.

RBER increases with retention, but with read-retry method of reading the error decreases. But read retry modeA only reduces the error at high PE cycle counts. And most of the times it is not beneficial as it is not controllable and visible.





SSD papers

Following is the list of paper studied about the SSDs.


  • The original paper can be found at this link. This paper was presented in USENIX FAST2020 and was awarded best paper.
  • Large scale study focused on enterprise storage systems, study conducted on 1.4 million SSDs of NetApp which is a major storage vendor.
  • Netapp storage system employ WAFL file systems and DATA ONTAP operating system which uses software RAID to provide resiliency against drive failures.
  • Data over the nw is serviced using file-based protocols as NFS and CIFS/SMB or block-based protocols such as iSCSI
  • Netapp systems in field send weekly NetApp Active IQ bundles that track a very large set of system and device parameters. This study is based in mining this collection of NetApp Active IQ messages.
  • Different types of failures are categorized. The most severe of them that prompts replacement of drives was SCSI error. These error are due to ECC errors. Majority of other errors were recovered by RAID reconstruction.
  • Replacement rate: number of device failures divided by number of device years.
  • For the causes of errors or the factors impacting replacement rates:
    • Usage and Age: A increasing failure rates of long period 12-15 months in the beginning, and 6-12 months of decreasing failure rates before finally stabilizing for 3D-TLC and eMLC (enterprise MLC) drives.
    • 3D-TLC failure rates is higher than other types, thus the replacement reateis higher. Also 3d-TLC uses 10-15X times more for spare blocks
    • Higher capacity drives have higher replacement rates and more severe failures too (unresponsive drives).
    • High density drives have higher replacment rates.
    • EMphasize importance of firmware updates
    • Most of the systems make use of only 15% of rated life of device, thus there should be no concern to updatd to QLC.



  • SRAM



    DRAM



    Security Primitives in SRAM and DRAM



    FRAM

    FRAM or FeRAM is a class of memory that achieves non-volatility with a structure similar to DRAM. The capacitor in the DRAM memory has a dielectric layer that needs to periodic recharge to hold the charge, thus making DRAM a volatile memory while it is replaced by Ferroelectric material in FRAM. Presence of this Ferroelectric material (typically Lead Zirconate Titanate, PZT) makes FRAM non-volatile (ref).


    Writing to the FRAM is peformed in a process similar to storing charge in capacitors. A field is applied across the dielectric by charging the plates on either sides of it. This causes the atoms inside the dielectric to orient in "up" or "down" orientation, thus storing '1' or '0'. This change in polarity because of application of electric field produces a power-efficient binary switch (ref). The ferroelectric crystal and its orientation is not affected by magnetic field.


    Destructive Read: Reading in FRAM is done through the transistor in 1T-1C structure. The transistor forces the cell into a particular state. For example let us assume the transistor forces the cell into a '0' state. If the cell was already in '0' state, there is no change in the output lines. However, if the cells was in a '1' state, this reorientation of atoms in the dielectric causes a brief pulse of current in the output lines are the electrons are pushed out. Detection of this pulse indicates there was a '1' in FRAM cell. This process causes the original value to be overwritten, thus the reading process is desctructive and needs to be re-written.


    Ferroelectric Property: The dielectric or the ferroelectric material has a crystal structure. The PbZT crystal is organized in a perovskite structure where the Lead atoms are at the outermost layer and O atom inside them. The Z/Ti atom in the center form the cat-ion and has two equal and low-energy states. By the application of the electric field, the cat-ion will move in the direction of the field applied. This causes the low-energy state to be aligned in the direction of applied field while and conversely high-energy state will be aligned in the opposite direction. THe mnovement of cation causes either "up-polarization" or "down polarization" thus creating the two states (ref).



    RRAM

    Resistive RAM or ReRAM works in the fashion similar to PCM (ref), i.e. by changing the resistance of the dielectric solid-state meterial. Here a metal oxide is sandwiched between two metal electrodes (ref). RRAM works by creating defects in the oxide layer to create a filmaent or a condictive path by the application of high voltage.


    The filament can be broken resulting in a high resistance state and can be formed to result in a low resistance state.



    MRAM

    MRAM stands for Magnetoresistive Random Access Memory. Unlike other conventional memory technology where information is stored in the form of presence or absence of charge particles or current flows, information in MRAM is stored by magnetic storage elements.

    Structurally, MRAM consists of MRAM cells just like any other memory. Each MRAM cell contains two magnetic plates that are separated by an insulating layer. These plates are ferromagnetic in nature where one of the plates in permanent while the magnetization on other can be changed by influence of external applied field.

    Based on the magnetization of the two plates being in parallel or antiparallel direction, the resistance of the MRAM cell is different. Measure of this electric resistance is the reading process. If the two plates have the same magnetic alignment, the logic state is said to be '1' while if the two plates have different magnetic alignment, the logic state is said to be '0'. (In parallel orientation, the likelihood of electron tunneling through the insulating layer is more than if they are in antiparallel orientation. Thus in parallel orientation, resistance offered is low and is read as 1 and vice versa.)

    The two layers or plates can be referred to as (i) free layer, and (ii) fixed layer.


    Magnetic Tunnel Junction

    A structure that consists of two layers of magnetic metals that is separated by an insulating layer in between them is the Magnetic Tunnel Junction. Here, the insulating layer between the magnetic layers is very thin that allows electrons to tunnel through one of the magnetic layers to another if a bias voltage is applied between the electrods.


    Tunnel Magnetoresistance

    The current in the tunnel, tunneling current, can be changed based on the orientation of the magnetic plates relative to each other. This change in the current based on the magnetic alignment of the plates is called Tunnel Magnetoresistance.


    Toggle MRAM

    The structure consists of two magnetic layers and an insulating layer in between. When a bias is applied, the electrons that are spin polarized by the magnetic layers tunnel across the dielectric.


    STT MRAM

    A newer technique called Spin Torque Transfer MRAM that uses Spin Torque Transfer technology. Under the Spin Torque Transfer technology, an electric current that is unpolarized (unpolarized current contains 50% electrons in each orientation) is passed through the fixed layer. This causes the electric current to be spin polarized and thus creates spin-polarized current.

    When this spin-polarized current is directed into the free layer, the angular momentum can be transferred to this layer. This changes the orientation of the free layer. Here the torque developed while changing the orientation of the current is transferred to the free layer. This creates parallel orientation between the fixed and free layer. If it is required to change the orientation from parallel to antiparallel, the current direction is reversed. Here, the electrons are sent from the free layer to the fixed layer; majority electrons pass to the fixed layer while a minority electrons are reflected; the reflected electrons transfers the angular momentum to the free layer thus changing the orientation to antiparallel.


    Spin Orbit MRAM

    SOT MRAM is a more recent kind of MRAM technology where an additional layer of heavy metal is attached to the free layer. When a current is passed through the heavy metal layer, a spin polarized current is created in direction perpendicular to the unpolarized current. This transfers its angular momentum to the free layer thus performing switching operation on free layer (ref). The basic principal is based on the idea that a charge current in a heavy metal creates a spin current in a transverse direction or Hall Effect (ref). While this is still a under-development technology, researcher at ETH Zurich have reported writing operations in the range of 100 picoseconds (ref) and (ref).


    Write Disturb in MRAM

    Writing to the memory is performed by changing the magnetic orientation of the plate whose magnetization can be changed. Conventional method of writing requires substantial amount of current on the word and bit lines (the fields being perpendicular to each other). Another negative point of MRAM is the induced field might effect multiple cells, thus limiting the scaling below 100nm (ref). This problem where the field overlaps multiple cells causes write disturb in MRAM.




    PCM

    Phase Change Memories(PCM) utilize the unique propoerty of some material to remain in multiple states as the basics of information storage. Since the change induced in the state remain until the state is again changed to some other state, or reverted back, the storage is non-volatile.


    The material that is commonly used is Chalcogenide glass, which can stay in amorphous as well as crystalline solid state. In older generation of PRAM, the state of the Chalcogenide glass was changed by production of heat by passing current through a heating element. This heating element is made of TiN, and was used to change the state of the glass to amorphous solid state by quick application (heated over 600 celcuis). Holding the heat for some time in the crystallization temperature range (cooled) would switch the state back to the crystalline state. Alternative technologies for changing states using laser have been proposed in research (ref).


    Amorphous state offers more resistance and thus represents a binary 0 state while crystalline state represents a binary 1 state.


    The material used in the PCM can exhibit multiple intermediate states between the amorphous and crystalline solid state. This allows for multiple bits storage in a single cell of the PCM. Intel and ST Microelectronis are currently working on a design with four states, two of which are partially crystalline state in addition to the previous two states. (ref).


    PRAMs are highly temperature sensitive as data can be lost with application of high temperature. However, they offer fast write times and have an endurance of about 100 million write cycles.



    Optane

    Following are the papers studied:





    1. Access with Low Request Scale:
      • Optane claimed to be upto 1000x faster than SSDs. But is it?
      • Optane SSD users should issue small requests and maintain small number of outstanding IOs.
      • Needed to extract low latency but also to exploit full bandwidth of Optane SSD
      • Variables of analysis are (i) request size and (ii) queue depth. Analysis of read and writes are performed.
      • For read operation, large request size and large queue depth does not work better with optane SSD.
      • For write operations, they are almost comparable in all the cases while optane still being poor for large request size and large queue depth.
      • Optane internally uses RAID like organization of memory dies.
      • The interleaving degree (number of channels) of the Optane and SSD are examined through experiments (ref 18 and 19) and found out to be 7 and 128 respectively. The info of 7 channels is also found in hardware description ( ref 3).
      • This shows Optane has limited internal parallelism.
      • The limited parallelism is one reason that Optane performs better with small number of queue depth.
    2. Random Access is OK
      • In SSD and HDD, better performance is seen with sequential access than random access. But, Optane is random access block device.
      • Experiments were performed for this for both SSD and Optane based system to prove.
      • Flash SSD performs better on sequential while Optane has comparable performance for read operations.
      • For smaller sized requests, Optane actually favors random writes over sequential writes. Similarly, flash SSD favors sequential writes only for small request size while in other cases, they are comparable.
      • Optane prefers random access because of the ability to perform in-place updates in 3D X Point memory. In Optane, there is no difference in address translation cost for random versus sequential workloads.
    3. Avoided Crowded Accesses
      • The client should not issue parallel access to a single chunk in Optane based system because Optane SSD contains shared resources.
      • Experiments performed to study the performance by issuing parallel requests for different sector within same chunk.
      • Latency increases with increase in queue depth.
    4. Control Overall Load
      • For optimal latency in Optane SSD, client must control overall load of both reads and writes
      • Observation derived from performance of Optane serving mixed reads and writes. In the experiment, we issue random 4KB requests, varying the percentageof writes from 0% to 100%, with QD= 64 (large enough toachieve full throughput for both Optane SSD and Flash SSD).
      • In Optane, reads and writes are treated equally. Latency plot overlaps for the plots with different ratio of writes/total access.
      • Latency was not a function of writes vs reads, but is dependent upon other factors of overall load.
      • For flash SSD, write operation increases the latency.
    5. Avoid Tiny Accesses
      • Client must not issue less than 4KB request.
      • Latency might be same for small requests but for maximizing throughput it is better to issue 4KB request.
    6. Issue 4KB Aligned Requests
      • For best latency, request should be aligned to 8 sectors that is 4KB.
      • In the experiment,we measure the latency of individual read requests (QD= 1);each read is issued to a position A+offset, where A is a random position aligned to 32KB and offset is a 512-byte sector within that 32KB.
      • A periodic latency observation is made which shows Optane favor aligned requests.
      • Best is when request are aligned to 8 sectors
    7. Forget Garbage Collection
      • No need to worry about garbage collection of Optane
      • In Flash SSDs, after the device is full, any further writes is slow because it triggers garbage collection. But the write latency foe Optane is sustained which shows garbage collection has no cost.
      • Optane SSD has LBA_based mapping. I have to study this myself later on.
      • Flash performs log-structured based file system thus the throughput pattern occurs when we read according to written order.




    Study of File Systems

    Following is the list of paper studied about the SSDs.









    Magnetic Disks



    Others

    Following are the papers studied: