Storage - System Overview

simplified structure.

Introduction

We have 500 - 700 GB of data to be stored and saved to tape & HDDs each day, therefore we have to design a proper pipeline.

This section has two parts: utilizing the tape and structure.

Utilizing the Tape

Loading a Tape Cartridge

Before a tape can be read, it has to be loaded into the tape drive first. This can be easily done using MSL2024's web management interface.

web management interface of msl2024.

Data Storage on Tape

Before data can be stored on tapes, I have to figure out how it works and how much data could be stored on each tape. I have connected the tape drive to compression server A using LC-LC fiber. Also, the library management interface was connected via RJ45 cable as well.

Based on instructions found on HP's website, I downloaded HPE Library and Tape Tools, HPE LTFS Cartridge Browser, and HPE LTFS Configuration Tool. LTFS stands for Linear Tape File System, which is required to store data on the tape. This file system, due to the physical limitation of the tape, does not support data deletion. When a file is removed, it will disappear in the file system, but the space it occupies will not be released. The only way to reclaim the space is to format the entire tape.

tape management tools.

Tape Capacity

The tape we are using is LTO6 which has a capacity of 6.25 TB per tape. However, since LTO has a built-in compression strategy, 6.25TB refers to data size before compression. In our case, where the data is already compressed, it is unlikely we can store 6 TB per tape.

Even though the OS shows each tape is capable of storing 2.29 TB of data, the information is not as accurate as for an HDD. After quick testing, I found each tape can handle about ~2.10 - 2.25 TB data.

capacity of tape in os

Disabling Compression

During the test, I found the data transferring speed to be very unstable. it ranges from 160 MB/s to 0KB/s, which is confusing for a linear file system without a cache, and I guess it could be due to the compression. I decided to turn off the compression. However, the problem still exists after compression was turned off. In addition, the amount of data that could be stored in tape does not change after compression was turned off, and I decided to leave it turned off.

disable compression

Copying Tools

Since compression is not the problem, I suspect it could be a Windows Explorer problem. Tape is a very rare storage medium today, and Windows is very good at doing "smart tricks underneath" so it is likely that Windows is somehow incompatible with the tape. As a result, I tried FastCopy, which is a better tool than Windows Explorer for large-scale data transfering.

The result, however, is even worse. FastCopy is never designed for a tape system, and mechanisms like parallel transferring could degrade the performance severely on a linear file system. Since the total amount of data copied to the tape using Windows File Explorer each day can satisfy our needs, I decided to leave this problem for now.

Structure

Since copying file to tape requires hours to finish, we have to prepare an additional day for each batch of data could be processed.

Storage Structure

Navigate through the Storage Section

Navigate through the Genetic Project