Managing TBs of data is not an easy task, and we have to develop a series of scripts to automate most of the processes.
This section discussed basic design principles while developing the managing scripts.
Since I'm not the only one who manages the data, the script needs to be friendly enough. That is, each script has to be able to run on its own by a normal user.
Modularize and Parameterize
It is possible that we need to modify the configuration in the future (and we did change the compression parameter). Therefore, I decided to make the script configurable as much as possible, and avoid using hard-coded values when possible.
In addition, most reusable functions should be separated into individual files.
When possible, multithread should be used. Tasks like extracting download links of 50 accessions at the same time would greatly benefit from multithread processing. The server we used generally has over 40 logical cores and is capable of running 50 processes at the same time in most cases.
Navigate through the Script Development Section
- Design Principle
- Major Scripts
- Recovery from Missing Files
- Human Interaction Improvement and Auxiliary Scripts