Bioinformatics

DNA

I'm interested in bioinformatics and have conducted a major project in order to help "unrevealed company" for their needs.

We compare RNA sequencing of patients, our own data, and combined public data on NCBI & ENA database to produce medical suggestions for patients, especially who has less-common disease.

Due to security concerns, I can only provide a very small amount of information here. I'm willing to provide more details if you are interested.


Hardware

  • 3 * DELL R740 each with dual Intel Xeon Platinum 8269CY
  • 3 * DELL R630 each with dual Intel Xeon E5-2678V3
  • 3 * HP DL380 G9 with dual Intel Xeon E5-2678V3
  • 1 * HP MSL2024 LTO6 Tape Library
Rack B computing cluster

Project Pipeline

1. Crawl JSON of all needed projects on ENA website.
2. Extract download links.
3. Circumvent download restrictions by proxy servers and automated scripts (aria2c & aspera)
4. Integrity check and send to analysis
5. result collection and error dealing
6. Storage of result and original data

project pipeline

Code Snippets

download script snippet 1