The Genomics industry has grown astronomically in the past decade, and it will only keep growing as the demand for personalized medicine keeps increasing, and Direct-to-Consumer DNA testing continues to flourish. Not to mention the cost for having your entire genome sequenced has decreased significantly in recent years. Thanks to this huge increase in testing we now have a colossal amount of genomic data. The public archives for raw sequencing data has been doubling in size every 18 months! Also, keep in mind: One whole genomic sequence creates approximately 200 gigabytes of raw data, and it should be noted that the actual analysis of the data will create additional gigabytes of data and require even more computing power. That brings us to the point of this write-up: How and where do we securely store this highly sensitive data that also happens to require a lot of computing power? This is where Cloud Computing comes into the picture…
The Cloud Service Models:
Infrastructure As-a-Service (IaaS)
Primary Business driver is a large-scale raw compute & storage
Primary threats include lack of due diligence for tenant isolation
Platform As-a-Service (PaaS)
Primary business driver is developing and deploying applications
Primary threats include poor or absent DevSecOps
Software As-a-Service (SaaS)
Primary business driver is specific service consumption
Primary threats include lack of granularity of data access controls
The Cloud Deployment Models:
PRIVATE
MAJOR BUSINESS DRIVER IS LOW RISK TOLERANCE & HEAVY REGULATIONS
TOP CONCERN: HIGHEST OPERATING COSTS
PUBLIC
MAJOR BUSINESS DRIVER IS LOWER COST OF OWNERSHIP
TOP CONCERN: SECURITY GOVERNANCE
COMMUNITY
MAJOR BUSINESS DRIVER IS KNOWLEDGE ACCESS & SHARING
TOP CONCERN: GRANULAR COMMUNITY SHARING
HYBRID
MAJOR BUSINESS DRIVER IS DIVERSE SERVICE NEEDS
TOP CONCERN: PROPER FEDERATION CONTROLS
There are many more benefits to Cloud Computing, including: Flexibility, Capital Cost Control, Access to skilled staff, and Environmental staff. Let’s go into more detail about the ways in which Genomics specifically could benefit from Cloud Computing.
Elasticity, in particular, would be a great advantage for the world of Genomics. This means a researcher can use as many computers as needed to finish an analysis, this makes the research process much quicker. This also gives multiple researchers the ability to contribute to the same research project and share data effortlessly. Another perk to ‘elasticity’ is it allows the user to rent resources and only pay for what actually gets used. Cloud resources are rented in virtual slides called ‘instances.’ Providers advertise a menu of instance types with their capabilities listed: amount of disk space, processor speed, amount of memory, etc.
A container is a standard unit of software that packages up code and all of its dependencies so the application runs quickly and reliably from one computing environment to another. The fact that containers are made freely available for download saves a lot of time..especially for the researcher who otherwise could spend upwards of a year to develop genomic analysis.
Even with all the positives associated with transferring to The Cloud there are still security concerns that are holding back organizations in the Genomics industry from making the switch. In fact, a lot of them are still relying on the use of HPC (High Performance Computing) instead of the much more storage friendly and convenient cloud computing. My main area of focus in regards to The Cloud is security and data privacy. I go into depth in these previous blogs on why protecting genetic data is so important: Consumer Genetic Testing & Privacy Concerns Part 1 and Part 2. Another topic it’s important to touch on when discussing Genomics and data protection is de-identification and the reality of re-identification. I go into depth on this topic in this blog.
Some Top Threats to Cloud Computing via the CSA (Cloud Security Alliance)
Data breaches
misconfiguration-inadequate change control
lack of cloud security architecture & strategy
account hijacking
insufficient identity, credential, access & key management
One of the benefits to the cloud is also one of the same things that increases its security concerns. That is the offering of broad network access, that same accessibility that comes with a broad network is also one of the things that makes it more accessible to malicious actors.
Something else to consider: When we talk about Genomic data it is mostly being analyzed in a research or healthcare environment. On average healthcare institutions only spend about 2-3% of their budget on information technology…and more specifically only 0.19% of their overall budget on security! For comparison, financial institutions spend over ten times that (33%).
Cloud Computing is just going to keep becoming more popular and in the process it will continue to change how organizations manage computational resources, not to mention changing the convenience and collaboration for Scientists involved in research.
In the next part of this series we will go into more depth about security concerns and dive into the laws and regulations that apply to GENOMICS in The Cloud (e.g. dbGaP. HIPAA, GINA, etc.)