In 2020, data integrity is more important than ever to the scientific community. So why are labs still using spreadsheets to record their sample inventory?
Well, spreadsheets, such as Microsoft Excel, are quick to setup, simple to update and most people have experience of using them. Why should they look any further?
The obvious answer is just one word: errors.
There is a longer, equally significant alternative answer, which is: a spreadsheet is not a database.
Let us look closer:
Excel disasters affect the world
In 2005, Ray Panko, the world expert on spreadsheet errors, stated that “94% of the 88 spreadsheets audited in 7 studies have contained errors” . Spreadsheet errors have cost firms worldwide many millions of dollars – and the repercussions are not only financial.
A widely reported case in October 2020 caused UK Covid-19 results to be lost, delaying the contract tracking process and putting lives at risk. The BBC reported that “a badly thought-out use of Microsoft's Excel software was the reason nearly 16,000 coronavirus cases went unreported in England” . The older file format used for processing could not hold the number of records needed, and a lack of basic data controls meant the problem wasn’t spotted.
A different issue with less drastic results was reported by The Verge, where “some 27 human genes have been renamed, all because Microsoft Excel kept misreading their symbols as dates” . This default conversion doesn’t have the option to be turned off.
This problem is widespread. In fact, in 2016, Genome Biology reported that “approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions” 
The examples given above are down to methodology and programming failures, but many errors are simple data entry ones.
While knowledgeable Excel users can avoid some problems, not all users are expert. It is easy for mistakes to be introduced and people are prone to error. People are also very bad at spotting when they’ve made a mistake.
In 2016, Panko concluded that “spreadsheet developers and corporations are highly overconfident in the accuracy of their spreadsheets” 
Spreadsheet errors have become such a concern that a European Spreadsheet Risks Interest Group (EuSpRIG) has been formed to offer independent information on Spreadsheet Risk Management. Its website (http://www.eusprig.org) provides evidence of ‘horror stories’ and ways they could have been avoided.
Succeeding without Excel-ing
Setting up a spreadsheet might be simple, but keeping on top of your sample management is not!
Sample managers need to know many things, including:
- Is there an audit trail of every action and location update on that sample?
- Has someone updated the data and who were they?
- Can sample lineage be accurately tracked?
- Are the remaining amounts calculated as substances are aliquotted?
- Is it easy to capture additional information relating to a sample?
- Can I query the data to generate reports and metrics?
- Can I quickly find samples matching certain criteria?
- Can I communicate with lab automation to drive workflows and track processes?
- Can I scale the system as processes change or instruments are upgraded?
To answer these questions, you need a database. A spreadsheet tabulates data and can create graphs from it. A database is designed for the long-term storage of records that will be subject to changes. It defines and structures data and includes sophisticated summarization and reporting tools, so you can easily search across criteria and capture additional information.
A database also brings data integrity. Relational databases follow standardised integrity rules to ensure that the data they contain are accurate and accessible and human error is minimised. They also make it easier to track and manage user access and updates.
Finally, spreadsheets such as Microsoft Excel are not designed for regulated environments – but the biopharma industry is a regulated one. The chief requirements to meet for electronic records are the EU’s Annex 11 and the FDA’s 21 CFR Part 11. It is possible to meet these requirements with a spreadsheet, but the process is time-consuming. Spreadsheet errors are commonly cited on FDA warning letters.
Several companies offer database-based software designed specifically to manage and track samples used in life sciences, which aim to address all of these issues and provide the exact information that sample managers need to know. This sample management software or LIMS solutions are scalable and vary from cloud-based subscription services, to bespoke tools for multi-site management, and everything in between. Some solutions have been on the market for decades and are trusted by biopharma companies, contract research organisations and academia worldwide.
For instance, Titian’s Mosaic sample management software can log and track any sample, stored in any container, in any type of store, from collection to disposal, with a complete audit trail.
- Mosaic FreezerManagement is affordable software for managing and tracking all types of sample inventory in your freezers, that goes far beyond other inventory management systems.
- Mosaic SampleBank is the ideal solution for tracking and ordering from centralised sample banks as it offers extended automation integration and workflow management.
If you are researching sample management, try out our Resources pages where there are white papers, application notes, videos, webinars and more.
 R Panko, Spreadsheet Errors, What We Know, What We Think We Can Do, 2005
 BBC website, Why using Microsoft's tool caused Covid-19 results to be lost
 The Verge, Scientists rename human genes to stop Microsoft Excel from misreading them as dates
 Ziemann et al, Genome Biology, 2016, Gene name errors are widespread in the scientific literature
 R Panko, What We Don't Know About Spreadsheet Errors Today, Cornell University, 2016