Headline: How to stop wasting too much time on bioprocess data gathering, preparation and analysis?

Companies are losing big money by neglecting the data management process

The 2016 report on data scientists from CrowdfFlower reveals that people are wasting around 20% of working time on collecting data and around 60% of the time on cleaning and organizing data. It is around 80% of weekly time wasted on data gathering and analysis! 4 days over a week! A case study published by pharmaceuticalonline.com confirms that something should be done. The authors report that data input, capture and search time can be slashed by 75% when using a structured electronic data management system.

Over the last months, we interviewed bioprocess key players such as operators, scientists, and supervisors, and they all agreed about teammates, employees, and themselves spending easily 25-50% of their time on average on data manipulation, calculation, and analysis. It is on average 1-2 days a week! Every week! For every one of them!

The process of data input, capture, cleaning, organizing and analyzing is a costly process, largely realized manually in our industry. If we want to increase the efficiency and the profitability of bioprocess development and biomanufacturing to sustain growth and to enable new production processes, we have to develop and implement solutions to automatize the flow process of data treatment and data analysis, from its source up to the final point of use, actionable performance indicators. The sooner we implement the automatization of data management and data analysis, the easiest it will be and the larger the return on investment will be as well.

Traditional tools are incomplete and unfit

Different tools have been developed or adapted as an attempt to solve or to anticipate the problem of time-consuming data management and analysis. It is surprising how many teams are using Microsoft Excel as their “database” and as THE tool for process data analysis. Excel is a highly flexible tool for custom calculation and for “quick and dirty” analysis, having plenty of pre-configured functions and graph templates. However, everything has to be “programmed”, “customized” or “implemented” in Excel. Every piece of information has to be entered into a cell. Every formula has to be known and written as well. Then, every graph has to be configured. After this amount of effort, one will have access to some valuable performance indicators.

In addition to the time wasted to customize worksheets, two problems result from using Excel as a database and as a tool for data analysis of fermentations or cell cultures. First, a lot of time is lost in creating a new worksheet or a new workbook (usually from a custom template) for each and every new batch/experiment/production to launch the calculations with the new dataset. Second, the user ends up with a tremendous number of worksheets and workbooks as its “database” for process data. This “database” is not integrated. A lot of time is usually wasted to find back pieces of information from past batches and to compare results from different batch/experiment/production.

One way to accelerate the calculation of indicators is from using macros in Excel. It is an easy and performing tool to automatize some data cleaning and calculations. However, macros cannot replace the entire process of data management, capture from every source (ex.: probe, HPLC) up to the final point of use, a real-time and actionable performance indicator. Moreover, these macros have first to be developed, sometimes they have to be adapted to new situations, and they also have to be launched for every new fermentation/culture/dataset.  Some teammates have to be continuously dedicated to run, adapt and maintain these macros. It is NOT a long-term profitable solution. It is neither scalable.

Another attempt to solve the problem is to use an Electronic Lab Notebook (also known as electronic laboratory notebook, or ELN), combined with a Laboratory Information Management System (LIM or LIMS) and a Scientific data management system (SDMS). ELNs are software designed to replace paper laboratory notebooks and are generally used by scientists, engineers, and technicians to document research, experiments, and procedures performed in a laboratory. ELN accelerate the process of searching and retrieving pieces of information since everything is stored in a structured database. Even though some ELNs can be linked to some data sources for direct incorporation of data from instruments, they are NOT designed to run real-time calculations and to generate analyzed structured information.

Some companies will use a combination of different software programs to try fulfilling their needs. It is not rare to see an installation using the Distributed Control Unit (DCU) from a bioreactor maker for real-time monitoring of some signals, Microsoft Excel as the database to gather information from DCU, offline analyzers, manual tests results, and more, a Generic analysis software program (usually a statistical analysis program) for offline or delayed analysis to take operational decisions, and a Specialized graphics software to prepare the dataset and the analysis’ results for reporting and presentations. It is a costly solution since multiple software programs have to be bought, updates for all of them have to be acquired at some point and a lot of time is wasted by users to transfer information from programs to programs. Moreover, the user has to go through this data management process over and over again, for each and every new batch/experiment/production even with the use of templates.

Another problem with this approach is in the knowledge and the training required by the user to use the generic analysis programs such as a statistical analysis program. These programs are well developed and can run a multitude of different analysis using the same dataset as an input. The user has to know exactly which functions will generate valid results in a specific situation and how to perform an accurate interpretation of these results. Such statistical analysis programs are difficult to use by non-train users, if not completely worthless for them, regarding the specific purpose of having access to simple and actionable performance indicators. For such elaborated software programs, the learning curve can be very steep and long. It could be a strong obstacle in sharing information and make it “understandable” to every user.

Some of the biggest teams and companies have dedicated personnel to develop “home-made”/custom software programs or they hire a firm to develop one. The biggest advantage of this option is to have a software modeled for the exact and precise need of the organization. However, having a custom software program is a costly option since the organization needs to have dedicated personnel to develop and to maintain the software so as to make it evolve and to integrated new requirements from the users. Besides, the custom software program is normally not based on a common standard, making it incompatible with other software programs in the industry, and difficult to scale.

New solutions have to fit operations’ needs and be scalable

New solutions have to be developed keeping in mind that the “industrial biotechnologies” industry (the Biotech Industry, with the meaning of bioprocess development and biomanufacturing) is at the beginning of an exponential economic growth phase. Through the combination of synthetic biology with well-known bioprocesses, we already see more and more applications reaching commercial production. Consequently, there will be more and more new companies and more and more new users requiring new solutions for data management and data analysis. As for any new market sector where competition is increasing, users want to accelerate the development of new bioprocesses, as well as the speed of implementing and starting these new applications.

The new data management and analysis solutions have to be developed with the typical user, its needs and its daily routine at the center of the process. This is an obvious concept for software development in almost every sphere of applications, but to my opinion, it is not the case for most of the scientific software and of the process control applications.

For bioprocess operations efficiency, teams of users need to have each and every step of daily tasks performed by its members to be interconnected and integrated seamlessly. A good starting point is to make operations’ raw information as well as high-level performance indicators to be accessible in real-time and at any time afterward, to anyone involved, from anywhere. This being said, data security and safety has to be centric during the entire development of the new solutions.

To maximize performances and efficiency, a new tool has to meet the needs of modern bioprocesses users. The optimization of bioprocess final performance requires being firm about the result (product quality and yield) while being flexible in the means/operations to reach it. Accordingly, continuous optimization of the bioprocess final performance will be possible if users can have real-time (RT-) access to Significant Performance Indicators (SPI). RT-SPIs will result from Intelligent Automated Analysis (IAA) of bioprocess raw data, based on information available from both current operations and previous/historical batches (or time windows for continuous bioprocesses). Thus, IAA requires the automated gathering of bioprocess information and the automation of analysis, triggered as soon as a new piece of information is available from any source. Consequently, the optimization of bioprocess final performance requires information to flow automatically from its source (ex.: inline probe, offline HPLC, manual test) up to the final point of use, SPIs.

Consequently, the new solutions required to bring the Biotech Industry’s operations efficiency to the next level must possess specific characteristics. Regarding the Automation of Data Capture, the new solutions should enable:

  • Automatized combination into a single application of signal monitoring from bioreactors’ probes and from downstream units;
  • Automatized monitoring of inline and at-line analyzers as well;
  • Automatized retrieving of analysis results from offline analyzers such as HPLC, GCMS, etc.;
  • Reduction of data manipulations required while performing manual analysis of samples.

The dataset generated by all these sources of information has to be easily accessible by the users and by the analysis process units. Thus, information has to be stored into a Structured and Searchable Database:

  • Automatized gathering of each piece of information into a standardized database to facilitate further analysis;
  • Embedded search and reference tools to accelerate access to past/historical information.

Having real-time access to the entire set of information stored in a structured database will enable the implementation of Automated Data Analysis:

  • Automatized data preparation;
  • Embedded pre-configured “easy-to-use” algorithms for calculations performed by the majority of biotech scientists, operators and supervisors;
  • Automatically triggered “Pipeline of calculations” for new batches/experiments/productions.

Automated data analysis should be advanced enough to generate simple, actionable and practical Significant Performance Indicators:

  • “Easy to understand”, NOT requiring special scientific nor financial training to help taking enlighten decisions;
  • Rich enough to understand the whole culture behavior through yields and rates, in addition to buffer and medium composition;
  • Instinctive usability of the visualization tools for a quick and easy learning curve;
  • Meaningful, revealing dashboarding to see things on first sights.

Finally, the new solutions not only have to be relevant for one lab, for one bioreactor nor for one facility. The new solutions have to be developed to fit the needs for Scalability:

  • Facilitating the addition of new calculation and new “calculation recipes” or “calculation pipelines”;
  • Facilitating the addition of completely new types of analysis, for the tool to evolve with the industry and the technologies to come;
  • Enabling the addition of new pieces of equipment such as probes, sensors, and instruments;
  • Enabling the integration of new bioreactors, production lines, facilities, etc.;
  • Being flexible enough for the average user to modify existing dashboards and to add new visual tools;
  • Enabling seamless sharing of information with teammates;
  • Enabling the integration of other software and systems to facilitate the flow of information, to accelerate data processing;
  • Having access to frequent updates to avoid obsolescence;
  • Being robust enough and reliable to avoid having dedicated personnel for programming and maintenance;
  • Being an all-inclusive system to provide the “peace in mind” at each scale.

Some companies are developing solutions that encompass more and more of these characteristics. However, everyone suffers from its own historical paradigms making it difficult sometimes to adjust to the new needs of the actual industry. BioIntelligence Technologies Inc. is focusing its efforts on developing, maintaining and offering integrated systems that meet these needs of the growing and very interesting Biotech Industry.

Written by
Joel Sirois, P.Eng. PhD

October 2017

Copyright 2024 - BioIntelligence Technologies inc. - All rights reserved.