Biotechnology is one of the most promising technologies when it comes to replacing energy-intensive or environmentally polluting processes with sustainable alternatives. But it is precisely the enormous complexity of biological systems that makes it so difficult to reliably apply this technology on an industrial scale. The production of raw materials or pharmaceuticals is typically biotechnologically optimized in high-throughput processes in which thousands of steps are carried out in parallel. The issue with these automated systems is that they are specialized in one biotechnological process only.
In this context, the goal of the KIWI-biolab is to develop self-learning robotic systems able to independently determine the optimal experimental conditions for certain biotechnological processes using large-scale database search. Assisted by machine learning methods, the system should also be able to improve these processes and develop new solutions. In addition, the system must be able to recognize image data, such as microscopic images or other optical signals, to autonomously monitor the production process and optimize it.
The development of such a system requires a close, interdisciplinary cooperation of diverse research fields, including computer science, engineering and biotechnology. The KIWI-biolab gathers together international experts, collaborators and partners from industry, who distribute their contributions along four task forces: Active Learning, Model Predictive Control, Signal Processing, and Automation.
TF1: Machine Learning for Bioprocess Forecasting and Control and Improved Enzyme Screening
This joint Task Force of TU Berlin and the University of Hildesheim is involved in several machine learning tasks within the KIWI-biolab, building on the experience of the ISMLL Hildesheim in time series forecasting, supervised learning for heterogeneous data and meta learning on the one hand, and the experience of the bioprocess group at TU Berlin with mechanistic modelling of bioprocesses on the other hand.
Task Force 1 is developing forecasting models for bioprocesses, given culture conditions, feeding profiles and metadata of processes and organisms. Traditionally, these processes have been modelled by estimating the parameters of biologically and biochemically motivated mechanistic models. We propose to use domain agnostic forecasting models alongside hybrid models that incorporate mechanistic components and mass balances.
An important goal of the Task Force is to model the forecasting problem across experiments and organisms, in order to transfer information from old experiments to new ones, thus decreasing the number of experiments required for the characterization of a new process. This can be seen as an instance of (Bayesian) meta learning.
TF2: Model Predictive Control for High Throughput Bioprocess Development
Bioprocess development is rapidly accelerating pushed by the integration of advanced liquid handling stations with embedded parallel minibioreactors into modern laboratories. Nevertheless, the complexity of these parallel processes demands model-based methods for process design, optimization, scale-up, and scale-down to exploit its full capacity.
Even though some applications have been reported in parallel mini-bioreactors (Abt 2008), further development is needed to achieve a fully autonomous operation for up to 48 parallel dynamical experiments in a process-wide design and optimization scheme. This Task Force will deliver the backbone for online State and Parameter Estimation (SPE), nonlinear Model Predictive Control (nMPC), and online sustainable process optimization.
The State and PE challenges will be tackled using a robust MHE for complex nonlinear systems with event based simulations, appropriate estimation methods (e.g. Maximum Likelihood) and statistical error models, and advanced regularization techniques for handling large parameter spaces.
TF3: Advanced Signal Processing
The goal of Task Force 3 is to investigate how signal processing and visual computing methods can aid biomolecular processes through a fast online analysis of large amounts of data being generated in parallel platforms. A machine learning layer for data analysis and smart sensor development will select the analysis methods to be used, calibrate them, and assess its implementation for online operation. The different sensors, probes, enzymatic analyzes and microscopic images deliver noisy data. Tools processing the data and extracting valuable information are essential to maximize the knowledge gained from each experiment.
To this aim, the following methods will be implemented:
- Blind Signal Separation (BSS) for complex spectral data with high background noise and low concentration metabolites (mL-μL) in the cultivations.
- Spectral unmixing by (Bayesian) non-negative matrix factorization and matrix completion, where spectra of components may be partially known.
- Supervised learning of relevant properties of spectra/images from annotated data using deep learning.
TF4: Robot Operation and Control
Our team is responsible for operation and control of the High Throughput laboratories. That implies the data generation for feeding the models to be developed in Task Forces 1 and 2, and the experimental implementation and application of their artificial intelligence-based approaches, as well as the analytical strategies developed in Task Force 3.
The robotic facilities of the KIWI-biolab must be able to autonomously perform the experiments as designed by higher level applications, programs and, therefore, develop a robust and efficient process. The optimization strategies proposed by the machine learning artificial intelligence algorithms represent a major challenge for the control of the robotic facility. Effective and fast re-scheduling planning of resources and analysis methods is required to ensure robust operation under changing strategies.
The Model Predictive Control (MPC) strategy will be directly embedded to the dynamical machine learning algorithms developed in Task Forces 1 and 2, building an advanced active learning framework. Convolution Neural Networks algorithms on strictly continuous differentiable functions will be developed to describe the complex dynamics of the biological processes.