Diagnosing Cancer Through Machine Learning
Imagine going to your GP for a screening test. Within minutes the test tells you whether or not you will get cancer in the near future. Would you really want to have access to such information? The fact is, it is already there in our bodies — you just have to decode it. This is the task of Dr. Kosmas Kepesidis, a physicist and data scientist who has recently joined the Broadband Infrared Diagnostics (BIRD) team at the Laboratory for Attosecond Physics. His name is fitting — ‘Kosmas’ derives from the Greek for ‘cosmos’ or ‘world’ and that is exactly what Kosmas studies: the microcosm of molecules in our blood. He does this with the help of algorithms, in other words, numbers.
The scientists on the BIRD team are developing a medical diagnostics tool to detect cancer based on the analysis of infrared light waves. These are emitted when ultrashort laser pulses excite molecules in the blood. The resulting spectra contain fingerprint-like information about the blood’s molecular make-up and thus the state of the patient’s health.
The problem is that, unlike the abstract models used in physics, biological systems are highly complex. Thousands of data points are collected and no one quite knows what to look for. Who even has the time to sift through them? Kosmas is therefore developing software which uses machine learning algorithms to carry out predictive modelling. In other words, he uses advanced computational methods to predict outcomes, such as whether a given molecular fingerprint is an indicator of early-onset cancer.
First, thousands of samples are collected from patients with and without cancer. Thus, we end up with two massive mounds of data. Eventually, the goal is to create further stacks to differentiate between distinct types of cancer. These mounds of data are then pre-processed. For instance, decisions have to be made about which patterns constitute ‘noise’ and can be ignored. Next, Kosmas performs a so-called ‘dimensionality reduction’, i.e. he ‘zooms in’ on those features of the data that are relevant. The third stage is the search for a model: which algorithm is most suited for cancer diagnostics? Kosmas hopes to use artificial ‘neural networks’, algorithms which very roughly approximate biological nervous systems in how they process high-level, as opposed to low-level information. Such algorithms work with abstract patterns rather than zeros and ones like conventional computer programs. For this to work, Kosmas needs lots of data, which are currently being collected in hospitals around the world. Finally, once a model is found, Kosmas will expose it to rigorous testing.
He hopes that his research will culminate in a simple-to-use app that enables physicians to analyse blood samples on the spot. How long it will take to develop this software is as yet uncertain. ‘There are a lot of factors involved.’ And while a screening test which, having analysed a blood sample, outputs either ‘cancerous’ or ‘non-cancerous’ is clearly sufficient for daily life at the doctor’s office, scientists of course want to know exactly which features of the molecular fingerprint are responsible for such diagnoses. But Kosmas relishes the risks and uncertainty involved in doing cutting-edge science. The tools change constantly. ‘I do not know what my work will look like in a few months’ time. I expect it to change a lot’.