Q and A with Dr Nicola Mulder on bioinformatics and H3Africa
November / December 2015 | Volume 14, Issue 6
Nicola Mulder, Ph.D.
Nicola Mulder, Ph.D., is a Professor at South Africa's University of Cape Town and heads its bioinformatics center, the Computational Biology Group, which is playing a leading role in developing
H3ABioNet, the NIH-funded Pan-Africa Bioinformatics Network for
Human Heredity and Health in Africa (H3Africa). She is one of the founding members of the Global Organisation for Bioinformatics Learning, Education and Training. Mulder has a doctoral degree in medical microbiology and spent nearly a decade at the European Bioinformatics Institute before returning to South Africa. In 2014 and 2015, she was included on the Thomson Reuters highly cited list of the world's most influential scientific minds.
What is H3ABioNet and why is it important?
Basically, it's a Pan-African bioinformatics network to build capacity for genomics research in Africa. It includes over 30 institutions on the continent and two abroad. We are in existence because the H3Africa program is going to be generating a lot of data from multiple projects, multiple sites and different types of data from clinical, metagenomics and pharmacogenomics studies. We need to manage, store, process and interpret this data to turn it into knowledge. Historically, researchers in Africa have sent their samples overseas and got somebody else to analyze the data. Our network wants to switch that mentality and empower the African scientists to do their own research, do the data interpretation and do the analysis. If somebody else handles your data, it's much harder to go back and say, "Well, have you tried this, have you tried that, what about this?" whereas if you start the analysis from scratch, you have the power to do that. Generally, they haven't had the resources, or the training to do this before.
What progress have you made?
We've held 19 workshops, trained about 456 people and also placed 25 fellows in one- to two-month internships. Bioinformaticians usually need specialized training, as do bioinformatics users. Clinicians and geneticists doing the science have seldom been trained in statistics, large-scale data manipulation and analysis. Actually, bioinformatics is one of the cheapest sciences you can do, because you don't need lab equipment - you need the data and somebody to work with it on a computer. It does bring challenges because the Internet is essential and speed is generally very poor in Africa, let alone when the electricity goes down. Another challenge is the lack of infrastructure for people to work with when they get home after training.
To sustain the program, we are developing trainers and supporting institutions as they establish their own degree programs. In the first two to three years, all the trainers were international, but then it started switching over and some courses have gone 80-100 percent local, others perhaps 50 percent local. The University of Bamako in Mali just launched their first master's degree program in bioinformatics, institutions in Kenya are going to start soon, and some in Tanzania and Nigeria also want their own program. Some institutions have managed to leverage research funding and, in general, scientists are more confident to write grants together and explore new research opportunities. At the University of Cape Town, we're developing a brand new master's degree in data science with the statistics department. That's going to train data scientists who are better able to manage large datasets from different disciplines.
What training is needed?
There are various kinds of specialized training. We split it into the bioinformatics user, the bioinformatics scientist and the bioinformatics engineer. And you need people across that spectrum. So some in the genetics lab will be the users - take the results coming out of big pipelines and then do the downstream analysis and the interpretation of the data. They just need a little bit of computational skills to know the programs to use, how to move data around and how to visualize and interpret it.
Your engineer is on the technical side, developing the algorithms, setting up the pipelines on the high performance cluster, making sure that there's sufficient storage available that they can process on and setting up the infrastructure. Because the data's becoming so big, there's no way you could run it on a PC, so people are having to look at Cloud solutions and other options.
Then you've got your scientist who's really doing quite a lot of the data analysis and the interplay between the user and the engineer. In terms of interpretation, they need to ask, "Is it making sense statistically, is there the power to say what I'm saying about this set of samples, is it valid?" So in my opinion, all lab people who are going to do any form of high throughput biology need to have statistics and at least a little bit of programming. You don't need to be an expert, because the bioinformatics engineer can help you. But if the data comes back and you see that something could be tweaked, you need to know the parameters - if we did change this, how would it change the results - and the different kinds of statistical tests and which is the most appropriate for the data.
How does it work?
We've got an e-learning section - that's where we are collecting information on existing online courses - and then we have courses that we've run. To design those, we work with the H3Africa consortium and say, right, what data's coming next, what course do you need next and when do you need it? And then we set up that course, usually a five-day intensive training course, and one of our training nodes will bid to host that course - based on where the demand is, the cost and getting the lecturers in and out - and when that's decided, we'll offer the course. There's such a need, we get 100 applications for 25 places. We want to run more regional courses or workshops and encourage Africa-Africa collaboration. If you go to the U.S. for training and come back to an African institution, there are usually not the same computing facilities in place. If it was in another African setting, it may be slightly more similar to what's at home.
How do you see genomics developing in Africa?
When you've got the data and you know what to do with it, that can lead to new discoveries, because there are so many ways you can look at the same dataset. You give it to one group and they'll come up with one answer, you give it to another group and they'll ask a totally different question of the same data. So, empowering African scientists to think out of the box is paving the way for novel discoveries. We're trying to impress upon people that just because you're in Africa with limited resources, it doesn't mean you're going to be behind in the field, you can find a niche area and you can lead the field.
More Information
To view Adobe PDF files,
download current, free accessible plug-ins from Adobe's website.