Machine learning algorithms developed to select high-yield food crops could be applied to ‘hyperspectral analysis’ in other disciplines, from astronomy to espionage
— by University of Illinois at Urbana-Champaign
To help researchers better predict high-yielding crop traits, a team from the University of Illinois have stacked together six high-powered, machine learning algorithms that are used to interpret hyperspectral data. They demonstrated that this technique improved the predictive power of a recent study by up to 15 percent, compared to using just one algorithm.
Hyperspectral data comprises maps of the full light spectrum — not just the visible wavelengths — and has many other applications, from understanding the health of the Great Barrier Reef to tracking the rate of loss of the Amazon rainforest.
“We are empowering scientists from many fields, who are not necessarily experts in computational analysis, to translate their enormous datasets into beneficial results,” said first author Peng Fu, a postdoctoral researcher at Illinois, who led this work for a research project called Realizing Increased Photosynthetic Efficiency (RIPE). “Now scientists do not need to scratch their heads to figure out which machine learning algorithms to use; they can apply six or more algorithms–for the price of one–to make more accurate predictions.”
RIPE for the picking of high-yield crops
RIPE, which is led by Illinois, is engineering crops to be more productive by improving photosynthesis, the natural process all plants use to convert sunlight into energy and yields. RIPE is supported by the Bill & Melinda Gates Foundation, the U.S. Foundation for Food and Agriculture Research (FFAR), and the U.K. Government’s Department for International Development (DFID).
In a recent study the team introduced spectral analysis as a means to quickly identify photosynthetic improvements that could increase yields. In this new study, published in Frontiers in Plant Science, the team improved their previous predictions of photosynthetic capacity by as much as 15 percent using machine learning, where computers automatically applied these six algorithms to their dataset without human help.
“I’ve loved seeing what’s possible when you can use computational power to exploit the data for all its worth,” said co-author Katherine Meacham-Hensold, a RIPE postdoctoral researcher at Illinois, who led the previous study in Remote Sensing of Environment. “It’s exciting to see what a data analyst like Peng can do with my data. Now other non-data-analyst scientists can test several powerful algorithms to figure out which one will help them leverage their data to the fullest extent.”
Stacks of applications
Further studies will prove the relevance of this stacked algorithm technique to the plant science community and other fields of study.
“By applying the expertise of data analysts to address the needs of plant physiologists like myself, we ended up refining a technique that is relevant to other hyperspectral datasets,” said co-author Carl Bernacchi, a RIPE research leader and scientist with the U.S. Department of Agriculture, who is based at Illinois’ Carl R. Woese Institute for Genomic Biology. “The next step is to test more stacked machine learning algorithms on datasets from many more crop species and explore the utility of this technique to estimate other parameters, such as abiotic stresses from drought or disease.”
“As scientists, we should try to use our domain knowledge to explain advanced performance from machine learning methods,” said co-author Kaiyu Guan, an assistant professor in Illinois’ College of Agriculture, Consumer, and Environmental Sciences (ACES). “Combining computational methods and domain disciplines allows us to possibly unravel what causes the measurable differences in hyperspectral datasets–which is an unsolved mystery in our work and worth future exploration.”