When seasonal rains arrive late in Indonesia, farmers often take it as a sign that it is not worth investing in fertilizer for their crops. Sometimes they opt out of planting annual crops altogether. Generally, they're making the right decision, as a late start to the rainy season is usually associated with the state of El Niño Southern Oscillation (ENSO) and low rainfall in the coming months.
New research published in the Nature journal Scientific Reports shows that ENSO, the weather-shaping cycle of warming and cooling of the Pacific Ocean along the Equator, is a strong predictor of cacao harvests up to two years before a harvest.
This is potentially very good news for smallholder farmers, scientists, and the global chocolate industry. The ability to predict harvest sizes well in advance could shape on-farm investment decisions, improve tropical crop research programs, and reduce risk and uncertainty in the chocolate industry.
Researchers say that the same methods - which pair advanced machine learning with rigorous, short-term data collection on farmer practices and yields - can apply to other rain-dependent crops including coffee and olives.
"The key innovation in this research is that you can effectively substitute weather data with ENSO data," said Thomas Oberthür, a co-author and business developer at the African Plant Nutrition Institute (APNI) in Morocco. "Any crop that shares a production relationship with ENSO can be explored using this method."
About 80 percent of global cropland depends on direct rainfall (as opposed to irrigation), accounting for almost 60 percent of production. But rainfall data is sparse and highly variable in many of these regions, making it difficult for scientists, policymakers and farmers groups to adapt to the vagaries of the weather.
No weather data? No problem
For the study, researchers used a type of machine learning that did not require weather records for the Indonesian cacao farms that participated in the research.
Rather, they relied on data on fertilizer application, yields and farm type, which they plugged into a Bayesian Neural Network (BNN) and found that ENSO phases predicted 75 % of the variation in yields.
In other words, the sea-surface temperature of the Pacific accurately predicted cacao harvests in a large majority of cases for the farms in the study. In some cases, accurate predictions were possible 25 months before the harvest.
For the uninitiated, a model that can accurately predict 50% of yield variation is usually cause to celebrate. And such long-range predictive accuracy for crop yields crops is rare.
"What this allows us to do is superimpose different management practices - such as fertilization regimes - on farms and deduce, with a high level of confidence, those interventions that work," said James Cock, a co-author and emeritus researcher at the Alliance of Bioversity International and CIAT. "This is a whole paradigm shift toward operational research."
Cock, a plant physiologist, said that while randomized control trials (RCTs) are generally considered the gold standard in research, these are extremely costly and consequently often impossible to perform in developing tropical agricultural areas. The approach used here is much lower cost, requires no expensive collection of weather records and provides useful guidelines on how to better manage crops under variable weather.
Ross Chapman, a data analyst and the study's lead author, explained some of the key benefits of machine learning methods over conventional data analysis approaches:
"The BNN modeling differs from standard regression modeling because the algorithm takes input variables, such as sea-surface temperature and farm type, and then automatically 'learns' to recognize responses in other variables, such as crop yield," Chapman said. "The learning process uses the same fundamental process that the human mind learns to recognize objects and patterns from real-life experience. In contrast, standard models require manual supervision of different variables via human-generated equations."
The value of shared data
While machine learning may promise better crop yield predictions in the absence of weather data, scientists - or farmers themselves - still need to accurately collect certain production information and have that data readily available if machine-learning models are going to work.
In the case of the Indonesian cacao farms in the study, farmers had been part of a major chocolate company's training program on best practices. They kept track of inputs such as fertilizer application, freely shared that data for analysis, and an organization with a local presence, the International Plant Nutrition Institute (IPNI), kept tidy records for researchers to use.
In addition, scientists had previously divided their farms into ten similar groups, where topography and soil conditions were similar. The researchers used data on harvests, fertilizer applications and yields from 2013 to 2018 to build their model.
The knowledge gained by cacao growers gives them confidence on how and when to invest in fertilizers. The agronomic skills this vulnerable group acquired shields them against a loss in their investment, which typically occurs when weather is adverse.
Thanks to their collaboration with the researchers, now their knowledge can be, in a way, shared with growers of other crops in other regions of the world.
"This research could not have happened without dedicated farmers, IPNI and a strong farmers' support organization, Community Solutions International, to pull everyone together," Cock said, emphasizing the importance of multidisciplinary collaboration and balancing stakeholder's different needs.
"What scientists want is to know why something happens," he said. "Farmers want to know what works."
APNI's Oberthür said strong predictive modeling could benefit both farmers and researchers, and fuel further collaboration.
"You need to have tangible results if you're a farmer who is also collecting data, which is a lot of work," Oberthür said. "This modeling, which can provide farmers with beneficial information, may help incentivize data collection since farmers will see that they are contributing to something that provides benefits to them on their farms."