MIT’s new validation technique for more accurate forecasts

MIT’s new validation technique for more accurate forecasts

Researchers often make spatial predictions, like sea surface temperature for weather forecasting, air pollution for health studies, or invasive species prevalence for ecological management.

Trusting these predictions is crucial, but traditional validation methods struggle with mismatches between validation and prediction locations. This can make a forecast seem accurate when it isn’t.

MIT researchers developed a technique to assess prediction-validation methods and proved that two classical methods can be wrong for spatial problems. They found out why these methods failed and created a new method for spatial predictions.

Their new method was more accurate than traditional techniques in experiments with real and simulated data. They tested it with realistic spatial problems, like predicting wind speed at Chicago O-Hare Airport and forecasting air temperature at five U.S. metro locations.

This method could help predict sea surface temperatures, estimate the effects of air pollution on diseases, and more. It could also lead to more reliable evaluations of new predictive methods.

As an MIT associate professor, Tamara Broderick worked with oceanographers and atmospheric scientists to develop machine-learning models for spatial problems. In these settings, they found traditional validation methods inaccurate.

Their analysis showed that traditional methods make wrong assumptions about spatial data. Validation and test data are often not independent and identically distributed, especially in spatial applications.

For example, EPA air pollution sensors used for validation aren’t independent since their locations depend on other sensors. If validation data is from cities and test data is from rural areas, they have different statistical properties and aren’t identically distributed.

Broderick notes that their experiments showed incorrect results in spatial cases when these assumptions broke down. The researchers needed a new approach.

In spatial contexts, researchers designed a method assuming validation and test data vary smoothly in space. For example, air pollution levels are unlikely to change dramatically between neighboring houses.

This regularity assumption suits many spatial processes and allows for evaluating spatial predictors. According to Broderick, no one has systematically evaluated these issues to improve methods.

To use their technique, one inputs their predictor, prediction locations, and validation data; the system estimates the accuracy of the location. However, assessing their validation technique effectively was challenging.

Broderick explains, “We are not evaluating a method but evaluating an evaluation. So, we had to step back, think carefully, and get creative about the appropriate experiments we could use.”

First, researchers tested their methods using simulated data with controlled parameters. Then, they used more realistic, semi-simulated data by modifying real data. Finally, they conducted experiments with real data.

They evaluated their technique using three data types: predicting flat prices in England and forecasting wind speed. In most experiments, their method was more accurate than traditional ones.

In the future, they plan to improve uncertainty quantification in spatial settings and explore other areas, such as time-series data, where their assumption could enhance predictor performance.

The research will be presented at the International Artificial Intelligence and Statistics Conference.

Journal Reference:

  1. David R. Burt, Yunyi Shen, and Tamara Broderick. Consistent Validation for Predictive Methods in Spatial Settings. arXiv: 2402.03527v2

Source: Tech Explorist

Tags: