Using Large Earth Observation Models to Map Soy Infrastructure in Brazil - Ode Partners
Using Large Earth Observation Models to Map Soy Infrastructure in Brazil
How Ode and Trase teamed up to find unmapped grain storage facilities at scale.
Soy is one of the leading drivers of deforestation in Brazil. Slowing deforestation depends on better supply chain transparency and the ability for buyers, retailers, and regulators to trace a shipment back to its origins.
Understanding soy supply chains in Brazil means knowing where the infrastructure is: the silos, warehouses, and storage facilities that sit between the field and end use. Brazil's soy-growing regions span millions of square kilometers, and existing datasets of known facilities are incomplete. Small, rural storage sites are especially hard to catalog because they are smaller than the industrial hubs found in cities and they change more often as new farms come online.
To see if geospatial AI could help close this gap, the supply chain transparency initiative Trase partnered with Ode to build a two-stage detection pipeline powered by Large Earth Observation Models. Ode led the first stage, a coarse, neighborhood-level filter built on a simple logistic regression over embeddings from Clay, a geospatial foundation model. That stage hit 92% accuracy and 80% recall at the chip level, with the linear approach actually outperforming a more complex neural network. Trase then took the work further by layering on Google's AlphaEarth as a precise, point-level localizer, and using Clay again as a final filter to disambiguate confusable structures. Together the pipeline surfaced 4,200 high-confidence candidates for previously unmapped facilities across Brazil.
How the collaboration worked
Trase had just finished a strategy refresh that pointed to AI playing a much bigger role in their data work. Mapping commodity infrastructure is core to what they do, so they were looking for fire starters: working examples of what modern, foundation-model-powered workflows actually look like in practice. They came to Ode to help them explore.
The engagement kicked off with Trase introducing their team and their existing facility dataset. They were upfront that the data was likely incomplete, especially in rural areas where facilities are smaller, more numerous, and more dynamic than the static industrial sites near cities. The shared problem statement had two parts: help fill the rural gap, and find a way to get from coarse area predictions down to point-level locations, which is not something Clay does natively.
Before splitting up the work, Ode shipped a small embedding run over a few test municipalities. The point was to give Jailson Soares, the data scientist leading the work on Trase's side, something concrete to put his hands on. He needed to build intuition for what these embeddings represent conceptually, and to get tactical reps with the code: how to load them, how to manipulate them, how to attach geometries, how to train a model on top of them. Within a week and a half, the two teams had a shared vocabulary and a shared way of working with the new modality.
From there the project split into two parallel workstreams that converged at the end:
- Ode took the chip-level coarse filter, which required running Clay across the full Area of Interest.
- Trase took the path from chips to points, exploring how to pin down individual structures inside the regions Ode's filter would surface.
The two teams reconvened roughly two weeks later. Ode brought the full embedding run and a trained chip-level classifier. Jailson brought a working prototype that combined AlphaEarth at the pixel level with hand-labeled reference points. The pieces fit together cleanly, and the rest of the project was about hardening the pipeline and folding Clay back in as a final disambiguation step.
This split made sense for the project. The practitioners at Trase know this data better than anyone, including its quirks, its gaps, and what good looks like at the facility level. Ode's job was not to replace that expertise but to put new tools in front of it. By introducing foundation-model embeddings as a working modality and showing how to train on top of them, we gave the Trase team a way to analyze their own data faster and at larger scale, then got out of the way so they could push it further than we could on our own.
Defining the area of interest
To make the compute manageable, the first step was bounding the search. We vectorized a soy crop map from the GLAD Commodity Crop Mapping and Monitoring dataset covering South America and added a 50 km buffer around it. The buffer ensures we capture facilities that sit near but not directly within mapped soy-growing areas.
Even with that constraint, the AOI worked out to roughly 1.5 million chips, the largest embedding run Ode had ever done. It pushed us to add optimizations we now use everywhere: instead of downloading imagery one chip at a time, we pull a larger image that spans many chips at once and stream it through the GPU in a single pass. The throughput gains are significant, and the workflow is cleaner.
Compressing satellite data with Clay
A single 256x256 pixel Sentinel-2 chip across four spectral bands contains 262,144 values. Multiply that by 1.5 million chips and the numbers get unwieldy fast.
Clay, an open source Large Earth Observation Model, addresses this by compressing each chip into a 1,024-dimensional embedding, roughly 0.4% of the original satellite image file size. These embeddings lose information, but they retain enough to power downstream classification tasks. The benefits compound: lower storage costs, faster training, and quick iteration. Our final classifier trained in minutes on a consumer-grade desktop.
To generate embeddings, we pulled Sentinel-2 imagery for March and April 2024, near-harvest months when storage facilities are most visible, and created median composites from the 10-meter bands (red, green, blue, and near-infrared). Pixels were then aggregated into chips of 256 x 256 pixels, each covering 2.56 km x 2.56 km.
Figure 1. Each dot represents the location of a soy facility mapped by Trase at the start of the project. This dataset is known to be incomplete. Closing that gap is a primary motivation for this work. Known soy infrastructure in Brazil - 9608 facilities.
The embedding dataset combines two strategies. The first is a pair of offset grids, ensuring that facilities near grid borders still appear within at least one chip. The second applies small random jitters around known facility locations as a form of data augmentation, giving the model multiple views of the same infrastructure in different positions within the image.
Figure 2. The two embedding strategies. The offset grid (left) captures facilities near borders. The jittered embeddings (right) provide multiple views of each known location. This serves as cheap data augmentation and increases the count of positively lab
Stage one: chip-level classification (Ode)
The first stage, owned by Ode, asked a deliberately coarse question: does this 2.56 km x 2.56 km area contain soy infrastructure or not? Even a rough yes/no at this scale is valuable, because it lets the more expensive downstream steps focus on a fraction of the original search space. That gain shows up in geographic space, and it shows up again any time you want to rerun the analysis through time.
When working with a foundation model like Clay, you have a few options. You can use the embeddings directly and compute cosine similarity against known seed points. At the other end, you can fully fine-tune the entire model toward your task, which is the most flexible approach but dramatically increases data requirements and compute costs.
We took a middle path. Clay's encoder stayed frozen, and we trained a small logistic regression on top of the embeddings. This gave us the representational power of Clay without the overhead of full fine-tuning, and the frozen embeddings turned out to be more than sufficient.
Handling incomplete ground truth
THE CHALLENGE
The ground truth dataset contained 9,608 known facility locations and was almost certainly incomplete. In an early approach, treating every unlabeled chip as a negative example created a 1:22 class imbalance and, more problematically, mislabeled many true positives as negatives.
HOW WE APPROACHED IT
We switched to random negative sampling at a 1:5 positive-to-negative ratio. Instead of assuming all unlabeled chips are negative, we randomly sample a controlled number of negatives from the unlabeled pool. This sharply reduced label noise. We also applied class weights (3.0 for positives, 0.6 for negatives) to further balance the training signal.
Geographic generalization
THE CHALLENGE
Spatial autocorrelation is a well-known issue in geospatial modeling. Nearby locations tend to look alike, so a model that trains and tests on interleaved data can appear to perform well while actually memorizing spatial patterns rather than learning meaningful features.
HOW WE APPROACHED IT
We split the data geographically, training on everything south of approximately latitude -16 and testing on everything north of it. This ensures the model is evaluated on a region it has never seen. The training set contained 215,334 samples (35,889 positive) and the test set contained 69,732 samples (11,622 positive), with roughly equal proportions of positives in each.
Figure 5. Geographic train/test split. The model trains on southern data (blue) and is evaluated on the northern region (red).
Initial results
The model trained in minutes and posted these metrics on the geographically distinct test set:
| Logistic Regression | Neural Network | Delta | |
|---|---|---|---|
| Accuracy | 91.7% | 91.4% | +0.4 |
| Precision | 73.2% | 75.6% | -2.4 |
| Recall | 79.6% | 71.2% | +8.5 |
| F1 Score | 76.3% | 73.3% | +3.0 |
| AUC | 96.1% | 95.3% | +0.9 |
The fact that logistic regression outperformed the neural network is a real takeaway. It means the foundation model's representations are already doing most of the work and piling more model complexity on top doesn't really help.
Telescoping to higher resolution
A 2.56 km chip is large relative to a soy storage facility, so chip-level predictions tell you the neighborhood, not the address. To tighten the search before handing off to the point-level pipeline, we generated a second set of embeddings at 128x128 pixels (1.28 km x 1.28 km) inside the high-confidence chips, then trained an identical classifier on the smaller chips with the same geographic split.
Figure 6. Two 256x256px chips outlined with a heavy orange border. The red, pink, and white boxes represent the modeled results on the smaller chips, with dark red representing the highest probability of containing a facility. The orange dot is a known fac
| Logistic Regression | Neural Network | Delta | |
|---|---|---|---|
| Accuracy | 81.7% | 84.4% | -2.7 |
| Precision | 58.4% | 68.5% | -10.1 |
| Recall | 72.0% | 59.9% | +12.1 |
| F1 Score | 64.5% | 63.9% | +0.5 |
| AUC | 86.4% | 87.2% | -0.9 |
Telescoping was useful but slightly less accurate than the full chip, possibly because the smaller field of view loses helpful context. Taking the max prediction across the four quadrants still nudged the search area in the right direction, which is what matters for the next stage.
Stage two: from neighborhoods to exact locations (Trase)
While Ode was finishing the coarse filter, Jailson was building the path to point-level predictions. He worked in parallel and operated mostly off hand-labeled reference points (around 300 of them) rather than waiting on Ode's chip output, which let both teams move fast.
His pipeline combined Google's AlphaEarth as a pixel-level localizer with Clay as a patch-level disambiguator.
Figure 7. A probability heat map from the AlphaEarth model. Green indicates high probability of soy infrastructure. The bright green directly over known silos is exactly what we want to see.
Stage 2A, pixel-level candidates with AlphaEarth.
AlphaEarth produced dense probability surfaces over the search area, surfacing the actual shapes of candidate structures.
This process is hosted on Google Earth Engine (GEE) platform, a cloud-based environment that enables quick access to AlphaEarth embeddings across continental scales. The embeddings are then paired with the reference samples and used as input features to train a random forest classifier, which produces a dense probability map over the search area, revealing the spatial pattern of candidate structures.
Stage 2B, patch-level refinement with Clay.
The candidate points from the first sub-stage were used to extract 64x64 pixel Clay patches, which were passed through a dedicated neural network classifier trained on 321 facility reference samples. This step was critical because the AlphaEarth signal was getting confused by other rural infrastructure that looks structurally similar from above, particularly poultry CAFOs. Clay let Jailson sort high-confidence candidates into the thing we want and the things we don't want before any human review.
The combined approach detected an additional 4,200 previously unmapped soy facilities across Brazil, filling gaps in traditional datasets and providing confidence levels at both the pixel and patch level.
Iterative refinement
The pipeline is designed to be repeated. Each round produces high-confidence predictions that a human analyst can review, in our case through visual inspection in QGIS. Confirmed true positives get added to the ground truth, strengthening the training data for the next iteration. The false positive rate drops as the negative class gets cleaner, and recall improves as previously missing facilities enter the labeled set.
This matters given the starting point: 9,608 known facilities across millions of square kilometers. No single model run will find everything, but each cycle closes the gap.
Final results
The pipeline operates at two scales.
- Chip-level filtering. The fine-tuned Clay model at 2.56 km resolution achieved 92% accuracy and a 76% F1 score on a geographically distinct test set, validating its ability to generalize across large unseen areas. This first filter focuses higher resolution, more expensive approaches on areas with a higher likelihood of yielding new true positives.
- Pixel-level localization. The AlphaEarth plus Clay pipeline identified 4,200 additional soy storage facilities across Brazil, moving from neighborhood-level detection to point-level specificity.
The chip-level model trained in minutes on consumer hardware. The embeddings themselves required substantial compute to generate, but they compressed 1.5 million satellite chips into a format that makes rapid iteration possible from then on.
What the collaboration produced
Beyond the 4,200 new candidate facilities, the engagement left Trase with three things they did not have when it started.
- The first is a coarse-filter technique they can apply again, on their own, whenever they need to narrow a large geographic or temporal search space before running expensive analysis.
- The second is hands-on familiarity with foundation-model embeddings as a working modality, including the practical mechanics of generating, storing, and modeling on top of them.
- The third is a working two-stage pipeline, half built by Ode and half built by Trase, that can be pointed at new commodities or new regions with relatively little adaptation.
A few technical lessons stood out and are worth carrying into future projects:
- Random negative sampling and overlapping positive chips created a dataset suitable for training downstream classifiers on Clay embeddings. The same strategy transfers to other domains where positive labels are sparse and unreliable.
- Geographic train/test splits provided honest performance estimates by preventing spatial autocorrelation from inflating metrics. The split was harsh, and the metrics still warranted excitement.
- Clay's composability made it useful at multiple stages of the pipeline. At 256 pixels it served as a coarse filter, at 128 pixels it telescoped into neighborhoods, and at 64 pixels it refined AlphaEarth's pixel-level candidates. The same frozen encoder, applied at different resolutions and paired with different downstream models, addressed fundamentally different tasks without retraining the foundation model itself.
- These techniques are not limited to soy infrastructure. The same pipeline could be adapted to map other types of facilities, from mining operations to energy infrastructure to agricultural processing plants, anywhere that incomplete ground truth and vast geographic areas make manual identification impractical.
There is still work to do, but Trase's vision of a global database for commodities linked to deforestation is closer than ever. With that kind of system in place, companies could align traceability and reporting across borders, making it easier to meet deforestation-free commitments and act on them consistently.