SILO GeoTIFF Download and Visualization Demo¶
This notebook demonstrates the Cloud-Optimized GeoTIFF (COG) functionality
in the weather_tools package.
We'll download rainfall data for a specific region in South Australia over
the past 9 days using reduced resolution (overview level) and visualize
the results.
Key Features Demonstrated¶
Spatial subsetting: Download only data within a polygon boundary
Overview levels: Use pyramid overviews for reduced resolution (faster downloads)
Time series: Retrieve multiple days of data efficiently
HTTP range requests: COG files allow downloading only the pixels we need
Setup and Imports¶
from datetime import date, timedelta
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib import cm
from shapely.geometry import Polygon
from weather_tools.silo_geotiff import download_and_read_geotiffs
Define the Study Area¶
We'll use a polygon in South Australia as our area of interest.
# Coordinates for the polygon (longitude, latitude)
coords = [
[136.0, -32.0],
[136.0, -36.2],
[140.0, -36.2],
[140.0, -32.0],
[136.0, -32.0],
]
# Create a Shapely Polygon object
study_area = Polygon(coords)
print(f"Study area bounds: {study_area.bounds}")
Study area bounds: (136.0, -36.2, 140.0, -32.0)
Define the Date Range¶
We'll download the past 9 days of rainfall data ending yesterday
(today's data may not yet be available).
# Calculate date range (past 9 days)
end_date = date.today() - timedelta(days=1) # Yesterday
start_date = end_date - timedelta(days=8) # 9 days total
print(f"Date range: {start_date} to {end_date}")
print(f"Number of days: {(end_date - start_date).days + 1}")
Date range: 2025-11-12 to 2025-11-20 Number of days: 9
Download Rainfall Data¶
Using read_geotiff_timeseries(), we'll:
Query only the
daily_rainvariableClip to our polygon geometry
Use
overview_level=2for 4× reduced resolution (faster, smaller file sizes)Stream directly from S3 without caching to disk (
save_to_disk=False)
Note: Overview levels work by powers of 2:
overview_level=0: 2× reduced resolutionoverview_level=1: 4× reduced resolutionoverview_level=2: 8× reduced resolution
For this demo, we'll use level 1 (4× reduction) to balance speed and detail.
data["daily_rain"]
(masked_array(
data=[[[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
...,
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
...,
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
[[ 2.0000e-01, 2.0000e-01, 2.0000e-01, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 2.0000e-01, 2.0000e-01, 2.0000e-01, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 3.0000e-01, 2.0000e-01, 2.0000e-01, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
...,
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00]],
...,
[[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
...,
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 3.0000e-01,
3.0000e-01, 4.0000e-01],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 3.0000e-01,
3.0000e-01, 3.0000e-01],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 3.0000e-01,
3.0000e-01, 3.0000e-01]],
[[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
...,
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 5.0000e-01,
5.0000e-01, 5.0000e-01],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 5.0000e-01,
5.0000e-01, 4.0000e-01],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 5.0000e-01,
4.0000e-01, 4.0000e-01]],
[[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,
0.0000e+00, 0.0000e+00],
...,
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 1.0000e-01,
1.0000e-01, 1.0000e-01],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 1.0000e-01,
1.0000e-01, 1.0000e-01],
[-3.2767e+04, -3.2767e+04, -3.2767e+04, ..., 2.0000e-01,
1.0000e-01, 2.0000e-01]]],
mask=False,
fill_value=np.float64(1e+20),
dtype=float32),
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -32767.0, 'width': 40, 'height': 42, 'count': 9, 'crs': CRS.from_wkt('GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]'), 'transform': Affine(0.10125, 0.0, 135.975,
0.0, -0.10119047619047619, -31.975), 'blockxsize': 40, 'blockysize': 2, 'tiled': False, 'compress': 'deflate', 'interleave': 'band'})
print("Downloading rainfall data from SILO S3...")
data = download_and_read_geotiffs(
variables=["daily_rain"],
start_date=start_date,
end_date=end_date,
geometry=study_area,
save_to_disk=False, # Stream from S3, don't cache
overview_level=1, # 4× reduced resolution
)
rainfall, profile = data["daily_rain"]
print(f"\nDownloaded data shape: {rainfall.shape}")
print(f" - Time steps: {rainfall.shape[0]}")
print(f" - Height: {rainfall.shape[1]} pixels")
print(f" - Width: {rainfall.shape[2]} pixels")
Downloading rainfall data from SILO S3... Downloaded data shape: (9, 42, 40) - Time steps: 9 - Height: 42 pixels - Width: 40 pixels
Visualize the Results¶
Plot each day's rainfall as a 3×3 grid with a shared color scale.
- Grey background indicates no data or masked values (ocean areas)
# Create 3x3 subplot grid
fig, axes = plt.subplots(3, 3, figsize=(15, 15))
axes = axes.flatten()
# Generate date labels
date_list = [start_date + timedelta(days=i) for i in range(9)]
# Calculate global min/max for shared colorbar scale
# First, clean all the data and collect valid values
all_valid_values = []
cleaned_rainfall = []
for day_data in rainfall:
day_cleaned = day_data.copy()
day_cleaned[day_cleaned < 0] = np.nan # Set negative values to NaN
cleaned_rainfall.append(day_cleaned)
# Collect non-NaN values
valid = day_cleaned[~np.isnan(day_cleaned)]
if len(valid) > 0:
all_valid_values.extend(valid.flatten())
# Determine shared scale
if all_valid_values:
vmin = 0 # Rainfall minimum is always 0
vmax = np.max(all_valid_values)
else:
vmin, vmax = 0, 1 # Fallback if no valid data
print(f"Shared colorbar scale: 0 to {vmax:.2f} mm")
# Create custom colormap with grey background for NaN values
cmap = cm.get_cmap("viridis").copy()
cmap.set_bad(color="#EAE9E9") # Grey for missing/NaN data
# Plot each day with shared scale
for idx, (ax, day_data, day_date) in enumerate(zip(axes, cleaned_rainfall, date_list)):
# Create the plot with shared vmin/vmax and custom colormap
im = ax.imshow(day_data, cmap=cmap, interpolation="nearest", vmin=vmin, vmax=vmax)
ax.set_title(f"{day_date.strftime('%Y-%m-%d')}", fontsize=12, fontweight="bold")
ax.axis("off")
# Add colorbar (all will show the same scale)
cbar = plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
cbar.set_label("Rainfall (mm)", fontsize=9)
plt.suptitle(
f"Daily Rainfall - South Australia\n{start_date} to {end_date}",
fontsize=16,
fontweight="bold",
y=0.995,
)
plt.tight_layout()
plt.show()
Shared colorbar scale: 0 to 24.40 mm
/var/folders/b0/wq9x9vj13dv_75w41x6dmcyh0000gn/T/ipykernel_40328/80531725.py:33: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.
cmap = cm.get_cmap("viridis").copy()
Summary Statistics¶
# Calculate statistics across the entire period
stats_data = []
for idx, day_date in enumerate(date_list):
day_data = rainfall[idx]
day_data[day_data < 0] = np.nan # Clean negative values
valid_data = day_data[~np.isnan(day_data)]
if len(valid_data) > 0:
stats_data.append(
{
"Date": day_date.strftime("%Y-%m-%d"),
"Mean (mm)": np.mean(valid_data),
"Median (mm)": np.median(valid_data),
"Max (mm)": np.max(valid_data),
"Min (mm)": np.min(valid_data),
}
)
else:
stats_data.append(
{
"Date": day_date.strftime("%Y-%m-%d"),
"Mean (mm)": np.nan,
"Median (mm)": np.nan,
"Max (mm)": np.nan,
"Min (mm)": np.nan,
"Std Dev (mm)": np.nan,
"Total (mm)": np.nan,
}
)
# Create DataFrame and display
stats_df = pd.DataFrame(stats_data)
print("\nRainfall Statistics Summary:")
print(stats_df.to_string(index=False, float_format="%.2f"))
Rainfall Statistics Summary:
Date Mean (mm) Median (mm) Max (mm) Min (mm)
2025-11-12 0.01 0.00 0.20 0.00
2025-11-13 0.09 0.00 1.80 0.00
2025-11-14 0.91 0.40 7.90 0.00
2025-11-15 0.63 0.00 24.40 0.00
2025-11-16 0.34 0.00 4.10 0.00
2025-11-17 1.71 1.10 15.90 0.00
2025-11-18 0.15 0.00 2.40 0.00
2025-11-19 0.14 0.00 2.40 0.00
2025-11-20 0.03 0.00 0.60 0.00
/Users/hfsi/Developer/weather_tools/.venv/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:867: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray. a.partition(kth, axis=axis, kind=kind, order=order)
Key Takeaways¶
Efficient Downloads: Using COG overview levels and spatial subsetting, we downloaded
only the data we needed without retrieving full continental grids.
HTTP Range Requests: The entire operation used HTTP range requests to fetch only
relevant pixels from S3, avoiding unnecessary data transfer.
No Local Storage: With
save_to_disk=False, we streamed data directly into memorywithout creating local cache files.
Flexible Geometries: The same approach works with any Shapely geometry (Point, Polygon,
MultiPolygon, etc.).
Next Steps¶
Use
save_to_disk=Truewith acache_dirto cache files for reuseDownload multiple variables simultaneously (e.g.,
["daily_rain", "max_temp", "min_temp"])