weather tools
Download COG Geotiff from S3
Initializing search
    Harryeslick
    • Home
    • CLI Reference
    • Changelog
    • API Reference
    • Examples
    Harryeslick
    • Home
    • CLI Reference
    • Changelog
      • SILO API
      • Met.no API
      • Met.no Models
      • Merge Weather Data
      • Local NetCDF
      • Map of SILO stations
      • Local NetCDF
      • Plotting
      • SILO API
      • Forecast data
      • Download COG Geotiff from S3

    SILO GeoTIFF Download and Visualization Demo¶

    This notebook demonstrates the Cloud-Optimized GeoTIFF (COG) functionality

    in the weather_tools package.

    We'll download rainfall data for a specific region in South Australia over

    the past 9 days using reduced resolution (overview level) and visualize

    the results.

    Key Features Demonstrated¶

    • Spatial subsetting: Download only data within a polygon boundary

    • Overview levels: Use pyramid overviews for reduced resolution (faster downloads)

    • Time series: Retrieve multiple days of data efficiently

    • HTTP range requests: COG files allow downloading only the pixels we need

    Setup and Imports¶

    In [ ]:
    Copied!
    from datetime import date, timedelta
    
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    from matplotlib import cm
    from shapely.geometry import Polygon
    
    from weather_tools.silo_geotiff import download_and_read_geotiffs
    
    from datetime import date, timedelta import matplotlib.pyplot as plt import numpy as np import pandas as pd from matplotlib import cm from shapely.geometry import Polygon from weather_tools.silo_geotiff import download_and_read_geotiffs

    Define the Study Area¶

    We'll use a polygon in South Australia as our area of interest.

    In [ ]:
    Copied!
    # Coordinates for the polygon (longitude, latitude)
    coords = [
        [136.0, -32.0],
        [136.0, -36.2],
        [140.0, -36.2],
        [140.0, -32.0],
        [136.0, -32.0],
    ]
    
    # Create a Shapely Polygon object
    study_area = Polygon(coords)
    
    print(f"Study area bounds: {study_area.bounds}")
    
    # Coordinates for the polygon (longitude, latitude) coords = [ [136.0, -32.0], [136.0, -36.2], [140.0, -36.2], [140.0, -32.0], [136.0, -32.0], ] # Create a Shapely Polygon object study_area = Polygon(coords) print(f"Study area bounds: {study_area.bounds}")
    Study area bounds: (136.0, -36.2, 140.0, -32.0)
    

    Define the Date Range¶

    We'll download the past 9 days of rainfall data ending yesterday

    (today's data may not yet be available).

    In [ ]:
    Copied!
    # Calculate date range (past 9 days)
    end_date = date.today() - timedelta(days=1)  # Yesterday
    start_date = end_date - timedelta(days=8)  # 9 days total
    
    print(f"Date range: {start_date} to {end_date}")
    print(f"Number of days: {(end_date - start_date).days + 1}")
    
    # Calculate date range (past 9 days) end_date = date.today() - timedelta(days=1) # Yesterday start_date = end_date - timedelta(days=8) # 9 days total print(f"Date range: {start_date} to {end_date}") print(f"Number of days: {(end_date - start_date).days + 1}")
    Date range: 2025-11-12 to 2025-11-20
    Number of days: 9
    

    Download Rainfall Data¶

    Using read_geotiff_timeseries(), we'll:

    • Query only the daily_rain variable

    • Clip to our polygon geometry

    • Use overview_level=2 for 4× reduced resolution (faster, smaller file sizes)

    • Stream directly from S3 without caching to disk (save_to_disk=False)

    Note: Overview levels work by powers of 2:

    • overview_level=0: 2× reduced resolution

    • overview_level=1: 4× reduced resolution

    • overview_level=2: 8× reduced resolution

    For this demo, we'll use level 1 (4× reduction) to balance speed and detail.

    In [11]:
    Copied!
    data["daily_rain"]
    
    data["daily_rain"]
    Out[11]:
    (masked_array(
       data=[[[ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              ...,
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00]],
     
             [[ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              ...,
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00]],
     
             [[ 2.0000e-01,  2.0000e-01,  2.0000e-01, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 2.0000e-01,  2.0000e-01,  2.0000e-01, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 3.0000e-01,  2.0000e-01,  2.0000e-01, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              ...,
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00]],
     
             ...,
     
             [[ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              ...,
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  3.0000e-01,
                3.0000e-01,  4.0000e-01],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  3.0000e-01,
                3.0000e-01,  3.0000e-01],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  3.0000e-01,
                3.0000e-01,  3.0000e-01]],
     
             [[ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              ...,
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  5.0000e-01,
                5.0000e-01,  5.0000e-01],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  5.0000e-01,
                5.0000e-01,  4.0000e-01],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  5.0000e-01,
                4.0000e-01,  4.0000e-01]],
     
             [[ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              [ 0.0000e+00,  0.0000e+00,  0.0000e+00, ...,  0.0000e+00,
                0.0000e+00,  0.0000e+00],
              ...,
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  1.0000e-01,
                1.0000e-01,  1.0000e-01],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  1.0000e-01,
                1.0000e-01,  1.0000e-01],
              [-3.2767e+04, -3.2767e+04, -3.2767e+04, ...,  2.0000e-01,
                1.0000e-01,  2.0000e-01]]],
       mask=False,
       fill_value=np.float64(1e+20),
       dtype=float32),
     {'driver': 'GTiff', 'dtype': 'float32', 'nodata': -32767.0, 'width': 40, 'height': 42, 'count': 9, 'crs': CRS.from_wkt('GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]'), 'transform': Affine(0.10125, 0.0, 135.975,
            0.0, -0.10119047619047619, -31.975), 'blockxsize': 40, 'blockysize': 2, 'tiled': False, 'compress': 'deflate', 'interleave': 'band'})
    In [ ]:
    Copied!
    print("Downloading rainfall data from SILO S3...")
    
    data = download_and_read_geotiffs(
        variables=["daily_rain"],
        start_date=start_date,
        end_date=end_date,
        geometry=study_area,
        save_to_disk=False,  # Stream from S3, don't cache
        overview_level=1,  # 4× reduced resolution
    )
    
    rainfall, profile = data["daily_rain"]
    print(f"\nDownloaded data shape: {rainfall.shape}")
    print(f"  - Time steps: {rainfall.shape[0]}")
    print(f"  - Height: {rainfall.shape[1]} pixels")
    print(f"  - Width: {rainfall.shape[2]} pixels")
    
    print("Downloading rainfall data from SILO S3...") data = download_and_read_geotiffs( variables=["daily_rain"], start_date=start_date, end_date=end_date, geometry=study_area, save_to_disk=False, # Stream from S3, don't cache overview_level=1, # 4× reduced resolution ) rainfall, profile = data["daily_rain"] print(f"\nDownloaded data shape: {rainfall.shape}") print(f" - Time steps: {rainfall.shape[0]}") print(f" - Height: {rainfall.shape[1]} pixels") print(f" - Width: {rainfall.shape[2]} pixels")
    Downloading rainfall data from SILO S3...
    
    Downloaded data shape: (9, 42, 40)
      - Time steps: 9
      - Height: 42 pixels
      - Width: 40 pixels
    

    Visualize the Results¶

    Plot each day's rainfall as a 3×3 grid with a shared color scale.

    • Grey background indicates no data or masked values (ocean areas)
    In [13]:
    Copied!
    # Create 3x3 subplot grid
    fig, axes = plt.subplots(3, 3, figsize=(15, 15))
    axes = axes.flatten()
    
    # Generate date labels
    date_list = [start_date + timedelta(days=i) for i in range(9)]
    
    # Calculate global min/max for shared colorbar scale
    # First, clean all the data and collect valid values
    all_valid_values = []
    cleaned_rainfall = []
    
    for day_data in rainfall:
        day_cleaned = day_data.copy()
        day_cleaned[day_cleaned < 0] = np.nan  # Set negative values to NaN
        cleaned_rainfall.append(day_cleaned)
    
        # Collect non-NaN values
        valid = day_cleaned[~np.isnan(day_cleaned)]
        if len(valid) > 0:
            all_valid_values.extend(valid.flatten())
    
    # Determine shared scale
    if all_valid_values:
        vmin = 0  # Rainfall minimum is always 0
        vmax = np.max(all_valid_values)
    else:
        vmin, vmax = 0, 1  # Fallback if no valid data
    
    print(f"Shared colorbar scale: 0 to {vmax:.2f} mm")
    
    # Create custom colormap with grey background for NaN values
    cmap = cm.get_cmap("viridis").copy()
    cmap.set_bad(color="#EAE9E9")  # Grey for missing/NaN data
    
    # Plot each day with shared scale
    for idx, (ax, day_data, day_date) in enumerate(zip(axes, cleaned_rainfall, date_list)):
        # Create the plot with shared vmin/vmax and custom colormap
        im = ax.imshow(day_data, cmap=cmap, interpolation="nearest", vmin=vmin, vmax=vmax)
        ax.set_title(f"{day_date.strftime('%Y-%m-%d')}", fontsize=12, fontweight="bold")
        ax.axis("off")
    
        # Add colorbar (all will show the same scale)
        cbar = plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
        cbar.set_label("Rainfall (mm)", fontsize=9)
    
    plt.suptitle(
        f"Daily Rainfall - South Australia\n{start_date} to {end_date}",
        fontsize=16,
        fontweight="bold",
        y=0.995,
    )
    plt.tight_layout()
    plt.show()
    
    # Create 3x3 subplot grid fig, axes = plt.subplots(3, 3, figsize=(15, 15)) axes = axes.flatten() # Generate date labels date_list = [start_date + timedelta(days=i) for i in range(9)] # Calculate global min/max for shared colorbar scale # First, clean all the data and collect valid values all_valid_values = [] cleaned_rainfall = [] for day_data in rainfall: day_cleaned = day_data.copy() day_cleaned[day_cleaned < 0] = np.nan # Set negative values to NaN cleaned_rainfall.append(day_cleaned) # Collect non-NaN values valid = day_cleaned[~np.isnan(day_cleaned)] if len(valid) > 0: all_valid_values.extend(valid.flatten()) # Determine shared scale if all_valid_values: vmin = 0 # Rainfall minimum is always 0 vmax = np.max(all_valid_values) else: vmin, vmax = 0, 1 # Fallback if no valid data print(f"Shared colorbar scale: 0 to {vmax:.2f} mm") # Create custom colormap with grey background for NaN values cmap = cm.get_cmap("viridis").copy() cmap.set_bad(color="#EAE9E9") # Grey for missing/NaN data # Plot each day with shared scale for idx, (ax, day_data, day_date) in enumerate(zip(axes, cleaned_rainfall, date_list)): # Create the plot with shared vmin/vmax and custom colormap im = ax.imshow(day_data, cmap=cmap, interpolation="nearest", vmin=vmin, vmax=vmax) ax.set_title(f"{day_date.strftime('%Y-%m-%d')}", fontsize=12, fontweight="bold") ax.axis("off") # Add colorbar (all will show the same scale) cbar = plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04) cbar.set_label("Rainfall (mm)", fontsize=9) plt.suptitle( f"Daily Rainfall - South Australia\n{start_date} to {end_date}", fontsize=16, fontweight="bold", y=0.995, ) plt.tight_layout() plt.show()
    Shared colorbar scale: 0 to 24.40 mm
    
    /var/folders/b0/wq9x9vj13dv_75w41x6dmcyh0000gn/T/ipykernel_40328/80531725.py:33: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.
      cmap = cm.get_cmap("viridis").copy()
    
    No description has been provided for this image

    Summary Statistics¶

    In [14]:
    Copied!
    # Calculate statistics across the entire period
    stats_data = []
    
    for idx, day_date in enumerate(date_list):
        day_data = rainfall[idx]
        day_data[day_data < 0] = np.nan  # Clean negative values
        valid_data = day_data[~np.isnan(day_data)]
    
        if len(valid_data) > 0:
            stats_data.append(
                {
                    "Date": day_date.strftime("%Y-%m-%d"),
                    "Mean (mm)": np.mean(valid_data),
                    "Median (mm)": np.median(valid_data),
                    "Max (mm)": np.max(valid_data),
                    "Min (mm)": np.min(valid_data),
                }
            )
        else:
            stats_data.append(
                {
                    "Date": day_date.strftime("%Y-%m-%d"),
                    "Mean (mm)": np.nan,
                    "Median (mm)": np.nan,
                    "Max (mm)": np.nan,
                    "Min (mm)": np.nan,
                    "Std Dev (mm)": np.nan,
                    "Total (mm)": np.nan,
                }
            )
    
    # Create DataFrame and display
    stats_df = pd.DataFrame(stats_data)
    print("\nRainfall Statistics Summary:")
    print(stats_df.to_string(index=False, float_format="%.2f"))
    
    # Calculate statistics across the entire period stats_data = [] for idx, day_date in enumerate(date_list): day_data = rainfall[idx] day_data[day_data < 0] = np.nan # Clean negative values valid_data = day_data[~np.isnan(day_data)] if len(valid_data) > 0: stats_data.append( { "Date": day_date.strftime("%Y-%m-%d"), "Mean (mm)": np.mean(valid_data), "Median (mm)": np.median(valid_data), "Max (mm)": np.max(valid_data), "Min (mm)": np.min(valid_data), } ) else: stats_data.append( { "Date": day_date.strftime("%Y-%m-%d"), "Mean (mm)": np.nan, "Median (mm)": np.nan, "Max (mm)": np.nan, "Min (mm)": np.nan, "Std Dev (mm)": np.nan, "Total (mm)": np.nan, } ) # Create DataFrame and display stats_df = pd.DataFrame(stats_data) print("\nRainfall Statistics Summary:") print(stats_df.to_string(index=False, float_format="%.2f"))
    Rainfall Statistics Summary:
          Date  Mean (mm)  Median (mm)  Max (mm)  Min (mm)
    2025-11-12       0.01         0.00      0.20      0.00
    2025-11-13       0.09         0.00      1.80      0.00
    2025-11-14       0.91         0.40      7.90      0.00
    2025-11-15       0.63         0.00     24.40      0.00
    2025-11-16       0.34         0.00      4.10      0.00
    2025-11-17       1.71         1.10     15.90      0.00
    2025-11-18       0.15         0.00      2.40      0.00
    2025-11-19       0.14         0.00      2.40      0.00
    2025-11-20       0.03         0.00      0.60      0.00
    
    /Users/hfsi/Developer/weather_tools/.venv/lib/python3.12/site-packages/numpy/_core/fromnumeric.py:867: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray.
      a.partition(kth, axis=axis, kind=kind, order=order)
    

    Key Takeaways¶

    1. Efficient Downloads: Using COG overview levels and spatial subsetting, we downloaded

      only the data we needed without retrieving full continental grids.

    2. HTTP Range Requests: The entire operation used HTTP range requests to fetch only

      relevant pixels from S3, avoiding unnecessary data transfer.

    3. No Local Storage: With save_to_disk=False, we streamed data directly into memory

      without creating local cache files.

    4. Flexible Geometries: The same approach works with any Shapely geometry (Point, Polygon,

      MultiPolygon, etc.).

    Next Steps¶

    • Use save_to_disk=True with a cache_dir to cache files for reuse

    • Download multiple variables simultaneously (e.g., ["daily_rain", "max_temp", "min_temp"])

    2026-03-13
    Made with Material for MkDocs