CHIRPS + IPUMSI¶

Linking IPUMSI geography data to CHIRPS precipitation data. A test case using Bangladesh.¶

In social research, it might sometimes be useful to connect IPUMS International geography data with climate data. Linking these data sources makes it possible to explore how social variables might change over time in relation to environmental variables. For example, in their 2020 paper in Global Environmental Change, Mueller et al. estimated the effects that climate-related variables had on migration in Botswana, Kenya, and Zambia between 1989 and 2011 https://doi.org/10.1016/j.gloenvcha.2020.102183. Often, however, even expert social scientists lack experience joining social data to climate data, which is usually large and packaged in unfamiliar ways.

In the following tutorial, we will show you one way to access a reputable source of precipitation data and then process these data to get estimates of average precipitation for an areal unit. For our example, we will be remotely downloading the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) https://www.chc.ucsb.edu/data/chirps, processing it locally, and then arriving at daily estimates of the average precipitation in Bangladesh's 1991-2010 harmonized level one and level two geography units. Harmonized geographical units are units that remain consistent over time, and they can be useful for exploring social change over multiple years. Level one and level two refer to the resolution of the spatial data, where level two data is considered more granular than level one data. For more information and to download harmonized shapefiles, see IPUMS International's Geography and GIS page https://international.ipums.org/international/gis.shtml.

In the following tutorial, a basic understanding of python programming is assumed, but this example should be largely self-contained. Where appropriate, some helpful links will be provided.

Set Up¶

Before we can obtain average precipitation estimates for the 1991-2010 harmonized level one and level two Bangladesh geography units, we need to import the requisite libraries and create a few useful functions.

Remember that in Python, in order to import libraries, you need to install the requisite packages. Instructions for installing python libraries prior to importing them can be found on the web. This is one place to start https://packaging.python.org/en/latest/tutorials/installing-packages/

Libraries¶

Once you've installed the necessary packages, we can begin importing them. Information on the packages used in this tutorial can be found by searching for the package on the PyPI repository website. https://pypi.org/project/requests/

In [1]:
#import the libraries necessary for running this notebook.  
import requests #package to make HTTP/1.1 requests
import xarray #package for working with multi-dimensional arrays (e.g. climate data over space and time)
import pandas #two-dimensional data processing package
import geopandas #package for importing and manipulating geographic data (shapefile polygon data)
import rioxarray #package for using xarray with spatial data
import dask #package for multiprocessing data with xarray
import matplotlib.pyplot as plot #package for visualizing data
import shapely #package for manipulating spatial geometry data
import bottleneck #package for performing mean calculations quickly

from dask.diagnostics import ProgressBar #this will allow us to see our progress when manipulating the large multi-dimensional array containing our precipitation data.
/Users/computer/miniconda3/envs/ipums_py/lib/python3.10/site-packages/geopandas/_compat.py:123: UserWarning: The Shapely GEOS version (3.11.1-CAPI-1.17.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.4-CAPI-1.16.2). Conversions between both will be slow.
  warnings.warn(
/var/folders/s2/sd6pfh1s3px7tndrnp9cdx940000gn/T/ipykernel_97444/1510659582.py:5: UserWarning: Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas will still use PyGEOS by default for now. To force to use and test Shapely 2.0, you have to set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas #package for importing and manipulating geographic data (shapefile polygon data)

Functions¶

Once the packages have been installed and imported, we will begin by writing a few bespoke functions that will make our code easier to follow. We won't use these functions right now, but they will be useful later. In general, it is good practice to write and use functions when possible. This will make your work more reproducible and less prone to error.

In [2]:
#create a function to download the CHIRPS data, this will make downloading multiple sample years easier later on
def get_chirps_data(year, #year of the data you want to download
                    resolution, #resolution of the data
                    download_directory = ""): #place to save the data locally (defaults to the home directory)
                        
    #create a filename for the specific data of interest                    
    download_filename = f"chirps-v2.0.{year}.days_{resolution}.nc"

    #create the path to the file of interest. For this tutorial, we are interested in daily data, and we want to download the data in netcdf format.  
    #Note that there are other paths to other data formats and resolutions available, and you can modify this section of the script accordingly:
    #Just go to the https://data.chc.ucsb.edu/ webpage, find the product and format you'd like to use, and copy that information here. 
    #Note, however, that depending on the path selected, you may need to modify other parts of this script.             
    chirps_url = f"https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/netcdf/{resolution}/{download_filename}"

    #request the data from UCSB
    request_object = requests.get(chirps_url,
                                  allow_redirects = True)

    #create a download path. this is where your data will be saved                    
    download_path = f"{download_directory}{download_filename}"
    
    #save the requested data to the download_path file
    with open(download_path, 'wb') as request_content:
        for chunk in request_object.iter_content(chunk_size = 128):
            request_content.write(chunk)
In [3]:
#create a function that calculates the average CHIRPS variable estimate within a defined geography.
#this is the function that will actually open the CHIRPS data, reduce it to the area and time of interest, and then calculate its average value within that area.
#remember, the goal is to calculate an average precipitation value for each unit in Bangladesh's harmonized geography across all time points relevant to that geography.
def get_mean_chirps_variable_by_geography(shp_path, #location of the shapefile whose geographies you want CHIRPS estimates for
                                          nc_path, #location of the netcdf file containing the spatio-temporal CHIRPS data
                                          variable): #name of of the variable of interest

    #read in the shapefile containing the geography information (in this case, the IPUMSI harmonized geography file)
    ipumsi_shp = geopandas.read_file(shp_path)
    
    #open the chirps netcdf files and set the coordinate reference system
    chirps = xarray.open_dataset(nc_path)
    chirps_epsg = chirps.rio.write_crs("epsg:4326")
    
    #create an empty list to store the unit-level geography data
    dataframes = []
    #for each region in the IPUMS Internation geography shapefile
    for region in range(0, len(ipumsi_shp)):
        
        #get the name of the administrative unit of interest
        admin = ipumsi_shp["ADMIN_NAME"][region]
        #get the geometry of the administrative unit of interest
        geometry = ipumsi_shp["geometry"][region]
        #reformat the geography so it plays nicely with rioxarray
        geometry_formatted = [shapely.geometry.mapping(geometry)]
    
        #clip the xarray to the regional geometry
        clipped = chirps_epsg.rio.clip(geometry_formatted,
                                       ipumsi_shp.crs,
                                       all_touched = False, #note here we use all_touched = False, this means that only pixels (CHIRPS data) whose center is within the area of interest are included in the clip.  If all_touched equals true, then all pixels touching the area of interest are included in the clip.
                                       from_disk = True,
                                       invert = False)
        
        #get a mean of the CHIRPS variable pixel values accross the spatial dimension of the geography (this means you'll have an average estimate for the region in unit of your time dimension)
        chirps_variable = clipped.mean(dim = ["latitude", "longitude"])
        #convert this multi-dimensional array to a two-dimensional panda array and orgnaize it
        chirps_variable_df = chirps_variable.to_pandas()[[variable]].rename(columns = {variable: admin})
        #add this region's temporal data to the list of other regions temporal data
        dataframes.append(chirps_variable_df)
    #concatenate all of these individual region dataframes into a single dataframe
    dataframe = pandas.concat(dataframes, axis = 1)

    #return the dataframe
    return(dataframe)

Processing¶

Now that we've imported the important packages and created the functions we'll need, we can begin processing our data.

Download the CHIRPS-2.0 daily data¶

The first step in our data pipeline will be to acquire the CHIRPS data. We are going to download daily CHIRPS data in netcdf format. Though not always familiar to social scientists, the netcdf data format is commonly used to store climate information because it can efficiently handle large amounts of data. Luckily, there are python tools designed to handle it. To download our data from CHIRPS, we will use the bespoke function we created above. Remember, this is just one example of acquiring data. This function could be modified to make other http requests for Climate Hazard Group Data (e.g. temperature) by changing the function's "chirps_url" object to a different address.

In [3]:
#Create a list of sample years to download (these are the years for which Bangladesh data is available on IPUMSI)
sample_year_list = [1991, 2001, 2011]
In [ ]:
#a simple loop to download the daily chirps-2.0 data for each sample year
#Note: This could take a long time depending on your system and internet connection.  Each sample year will have a relatively large file associated with it
for sample_year in sample_year_list:
    get_chirps_data(year = sample_year,
                    resolution = "p05", #we could also use p25
                    download_directory = "chirps_raw/") #here I am saving these data to an empty directory I created called chirps_raw.  To follow along with this tutorial, it is important that the downloaded data be saved in a separate empty directory.
    

Extract the CHIRPS data covering Bangladesh¶

Once our requested data has downloaded to our "chirps_raw" directory (or whatever you chose to call it), we can begin processing it. As mentioned above, the requested dataset is really large (it's a global dataset with more than 1000 estimates of precipitation at each grid location), so our next step will be to reduce the dataset down to only include our values relevant to our country of interest. This will significantly speed up our processing later.

For this tutorial, we are going to reduce the dataset down to the extent of Bangladesh. Remember we are just using Bangladesh as a test case. It is a good candidate for this tutorial because the harmonized level one and level two geography files available through IPUMS International will allow us to explore average precipitation in Bangladesh's harmonized areal units through time, but you could use any country shapefile.

To begin, we need to import our harmonized Bangladesh shapefile. We can obtain this file from the IPUMS International website https://international.ipums.org/international/gis.shtml. Here we'll be using the "spatially-harmonized first-level geography" file for Bangladesh. Simply download this zipped file from IPUMS International, unzip the file, and we'll be ready to work with the harmonized data.

First, let's visualize our shapefile.

In [4]:
#read in the ipumsi harmonized shapefile. My file is located within the unzipped directory I downloaded from the IPUMSI website.
geo1_bd = geopandas.read_file("geo1_bd1991_2011/geo1_bd1991_2011.shp") #this is just the path to your file.
In [5]:
#visualize the harmonized first-level bangladesh geography data
geo1_bd.plot()
Out[5]:
<Axes: >

Now that we have a sense of what the Bangladesh harmonized level-one geography looks like, we want to reduce our CHIRPS data down to just it's extent to improve our processing speed. First, we'll open up the global CHIRPS precipitation data we downloaded and assign the appropriate coordinate reference system to the object. Next, we'll make sure the Bangladesh shapefile has the same coordinate reference system (they need to match) and create a "bounding box" for Bangladesh, which is just a set of coordinates whose internal area covers all points within our shapefile, that we can use to reduce the CHIRPS data down to only data relevant to Bangladesh. Finally, we'll send a set of instructions through the "xarray" package to make the data reduction. That is, to "clip" the CHIRPS data to only include the values that were relevant to Bangladesh.

In [13]:
#open the chirps netcdf files. Here we are openning all of the sample years of the raw data and reading it into one large multidimensional array (time and space)
precipitation_xarray = xarray.open_mfdataset("chirps_raw/*.nc", parallel = False)

#In order to manipulated the spatial data, we need to assign a coordinate reference system to the data.  The CHIRPS data can be assigned an epsg code of 4326
precipitation_xarray_epsg = precipitation_xarray.rio.write_crs("epsg:4326")
In [14]:
#check to make sure the shapefile has the same coordinate reference system as the CHIRPS data
#here they do, but if they didn't you could use the .to_crs() method https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_crs.html
geo1_bd.crs
Out[14]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [15]:
#get a bounding box of geo1_bd geometry and format it to work with rioxarray.  This will allow us to clip our large global CHIRPS dataset to only the extent of bangladesh and reduce our processing time.
bd_bbox = shapely.geometry.box(*geo1_bd.total_bounds)
bd_bbox_formatted = [shapely.geometry.mapping(bd_bbox)]
In [16]:
#clip the precipitation_xarray to barisal's geometry, including all pixels that touch Bangladesh.
geo1_bd_chirps_xarry = precipitation_xarray_epsg.rio.clip(bd_bbox_formatted,
                                                          geo1_bd.crs,
                                                          all_touched = True,
                                                          from_disk = True,
                                                          invert = False)

Note, however, that at this point xarray hasn't actually clipped our global dataset. Instead, we've created a set of instructions that we want xarray to carry out on our command.

The next step will be to ask xarray to perform these instructions and save our new clipped CHIRPS dataset (the much smaller dataset that only includes data relevant to bangladesh). We can save these data anywhere we'd like, but for the purposes of this tutorial, don't save these data to the same location where we saved the raw chirps global dataset we downloaded earlier. It might also be useful to keep that data separate in case you want to clip a different country from that same global data.

In [15]:
#create an operation to save the bangladesh chirps data. Here I am simply saving a new netcdf file "bd_chirps.nc" that will contain the daily CHIRPS data for only the Bangladesh area for the 1991, 2001, and 2010 years
delayed_obj = geo1_bd_chirps_xarry.to_netcdf("bd_chirps.nc",
                                             compute = False)
In [16]:
#save the bangladesh chirps data and show a progress bar for the clipping and saving process
#note that this process can take a long time
with ProgressBar():
    results = delayed_obj.compute()
[########################################] | 100% Completed | 527.57 s

We should now have a new netcdf file (in this case named "bd_chirps.nc") that we can use for calculating the daily average precipitation estimates.

Average Daily Precipitation One Region Example¶

For calculating our daily average precipitation estimates for each 1991-2011 harmonized region in Bangladesh, we can just use one of our bespoke functions: get_mean_chirps_variable_by_geography()

But, it is useful to understand how this process works, so in the following cells, we'll demonstrate the process of calculating these values for one of the 1991-2011 Bangladesh first-level harmonized geography regions.

First, let's open our new Bangladesh-specific CHIRPS data, and visualize one day of precipitation across our harmonized Bangladesh level one geography.

In [18]:
#open the chirps netcdf files
bd_chirps = xarray.open_dataset("bd_chirps.nc")
In [19]:
#again we need to add a coordinate reference system to our CHIRPS data in order to work with it spatially
bd_chirps_epsg = bd_chirps.rio.write_crs("epsg:4326")
In [20]:
#visualize the CHIRPS data on January 1st 1991 across all of Bangladesh.
ax = geo1_bd.boundary.plot(color = "red")
bd_chirps_epsg.sel(time = "1991-01-01")["precip"].plot(ax = ax)
Out[20]:
<matplotlib.collections.QuadMesh at 0x12efb7850>

Now, that we've got an idea of what the data we're working with looks like. Let's visualize the process of calculating daily average precipitation for a single harmonized Bangladesh level one unit. For instruction purposes, we'll use the areal unit "Barisal".

First, we'll extract Barisal's geometry and then clip our Bangladesh-specific CHIRPS data to the Barisal extent using two-different clipping approaches.

In [21]:
#separate out Barisal (a region in the 1991-2010 first-level harmonized Bangladesh geometry) to visualize the process
barisal = geo1_bd[geo1_bd["ADMIN_NAME"] == "Barisal"]
barisal.plot()
Out[21]:
<Axes: >
In [22]:
#get a bounding box of barisal's geometry
barisal_geometry = barisal["geometry"][0]
barisal_geometry_formatted = [shapely.geometry.mapping(barisal_geometry)]

Next, let's visualize the two different options we have for clipping our CHIRPS data to Barisal's geometry. First we'll use the "all_touched = True" option in the rio.clip method. This will clip our CHIRPS data to include all data for which the CHIRPS data grid cell touches Barisal.

In [23]:
#clip the precipitation_xarray to barisal's geometry, including all pixels that touch barisal
barisal_clipped_all = bd_chirps_epsg.rio.clip(barisal_geometry_formatted,
                                              barisal.crs,
                                              all_touched = True,
                                              from_disk = True,
                                              invert = False)
In [24]:
ax = barisal.boundary.plot(color = "red")
barisal_clipped_all.sel(time = "1991-01-01")["precip"].plot(ax = ax)
Out[24]:
<matplotlib.collections.QuadMesh at 0x131a04a90>

Next, we'll use the "all_touched = False" option. This will clip our CHIRPS data to include only those data for which the centroid of the CHIRPS data grid cell touches Barisal.

In [25]:
#clip the precipitation_xarray to barisal's geometry, including all pixels that touch barisal
barisal_clipped = bd_chirps_epsg.rio.clip(barisal_geometry_formatted,
                                          barisal.crs,
                                          all_touched = False,
                                          from_disk = True,
                                          invert = False)
In [26]:
ax = barisal.boundary.plot(color = "red")
barisal_clipped.sel(time = "1991-01-01")["precip"].plot(ax = ax)
Out[26]:
<matplotlib.collections.QuadMesh at 0x131afc580>

From the above map overlays, we can see that the data included in these two methods differs. This means that the mean precipitation values we estimate for a region on a given day will differ slightly depending on which method we select. This difference may or may not be meaningful to you depending on your research. In the following cells, and in the function we created above, we used the "all_touched = False" option, meaning we only use CHIRPS data whose grid cell centroid is located within a region. This, though, can be modified depending on your needs by simply modifying above function's "all_touched" option value.

In the above visualizations, we singled out the precipitation on a single day in order to explore our dataset, but we want to get average precipitation values for Barisal each date in our CHIRPS data. We can do this using the mean method on our "xarray" object. We will take the mean of our precipitation values across the spatial dimensions.

In [28]:
#understand xarray's architecture. Note that there are three dimensions (latitude, longitude, and time) and one data variable.
#for more on xarray and how it can be used with netcdf files check out its documentation https://docs.xarray.dev/en/stable/
barisal_clipped
Out[28]:
<xarray.Dataset>
Dimensions:      (latitude: 25, longitude: 23, time: 1095)
Coordinates:
  * latitude     (latitude) float32 21.82 21.88 21.92 ... 22.92 22.97 23.02
  * longitude    (longitude) float32 89.88 89.92 89.98 ... 90.88 90.92 90.98
  * time         (time) datetime64[ns] 1991-01-01 1991-01-02 ... 2011-12-31
    spatial_ref  int64 0
Data variables:
    precip       (time, latitude, longitude) float32 nan nan nan ... nan nan nan
Attributes: (12/15)
    Conventions:       CF-1.6
    title:             CHIRPS Version 2.0
    history:           created by Climate Hazards Group
    version:           Version 2.0
    date_created:      2015-11-20
    creator_name:      Pete Peterson
    ...                ...
    reference:         Funk, C.C., Peterson, P.J., Landsfeld, M.F., Pedreros,...
    comments:           time variable denotes the first day of the given day.
    acknowledgements:  The Climate Hazards Group InfraRed Precipitation with ...
    ftp_url:           ftp://chg-ftpout.geog.ucsb.edu/pub/org/chg/products/CH...
    website:           http://chg.geog.ucsb.edu/data/chirps/index.html
    faq:               http://chg-wiki.geog.ucsb.edu/wiki/CHIRPS_FAQ
xarray.Dataset
    • latitude: 25
    • longitude: 23
    • time: 1095
    • latitude
      (latitude)
      float32
      21.82 21.88 21.92 ... 22.97 23.02
      units :
      degrees_north
      standard_name :
      latitude
      long_name :
      latitude
      axis :
      Y
      array([21.824997, 21.875   , 21.924995, 21.974998, 22.024994, 22.074997,
             22.125   , 22.174995, 22.224998, 22.274994, 22.324997, 22.375   ,
             22.424995, 22.474998, 22.524994, 22.574997, 22.625   , 22.674995,
             22.724998, 22.774994, 22.824997, 22.875   , 22.924995, 22.974998,
             23.024994], dtype=float32)
    • longitude
      (longitude)
      float32
      89.88 89.92 89.98 ... 90.92 90.98
      units :
      degrees_east
      standard_name :
      longitude
      long_name :
      longitude
      axis :
      X
      array([89.875   , 89.92499 , 89.975006, 90.024994, 90.07501 , 90.125   ,
             90.17499 , 90.225006, 90.274994, 90.32501 , 90.375   , 90.42499 ,
             90.475006, 90.524994, 90.57501 , 90.625   , 90.67499 , 90.725006,
             90.774994, 90.82501 , 90.875   , 90.92499 , 90.975006], dtype=float32)
    • time
      (time)
      datetime64[ns]
      1991-01-01 ... 2011-12-31
      standard_name :
      time
      axis :
      T
      array(['1991-01-01T00:00:00.000000000', '1991-01-02T00:00:00.000000000',
             '1991-01-03T00:00:00.000000000', ..., '2011-12-29T00:00:00.000000000',
             '2011-12-30T00:00:00.000000000', '2011-12-31T00:00:00.000000000'],
            dtype='datetime64[ns]')
    • spatial_ref
      ()
      int64
      0
      crs_wkt :
      GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      WGS 84
      grid_mapping_name :
      latitude_longitude
      spatial_ref :
      GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
      GeoTransform :
      89.84999986128373 0.05000027743252841 0.0 21.799997011820476 0.0 0.04999987284342448
      array(0)
    • precip
      (time, latitude, longitude)
      float32
      nan nan nan nan ... nan nan nan nan
      units :
      mm/day
      standard_name :
      convective precipitation rate
      long_name :
      Climate Hazards group InfraRed Precipitation with Stations
      time_step :
      day
      geostatial_lat_min :
      -50.0
      geostatial_lat_max :
      50.0
      geostatial_lon_min :
      -180.0
      geostatial_lon_max :
      180.0
      grid_mapping :
      spatial_ref
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
      ...
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
    • latitude
      PandasIndex
      PandasIndex(Float64Index([21.824996948242188,             21.875,  21.92499542236328,
                    21.974998474121094, 22.024993896484375, 22.074996948242188,
                                22.125,  22.17499542236328, 22.224998474121094,
                    22.274993896484375, 22.324996948242188,             22.375,
                     22.42499542236328, 22.474998474121094, 22.524993896484375,
                    22.574996948242188,             22.625,  22.67499542236328,
                    22.724998474121094, 22.774993896484375, 22.824996948242188,
                                22.875,  22.92499542236328, 22.974998474121094,
                    23.024993896484375],
                   dtype='float64', name='latitude'))
    • longitude
      PandasIndex
      PandasIndex(Float64Index([           89.875, 89.92498779296875, 89.97500610351562,
                    90.02499389648438, 90.07501220703125,            90.125,
                    90.17498779296875, 90.22500610351562, 90.27499389648438,
                    90.32501220703125,            90.375, 90.42498779296875,
                    90.47500610351562, 90.52499389648438, 90.57501220703125,
                               90.625, 90.67498779296875, 90.72500610351562,
                    90.77499389648438, 90.82501220703125,            90.875,
                    90.92498779296875, 90.97500610351562],
                   dtype='float64', name='longitude'))
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['1991-01-01', '1991-01-02', '1991-01-03', '1991-01-04',
                     '1991-01-05', '1991-01-06', '1991-01-07', '1991-01-08',
                     '1991-01-09', '1991-01-10',
                     ...
                     '2011-12-22', '2011-12-23', '2011-12-24', '2011-12-25',
                     '2011-12-26', '2011-12-27', '2011-12-28', '2011-12-29',
                     '2011-12-30', '2011-12-31'],
                    dtype='datetime64[ns]', name='time', length=1095, freq=None))
  • Conventions :
    CF-1.6
    title :
    CHIRPS Version 2.0
    history :
    created by Climate Hazards Group
    version :
    Version 2.0
    date_created :
    2015-11-20
    creator_name :
    Pete Peterson
    creator_email :
    pete@geog.ucsb.edu
    institution :
    Climate Hazards Group. University of California at Santa Barbara
    documentation :
    http://pubs.usgs.gov/ds/832/
    reference :
    Funk, C.C., Peterson, P.J., Landsfeld, M.F., Pedreros, D.H., Verdin, J.P., Rowland, J.D., Romero, B.E., Husak, G.J., Michaelsen, J.C., and Verdin, A.P., 2014, A quasi-global precipitation time series for drought monitoring: U.S. Geological Survey Data Series 832, 4 p., http://dx.doi.org/110.3133/ds832.
    comments :
    time variable denotes the first day of the given day.
    acknowledgements :
    The Climate Hazards Group InfraRed Precipitation with Stations development process was carried out through U.S. Geological Survey (USGS) cooperative agreement #G09AC000001 "Monitoring and Forecasting Climate, Water and Land Use for Food Production in the Developing World" with funding from: U.S. Agency for International Development Office of Food for Peace, award #AID-FFP-P-10-00002 for "Famine Early Warning Systems Network Support," the National Aeronautics and Space Administration Applied Sciences Program, Decisions award #NN10AN26I for "A Land Data Assimilation System for Famine Early Warning," SERVIR award #NNH12AU22I for "A Long Time-Series Indicator of Agricultural Drought for the Greater Horn of Africa," The National Oceanic and Atmospheric Administration award NA11OAR4310151 for "A Global Standardized Precipitation Index supporting the US Drought Portal and the Famine Early Warning System Network," and the USGS Land Change Science Program.
    ftp_url :
    ftp://chg-ftpout.geog.ucsb.edu/pub/org/chg/products/CHIRPS-latest/
    website :
    http://chg.geog.ucsb.edu/data/chirps/index.html
    faq :
    http://chg-wiki.geog.ucsb.edu/wiki/CHIRPS_FAQ
In [29]:
#calculate the mean of barisal across the spatial dimensions (latitude, longitude).  This will yield an average value across space for each time point (each day in the 1991, 2001, and 2010 years) for Barisal
barisal_precip = barisal_clipped.mean(dim = ["latitude", "longitude"])

#restructure the data to make it into a long-format two-dimensional array
barisal_precip_df = barisal_precip.to_pandas()[["precip"]].rename(columns = {"precip": "barisal"})
In [30]:
barisal_precip_df
Out[30]:
barisal
time
1991-01-01 1.733025
1991-01-02 5.439487
1991-01-03 0.683614
1991-01-04 0.000000
1991-01-05 0.000000
... ...
2011-12-27 0.057392
2011-12-28 0.685558
2011-12-29 0.000000
2011-12-30 0.000000
2011-12-31 0.000000

1095 rows × 1 columns

Use Our Function¶

The above cells are useful for visualizing the process of calculating daily average precipitation for a harmonized first-level region, but we want to calculate these values for each of the regions in our harmonized shapefile, and rather than going through this whole process for each region, we can use the function we created at the beginning of this notebook. This function performs the operations we described in the cells above, but loops through each region in the harmonized shapefile, creating a single dataframe containing a daily estimated average precipitation value for each harmonized region.

In [31]:
#run our function on the 1991-2011 harmonized first-level geography data
precip_estimates_level_one = get_mean_chirps_variable_by_geography(shp_path = "geo1_bd1991_2011/geo1_bd1991_2011.shp", #path to our shapefile
                                                                   nc_path = "bd_chirps.nc", #path to the CHIRPS data we clipped to the Bangladesh extent
                                                                   variable = "precip") #name of the variable in the CHIRPS data
In [32]:
#We can easily run it on the 1991-2011 harmonized second-level geography data, which can also be obtained from IPUMS International
precip_estimates_level_two = get_mean_chirps_variable_by_geography(shp_path = "geo2_bd1991_2011/geo2_bd1991_2011.shp",
                                                                   nc_path = "bd_chirps.nc",
                                                                   variable = "precip")
In [30]:
#Finally, we can save the data out to our local computer
precip_estimates_level_one.to_csv("precip_estimates_level_one.csv")
precip_estimates_level_two.to_csv("precip_estimates_level_two.csv")

We can now integrate these precipitation data with other IPUMS International variables to enhance our social research.

Note that the above is just one example of how to integrate climate data into social research using some of the geography data made available by IPUMS International. This script can be modified to handle other data sources, but often different data sources will require different methods of integration. Other sources of contextual information that might be useful to researchers using IPUMS International data can be found at https://international.ipums.org/international/geog_contextual.shtml