NOAA Operational Model Archive and Distribution System

GrADS Data Server User's Guide

1. Accessing data from a web browser

  • Browsing server contents

    To browse a directory of the datasets being served on a GDS, point your web browser to the base URL of the GDS. This will usually be a URL of the form
    http://machine.domain:9090/dods/.

    This directory listing will provide links to "info", "dds" and "das" for each dataset. The first link provides a Web page with a brief summary, followed by a complete metadata listing, for the dataset. The other two provide links to the DODS Data Descriptor Structure, which specifies the logical structure of the dataset, and Data Attribute Structure, which provides descriptive information about the dataset.

    You can also retrieve a complete dataset listing for a GDS by adding /xml to its base URL.

    If you are given a GDS dataset URL, you can enter that URL in your web browser, and get the "info" listing. This listing will contain links back to the dataset directory for the GDS.

    Note: Many DODS data objects with distinct URLs will often be considered a single "dataset" from a scientific point of view. However, the word "dataset" is used here in a technical sense, to mean a single DODS data object.

  • Retrieving data subsets as ASCII text

    The GDS can provide subsets of any dataset it is serving, in ASCII comma-delimited format. To retrieve a subset, enter a URL of the form
    http://gds-base-url/dataset.ascii?constraint.

    The constraint portion of the URL should be a DODS constraint expression. Some basic constraints:

    • A constraint of the form var will request the complete contents of the variable.
    • A constraint of the form var[a:b] will return the subset of the variable defined by a and b.
    • A constraint of the form var[a:n:b] will return every nth element of the subset defined by a and b.

    For subsets of variables with multiple dimensions, each dimension must have a constraint. So a constraint for a subset of a three-dimensional variable would appear as var[a1:b1][a2:b2][a3:b3], or var[a1:n1:b1][a2:n2:b2][a3:n3:b3]


2. Accessing data from a DODS-enabled analysis tool

  • Opening a dataset and retrieving data subsets

    You can retrieve data from a GrADS server using any DODS-enabled desktop analysis tool (aka "client") such as:

    To do this, provide a URL instead of a path name to your client's open command. You can then use the data as if it were a local data set; the client will automatically retrieve data as needed.

  • Performing remote analysis

    Remote analysis is very useful for doing calculations on remote data that use a large quantity of input data but generate a small output, such as averaging and correlation functions. This type of calculation will run much faster on the server, and you will only need to download the small result instead of the entire set of inputs.

    In order to do data analysis on the server, you construct a URL containing a GrADS expression, and then open that URL with your client. The server will perform the analysis task, and return the results to the client as a DODS dataset with a single variable called result, containing the results. This result dataset can then be used exactly as it if were an original data set (see above). It can even be used as input to further analysis expressions on the server, allowing calculations that use multiple stages of intermediate results to be performed remotely.

    The URL for an analysis operation is created by appending _expr_ to the server's base URL, as follows:
    http://machine_name:9090/dods/_expr_; followed immediately by three sets of curly braces containing arguments, as follows:

    {dataset1,dataset2,...}{expression}{x1:x2,y1:y2,z1:z2,t1:t2}

    The first set of curly braces contains a list of all the datasets on the server that are used in the GrADS expression. If the datasets are in a subdirectory, the name of the subdirectory should be included in the dataset name.

    Source datasets can include the results of previous analysis expressions, allowing you to perform multi-stage calculations. To use a previous analysis result as a source, put its shorthand name in the list of datasets. The shorthand name for a result dataset is contained in the dataset's title attribute (in GrADS you can view this by typing q file), and has the form _expr_nnnn where nnnn will be some number.

    The second set of curly braces contains the GrADS expression to be evaluated. This describes the actual calculation to be performed, using GrADS syntax .

    The third set of curly braces contains the boundaries for the expression evaluation in world coordinates (latitude, longitude, elevation, time). These boundaries may not vary in more than two of the four dimensions. The first three coordinate pairs should be given as real numbers. The last pair are time coordinates, and should be in the format recognized by the set time command in GrADS: [hh[mm]z][dd][mon][yyyy]. For example, 0z1jan2000.

    Specifically, the analysis is performed as follows:

    1. GrADS is invoked.
    2. The source datasets are opened in the order they are listed in the first set of curly braces.
    3. The dimension environment is set according to the parameters in the third set of curly braces.
    4. The expression in the second set of curly braces is evaluated and saved as a new dataset.

    Thus, a variable in the nth listed dataset should be referred to as var_name.n in the analysis expression. For instance, if dataset2 contains a variable called foo, this variable should be referred to in the expression as foo.2. The expression will be evaluated against the grid of the last dataset opened.

    Following are some examples of remote analysis, using GrADS as a client. The GrADS Data Servers are no longer running, but the examples may be useful as a learning tool.

    1. Global Averaging:
      The following expression will return a timeseries of globally-averaged monthly mean surface air temperatures based on NCEP reanalysis data being served on the GDS at the Climate Diagnostic Center:

      ga-> sdfopen http://web2.cdc.noaa.gov:9090/dods/_expr_{sfc_air_mon_mean}
      {tloop(aave(air,global))}{0:0,0:0,1:1,jan1948:jan2001}

    2. Variable Comparison:
      A GDS running at NCAR is distributing a set of ensemble members from the "Climate of the 20th Century" runs of the COLA atmospheric general circulation model. We will compare the relative humidity "rh" from the first two datasets, namely "C20C_A" and "C20C_B". Suppose we want to find a global time-average of their difference at the 1000 mb level in 1960. Using GrADS as our client, we would open the following URL:

      ga-> sdfopen http://motherlode.ucar.edu:9090/dods/_expr_{C20C_A,C20C_B}
      {ave((rh.1-rh.2),time=1jan1960,time=1dec1960)}
      {0:360,-90:90,1000:1000,1nov1976:1nov1976}
      ga-> display result

      The analysis results are returned in the variable "result" in the opened dataset. Note that the world coordinate boundaries specified in the third set of curly braces fix the time to 1nov1976 -- this can be set to any arbitrary time because the time dimension specification is overridden by the GrADS expression which tells the server to average over the period from January 1960 to December 1960.

    3. A More Complex Analysis Operation:
      Suppose you wanted to calculate the mean 500mb height anomaly associated with warm tropical SST anomalies. Use the Reynolds SST Analyses to create a time series of the area-averaged SST anomaly between 180 and 90W and 10N and 10S. An "ENSO" mask is then defined for SST anomalies greater than 1 degree. Using this mask, calculate a mean 500mb height from the the NCEP/NCAR Reanalysis Data associated with the warm SST anomalies. All these operations are packaged into a single URL:

      ga-> sdfopen http://cola8.iges.org:9090/dods/_expr_{ssta,z5a}
      {tmave(const(maskout(aave(ssta.1,lon=-180,lon=-90,lat=-10,lat=10),
      aave(ssta.1,lon=-180,lon=-90,lat=-10,lat=10)-1.0),1),
      z5a.2(lev=500),t=1, t=600)}
      {0:360,0:90,500:500,jan1950:jan1950}

  • Uploading data

    The GDS also allows the client to upload data that can then be used as a source in analysis expressions. This capability is still experimental, and will be fully documented and made officially available with the release of GrADS 1.9.


3. Using remote data in scripts

You do not have to do anything special to adapt a script to work with remote data. All you need to do is replace local filenames with URLs. This is because from your client program's point of view, a remote dataset behaves exactly like a local dataset except that access is slower.

However, because remote data retrieval is not instantaneous, existing scripts that do not take this into account may run very slowly. Thus it is often desirable to modify the script to improve its efficiency.

The key to writing efficient scripts is fine-tuning your use of I/O requests. DODS-enabled clients such as GrADS only provides the illusion of a continuous connection with a remote dataset. In fact, a new connection is made to the server every time you request data from the I/O layer (for instance by using the "display" command). The speed of these connections is dependent on network latency and server response time, but is generally much slower than an equivalent request from a local disk. Thus, reducing the number of network connections, and the quantity of data sent over the network, will often significantly speed up your script.

Following are some guidelines for writing efficient scripts. The examples given use the GrADS scripting language, but the principles apply to most DODS-enabled clients:

  • Avoid multiple opens. Opening a GDS data file generates as many as eight separate network requests, so try to avoid opening the same file more than once.

  • Store remote data locally if you plan to reuse it. DODS has a limited ability to cache remote data locally, the way a web browser does with web pages. However, this only works when you request the exact same subset. Thus, if you use different parts of the same remote data subset in multiple places in your script, you are actually requesting it multiple times over the network.

    To avoid this, request data once from the server, and then store it in local memory or on disk. In GrADS, you can do this using 'define' or 'set gxout fwrite'. For example:

      'sdfopen http://cola8.iges.org:9090/dods/mrf0930'
      'set lat 22 52'
      'set lon 233 295'
      'set t 1 15'
      'define slp = slp/100'
      'd slp'

    Now you can use the variable 'slp' as many times as you wish in your script without any additional network requests.

    Note to GrADS users: The 'define' command automatically loops through each time step in the dimension environment, so using 'define' may not always improve your performance if you are accessing time series at a single point. For example:

       'set lon -90'
      'set lat 40'
      'set lev 500'
      'set t 1 15'
      'define ztser = z'

    The above example will result in 15 separate requests for data from the server, one for each time. Each request will only obtain a single data value! If time is the only varying dimension, it is far better to display the data using the 'display' command (which doesn't automatically loop through time) or, if you're going to display the data more than once, use 'set gxout fwrite' to preserve a local copy. We will be fixing this behavior in version 1.9 of GrADS.

  • Evaluate expressions on the server side when appropriate. It may save time and server resources to package your request into an analysis expression. A good rule of thumb is to use analysis expressions when the size of the result data set is smaller than the total size of the input data, e.g. when doing spatial or time averaging.

    For example, the following script example opens two separate MRF forecast data files and then uses the 'const' function to merge one variable from each of them to form one continuous time series:

       'sdfopen http://cola8.iges.org:9090/dods/mrf010800'
      'sdfopen http://cola8.iges.org:9090/dods/mrf010800b'
      'set lat 0'
      'set lon 0'
      'set t 1 31'
      'define tt = const(t.1,0,-u) + const(t.2,0,-u)'
      'd tt'

    This second version of the script example creates the same continuous time series using an analysis expression. This script runs three times faster than the first version, and hits the server half as many times.

      baseurl    = 'http://cola8.iges.org:9090/dods/_expr_'
      datasets   = '{mrf010800,mrf010800b}'
      expression = '{const(t.1,0,-u)+const(t.2,0,-u)}'
      dimensions = '{0:0,0:0,1000:1000,00Z08JAN2002:00Z23JAN2002}'
      'sdfopen '%baseurl%datasets%expression%dimensions
      'set t 1 31'
      'define tt = result.1'
      'd tt'

    Note however, that there is an overhead on the server associated with each analysis expression. Thus, if the size of the expression output is the same as, or larger than, its inputs, it will be more efficient to retrieve the inputs first, and do the analysis locally.

  • Try to move data requests outside loops. If you are looping over a grid of data points, when possible you should retrieve the whole area you intend to use with a single request, and store it locally for use in the loop. Otherwise you will be making a new network request for each data point inside the loop, which can cause extremely slow performance.

NCEP NOMADS Version 2.2.7, Dec. 2023