metScanR is an R package that enables users to quickly locate and work with freely available meteorological (MET) data across multiple networks. This package can currently find data across 107,000 stations among 18 different networks across the globe. The wide range of networks and their associated, but varying documentation, meta-data, data formats, and even station identifiers can pose a major roadblock to finding, wrangling, and synthesizing MET data.
metScanR currently allows for a user to ‘bypass’ many steps involved in finding MET data. A user can:
metScanR will return an R list object containing all weather stations that meet the criteria. metScanR also empowers users to explore data by providing an interactive map of all returned MET stations (powered by Leaflet).
This brief tutorial is intended for users that are both familiar and unfamiliar with R. The R code higlighted below can be copy-pasted and executed inside an R script.
A general workflow for locating meteorological data is outlined below.
There are two primary functions that a user can interact with:
siteFinder
: searches for MET stations, returns a nested listmapSiteFinder
: takes a nested list returned by siteFinder, returns an interactive Leaflet map of all stations in the listIn this example we’ll do the following:
length(getNetwork("SCAN"))
)getVars()
“conductivity”, turbidity, ground water, gauge heightgetNetwork
, mapSiteFinder(getVars("conductivity"))
getDates(start = "1800", end = "2015")
The release of version 1.0.0 has brought not only more MET station data (from 13,000 stations to 107,000) but enhanced functionality. The expansion in MET station data required a total re-working of how data are stored and accessed. The current data set occupies a 600 megabyte binary file; this large file is loaded from a remote GitHub repo into R’s ‘background’ when the package is loaded.
A 10x increase in data also necessitated a new data structure, nested lists, and functions to access those data. A series of get...
functions are now used to quickly pull data from the metScanR_DB object. Storing data as a list rather than a data.frame has two key benefits: (1) lists take up less disk space (and random access memory) and (2) lists are processed much more quickly.
siteFinder
: an all in one “wrapper” that accesses the functionality of several lower level get
functions (listed below)
get
functions directly will return data more quicklymapSiteFinder
: will map the outputs returned from siteFinder
and/or get
functionsgetCountry
: query MET stations by country of origingetDates
: query by startdate, enddate, or date range (startdate AND enddate)getElevation
: query by station elevationgetId
: find specific stations by unique identifiergetNearby
: find stations near a specific location and radiusgetNetwork
: return list of stations from a given MET networkgetVars
: query stations by environmental variables measured
An object, metScanR_DB, is imported into the metScanR environment when the library is loaded in R. This metScanR_DB object contains all meta-data for the MET stations captured by this project. All metScanR
functions return meta-data in a nested and named list that follows the same general structure for every station in the database.
If we return an object named “data” from a getVars
function call (e.g. data <- getVars[1]
, this gives you the first list element or station of the metScanR_DB), the data are formatted as such (with a description in parentheses):
In the near future, metScanR will provide functionality for directly downloading MET data via existing APIs. We are also planning on including meta-data from Ameriflux and NADP stations.
install.packages("metScanR")
If you encounter a bug, please provide a reproducible example on this package’s github issues page.
Find station meta-data near a given coordinate, assign the output to object scenario1
:
library(metScanR)
## Warning: package 'metScanR' was built under R version 3.3.3
## Welcome to metScanR! This package takes a few extra seconds to load because it checks for updates to an external database upon startup. Thank you for your patience.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
scenario1 <- siteFinder(lat=40.05,lon=-105.27,startDate="2000-01-05",radius=45) # returns 40 stations
Our search can be narrowed to more specific variables of interest via the vars
parameter. There are hundreds of weather/climatological variables included in the metScanR database. Rather than requiring the user to know the exact syntax, case, phrase, or network of variables before they search the database, we have implemented a ‘fuzzy search’ function that interprets a user’s text input for the vars
variable. The ‘fuzzy search’ function attempts to match the user’s input against a large set of overlapping keywords for each variable listed in the database.
In this example we search for any variables associated with air temperature measurements:
scenario2 <- siteFinder(lat=40.05,lon=-105.27,startDate="2000-01-05",radius=45,network="COOP",vars="air temperature", includeUnk = TRUE) # returns 15 stations
Typing the object name cenario2
shows the resulting output, where values such as “TMAX”, “TMIN”, and “AVGT” are returned (i.e. variables associated with air temperature measurements). Note that an error message is the resulting output if none of the stations within the search have variables that match the vars
argument.
metScanR was designed to support users who want to quickly and interactively locate meteorological data. The output of the siteFinder()
and get()
functions can therefore be visualized by passing the output data to the mapSiteFinder()
function:
mapSiteFinder(scenario1)
The mapSiteFinder()
function produces an interactive Leaflet map with every MET station represented by a colored circle. Colors denote which network a particular station belongs to. Users can click on circles to view key station meta-data such as name, platform, unique identifier, monitoring start date, monitoring end date, and station elevation. Users can also pan, zoom, and click on stations to better visualize the spatial arrangement of stations and/or networks.
mapSiteFinder(scenario1)
returns 56 stations across 5 networks. Attempting to find, download, and organize data from this many stations and networks can be a time consuming task. One strategy for reducing the complexity of this task is to ‘eyeball’ which MET networks have the most extensive spatial coverage or time series (i.e. broadest monitoring start and end dates).
A brief glance at the map shows that the “COOP” network fits our criteria:
scenario3 <- siteFinder(lat=40.05,lon=-105.27,startDate="2000-01-05",radius=45,network="COOP")
mapSiteFinder(scenario3)
## count the number of stations
length(names(scenario3))
## [1] 19
Here we see that the data set has been reduced to 19 stations from the same network, thus simplifying our original task.
The next section walks you through how to browse through the many environmental variables logged by metScanR.
There are 830 different environmental variables listed in the metScanR_DB. You can search through the list of terms associated with any station by combining two functions in RStudio, View()
and plyr’s ldply
, to neatly display, and then search through, the data:
View(plyr::ldply(metScanR:::metScanR_terms))
In the search bar, typing in a term like “temperature” will narrow the number of rows displayed.
mapSiteFinder(getVars("conductivity"))
RNRCS
package authors for adding NRCS station meta-data!## Save huge map as a PDF or png to share with others, or render as HTML
maps <- mapSiteFinder(getNetwork("SCAN"))
maps
oldest <- getDates(startDate = "1800-01-01", endDate="2015-01-01") # search for stations that begin at the turn of the 19th century and monitor to 2015
length(oldest) # count to see how many stations metScanR finds
## [1] 2
names(oldest) # see what the stationIDs are
## [1] "USC00226177" "USW00013782"
oldest$USC00226177
## $namez
## [1] "NATCHEZ"
##
## $identifiers
## idType id
## 1 GHCND USC00226177
## 5 GHCNMLT USC00226177
## 9 COOP 226177
## 13 NWSLI NATM6
## 14 NCDCSTNID 20011197
##
## $platform
## [1] "COOP"
##
## $elements
## element date.begin date.end
## 1 PRECIP 1948-01-01 2017-04-17
## 2 TEMP 1948-01-01 present
## DAPR DAPR 1953 2016
## MDPR MDPR 1953 2016
## PRCP PRCP 1892 2017
## SNOW SNOW 1894 2017
## SNWD SNWD 1909 2017
## TMAX TMAX 1892 2017
## TMIN TMIN 1892 2017
## TOBS TOBS 1901 2017
## WT01 WT01 1906 1962
## WT03 WT03 1915 1991
## WT04 WT04 1898 2011
## WT05 WT05 1915 1985
## WT06 WT06 1936 2011
## WT07 WT07 1935 1944
## WT08 WT08 1919 1949
## WT11 WT11 1934 1990
## WT14 WT14 1924 1978
## WT16 WT16 1895 1929
## WT18 WT18 1898 1924
##
## $location
## latitude_dec longitude_dec elev country state county utcoffset
## 1 31.589 -91.3409 59.4 UNITED STATES MS ADAMS -6
## date.begin date.end
## 1 1799-01-01 present
## find stations with soil moisture
soilMoisture <- getVars("soil moisture")
## find stations with snow depth
snow <- getVars("snow depth")
## determine which stations have both variables by finding intersecting station names between the two lists
colocated <- intersect(names(soilMoisture), names(snow))
colocated %>% head
## [1] "SCAN:2221" "SCAN:2214" "SCAN:2216" "SCAN:2213" "SCAN:2210" "SCAN:2211"
Returning just the meta-data for the colocated stations is a little more involved, as it requires filtering the metScanR_DB
list object. You can subset a list either by numeric index or the element name (if it exists). The nicely formatted metScanR_DB
object is a named list, so we access the colocated data via the station’s identifier:
## this method takes about 1.7 seconds on my system
## for each stationid, subset the metScanR_DB object and return the meta-data as a smaller list
colocated_metadata <- lapply(colocated, function(x) metScanR:::metScanR_DB[x])
colocated_metadata %>% head(3)
Since metScanR only returns station meta-data at the moment (functionality for downloading station data directly will be implemented in the future), users may want a ‘hard copy’ .csv of station identifiers to plug into various APIs or websites supported by MET networks:
write.csv(x = scenario1, file = "path/to/your/folder/metScanR_output.csv", na="")
You must specify a local file path for the output via the file
argument.