install.packages("pak")
2 WQMA Oracle Data Warehouse
2.1 Connecting to the WQMA Data Warehouse
If you have not already, please setup an ODBC connection to the WQMA data warehouse following the instructions in the ODBC section.
2.1.1 Install nexus
The nexus
package is available on GitHub. To install packages from GitHub, first install the package, pak
.
Use pak
to install the nexus
package from GitHub. The code below specifies the organization (“BWAM”) and the repository (“nexus”). If this fails, try the next code chunk.
pak::pak("BWAM/nexus")
If pak fails to install the package, try using devtools and gitcreds. The difficulty with this option is you must install a GitHub PAT. To setup a PAT, see Managing Git(Hub) Credentials.
devtools::install_github(
repo = "BWAM/nexus",
auth_token = gitcreds::gitcreds_get()$password)
2.1.2 Get Connected
To work with the data warehouse, you must first connect to it. This is very similar to specifying specifying a file directory containing multiple CSV files you want to import into R. nexus
provides the function, get_connected()
, to simplify the process for connecting to the data warehouse. At a minimum, you must provide your data warehouse username. You will be prompted to enter your password in a pop-up window. You should never supply your password in plain text because it increases the chances you accidentally share your password with others. For example, it is very easy to accidentally share your password when committing and pushing other changes to GitHub.
When you are finished querying the data warehouse, please use
`DBI::dbDisconnect(conn = insert-your-connection-name-here)`.
This message is displayed once per session.
con <- nexus::get_connected(username = "ZSMITH")
2.1.3 Understanding the Data Warehouse Structure
A primary goal of the WQMA Data Warehouse and nexus is to allow users to focus on the task at hand and NOT have to spend much energy cleaning data or learning how tables relate to one another. This section outlines a number of functions that can be used to rapidly learn about the data warehouse and it’s contents.
2.1.3.1 Tables and Columns and Definitions, Oh My!
The DBI function, dbListTables()
, can be used to get a vector of tables in a database. For the WQMA data warehouse, we also need to specify the name of the schema we want to explore.
DBI::dbListTables(con, schema_name = "WQMA_OWNER")
[1] "TAXONOMY_REFERENCE_JUNCTION" "PARAMETER"
[3] "SITE" "TAXONOMIC_TRAIT"
[5] "SAMPLE_DELIVERY_GROUP" "BASIN"
[7] "RESULT" "ACCESSIONS_TABLES"
[9] "ACCESSIONS_DATA_MAP" "PREVIOUS_TAXON_NAME"
[11] "QUALITY_CONTROL" "PROJECT"
[13] "PREVIOUS_SITE_ID" "ACCESSIONS_FLD_XFM"
[15] "EVENT" "SAMPLE"
[17] "WATERBODY" "ACCESSIONS_CDE_MAP"
[19] "TAXONOMY_REFERENCE" "ACCESSIONS"
[21] "RESULT_QUALIFIER" "TAXONOMIC_ABUNDANCE"
[23] "PARAMETER_NAME" "TAXONOMY"
[25] "ACCESSIONS_KEY_MAP"
However, it is recommended that you use the nexus get_data_dictionary()
function because it provides much more information than dbListTables()
. Setting the argument just_tables = TRUE
will return tibble (i.e., a fancy data frame) of all tables in the data warehouse and their definitions.
(dictionary <- nexus::get_data_dictionary(just_tables = TRUE))
# A tibble: 19 × 2
table_name definition
<chr> <chr>
1 BASIN This table represents the 17 major drainage basi…
2 EVENT A table representing all distinct sampling event…
3 PARAMETER This table represents the distinct parameters pr…
4 PARAMETER_NAME This table provides descriptions of the paramete…
5 PREVIOUS_SITE_ID This table represents site IDs that were previou…
6 PREVIOUS_TAXON_NAME This table represents taxonomic names that were …
7 PROJECT This table represents the monitoring projects th…
8 QUALITY_CONTROL This table contains quality control results asso…
9 RESULT This table represents numeric and/or categorical…
10 RESULT_QUALIFIER The table provides a detailed description of res…
11 SAMPLE This table represents distinct samples collected…
12 SAMPLE_DELIVERY_GROUP A table containing all valid Sample Delivery Gro…
13 SITE This table represents the distinct monitoring lo…
14 TAXONOMIC_ABUNDANCE This table contains taxonomic abundances— the co…
15 TAXONOMIC_TRAIT This table represents taxonomic traits, such as …
16 TAXONOMY This table represents the tax's observed in New …
17 TAXONOMY_REFERENCE This table identifies taxonomic reference materi…
18 TAXONOMY_REFERENCE_JUNCTION This is a junction table to facilitate the relat…
19 WATERBODY This table represents waterbodies in New York St…
Running get_data_dictionary()
with all the default arguments will return all table and column definitions in the data warehouse.
(dictionary <- nexus::get_data_dictionary())
# A tibble: 218 × 11
table_name column_name examples definition db_data_type character_limit
<chr> <chr> <list> <chr> <chr> <dbl>
1 BASIN BASIN <chr [17]> A unique 2… NUMBER NA
2 BASIN BASIN_NAME <chr [17]> The names … VARCHAR2(50) 50
3 BASIN <NA> <NULL> This table… <NA> NA
4 EVENT EVENT_DATETIME <chr [4]> The date a… DATETIME NA
5 EVENT EVENT_ID <chr [10]> A unique s… VARCHAR2(50) 50
6 EVENT PROJECT <chr [10]> A monitori… VARCHAR2(10… 100
7 EVENT QAPP_ID <chr [5]> The Qualit… VARCHAR2(50) 50
8 EVENT SITE_ID <dbl [10]> A unique s… NUMBER NA
9 EVENT <NA> <NULL> A table re… <NA> NA
10 PARAMETER CASRN <chr [10]> A CAS Regi… VARCHAR2(50) 50
# ℹ 208 more rows
# ℹ 5 more variables: not_null <lgl>, surrogate_key <lgl>, primary_key <chr>,
# foreign_key <chr>, parent_table <chr>
The dictionary contains an examples
field that provides examples of the type of data contained in each column. If there are less than 20 unique values in a column, all possible values are shown. For example, you can see all of the possible values for WATERBODY_TYPE
using the following code:
dictionary |>
dplyr::filter(table_name %in% "WATERBODY",
column_name %in% "WATERBODY_TYPE") |>
dplyr::pull(examples)
[[1]]
[1] "lake" "river_stream" "other" "outfall"
You can supply a character vector of table names to the argument table_vec
to subset the tibble to only the table or tables of interest. The example below, only shows the definitions for the BASIN and WATERBODY tables.
(dictionary <- nexus::get_data_dictionary(table_vec = c("BASIN",
"WATERBODY")))
# A tibble: 12 × 11
table_name column_name examples definition db_data_type character_limit
<chr> <chr> <list> <chr> <chr> <dbl>
1 BASIN BASIN <chr> A unique … NUMBER NA
2 BASIN BASIN_NAME <chr> The names… VARCHAR2(50) 50
3 BASIN <NA> <NULL> This tabl… <NA> NA
4 WATERBODY WATERBODY_CODE <chr> A unique … VARCHAR2(50) 50
5 WATERBODY BEACH_PRESENT <chr> Identifie… VARCHAR2(15) 15
6 WATERBODY FISHERIES_INDEX_… <chr> This is a… VARCHAR2(75) 75
7 WATERBODY PUBLIC_WATER_SUP… <chr> Identifie… VARCHAR2(15) 15
8 WATERBODY USGS_POND_NUMBER <chr> A unique … VARCHAR2(15) 15
9 WATERBODY WATERBODY_ALTERN… <chr> Some wate… VARCHAR2(10… 100
10 WATERBODY WATERBODY_NAME <chr> The name … VARCHAR2(10… 100
11 WATERBODY WATERBODY_TYPE <chr> Inidicate… VARCHAR2(50) 50
12 WATERBODY <NA> <NULL> This tabl… <NA> NA
# ℹ 5 more variables: not_null <lgl>, surrogate_key <lgl>, primary_key <chr>,
# foreign_key <chr>, parent_table <chr>
Similar to table_vec
, you can supply a character vector of column names to the argument column_vec
to subset the tibble to only the column or columns of interest. This is useful for:
Getting the definition of a column(s) of interest.
Quickly identifying which table a column comes from.
The example below shows the definitions for two columns of interest from two different tables.
(dictionary <- nexus::get_data_dictionary(
column_vec = c("BASIN_NAME",
"WATERBODY_NAME")
))
# A tibble: 2 × 11
table_name column_name examples definition db_data_type character_limit
<chr> <chr> <list> <chr> <chr> <dbl>
1 BASIN BASIN_NAME <chr [17]> The names o… VARCHAR2(50) 50
2 WATERBODY WATERBODY_NAME <chr [10]> The name of… VARCHAR2(10… 100
# ℹ 5 more variables: not_null <lgl>, surrogate_key <lgl>, primary_key <chr>,
# foreign_key <chr>, parent_table <chr>
2.1.4 Get Data
2.1.4.1 Single Table Queries
To query data from a single table you can use functions from the dplyr and DBI packages. Below I have established a remote connection to the BASIN table in the data warehouse. I have not yet pulled the data into R. You may think of this as a preview of the data that you will pull into R. As you can see, you need to provide a connection to a database, the schema, and the table name.
# Source: table<WQMA_OWNER.BASIN> [?? x 6]
# Database: Oracle 19.00.0000[ZSMITH@WQMAP/]
BASIN BASIN_NAME CREATE_DATE END_DATE UPDATE_DATE
<chr> <chr> <dttm> <dttm> <dttm>
1 01 Lake Erie-… 2024-03-26 13:32:45 NA NA
2 02 Allegheny … 2024-03-26 13:32:45 NA NA
3 03 Lake Ontar… 2024-03-26 13:32:45 NA NA
4 04 Genesee Ri… 2024-03-26 13:32:45 NA NA
5 05 Chemung Ri… 2024-03-26 13:32:45 NA NA
6 06 Susquehann… 2024-03-26 13:32:45 NA NA
7 07 Seneca-One… 2024-03-26 13:32:45 NA NA
8 08 Black Rive… 2024-03-26 13:32:45 NA NA
9 09 St. Lawren… 2024-03-26 13:32:45 NA NA
10 10 Lake Champ… 2024-03-26 13:32:45 NA NA
# ℹ more rows
# ℹ 1 more variable: UPDATED_BY_GUID <chr>
To bring the data into R, you need to use the dplyr function, collect()
.
(basin_df <- dplyr::collect(basin_tbl))
# A tibble: 17 × 6
BASIN BASIN_NAME CREATE_DATE END_DATE UPDATE_DATE
<chr> <chr> <dttm> <dttm> <dttm>
1 01 Lake Erie-… 2024-03-26 13:32:45 NA NA
2 02 Allegheny … 2024-03-26 13:32:45 NA NA
3 03 Lake Ontar… 2024-03-26 13:32:45 NA NA
4 04 Genesee Ri… 2024-03-26 13:32:45 NA NA
5 05 Chemung Ri… 2024-03-26 13:32:45 NA NA
6 06 Susquehann… 2024-03-26 13:32:45 NA NA
7 07 Seneca-One… 2024-03-26 13:32:45 NA NA
8 08 Black Rive… 2024-03-26 13:32:45 NA NA
9 09 St. Lawren… 2024-03-26 13:32:45 NA NA
10 10 Lake Champ… 2024-03-26 13:32:45 NA NA
11 11 Upper Huds… 2024-03-26 13:32:45 NA NA
12 12 Mohawk Riv… 2024-03-26 13:32:45 NA NA
13 13 Lower Huds… 2024-03-26 13:32:45 NA NA
14 14 Delaware R… 2024-03-26 13:32:45 NA NA
15 15 Passaic-Ne… 2024-03-26 13:32:45 NA NA
16 17 Atlantic O… 2024-03-26 13:32:45 NA NA
17 16 Housatonic… 2024-03-26 13:32:45 NA NA
# ℹ 1 more variable: UPDATED_BY_GUID <chr>
You can use dplyr functions to work with the data in the database. dplyr will translate these functions to SQL for you.
This example: 1. Uses the basin_tbl
connection established before 2. Filters the BASIN
column to the rows that represent the major drainage basins “03” and “07” 3. Retains only the BASIN_NAME
column 4. Collects the data from the database and into R.
# 1
basin_tbl |>
# 2
dplyr::filter(BASIN %in% c("03", "07")) |>
# 3
dplyr::select(BASIN_NAME) |>
# 4
dplyr::collect()
# A tibble: 2 × 1
BASIN_NAME
<chr>
1 Lake Ontario & Minor Tribs.
2 Seneca-Oneida-Oswego River Basin
This example: 1. Connects to the “EVENT” table 2. filters to the rows were the EVENT_DATETIME
column to only the events that occurred between July 1-7th of 2022. 3. Collects the data from the database and into R.
# 1
dplyr::tbl(con,
I("WQMA_OWNER.EVENT")) |>
# 2
dplyr::filter(
dplyr::between(
x = EVENT_DATETIME,
left = to_date("2022-07-01", "YYYY-MM-DD HH24:MI:SS"),
right = to_date("2022-07-07", "YYYY-MM-DD HH24:MI:SS")
)
) |>
# 3
dplyr::collect()
# A tibble: 215 × 9
EVENT_ID EVENT_DATETIME SITE_ID PROJECT QAPP_ID CREATE_DATE
<chr> <dttm> <dbl> <chr> <chr> <dttm>
1 13-FDRY-0.4_… 2022-07-06 09:15:00 27578 Undefi… Unknown 2024-03-26 14:26:14
2 13-STNY-0.8_… 2022-07-06 15:00:00 28025 Undefi… Unknown 2024-03-26 14:26:14
3 13-STNY-0.8_… 2022-07-05 15:00:00 28025 Undefi… Unknown 2024-03-26 14:26:14
4 13-GUNK-40.3… 2022-07-05 13:15:00 27611 PWS pr… Unknown 2024-03-26 14:26:11
5 13-WKLEI_T4-… 2022-07-05 15:30:00 28182 PWS pr… Unknown 2024-03-26 14:26:11
6 14-COUL-0.9_… 2022-07-05 10:00:00 28349 PWS pr… Unknown 2024-03-26 14:26:11
7 07-SXML-2.2_… 2022-07-06 11:45:00 25790 PWS pr… Unknown 2024-03-26 14:26:11
8 07-SXML-8.0_… 2022-07-06 10:30:00 25791 PWS pr… Unknown 2024-03-26 14:26:11
9 07-SXML-12.4… 2022-07-06 09:15:00 25789 PWS pr… Unknown 2024-03-26 14:26:11
10 13-CRUM-1.5_… 2022-07-06 11:00:00 27545 PWS pr… Unknown 2024-03-26 14:26:11
# ℹ 205 more rows
# ℹ 3 more variables: END_DATE <dttm>, UPDATE_DATE <dttm>,
# UPDATED_BY_GUID <chr>
2.1.4.2 Multiple Table Queries (Or Operating on the Data Model)
In most situations, it is recommended that you acquire data using the get_data_model()
function from nexus. This function has been tailored to connect to establish a remote connection to each table in the data warehouse and define the relationships between tables.
(data_model <- nexus::get_data_model(con = con))
── Table source ────────────────────────────────────────────────────────────────
src: Oracle 19.00.0000[ZSMITH@WQMAP/]
── Metadata ────────────────────────────────────────────────────────────────────
Tables: `BASIN`, `EVENT`, `PARAMETER`, `PARAMETER_NAME`, `PREVIOUS_SITE_ID`, … (19 total)
Columns: 200
Primary keys: 13
Foreign keys: 18
We can visualize the table relationships with the get_erd()
function.
nexus::get_erd(data_model)
If you want to collect a single table, such as BASIN, you can use the following code. The get_data_model()
function hides some of the complexity of connecting to the table through dplyr, such as establishing the schema, and makes it simple to import data into R.
(basin_df2 <- data_model$BASIN |>
dplyr::collect())
# A tibble: 17 × 2
BASIN BASIN_NAME
<chr> <chr>
1 01 Lake Erie-Niagara River Basin
2 02 Allegheny River Basin
3 03 Lake Ontario & Minor Tribs.
4 04 Genesee River Basin
5 05 Chemung River Basin
6 06 Susquehanna River Basin
7 07 Seneca-Oneida-Oswego River Basin
8 08 Black River Basin
9 09 St. Lawrence River Basin
10 10 Lake Champlain River Basin
11 11 Upper Hudson River Basin
12 12 Mohawk River Basin
13 13 Lower Hudson River Basin
14 14 Delaware River Basin
15 15 Passaic-Newark (Ramapo River)
16 16 Housatonic River Basin
17 17 Atlantic Ocean-Long Island Sound
Or…
# A tibble: 17 × 2
BASIN BASIN_NAME
<chr> <chr>
1 01 Lake Erie-Niagara River Basin
2 02 Allegheny River Basin
3 03 Lake Ontario & Minor Tribs.
4 04 Genesee River Basin
5 05 Chemung River Basin
6 06 Susquehanna River Basin
7 07 Seneca-Oneida-Oswego River Basin
8 08 Black River Basin
9 09 St. Lawrence River Basin
10 10 Lake Champlain River Basin
11 11 Upper Hudson River Basin
12 12 Mohawk River Basin
13 13 Lower Hudson River Basin
14 14 Delaware River Basin
15 15 Passaic-Newark (Ramapo River)
16 16 Housatonic River Basin
17 17 Atlantic Ocean-Long Island Sound
2.1.5 Example Queries
You cannot import all data from the WQMA Data Warehouse into the R– the connection will time out or you will crash your R session. However, there is no need to read in all data to R. With DBI, dplyr, dm, and nexus you have the ability to do the following and more within the database:
preview data
filter data
select only the columns of interest
join tables together
You should try to narrow your focus to the smallest amount of data you need for your task. Queries will return results much faster if you limit the number of rows and columns that need to be transferred from the data warehouse into R. This might mean filtering the rows by basin, waterbody, site, sampling event, project, parameter, etc. or a combination of these factors to get only the data necessary. Similarly, selecting only the tables and columns of interest will help to speed queries up.
If you are feeling nervous about not pulling all data in, I hope that it
nexus is designed to work with the R-package, dm. dm stands for “data model” and it provides a number of useful functions for working with relational data and has great documentation on how to use those functions (see Tutorials, Cheatsheet, and Technical Articles on the dm website). Many of the functions are designed to both execute locally in R and to be translated into a SQL query to be executed by a relational database or data warehouse. When possible, it is recommended that you favor dm queries over dbplyr, DBI, and custom SQL queries. It is not bad to write dbplyr, DBI, and custom SQL queries, but dm provides an elegant syntax for leveraging relational data that would be very difficult and time intensive to re-implement with other tools; in other words, dm is generally more efficient because easier to write and understand.
2.1.5.1 Lake Sites with Phosphorus Data Collected in the last 5-Years
- Define the time period of interest as
start_date
andend_date
- Start with the
data_model
object defined previously - Filter the
WATERBODY
table to only the rows where the columnWATERBODY_TYPE
is “lake”. - Filter the
PARAMETER
table to only the rows where the columnPARAMTER_NAME
is “phosphorus”. - Filter the
EVENT
table to only the rows where the columnEVENT_DATETIME
represents a date in the last 5-years. - In this case, we are only interested in the
SITE
table. It’s important to note that we can use the data model to perform queries on tables besides theSITE
table that ultimately influence the rows of theSITE
table returned. - In this case, we are only interested in keeping the columns associated with the site identifier,
SITE_CODE
, and the sites coordinates (LATITUDE
andLONGITUDE
). - Collects the data from the database and into R.
# 1
start_date <- Sys.Date() - lubridate::years(5)
end_date <- Sys.Date()
site_remote <- data_model |> # 2
dm::dm_filter(
WATERBODY = WATERBODY_TYPE == "lake", # 3
PARAMETER = PARAMETER_NAME == "phosphorus", # 4
EVENT = dplyr::between( # 5
x = EVENT_DATETIME,
left = to_date(start_date, "YYYY-MM-DD HH24:MI:SS"),
right = to_date(end_date, "YYYY-MM-DD HH24:MI:SS")
)
) |>
dm::pull_tbl(SITE) |> # 6
dplyr::select( # 7
SITE_CODE,
LATITUDE,
LONGITUDE
)
site_dm <- dplyr::collect(site_remote) # 8
2.1.5.2 Site
site_dm <- dplyr::collect(site_remote)
(single_df <- site_dm |>
dm::dm_flatten_to_tbl(.start = RESULT, .recursive = TRUE) |>
dplyr::select(
EVENT_ID,
SITE_CODE,
REPLICATE,
SAMPLE_METHOD,
SAMPLE_METHOD_DESCRIPTION,
PARAMETER_NAME,
RESULT_VALUE,
UNIT,
LATITUDE,
LONGITUDE,
BASIN,
BASIN_NAME,
WATERBODY_NAME,
WATERBODY_TYPE
))
# A tibble: 1,693 × 14
EVENT_ID SITE_CODE REPLICATE SAMPLE_METHOD SAMPLE_METHOD_DESCRI…¹
<chr> <chr> <chr> <chr> <chr>
1 07-ONON-1.0_1986090… 07-ONON-… 1 survey sample_event_info_sur…
2 07-ONON-1.0_1986090… 07-ONON-… 1 multiprobe unknown
3 07-ONON-1.0_1986090… 07-ONON-… 1 multiprobe unknown
4 07-ONON-1.0_1986090… 07-ONON-… 1 multiprobe unknown
5 07-ONON-1.0_1986090… 07-ONON-… 1 multiprobe unknown
6 07-ONON-1.0_1986090… 07-ONON-… 1 survey macroinvertebrate_fie…
7 07-ONON-1.0_1986090… 07-ONON-… 1 survey macroinvertebrate_fie…
8 07-ONON-1.0_1986090… 07-ONON-… 1 survey macroinvertebrate_fie…
9 07-ONON-1.0_1986090… 07-ONON-… 1 survey macroinvertebrate_fie…
10 07-ONON-1.0_1986090… 07-ONON-… 1 survey macroinvertebrate_fie…
# ℹ 1,683 more rows
# ℹ abbreviated name: ¹SAMPLE_METHOD_DESCRIPTION
# ℹ 9 more variables: PARAMETER_NAME <chr>, RESULT_VALUE <dbl>, UNIT <chr>,
# LATITUDE <dbl>, LONGITUDE <dbl>, BASIN <chr>, BASIN_NAME <chr>,
# WATERBODY_NAME <chr>, WATERBODY_TYPE <chr>
2.1.5.3 Project
project_remote <- data_model |>
dm::dm_filter(PROJECT = PROJECT_TYPE == "RIBS Routine") |>
dm::dm_select_tbl(
WATERBODY,
SITE,
EVENT,
SAMPLE,
RESULT,
PARAMETER
)
project_dm <- dplyr::collect(project_remote)
(single_df <- project_dm |>
dm::dm_flatten_to_tbl(.start = RESULT, .recursive = TRUE) |>
dplyr::select(
EVENT_ID,
SITE_CODE,
REPLICATE,
SAMPLE_METHOD,
SAMPLE_METHOD_DESCRIPTION,
PARAMETER_NAME,
RESULT_VALUE,
UNIT,
LATITUDE,
LONGITUDE,
BASIN,
WATERBODY_NAME,
WATERBODY_TYPE
))
# A tibble: 18,911 × 13
EVENT_ID SITE_CODE REPLICATE SAMPLE_METHOD SAMPLE_METHOD_DESCRI…¹
<chr> <chr> <chr> <chr> <chr>
1 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
2 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
3 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
4 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
5 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
6 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
7 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
8 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
9 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
10 13-ROND-9.9_2018061… 13-ROND-… 1 unknown unknown
# ℹ 18,901 more rows
# ℹ abbreviated name: ¹SAMPLE_METHOD_DESCRIPTION
# ℹ 8 more variables: PARAMETER_NAME <chr>, RESULT_VALUE <dbl>, UNIT <chr>,
# LATITUDE <dbl>, LONGITUDE <dbl>, BASIN <chr>, WATERBODY_NAME <chr>,
# WATERBODY_TYPE <chr>
2.1.5.4 Parameter
chloride_remote <- dm::dm_filter(data_model,
PROJECT = PROJECT_TYPE == "RIBS Routine",
PARAMETER = PARAMETER_NAME == "chloride")
chloride_dm <- dplyr::collect(chloride_remote)
(single_df <- chloride_dm |>
dm::dm_flatten_to_tbl(.start = RESULT, .recursive = TRUE) |>
dplyr::select(
EVENT_ID,
SITE_CODE,
REPLICATE,
SAMPLE_METHOD,
SAMPLE_METHOD_DESCRIPTION,
PARAMETER_NAME,
RESULT_VALUE,
UNIT,
LATITUDE,
LONGITUDE,
BASIN,
BASIN_NAME,
WATERBODY_NAME,
WATERBODY_TYPE
))
# A tibble: 623 × 14
EVENT_ID SITE_CODE REPLICATE SAMPLE_METHOD SAMPLE_METHOD_DESCRI…¹
<chr> <chr> <chr> <chr> <chr>
1 01-BUFF-1.7_2018041… 01-BUFF-… 1 unknown unknown
2 01-BUFF-1.7_2018061… 01-BUFF-… 1 unknown unknown
3 01-BUFF-1.7_2018081… 01-BUFF-… 1 unknown unknown
4 01-BUFF-1.7_2018101… 01-BUFF-… 1 unknown unknown
5 01-BUFF-1.7_2019041… 01-BUFF-… 1 unknown unknown
6 01-BUFF-1.7_2019061… 01-BUFF-… 1 unknown unknown
7 01-BUFF-1.7_2019081… 01-BUFF-… 1 unknown unknown
8 01-BUFF-1.7_2019102… 01-BUFF-… 1 unknown unknown
9 01-BUFF-1.7_2020042… 01-BUFF-… 1 unknown unknown
10 01-BUFF-1.7_2020061… 01-BUFF-… 1 unknown unknown
# ℹ 613 more rows
# ℹ abbreviated name: ¹SAMPLE_METHOD_DESCRIPTION
# ℹ 9 more variables: PARAMETER_NAME <chr>, RESULT_VALUE <dbl>, UNIT <chr>,
# LATITUDE <dbl>, LONGITUDE <dbl>, BASIN <chr>, BASIN_NAME <chr>,
# WATERBODY_NAME <chr>, WATERBODY_TYPE <chr>
2.2 Setup Oracle ODBC Connection{sec-odbc}
2.2.1 Objective
Setup an ODBC connection to the WQMA Data Warehouse.
2.2.2 What is an ODBC?
ODBC stands for Open Database Connectivity and, once configured accordingly, it allows the user to connect to a database from another program, such as R.
2.2.3 Oracle’s Instant Client ODBC Driver
Oracle’s Instant Client ODBC software is a standalone package that offers the full functionality of the Oracle ODBC driver (except the Oracle service for Microsoft Transaction Server) with a simple install. –From: https://www.oracle.com/database/technologies/releasenote-odbc-ic.html
In other words, Oracle’s Instant Client is a set of software that needs to be installed on your computer in order for you to connect to an Oracle database.
I don’t care what Oracle says, this is anything but simple for the average user.
I was directed to the Oracle Instant Client from Posit’s webpage on connecting to an Oracle database..
2.2.4 Procedure
-
Follow the instructions in Installing Oracle Instant Client ODBC/On Windows section. You must first install the Instant Client Basic package and then install the Instant Client ODBC package. I have copied and modified the text from the webpage to try and make the steps easier to follow.
-
“Install the Instant Client Basic or Basic Light package, as described above.”
Follow the instructions at the head of the page in the “Installing Oracle Instant Client Basic and Basic Lite” section, which boils down to Installing Oracle’s Instant Client from here.
Select the Instant Client for Microsoft Windows (x64).
Select the most recent version of Instant Client for Microsoft Windows and a table will expand.
Download the Basic Package to the root of your Downloads folder. You cannot download the file directly to your C-drive (C:\).
Right-click on the downloaded zip file in your Downloads folder, select Extract All, browse to your C-drive (C:\), and select Extract.
-
“Download the Instant Client ODBC package. Unzip it in the same directory as your Basic or Basic Light package.”
The ODBC package is located on the same downloads page as the Basic Package installed in step 2.1 above. If you need to navigate back to this page, follow steps in 2.1.1 - 2.1.3.
Scroll down until you see the ODBC Package– it is under the subheading: Development and Runtime - optional packages.
Install the ODBC package to your Downloads folder. You cannot download the file directly to your C-drive (C:\).
Right-click on the downloaded zip file in your Downloads folder, select Extract All, browse to your C-drive (C:\), select the folder you created in step 2.1.5, and select Extract.
-
The C-drive directory, created in step 2.1, must be put on your computers PATH.
In your Windows Search Bar, look up and open “Edit the system environmental variables.”
In the Advanced tab of the pop-up window, click the “Environmental Variables…” button
-
This step requires admin-rights.
-
In the “System variables” section,:
find the existing “Path” variable
select the “Path” row
click on the “Edit…” button
-
In the pop-up:
Click on the “New” button
Click on the “Browse…” button
Select the C-drive path to the instant client directory created in step 2.1
-
-
This step requires admin-rights. “Execute
odbc_install.exe
from the Instant Client directory.”Navigate to the folder created in step 2.2.4.
Double-click the
odbc_install.exe.
-
-
Setup a 64-bit ODBC connection.
In your Windows search bar, type “ODBC Data Sources”.
Select the Option that ends in “64-bit”.
In the window that appears, select the “Add” button.
Select the Oracle Instant Client option (should look something like this: “Oracle in instant_client_19_19”) and click the “Finish” button. If you do not see the Oracle Instant Client Option, then your ODBC installation was not successful. Please try to install the ODBC again.
-
Configuration:
Data Source Name: wqma_prod
Description: WQMA Data Warehouse Production Version
TNS Service Name: WQMAP
-
User ID: [Enter your database User ID]
- Example: JDOE
-
Click on the “Test Connection” Button, enter your database password, and click on the “OK” button. If there are no errors, you should receive a message stating “Connection successful.”
- If you are entering a temporary password, you will be prompted for a new password.
Click the “OK” button in the Oracle ODBC Driver Configuration window.
Congratulations! You have successfully downloaded, installed, and configured the necessary software to connect to an Oracle database.
2.2.5 Keyring Setup (Optional)
If you do not want to enter your password each time you establish a connection, you can setup a keyring to securely store and access your password. Again, you should NEVER type out your password in plain text.
keyring::key_set("your-user-name",
keyring = "wqma-example")
keyring::key_get("your-user-name",
keyring = "wqma-example")
Once the keyring is established, you can point to the keyring in the get_connected()
function using the kerying
argument and you will not need to provide your password each time you connect to the WQMA data warehouse.
con <- nexus::get_connected(username = "your-user-name",
keyring = "wqma-example")