forked from NOAA-EDAB/tech-doc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsurvey_data.rmd
364 lines (296 loc) · 14.4 KB
/
survey_data.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
# Survey Data
```{r setup, echo = F, message=F}
#devtools::install_github("slucey/Rsurvey/Survdat")
#Load packages
library(knitr)
library(rmarkdown)
library(Survdat)
```
<!--The purpose of this form is to provide the methods used to collect, process, and analyze derived data prior to data submission for all EDAB reports. This information will be compiled into the State of the Ecosystem Tech Memo, AND be used to meet metadata requirements put forth by the Public Access to Research Results (PARR) directive.
For the tech memo, we will be documenting the methods behind each indicator to the level of a peer-reviewed document.-->
## Contributor name(s)
Sean Lucey
## Data name
Survdat (Survey database)
### Indicator category
1. Database pull
<!--2. Database pull with analysis
3. Synthesis of published information
4. Extensive analysis; not yet published
5. Published methods-->
## Methods
The Northeast Fisheries Science Center (NEFSC) has been conducting standardized bottom trawl surveys
in the fall since 1963 and spring since 1968. The surveys follow a stratified random design. Fish
species and several invertebrate species are enumerated on a tow by tow basis [@Azarovitz_1981].
The data are housed in the NEFSC's survey database (SVDBS) maintained by the Ecosystem Survey Branch.
Direct pulls from the database are not advisable as there have been several gear modifications and
vessel changes over the course of the time series [@Miller_2010]. Survdat was developed as a database
query that applies the appropriate calibration factors for a seemless time series since the 1960s.
As such, it is the base for many of the other analyses conducted for the State of the Ecosystem
report that involve fisheries independent data.
The Survdat script can be broken down into two sections. The first pulls the raw data from SVDBS.
While the script is able to pull data from more than just the spring and fall bottom trawl surveys,
for the purposes of the State of the Ecosystem reports only the spring and fall data are used.
Survdat identifies those research cruises associated with the seasonal bottom trawl surveys and pulls
the station and biological data. Station data includes tow identification (cruise, station,
and stratum), tow location and date, as well as several environmental variables (depth, surface/bottom salinity,
and surface/bottom temperature). Stations are filtered for representitivness uses a station, haul, gear
(SHG) code for tows prior to 2009 and a tow, operations, gear, and aquisition (TOGA) code from 2009
onward. The codes that correspond to a representative tow (SHG <= 136 or TOGA <= 1324) are the same
used by assessment biologists at the NEFSC. Biological data includes the total biomass and sbundance
by species, as well as lengths and number at length.
The second section of the Survdat script applies the calibration factors. There are four calibrartion
factors apllied (Table 1). Calibration factors are pulled directly from SVDBS. Vessel conversions were made from
either the NOAA Ship *Delaware II* or NOAA Ship *Henry Bigelow* to the NOAA Ship *Albatross IV* which was
the primary vessel for most of the time series. The Albatross was decommisioned in 2009 and the Bigelow is
now the primary vessel for the bottom trawl survey.
```{r calibration factors, eval = T, echo = F}
cal.factors <- data.frame(Name = c('Door Conversion', 'Net Conversion', 'Vessel Conversion I', 'Vessel Conversion II'),
Code = c('DCF', 'GCF', 'VCF', 'BCF'),
Applied = c('<1985', '1973 - 1981 (Spring)', 'Delaware II records', 'Henry Bigelow records'))
kable(cal.factors)
```
The output from Survdat is an RData file that contains all the station and biological data, corrected
as noted above, from the NEFSC Spring Bottom Trawl Survey and NEFSC Fall Bottom Trawl Survey. The RData
file is a data.table, a powerful wrapper for the base data.frame (https://cran.r-project.org/web/packages/data.table/data.table.pdf).
There are also a series of tools that have been developed in order to utilize the Survdat data set
(https://github.com/slucey/RSurvey).
### Data source(s) (All types)
Survdat is a database query of the NEFSC survey database (SVDBS).
### Data extraction
Extraction methods are described above.
R code used in extraction process.
```{r survdat, echo = T, eval = F}
#Survdat.r
#This script will generate data from the NEFSC bottom trawl surveys
#SML
#-------------------------------------------------------------------------------
#User parameters
if(Sys.info()['sysname']=="Windows"){
data.dir <- "L:\\Rworkspace\\RSurvey"
out.dir <- "L:\\EcoAP\\Data\\survey"
memory.limit(4000)
}
if(Sys.info()['sysname']=="Linux"){
data.dir <- "/home/slucey/slucey/Rworkspace/RSurvey"
out.dir <- "/home/slucey/slucey/EcoAP/Data/survey"
uid <- 'slucey'
cat("Oracle Password: ")
pwd <- scan(stdin(), character(), n = 1)
}
shg.check <- 'y' # y = use only SHG <=136 or TOGA <= 1324 (>2008)
raw.check <- 'n' # y = save data without conversions (survdat.raw), will still
# save data with conversions (survdat)
all.season <- 'n' # y = save data with purpose code 10 not just spring/fall
# (survdat.allseason), will not save survdat regular
use.SAD <- 'y' # y = grab data from Survey Analysis Database (SAD) for
# assessed species
#-------------------------------------------------------------------------------
#Required packages
library(RODBC); library(data.table)
#-------------------------------------------------------------------------------
#Created functions
#Convert output to text for RODBC query
sqltext <- function(x){
out <- x[1]
if(length(x) > 1){
for(i in 2:length(x)){
out <- paste(out, x[i], sep = "','")
}
}
out <- paste("'", out, "'", sep = '')
return(out)
}
#-------------------------------------------------------------------------------
#Begin script
if(Sys.info()['sysname']=="Windows"){
channel <- odbcDriverConnect()
} else {
channel <- odbcConnect('sole', uid, pwd)
}
#Generate cruise list
if(all.season == 'n'){
cruise.qry <- "select unique year, cruise6, svvessel, season
from mstr_cruise
where purpose_code = 10
and year >= 1963
and (season = 'FALL'
or season = 'SPRING')
order by year, cruise6"
}
if(all.season == 'y'){
cruise.qry <- "select unique year, cruise6, svvessel, season
from mstr_cruise
where purpose_code = 10
and year >= 1963
order by year, cruise6"
}
cruise <- as.data.table(sqlQuery(channel, cruise.qry))
cruise <- na.omit(cruise)
setkey(cruise, CRUISE6, SVVESSEL)
#Use cruise codes to select other data
cruise6 <- sqltext(cruise$CRUISE6)
#Station data
if(shg.check == 'y'){
preHB.station.qry <- paste("select unique cruise6, svvessel, station, stratum,
tow, decdeg_beglat as lat, decdeg_beglon as lon,
begin_est_towdate as est_towdate, avgdepth as depth,
surftemp, surfsalin, bottemp, botsalin
from Union_fscs_svsta
where cruise6 in (", cruise6, ")
and SHG <= 136
and cruise6 <= 200900
order by cruise6, station", sep='')
HB.station.qry <- paste("select unique cruise6, svvessel, station, stratum,
tow, decdeg_beglat as lat, decdeg_beglon as lon,
begin_est_towdate as est_towdate, avgdepth as depth,
surftemp, surfsalin, bottemp, botsalin
from Union_fscs_svsta
where cruise6 in (", cruise6, ")
and TOGA <= 1324
and cruise6 > 200900
order by cruise6, station", sep='')
preHB.sta <- as.data.table(sqlQuery(channel, preHB.station.qry))
HB.sta <- as.data.table(sqlQuery(channel, HB.station.qry))
station <- rbindlist(list(preHB.sta, HB.sta))
}
if(shg.check == 'n'){
station.qry <- paste("select unique cruise6, svvessel, station, stratum, tow,
decdeg_beglat as lat, decdeg_beglon as lon,
begin_est_towdate as est_towdate, avgdepth as depth,
surftemp, surfsalin, bottemp, botsalin
from UNION_FSCS_SVSTA
where cruise6 in (", cruise6, ")
order by cruise6, station", sep='')
station <- as.data.table(sqlQuery(channel, station.qry))
}
setkey(station, CRUISE6, SVVESSEL)
#merge cruise and station
survdat <- merge(cruise, station)
#Catch data
catch.qry <- paste("select cruise6, station, stratum, tow, svspp, catchsex,
expcatchnum as abundance, expcatchwt as biomass
from UNION_FSCS_SVCAT
where cruise6 in (", cruise6, ")
and stratum not like 'YT%'
order by cruise6, station, svspp", sep='')
catch <- as.data.table(sqlQuery(channel, catch.qry))
setkey(catch, CRUISE6, STATION, STRATUM, TOW)
#merge with survdat
setkey(survdat, CRUISE6, STATION, STRATUM, TOW)
survdat <- merge(survdat, catch, by = key(survdat))
#Length data
length.qry <- paste("select cruise6, station, stratum, tow, svspp, catchsex,
length, expnumlen as numlen
from UNION_FSCS_SVLEN
where cruise6 in (", cruise6, ")
and stratum not like 'YT%'
order by cruise6, station, svspp, length", sep='')
len <- as.data.table(sqlQuery(channel, length.qry))
setkey(len, CRUISE6, STATION, STRATUM, TOW, SVSPP, CATCHSEX)
#merge with survdat
setkey(survdat, CRUISE6, STATION, STRATUM, TOW, SVSPP, CATCHSEX)
survdat <- merge(survdat, len, all.x = T)
if(raw.check == 'y'){
survdat.raw <- survdat
save(survdat.raw, file = paste(out.dir, "Survdat_raw.RData", sep =''))
}
#Conversion Factors
#need to make abundance column a double instead of an integer
survdat[, ABUNDANCE := as.double(ABUNDANCE)]
#Grab all conversion factors off the network
convert.qry <- "select *
from survan_conversion_factors"
convert <- as.data.table(sqlQuery(channel,convert.qry))
#DCF < 1985 Door Conversion
dcf.spp <- convert[DCF_WT > 0, SVSPP]
for(i in 1:length(dcf.spp)){
survdat[YEAR < 1985 & SVSPP == dcf.spp[i],
BIOMASS := BIOMASS * convert[SVSPP == dcf.spp[i], DCF_WT]]
}
dcf.spp <- convert[DCF_NUM > 0, SVSPP]
for(i in 1:length(dcf.spp)){
survdat[YEAR < 1985 & SVSPP == dcf.spp[i],
ABUNDANCE := round(ABUNDANCE * convert[SVSPP == dcf.spp[i], DCF_NUM])]
}
#GCF Spring 1973-1981 Net Conversion
gcf.spp <- convert[GCF_WT > 0, SVSPP]
for(i in 1:length(gcf.spp)){
survdat[SEASON == 'SPRING' & YEAR > 1972 & YEAR < 1982 & SVSPP == gcf.spp[i],
BIOMASS := BIOMASS / convert[SVSPP == gcf.spp[i], GCF_WT]]
}
gcf.spp <- convert[GCF_NUM > 0, SVSPP]
for(i in 1:length(gcf.spp)){
survdat[SEASON == 'SPRING' & YEAR > 1972 & YEAR < 1982 & SVSPP == gcf.spp[i],
ABUNDANCE := round(ABUNDANCE / convert[SVSPP == gcf.spp[i], GCF_NUM])]
}
#VCF SVVESSEL = DE Vessel Conversion
vcf.spp <- convert[VCF_WT > 0, SVSPP]
for(i in 1:length(vcf.spp)){
survdat[SVVESSEL == 'DE' & SVSPP == vcf.spp[i],
BIOMASS := BIOMASS * convert[SVSPP == vcf.spp[i], VCF_WT]]
}
vcf.spp <- convert[VCF_NUM > 0, SVSPP]
for(i in 1:length(vcf.spp)){
survdat[SVVESSEL == 'DE' & SVSPP == vcf.spp[i],
ABUNDANCE := round(ABUNDANCE * convert[SVSPP == vcf.spp[i], VCF_NUM])]
}
#Bigelow >2008 Vessel Conversion - need flat files (not on network)
#Use Bigelow conversions for Pisces as well (PC)
big.fall <- as.data.table(read.csv(file.path(data.dir,
'bigelow_fall_calibration.csv')))
big.spring <- as.data.table(read.csv(file.path(data.dir,
'bigelow_spring_calibration.csv')))
bf.spp <- big.fall[pW != 1, svspp]
for(i in 1:length(bf.spp)){
survdat[SVVESSEL %in% c('HB', 'PC') & SEASON == 'FALL' & SVSPP == bf.spp[i],
BIOMASS := BIOMASS / big.fall[svspp == bf.spp[i], pW]]
}
bf.spp <- big.fall[pw != 1, svspp]
for(i in 1:length(bf.spp)){
survdat[SVVESSEL %in% c('HB', 'PC') & SEASON == 'FALL' & SVSPP == bf.spp[i],
ABUNDANCE := round(ABUNDANCE / big.fall[svspp == bf.spp[i], pw])]
}
bs.spp <- big.spring[pW != 1, svspp]
for(i in 1:length(bs.spp)){
survdat[SVVESSEL %in% c('HB', 'PC') & SEASON == 'SPRING' & SVSPP == bs.spp[i],
BIOMASS := BIOMASS / big.spring[svspp == bs.spp[i], pW]]
}
bs.spp <- big.spring[pw != 1, svspp]
for(i in 1:length(bs.spp)){
survdat[SVVESSEL %in% c('HB', 'PC') & SEASON == 'SPRING' & SVSPP == bs.spp[i],
ABUNDANCE := round(ABUNDANCE / big.spring[svspp == bs.spp[i], pw])]
}
if(use.SAD == 'y'){
sad.qry <- "select svspp, cruise6, stratum, tow, station, sex as catchsex,
catch_wt_B_cal, catch_no_B_cal, length, length_no_B_cal
from STOCKEFF.I_SV_MERGED_CATCH_CALIB_O"
sad <- as.data.table(sqlQuery(channel, sad.qry))
setkey(sad, CRUISE6, STRATUM, TOW, STATION, SVSPP, CATCHSEX, LENGTH)
sad <- unique(sad)
survdat <- merge(survdat, sad, by = key(sad), all.x = T)
#Carry over SAD values to survdat columns and delete SAD columns
survdat[!is.na(CATCH_WT_B_CAL), BIOMASS := CATCH_WT_B_CAL]
survdat[!is.na(CATCH_NO_B_CAL), ABUNDANCE := CATCH_NO_B_CAL]
survdat[, NUMLEN := as.double(NUMLEN)]
survdat[!is.na(LENGTH_NO_B_CAL), NUMLEN := LENGTH_NO_B_CAL]
survdat[, c('CATCH_WT_B_CAL', 'CATCH_NO_B_CAL', 'LENGTH_NO_B_CAL') := NULL]
}
odbcClose(channel)
if(all.season == 'n') save(survdat, file = file.path(out.dir, "Survdat.RData"))
if(all.season == 'y') save(survdat, file = file.path(out.dir,
"Survdat_allseason.RData"))
```
### Data analysis
Text description of analysis methods, similar in structure and detail to a peer-reviewed paper methods section.
### Further metadata
Fill out the following metadata required by PARR under each subheader.
#### Public availability statement
Will source data be available to the public? If not, is a Data Access Waiver necessary?
Yes
#### Point of contact - Whom should be contacted?
Sean Lucey
#### Data steward (can be POC; in charge of keeping data updated)
Sean Lucey
References
<!-- References must be submitted in BibTeX format. When compiling this document, they will appear in this section. There is no need to write them out here. Don't forget to include the name of your .bib file in the YAML header (fill in XYZ.bib in YAML header)!!!!!-->