Package 'RDS' reference manual

Title:	Respondent-Driven Sampling
Description:	Provides functionality for carrying out estimation with data collected using Respondent-Driven Sampling. This includes Heckathorn's RDS-I and RDS-II estimators as well as Gile's Sequential Sampling estimator. The package is part of the "RDS Analyst" suite of packages for the analysis of respondent-driven sampling data. See Gile and Handcock (2010) <doi:10.1111/j.1467-9531.2010.01223.x>, Gile and Handcock (2015) <doi:10.1111/rssa.12091> and Gile, Beaudry, Handcock and Ott (2018) <doi:10.1146/annurev-statistics-031017-100704>.
Authors:	Mark S. Handcock [aut, cre] , Krista J. Gile [aut], Ian E. Fellows [aut], W. Whipple Neely [ctb]
Maintainer:	Mark S. Handcock <[email protected]>
License:	LGPL-2.1
Version:	0.9-10
Built:	2025-02-04 03:55:11 UTC
Source:	https://github.com/cran/RDS

indexing

Description

indexing

Usage

## S3 method for class 'rds.data.frame'
x[i, j, ..., drop, warn = TRUE]
## S3 method for class 'rds.data.frame'
x[i, j, ..., drop, warn = TRUE]

Arguments

`x`	object
`i`	indices
`j`	indices
`...`	unused
`drop`	drop
`warn`	Warn if any new seeds are created

Details

Subsetting of RDS recruitment trees does not always yield a full RDS tree. In this case, subjects whose recruiter is no longer in the dataset are considered seeds. is issued if the 'warn' parameter is TRUE. dat <- data.frame(id=c(1,2,3,4,5), recruiter.id=c(2,-1,2,-1,4), network.size.variable=c(4,8,8,2,3)) r <- as.rds.data.frame(dat) r[1:3,] # A valid pruning of the RDS tree. r[c(1,5),warn=FALSE] # recruiter.id of last row set to -1 (i.e. a seed) to maintain validity of tree

indexing

Description

indexing

Usage

## S3 replacement method for class 'rds.data.frame'
 x[i, j] <- value
## S3 replacement method for class 'rds.data.frame'
 x[i, j] <- value

Arguments

`x`	object
`i`	indices
`j`	indices
`value`	value

Details

Indexed assignment. If the result is not a valid rds.data.frame, an error is emitted.

converts to character with minimal loss of precision for numeric variables

Description

converts to character with minimal loss of precision for numeric variables

Usage

as.char(x, ...)
as.char(x, ...)

Arguments

`x`	the value
`...`	passed to either format or as.character.

Coerces a data.frame object into an rds.data.frame object.

Description

This function converts a regular R data frame into an rds.data.frame. The greatest advantage of this is that it performs integrity checks and will fail if the recruitment information in the original data frame is incomplete.

Usage

as.rds.data.frame(
  df,
  id = if (is.null(attr(df, "id"))) "id" else attr(df, "id"),
  recruiter.id = if (is.null(attr(df, "recruiter.id"))) {
     "recruiter.id"
 } else
    attr(df, "recruiter.id"),
  network.size = if (is.null(attr(df, "network.size.variable"))) {
    
    "network.size.variable"
 } else attr(df, "network.size.variable"),
  population.size = if (all(is.na(get.population.size(df, FALSE)))) {
     NULL
 } else
    get.population.size(df, FALSE),
  max.coupons = if (is.null(attr(df, "max.coupons"))) {
     NULL
 } else attr(df,
    "max.coupons"),
  notes = if (is.null(attr(df, "notes"))) {
     NULL
 } else attr(df, "time"),
  time = if (is.null(attr(df, "time"))) {
     NULL
 } else attr(df, "time"),
  check.valid = TRUE
)
as.rds.data.frame(
  df,
  id = if (is.null(attr(df, "id"))) "id" else attr(df, "id"),
  recruiter.id = if (is.null(attr(df, "recruiter.id"))) {
     "recruiter.id"
 } else
    attr(df, "recruiter.id"),
  network.size = if (is.null(attr(df, "network.size.variable"))) {
    
    "network.size.variable"
 } else attr(df, "network.size.variable"),
  population.size = if (all(is.na(get.population.size(df, FALSE)))) {
     NULL
 } else
    get.population.size(df, FALSE),
  max.coupons = if (is.null(attr(df, "max.coupons"))) {
     NULL
 } else attr(df,
    "max.coupons"),
  notes = if (is.null(attr(df, "notes"))) {
     NULL
 } else attr(df, "time"),
  time = if (is.null(attr(df, "time"))) {
     NULL
 } else attr(df, "time"),
  check.valid = TRUE
)

Arguments

`df`	A data.frame representing an RDS sample.
`id`	The unique identifier.
`recruiter.id`	The unique identifier of the recruiter of this row.
`network.size`	The number of alters (i.e. possible recruitees).
`population.size`	The size of the population from which this RDS sample has been drawn. Either a single number, or a vector of length three indicating low, mid and high estimates.
`max.coupons`	The number of recruitment coupons distributed to each enrolled subject (i.e. the maximum number of recruitees for any subject).
`notes`	Data set notes.
`time`	the name of the recruitment time variable. optional.
`check.valid`	If true, validity checks are performed to ensure that the data is well formed.

Value

An rds.data.frame object

Examples

dat <- data.frame(id=c(1,2,3,4,5), recruiter.id=c(2,-1,2,-1,4),
                  network.size.variable=c(4,8,8,2,3))
as.rds.data.frame(dat)

dat <- data.frame(id=c(1,2,3,4,5), recruiter.id=c(2,-1,2,-1,4),
                  network.size.variable=c(4,8,8,2,3))
as.rds.data.frame(dat)

Does various checks and throws errors if x is not a valid rds.data.frame

Description

Does various checks and throws errors if x is not a valid rds.data.frame

Usage

assert.valid.rds.data.frame(x, ...)
assert.valid.rds.data.frame(x, ...)

Arguments

`x`	an rds.data.frame
`...`	unused

Details

Throws an informative message if x is malformed.

Performs a bootstrap test of independance between two categorical variables

Description

Performs a bootstrap test of independance between two categorical variables

Usage

bootstrap.contingency.test(
  rds.data,
  row.var,
  col.var,
  number.of.bootstrap.samples = 1000,
  weight.type = c("HCG", "RDS-II", "Arithmetic Mean"),
  table.only = FALSE,
  verbose = TRUE,
  ...
)
bootstrap.contingency.test(
  rds.data,
  row.var,
  col.var,
  number.of.bootstrap.samples = 1000,
  weight.type = c("HCG", "RDS-II", "Arithmetic Mean"),
  table.only = FALSE,
  verbose = TRUE,
  ...
)

Arguments

`rds.data`	an rds.data.frame
`row.var`	the name of the first categorical variable
`col.var`	the name of the second categorical variable
`number.of.bootstrap.samples`	The number of simulated boootstrap populations
`weight.type`	The type of weighting to use for the contningency table. Only large sample methods are allowed.
`table.only`	only returns the weighted table, without bootstrap.
`verbose`	level of output
`...`	Additional parameters for compute_weights

Details

This function first estimates a Homophily Configuration Graph model for the underlying network under the assumption that the two variables are independant and that the population size is large. It then draws bootstrap RDS samples from this population distribution and calculates the chi.squared statistic on the weighted contingency table. Weights are calculated using the HCG estimator assuming a large population size.

Examples

data(faux)
bootstrap.contingency.test(rds.data=faux, row.var="X", col.var="Y",
  number.of.bootstrap.samples=50, verbose=FALSE)
data(faux)
bootstrap.contingency.test(rds.data=faux, row.var="X", col.var="Y",
  number.of.bootstrap.samples=50, verbose=FALSE)

Calculates incidence and bootstrap confidence intervals for immunoassay data collected with RDS

Description

Calculates incidence and bootstrap confidence intervals for immunoassay data collected with RDS

Usage

bootstrap.incidence(
  rds.data,
  recent.variable,
  hiv.variable,
  N = NULL,
  weight.type = c("Gile's SS", "RDS-I", "RDS-I (DS)", "RDS-II", "Arithmetic Mean", "HCG"),
  mean.duration = 200,
  frr = 0.01,
  post.infection.cutoff = 730,
  number.of.bootstrap.samples = 1000,
  se.mean.duration = 0,
  se.frr = 0,
  confidence.level = 0.95,
  verbose = TRUE,
  ...
)
bootstrap.incidence(
  rds.data,
  recent.variable,
  hiv.variable,
  N = NULL,
  weight.type = c("Gile's SS", "RDS-I", "RDS-I (DS)", "RDS-II", "Arithmetic Mean", "HCG"),
  mean.duration = 200,
  frr = 0.01,
  post.infection.cutoff = 730,
  number.of.bootstrap.samples = 1000,
  se.mean.duration = 0,
  se.frr = 0,
  confidence.level = 0.95,
  verbose = TRUE,
  ...
)

Arguments

`rds.data`	an rds.data.frame
`recent.variable`	The name of the variable indicating recent infection
`hiv.variable`	The name of the variable indicating of hiv infection
`N`	Population size
`weight.type`	A string giving the type of estimator to use. The options are `"Gile's SS"`, `"RDS-I"`, `"RDS-II"`, `"RDS-I/DS"`, and `"Arithemic Mean"`. It defaults to `"Gile's SS"`.
`mean.duration`	Estimated mean duration of recent infection (MDRI) (days)
`frr`	Estimated false-recent rate (FRR)
`post.infection.cutoff`	Post-infection time cut-off T, separating "true-recent" from "false-recent" results (days)
`number.of.bootstrap.samples`	The number of bootstrap samples used to construct the interval.
`se.mean.duration`	The standard error of the mean.duration estimate
`se.frr`	The standard error of the false recency estimate
`confidence.level`	The level of confidence for the interval
`verbose`	verbosity control
`...`	additional arguments to compute.weights

Details

The recent.variable and hiv should be the names of logical variables. Otherwise they are converted to logical using as.numeric(x) > 0.5.

This function estimates incidence using RDS sampling wieghts. Confidence intervals are constucted using HCG bootstraps. See http://www.incidence-estimation.org/ for additional information on (non-RDS) incidence estimation.

Examples

data(faux)
faux$hiv <- faux$X == "blue"
faux$recent <- NA
faux$recent[faux$hiv] <- runif(sum(faux$hiv)) < .2
faux$recent[runif(nrow(faux)) > .5] <- NA
faux$hiv[is.na(faux$recent)][c(1,6,10,21)] <- NA
attr(faux,"time") <- "wave"
bootstrap.incidence(faux,"recent","hiv",weight.type="RDS-II", number.of.bootstrap.samples=100)
data(faux)
faux$hiv <- faux$X == "blue"
faux$recent <- NA
faux$recent[faux$hiv] <- runif(sum(faux$hiv)) < .2
faux$recent[runif(nrow(faux)) > .5] <- NA
faux$hiv[is.na(faux$recent)][c(1,6,10,21)] <- NA
attr(faux,"time") <- "wave"
bootstrap.incidence(faux,"recent","hiv",weight.type="RDS-II", number.of.bootstrap.samples=100)

Bottleneck Plot

Description

Bottleneck Plot

Usage

bottleneck.plot(
  rds.data,
  outcome.variable,
  est.func = RDS.II.estimates,
  as.factor = FALSE,
  n.eval.points = 25,
  ...
)
bottleneck.plot(
  rds.data,
  outcome.variable,
  est.func = RDS.II.estimates,
  as.factor = FALSE,
  n.eval.points = 25,
  ...
)

Arguments

`rds.data`	An rds.data.frame.
`outcome.variable`	A character vector of outcome variables.
`est.func`	A function taking rds.data and outcome.variable as parameters and returning an rds.weighted.estimate object.
`as.factor`	Convert all outcome variables to factors
`n.eval.points`	number of evaluation points to calculate the estimates at
`...`	additional parameters for est.func.

References

Krista J. Gile, Lisa G. Johnston, Matthew J. Salganik Diagnostics for Respondent-driven Sampling eprint arXiv:1209.6254, 2012

Examples

data(fauxmadrona)
bottleneck.plot(fauxmadrona,"disease")
data(fauxmadrona)
bottleneck.plot(fauxmadrona,"disease")

Compute estimates of the sampling weights of the respondent's observations based on various estimators

Description

Compute estimates of the sampling weights of the respondent's observations based on various estimators

Usage

compute.weights(
  rds.data,
  weight.type = c("Gile's SS", "RDS-I", "RDS-I (DS)", "RDS-II", "Arithmetic Mean", "HCG"),
  N = NULL,
  subset = NULL,
  control = control.rds.estimates(),
  ...
)
compute.weights(
  rds.data,
  weight.type = c("Gile's SS", "RDS-I", "RDS-I (DS)", "RDS-II", "Arithmetic Mean", "HCG"),
  N = NULL,
  subset = NULL,
  control = control.rds.estimates(),
  ...
)

Arguments

`rds.data`	An `rds.data.frame` that indicates recruitment patterns by a pair of attributes named “id” and “recruiter.id”.
`weight.type`	A string giving the type of estimator to use. The options are `"Gile's SS"`, `"RDS-I"`, `"RDS-II"`, `"RDS-I/DS"`, and `"Arithemic Mean"`. It defaults to `"Gile's SS"`.
`N`	An estimate of the number of members of the population being sampled. If `NULL` it is read as the `population.size.mid` attribute of the `rds.data` frame. If that is missing, the weights will sum to 1. Note that this parameter is required for Gile's SS.
`subset`	An optional criterion to subset `rds.data` by. It is an R expression which, when evaluated, subset the data. In plain English, it can be something like `subset = seed > 0` to exclude seeds. It can also be the name of a logical vector of the same length of the outcome variable where TRUE means include it in the analysis. If `NULL` then no subsetting is done.
`control`	A list of control parameters for algorithm tuning. Constructed using `control.rds.estimates`.
`...`	Additional parameters passed to the individual weighting algorithms.

Value

A vector of weights for each of the respondents. It is of the same size as the number of rows in rds.data.

Named element accessor for ergm control lists

Description

Utility method that overrides the standard ‘$’ list accessor to disable partial matching for ergm control.list objects

Usage

## S3 method for class 'control.list'
object$name
## S3 method for class 'control.list'
object$name

Arguments

`object`	list-coearceable object with elements to be searched
`name`	literal character name of list element to search for and return

Details

Executes getElement instead of $ so that element names must match exactly to be returned and partially matching names will not return the wrong object.

Value

Returns the named list element exactly matching name, or NULL if no matching elements found

Author(s)

Pavel N. Krivitsky

Auxiliary for Controlling RDS.bootstrap.intervals

Description

Auxiliary function as user interface for fine-tuning RDS.bootstrap.intervals algorithm, which computes interval estimates for via bootstrapping.

Usage

control.rds.estimates(
  confidence.level = 0.95,
  SS.infinity = 0.01,
  lowprevalence = c(8, 14),
  discrete.cutoff = 0.8,
  useC = TRUE,
  number.of.bootstrap.samples = NULL,
  hcg.reltol = sqrt(.Machine$double.eps),
  hcg.BS.reltol = 1e+05 * sqrt(.Machine$double.eps),
  hcg.max.optim = 500,
  seed = NULL
)
control.rds.estimates(
  confidence.level = 0.95,
  SS.infinity = 0.01,
  lowprevalence = c(8, 14),
  discrete.cutoff = 0.8,
  useC = TRUE,
  number.of.bootstrap.samples = NULL,
  hcg.reltol = sqrt(.Machine$double.eps),
  hcg.BS.reltol = 1e+05 * sqrt(.Machine$double.eps),
  hcg.max.optim = 500,
  seed = NULL
)

Arguments

`confidence.level`	The confidence level for the confidence intervals. The default is 0.95 for 95%.
`SS.infinity`	The sample proportion, `n/N`, below which the computation of the SS weights should simplify to that of the `RDS-II` weights.
`lowprevalence`	Standard confidence interval procedures can be inaccurate when the outcome expected count is close to zero. This sets conditions where alternatives to the standard are used for the `ci.type="hmg"` option. See Details for its use.
`discrete.cutoff`	The minimum proportion of the values of the outcome variable that need to be unique before the variable is judged to be continuous.
`useC`	Use a C-level implementation of Gile's bootstrap (rather than the R level). The implementations should be computational equivalent (except for speed).
`number.of.bootstrap.samples`	The number of bootstrap samples to take in estimating the uncertainty of the estimator. If `NULL` it defaults to the number necessary to compute the standard error to accuracy 0.001.
`hcg.reltol`	Relative convergence tolerance for the HCG estimator. The algorithm stops if it is unable to reduce the log-likelihood by a factor of `reltol * (abs(log-likelihood) + reltol)` at a step. Defaults to `sqrt(.Machine$double.eps)`, typically about `1e-8`.
`hcg.BS.reltol`	Relative convergence tolerance for the bootstrap of the HCG estimator. It has the same interpretation as `hcg.reltol` except it is applied to each bootstrap sample. It is typically the same or larger than `hcg.reltol`.
`hcg.max.optim`	The number of iterations on the likelihood optimization for the HCG estimator.
`seed`	Seed value (integer) for the random number generator. See `set.seed`

Details

This function is only used within a call to the RDS.bootstrap.intervals function.

Some of the arguments are not yet fully implemented. It will evolve slower to incorporate more arguments as the package develops.

Standard confidence interval procedures can be inaccurate when the outcome expected count is close to zero. In these cases the combined Agresti-Coull and the bootstrap-t interval of Mantalos and Zografos (2008) can be used. The lowprevalence argument is a two vector parameter setting the conditions under which the approximation is used. The first is the penalty term on the differential activity. If the observed number of the rare group minus the product of the first parameter and the differential activity is lower than the second parameter, the low prevalence approximation is used.

Value

A list with arguments as components.

Convergence Plots

Description

This function creates diagnostic convergence plots for RDS estimators.

Usage

convergence.plot(
  rds.data,
  outcome.variable,
  est.func = RDS.II.estimates,
  as.factor = FALSE,
  n.eval.points = 25,
  ...
)
convergence.plot(
  rds.data,
  outcome.variable,
  est.func = RDS.II.estimates,
  as.factor = FALSE,
  n.eval.points = 25,
  ...
)

Arguments

`rds.data`	An rds.data.frame.
`outcome.variable`	A character vector of outcome variables.
`est.func`	A function taking rds.data and outcome.variable as parameters and returning an rds.weighted.estimate object.
`as.factor`	Convert all outcome variables to factors
`n.eval.points`	number of evaluation points to calculate the estimates at
`...`	additional parameters for est.func.

References

Krista J. Gile, Lisa G. Johnston, Matthew J. Salganik Diagnostics for Respondent-driven Sampling eprint arXiv:1209.6254, 2012

Examples

data(faux)
convergence.plot(faux,c("X","Y"))
data(faux)
convergence.plot(faux,c("X","Y"))

Counts the number or recruiter->recruitee transitions between different levels of the grouping variable.

Description

Counts the number or recruiter->recruitee transitions between different levels of the grouping variable.

Usage

count.transitions(rds.data, group.variable)
count.transitions(rds.data, group.variable)

Arguments

`rds.data`	An rds.data.frame
`group.variable`	The name of a categorical variable in rds.data

Examples

data(faux)
count.transitions(faux,"X")
data(faux)
count.transitions(faux,"X")

Calculates estimates at each successive wave of the sampling process

Description

Calculates estimates at each successive wave of the sampling process

Usage

cumulative.estimate(
  rds.data,
  outcome.variable,
  est.func = RDS.II.estimates,
  n.eval.points = 25,
  ...
)
cumulative.estimate(
  rds.data,
  outcome.variable,
  est.func = RDS.II.estimates,
  n.eval.points = 25,
  ...
)

Arguments

`rds.data`	An rds.data.frame
`outcome.variable`	The outcome
`est.func`	A function taking rds.data and outcome.variable as parameters and returning an rds.weighted.estimate object
`n.eval.points`	number of evaluation points to calculate the estimates at
`...`	additional parameters for est.func

Differential Activity between groups

Description

Differential Activity between groups

Usage

differential.activity.estimates(
  rds.data,
  outcome.variable,
  weight.type = "Gile's SS",
  N = NULL,
  subset = NULL,
  ...
)
differential.activity.estimates(
  rds.data,
  outcome.variable,
  weight.type = "Gile's SS",
  N = NULL,
  subset = NULL,
  ...
)

Arguments

`rds.data`	An rds.data.frame object
`outcome.variable`	A character string of column names representing categorical variables.
`weight.type`	A string giving the type of estimator to use. The options are `"Gile's SS"`, `"RDS-I"`, `"RDS-II"`, `"RDS-I/DS"`, and `"Arithemic Mean"`. It defaults to `"Gile's SS"`.
`N`	The population size.
`subset`	An expression defining a subset of rds.data.
`...`	Additional parameters passed to compute.weights.

Details

This function estimates the ratio of the average degree of one population group divided by the average degree of those in another population group.

Examples

data(faux)
differential.activity.estimates(faux,"X",weight.type="RDS-II")
data(faux)
differential.activity.estimates(faux,"X",weight.type="RDS-II")

Convert the output of print.rds.interval.estimate from a character data.frame to a numeric matrix

Description

Convert the output of print.rds.interval.estimate from a character data.frame to a numeric matrix

Usage

export.rds.interval.estimate(x, proportion = TRUE)
export.rds.interval.estimate(x, proportion = TRUE)

Arguments

`x`	An object, typically the result of print.rds.interval.estimate.
`proportion`	logical, Should the outcome be treated as a proportion and converted to a percentage.

A Simulated RDS Data Set

Description

This is a faux set used to demonstrate RDS functions and analysis. It is used is some simple examples and has categorical variables "X", "Y" and "Z".

Format

An rds.data.frame object

References

Gile, Krista J., Handcock, Mark S., 2010 Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327.

Examples

data(faux)
RDS.I.estimates(rds.data=faux,outcome.variable='X')
data(faux)
RDS.I.estimates(rds.data=faux,outcome.variable='X')

A Simulated RDS Data Set with no seed dependency

Description

This is a faux set used to illustrate how the estimators perform under different populations and RDS schemes.

Format

An rds.data.frame

Details

The population had N=1000 nodes. In this case, the sample size is 500 so that there is a relatively small sample fraction (50%). There is homophily on disease status (R=5) and there is differential activity by disease status whereby the infected nodes have mean degree twice that of the uninfected (w=1.8).

In the sampling, the seeds are chosen randomly from the full population, so there is no dependency induced by seed selection.

Each sample member is given 2 uniquely identified coupons to distribute to other members of the target population in their acquaintance. Further each respondent distributes their coupons completely at random from among those they are connected to.

Here are the results for this data set and the sister fauxsycamore data set:

Name	City	Type	Mean	RDS I (SH)	RDS II (VH)	SS
fauxsycamore	Oxford	seed dependency, 70%	0.2408	0.1087	0.1372	0.1814
fauxmadrona	Seattle	no seed dependency, 50%	0.2592	0.1592	0.1644	0.1941

Even with only 50% sample, the VH is substantially biased , and the SS does much better.

Source

The original network is included as fauxmadrona.network as a network object.
The data set also includes the data.frame of the RDS data set as fauxmadrona.
Use data(package="RDS") to get a full list of datasets.

References

Gile, Krista J., Handcock, Mark S., 2010 Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327.

A Simulated RDS Data Set with extreme seed dependency

Description

This is a faux set used to demonstrate RDS functions and analysis. The population had N=715 nodes. In this case, the sample size is 500 so that there is a relatively large sample fraction (70%). There is homophily on disease status (R=5) and there is differential activity by disease status whereby the infected nodes have mean degree twice that of the uninfected (w=1.8).

Format

An rds.data.frame plus the original network as a network object

Details

In the sampling the seeds are chosen randomly from the infected population, so there is extreme dependency induced by seed selection.

With 70% sample, the VH is substantially biased, so the SS (and presumably MA) do much better. We expect the MA to perform a bit better than the SS.

It is network 702 and its sample from YesYes on mosix. Look for "extract702.R"
The original network is included as fauxsycamore.network as a network object.
The data set also includes the data.frame of the RDS data set as fauxsycamore.
Use data(package="RDS") to get a full list of datasets.

References

Gile, Krista J., Handcock, Mark S., 2009. Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327.

A Simulated RDS Data Set

Description

This is a faux set used to demonstrate RDS functions and analysis.

Format

An rds.data.frame object

References

Gile, Krista J., Handcock, Mark S., 2010 Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327.

Get Horvitz-Thompson estimator assuming inclusion probability proportional to the inverse of network.var (i.e. degree).

Description

Get Horvitz-Thompson estimator assuming inclusion probability proportional to the inverse of network.var (i.e. degree).

Usage

get.h.hat(
  rds.data,
  group.variable,
  network.var = attr(rds.data, "network.size")
)
get.h.hat(
  rds.data,
  group.variable,
  network.var = attr(rds.data, "network.size")
)

Arguments

`rds.data`	An rds.data.from
`group.variable`	The grouping variable.
`network.var`	The network.size variable.

Get the subject id

Description

Get the subject id

Usage

get.id(x, check.type = TRUE)
get.id(x, check.type = TRUE)

Arguments

`x`	an rds.data.frame object
`check.type`	if true, x is required to be of type rds.data.frame

Details

returns the variable indicated by the 'id' attribute, coercing to a character vector

Returns the network size of each subject (i.e. their degree).

Description

Returns the network size of each subject (i.e. their degree).

Usage

get.net.size(x, check.type = TRUE)
get.net.size(x, check.type = TRUE)

Arguments

`x`	the rds.data.frame
`check.type`	if true, x is required to be of type rds.data.frame

Calculates the number of (direct) recuits for each respondent.

Description

Calculates the number of (direct) recuits for each respondent.

Usage

get.number.of.recruits(data)
get.number.of.recruits(data)

Arguments

data

An rds.data.frame

Examples

data(fauxmadrona)
nr <- get.number.of.recruits(fauxmadrona)
#frequency of number recruited by each id
barplot(table(nr))
data(fauxmadrona)
nr <- get.number.of.recruits(fauxmadrona)
#frequency of number recruited by each id
barplot(table(nr))

Returns the population size associated with the data.

Description

Returns the population size associated with the data.

Usage

get.population.size(x, check.type = TRUE)
get.population.size(x, check.type = TRUE)

Arguments

`x`	the rds.data.frame
`check.type`	if true, x is required to be of type rds.data.frame

Returns the recruitment time for each subject

Description

Returns the recruitment time for each subject

Usage

get.recruitment.time(
  x,
  to.numeric = TRUE,
  wave.fallback = FALSE,
  check.type = TRUE
)
get.recruitment.time(
  x,
  to.numeric = TRUE,
  wave.fallback = FALSE,
  check.type = TRUE
)

Arguments

`x`	the rds.data.frame
`to.numeric`	if true, time will be converted into a numeric variable.
`wave.fallback`	if true, subjects' recruitment times are ordered by wave and then by data.frame index if no recruitment time variable is available.
`check.type`	if true, x is required to be of type rds.data.frame

Get recruiter id

Description

Get recruiter id

Usage

get.rid(x, check.type = TRUE)
get.rid(x, check.type = TRUE)

Arguments

`x`	an rds.data.frame object
`check.type`	if true, x is required to be of type rds.data.frame

Details

returns the variable indicated by the 'recruiter.id' attribute, coercing to a character vector

Calculates the root seed id for each node of the recruitement tree.

Description

Calculates the root seed id for each node of the recruitement tree.

Usage

get.seed.id(data)
get.seed.id(data)

Arguments

data

An rds.data.frame

Examples

data(fauxmadrona)
seeds <- get.seed.id(fauxmadrona)
#number recruited by each seed
barplot(table(seeds))
data(fauxmadrona)
seeds <- get.seed.id(fauxmadrona)
#number recruited by each seed
barplot(table(seeds))

Gets the recruiter id associated with the seeds

Description

Gets the recruiter id associated with the seeds

Usage

get.seed.rid(x, check.type = TRUE)
get.seed.rid(x, check.type = TRUE)

Arguments

`x`	an rds.data.frame object
`check.type`	if true, x is required to be of type rds.data.frame

Details

All seed nodes must have the same placeholder recruiter id.

Markov chain statistionary distribution

Description

Markov chain statistionary distribution

Usage

get.stationary.distribution(mle)
get.stationary.distribution(mle)

Arguments

mle

The transition probabilities

Value

A vector of proportions representing the proportion in each group at the stationary distribution of the Markov chain.

Calculates the depth of the recruitment tree (i.e. the recruitment wave) at each node.

Description

Calculates the depth of the recruitment tree (i.e. the recruitment wave) at each node.

Usage

get.wave(data)
get.wave(data)

Arguments

data

An rds.data.frame

Examples

data(fauxmadrona)
#number subjects in each wave
w <- get.wave(fauxmadrona)
#number recruited in each wave
barplot(table(w))
data(fauxmadrona)
#number subjects in each wave
w <- get.wave(fauxmadrona)
#number recruited in each wave
barplot(table(w))

Weights using Giles SS estimator

Description

Weights using Giles SS estimator

Usage

gile.ss.weights(
  degs,
  N,
  number.ss.samples.per.iteration = 500,
  number.ss.iterations = 5,
  hajek = TRUE,
  SS.infinity = 0.04,
  se = FALSE,
  ...
)
gile.ss.weights(
  degs,
  N,
  number.ss.samples.per.iteration = 500,
  number.ss.iterations = 5,
  hajek = TRUE,
  SS.infinity = 0.04,
  se = FALSE,
  ...
)

Arguments

`degs`	subjects' degrees (i.e. network sizes).
`N`	Population size estimate.
`number.ss.samples.per.iteration`	The number of samples to use to estimate inclusion probabilities in a probability proportional to size without replacement design.
`number.ss.iterations`	number of iterations to use in giles SS algorithm.
`hajek`	Should the hajek estiamtor be used. If false, the HT estimator is used.
`SS.infinity`	The sample proportion, `n/N`, below which the computation of the SS weights should simplify to that of the `RDS-II` weights.
`se`	Should covariances be included.
`...`	unused

RDS data.frame has recruitment time information

Description

RDS data.frame has recruitment time information

Usage

has.recruitment.time(x, check.type = TRUE)
has.recruitment.time(x, check.type = TRUE)

Arguments

`x`	the rds.data.frame
`check.type`	if true, x is required to be of type rds.data.frame

HCG parametric bootstrap replicate weights

Description

HCG parametric bootstrap replicate weights

Usage

hcg.replicate.weights(
  rds.data,
  outcome.variable,
  number.of.bootstrap.samples = 500,
  include.sample.weights = FALSE,
  N = NULL,
  small.fraction = FALSE
)
hcg.replicate.weights(
  rds.data,
  outcome.variable,
  number.of.bootstrap.samples = 500,
  include.sample.weights = FALSE,
  N = NULL,
  small.fraction = FALSE
)

Arguments

`rds.data`	An rds.data.frame
`outcome.variable`	The column name of the variable defining the groups for the homophily configuration graph
`number.of.bootstrap.samples`	The number of bootstrap replicate weights to be generated
`include.sample.weights`	If TRUE, the first column of the returned frame are the HCG weights for the sample
`N`	The population size
`small.fraction`	If TRUE, the sample size is assumed to be small compared to the population size

Details

This function generates bootstrap replicate weights which may be used to analyze RDS data in other packages or software systems (e.g. the survey package with svrepdesign).

Value

A data.frame of replicate weights. If include.sample.weights is TRUE, the first column are the HCG weights for the observed sample.

Examples

## Not run: 
data("fauxmadrona")
set.seed(1)
# Generate replicate weights
result <- hcg.replicate.weights(fauxmadrona, "disease", 50, TRUE)
# Analyze with survey package and compare to internal function
if(require(survey)){
  set.seed(1)
  design <- svrepdesign(fauxmadrona, type = "bootstrap", 
    weights= result[[1]], repweights = result[-1])
  svymean(~disease, design) |> print()
  RDS.bootstrap.intervals(fauxmadrona, "disease", "HCG", "HCG", 
  number.of.bootstrap.samples = 50) |> print()
}

## End(Not run)
## Not run: 
data("fauxmadrona")
set.seed(1)
# Generate replicate weights
result <- hcg.replicate.weights(fauxmadrona, "disease", 50, TRUE)
# Analyze with survey package and compare to internal function
if(require(survey)){
  set.seed(1)
  design <- svrepdesign(fauxmadrona, type = "bootstrap", 
    weights= result[[1]], repweights = result[-1])
  svymean(~disease, design) |> print()
  RDS.bootstrap.intervals(fauxmadrona, "disease", "HCG", "HCG", 
  number.of.bootstrap.samples = 50) |> print()
}

## End(Not run)

homophily configuration graph weights

Description

homophily configuration graph weights

Usage

hcg.weights(
  rds.data,
  outcome.variable,
  N = NULL,
  small.fraction = FALSE,
  reltol = sqrt(.Machine$double.eps),
  max.optim = 500,
  theta.start = NULL,
  weights.include.seeds = TRUE,
  ...
)
hcg.weights(
  rds.data,
  outcome.variable,
  N = NULL,
  small.fraction = FALSE,
  reltol = sqrt(.Machine$double.eps),
  max.optim = 500,
  theta.start = NULL,
  weights.include.seeds = TRUE,
  ...
)

Arguments

`rds.data`	An rds.data.frame
`outcome.variable`	The variable used to base the weights on.
`N`	Population size
`small.fraction`	should a small sample fraction be assumed
`reltol`	Relative convergence tolerance for the HCG estimator. The algorithm stops if it is unable to reduce the log-likelihood by a factor of `reltol * (abs(log-likelihood) + reltol)` at a step. Defaults to `sqrt(.Machine$double.eps)`, typically about `1e-8`.
`max.optim`	The number of iterations on the likelihood optimization for the HCG estimator.
`theta.start`	The initial value of theta used in the likelihood optimization for the HCG estimator. If NULL, the default, it is the margin of the table of counts for the transitions.
`weights.include.seeds`	logical Should the weights be computed including the influence of the seeds?
`...`	Unused

Examples

data(fauxtime)
hcg.weights(fauxtime,"var1",N=3000)
fauxtime$NETWORK[c(1,100,40,82,77)] <- NA
data(fauxtime)
hcg.weights(fauxtime,"var1",N=3000)
fauxtime$NETWORK[c(1,100,40,82,77)] <- NA

This function computes an estimate of the population homophily and the recruitment homophily based on a categorical variable.

Description

This function computes an estimate of the population homophily and the recruitment homophily based on a categorical variable.

Usage

homophily.estimates(
  rds.data,
  outcome.variable,
  weight.type = NULL,
  uncertainty = NULL,
  recruitment = FALSE,
  N = NULL,
  to.group0.variable = NULL,
  to.group1.variable = NULL,
  number.ss.samples.per.iteration = NULL,
  confidence.level = 0.95
)
homophily.estimates(
  rds.data,
  outcome.variable,
  weight.type = NULL,
  uncertainty = NULL,
  recruitment = FALSE,
  N = NULL,
  to.group0.variable = NULL,
  to.group1.variable = NULL,
  number.ss.samples.per.iteration = NULL,
  confidence.level = 0.95
)

Arguments

`rds.data`	An `rds.data.frame` that indicates recruitment patterns by a pair of attributes named “id” and “recruiter.id”.
`outcome.variable`	A string giving the name of the variable in the `rds.data` that contains a categorical or numeric variable to be analyzed.
`weight.type`	A string giving the type of estimator to use. The options are `"Gile's SS"`, `"RDS-I"`, `"RDS-II"`, `"RDS-I/DS"`, `"Good-Fellows"` and `"Arithemic Mean"`. If `NULL` it defaults to `"Gile's SS"`.
`uncertainty`	A string giving the type of uncertainty estimator to use. The options are `"Gile's SS"` and `"Salganik"`. This is usually determined by `weight.type` to be consistent with the estimator's origins (e.g., for `"Gile's SS"`, `"RDS-I"`, `"RDS-II"`, `"RDS-I/DS"`, and `"Arithemic Mean"`). Hence it's current functionality is limited. If `NULL` it defaults to `"Gile's SS"`.
`recruitment`	A logical indicating if the homophily in the recruitment chains should be computed also. The default is FALSE.
`N`	An estimate of the number of members of the population being sampled. If `NULL` it is read as the `population.size.mid` attribute of the `rds.data` frame. If that is missing it defaults to 1000.
`to.group0.variable`	The number in the network of each survey respondent who have group variable value 0. Usually this is not available. The default is to not use this variable.
`to.group1.variable`	The number in the network of each survey respondent who have group variable value 1. Usually this is not available. The default is to not use this variable.
`number.ss.samples.per.iteration`	The number of samples to take in estimating the inclusion probabilites in each iteration of the sequential sampling algorithm. If `NULL` it is read as the `number.ss.samples.per.iteration` attribute of `rds.data`. If that is missing it defaults to 5000.
`confidence.level`	The confidence level for the confidence intervals. The default is 0.95 for 95%.

Value

If outcome.variable is binary then the homophily estimate of 0 verses 1 is returned, otherwise a vector of differential homophily estimates is returned.

Recruitment Homophily

The recruitment homophily is a homophily measure for the recruitment process. It addresses the question: Do respondents differential recruit people like themselves? That is, the homophily on a variable in the recruitment chains. Take as an example infection status. In this case, it is the ratio of number of recruits that have the same infection status as their recruiter to the number we would expect if there was no homophily on infection status. The difference with the Population Homophily (see below) is that this is in the recruitment chain rather than the population of social ties. For example, of the recruitment homophily on infection status is about 1, we see little effect of recruitment homophily on infection status (as the numbers of homophilous pairs are close to what we would expect by chance).

Population Homophily

This is an estimate the homophily of a given variable in the underlying networked population. For example, consider HIV status. The population homophily is the homophily in the HIV status of two people who are tied in the underlying population social network (a “couple”). Specifically, the population homophily is the ratio of the expected number of HIV discordant couples absent homophily to the expected number of HIV discordant couples with the homophily. Hence larger values of population homophily indicate more homophily on HIV status. For example, a value of 1 means the couple are random with respect to HIV status. A value of 2 means there are twice as many HIV discordant couples as we would expect if there was no homophily in the population. This measure is meaningful across different levels of differential activity. As we do not see most of the population network, we estimate the population homophily from the RDS data. As an example, suppose the population homophily on HIV is 0.75 so there are 25% more HIV discordant couples than expected due to chance. So their is actually heterophily on HIV in the population. If the population homophily on sex is 1.1, there are 10% more same-sex couples than expected due to chance. Hence there is modest homophily on sex.

Author(s)

Mark S. Handcock with help from Krista J. Gile

References

Gile, Krista J., Handcock, Mark S., 2010, Respondent-driven Sampling: An Assessment of Current Methodology. Sociological Methodology 40, 285-327.

Examples

## Not run: 
data(fauxmadrona)
names(fauxmadrona)
#
# True value:
#
if(require(network)){
	a=as.sociomatrix(fauxmadrona.network)
	deg <- apply(a,1,sum)
	dis <- fauxmadrona.network \
	deg1 <- apply(a[dis==1,],1,sum)
	deg0 <- apply(a[dis==0,],1,sum)
	# differential activity
	mean(deg1)/ mean(deg0)
	p=mean(dis)
	N=1000
	# True homophily
	p*(1-p)*mean(deg0)*mean(deg1)*N/(mean(deg)*sum(a[dis==1,dis==0]))
}
# HT based estimators using the to.group information
data(fauxmadrona)
homophily.estimates(fauxmadrona,outcome.variable="disease",
  to.group0.variable="tonondiseased", to.group1.variable="todiseased",
  N=1000)
# HT based estimators not using the to.group information
homophily.estimates(fauxmadrona,outcome.variable="disease",
  N=1000,weight.type="RDS-II")

## End(Not run)
## Not run: 
data(fauxmadrona)
names(fauxmadrona)
#
# True value:
#
if(require(network)){
	a=as.sociomatrix(fauxmadrona.network)
	deg <- apply(a,1,sum)
	dis <- fauxmadrona.network \
	deg1 <- apply(a[dis==1,],1,sum)
	deg0 <- apply(a[dis==0,],1,sum)
	# differential activity
	mean(deg1)/ mean(deg0)
	p=mean(dis)
	N=1000
	# True homophily
	p*(1-p)*mean(deg0)*mean(deg1)*N/(mean(deg)*sum(a[dis==1,dis==0]))
}
# HT based estimators using the to.group information
data(fauxmadrona)
homophily.estimates(fauxmadrona,outcome.variable="disease",
  to.group0.variable="tonondiseased", to.group1.variable="todiseased",
  N=1000)
# HT based estimators not using the to.group information
homophily.estimates(fauxmadrona,outcome.variable="disease",
  N=1000,weight.type="RDS-II")

## End(Not run)

Imputes missing degree values

Description

Imputes missing degree values

Usage

impute.degree(
  rds.data,
  trait.variable = NULL,
  N = NULL,
  method = c("mean", "quantile"),
  quantile = 0.5,
  recruitment.lower.bound = TRUE,
  round.degree = TRUE
)
impute.degree(
  rds.data,
  trait.variable = NULL,
  N = NULL,
  method = c("mean", "quantile"),
  quantile = 0.5,
  recruitment.lower.bound = TRUE,
  round.degree = TRUE
)

Arguments

`rds.data`	an rds.data.frame
`trait.variable`	the name of the variable in rds.data to stratify the imputation by
`N`	population size
`method`	If mean, the weighted mean value is imputed, otherwize a quantile is used.
`quantile`	If method is "quantile", this is the quantile that is used. Defaults to median
`recruitment.lower.bound`	If TRUE, then for each individual, the degree is taken to be the minimum of the number of recruits plus one, and the reported degree
`round.degree`	Should degrees be integer rounded.

Details

This function imputes degree values using the weighted mean or quantile values of the non-missing degrees. Weights are calcualted using Gile's SS if N is not NULL, or RDS-II if it is. If a trait variable is specified, means and quantile are calculated within the levels of the trait variable

Examples

data(faux)
rds.data <- faux
rds.data$network.size[c(1,2,30,52,81,101,108,111)] <- NA
impute.degree(rds.data)
impute.degree(rds.data,trait.variable="X")
impute.degree(rds.data,trait.variable="X",method="quantile")
data(faux)
rds.data <- faux
rds.data$network.size[c(1,2,30,52,81,101,108,111)] <- NA
impute.degree(rds.data)
impute.degree(rds.data,trait.variable="X")
impute.degree(rds.data,trait.variable="X",method="quantile")

Estimates each person's personal visibility based on their self-reported degree and the number of their (direct) recruits. It uses the time the person was recruited as a factor in determining the number of recruits they produce.

Description

Estimates each person's personal visibility based on their self-reported degree and the number of their (direct) recruits. It uses the time the person was recruited as a factor in determining the number of recruits they produce.

Usage

impute.visibility(
  rds.data,
  max.coupons = NULL,
  type.impute = c("median", "distribution", "mode", "mean"),
  recruit.time = NULL,
  include.tree = FALSE,
  reflect.time = FALSE,
  parallel = 1,
  parallel.type = "PSOCK",
  interval = 10,
  burnin = 5000,
  mem.optimism.prior = NULL,
  df.mem.optimism.prior = 5,
  mem.scale.prior = 2,
  df.mem.scale.prior = 10,
  mem.overdispersion = 15,
  return.posterior.sample.visibilities = FALSE,
  verbose = FALSE
)
impute.visibility(
  rds.data,
  max.coupons = NULL,
  type.impute = c("median", "distribution", "mode", "mean"),
  recruit.time = NULL,
  include.tree = FALSE,
  reflect.time = FALSE,
  parallel = 1,
  parallel.type = "PSOCK",
  interval = 10,
  burnin = 5000,
  mem.optimism.prior = NULL,
  df.mem.optimism.prior = 5,
  mem.scale.prior = 2,
  df.mem.scale.prior = 10,
  mem.overdispersion = 15,
  return.posterior.sample.visibilities = FALSE,
  verbose = FALSE
)

Arguments

`rds.data`	An rds.data.frame
`max.coupons`	The number of recruitment coupons distributed to each enrolled subject (i.e. the maximum number of recruitees for any subject). By default it is taken by the attribute or data, else the maximum recorded number of coupons.
`type.impute`	The type of imputation based on the conditional distribution. It can be of type `distribution`,`mode`,`median`, or `mean` with the first , the default, being a random draw from the conditional distribution.
`recruit.time`	vector; An optional value for the data/time that the person was interviewed. It needs to resolve as a numeric vector with number of elements the number of rows of the data with non-missing values of the network variable. If it is a character name of a variable in the data then that variable is used. If it is NULL then the sequence number of the recruit in the data is used. If it is NA then the recruitment is not used in the model. Otherwise, the recruitment time is used in the model to better predict the visibility of the person.
`include.tree`	logical; If `TRUE`, augment the reported network size by the number of recruits and one for the recruiter (if any). This reflects a more accurate value for the visibility, but is not the self-reported degree. In particular, it typically produces a positive visibility (compared to a possibility zero self-reported degree).
`reflect.time`	logical; If `FALSE` then the `recruit.time` is the time before the end of the study (instead of the time since the survey started or chronological time).
`parallel`	count; the number of parallel processes to run for the Monte-Carlo sample. This uses MPI or PSOCK. The default is 1, that is not to use parallel processing.
`parallel.type`	The type of parallel processing to use. The options are "PSOCK" or "MPI". This requires the corresponding type to be installed. The default is "PSOCK".
`interval`	count; the number of proposals between sampled statistics.
`burnin`	count; the number of proposals before any MCMC sampling is done. It typically is set to a fairly large number.
`mem.optimism.prior`	scalar; A hyper parameter being the mean of the distribution of the optimism parameter.
`df.mem.optimism.prior`	scalar; A hyper parameter being the degrees-of-freedom of the prior for the optimism parameter. This gives the equivalent sample size that would contain the same amount of information inherent in the prior.
`mem.scale.prior`	scalar; A hyper parameter being the scale of the concentration of baseline negative binomial measurement error model.
`df.mem.scale.prior`	scalar; A hyper parameter being the degrees-of-freedom of the prior for the standard deviation of the dispersion parameter in the visibility model. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation.
`mem.overdispersion`	scalar; A parameter being the overdispersion of the negative binomial distribution that is the baseline for the measurement error model.
`return.posterior.sample.visibilities`	logical; If TRUE then return a matrix of dimension `samplesize` by `n` of posterior draws from the visibility distribution for those in the survey. The sample for the `i`th person is the `i`th column. The default is FALSE so that the vector of imputes defined by `type.impute` is returned.
`verbose`	logical; if this is `TRUE`, the program will print out additional

References

McLaughlin, Katherine R.; Johnston, Lisa G.; Jakupi, Xhevat; Gexha-Bunjaku, Dafina; Deva, Edona and Handcock, Mark S. (2023) Modeling the Visibility Distribution for Respondent-Driven Sampling with Application to Population Size Estimation, Annals of Applied Statistics, doi:10.1093/jrsssa/qnad031

Examples

## Not run: 
data(fauxmadrona)
# The next line fits the model for the self-reported personal
# network sizes and imputes the personal network sizes 
# It may take up to 60 seconds.
visibility <- impute.visibility(fauxmadrona)
# frequency of estimated personal visibility
table(visibility)

## End(Not run)
## Not run: 
data(fauxmadrona)
# The next line fits the model for the self-reported personal
# network sizes and imputes the personal network sizes 
# It may take up to 60 seconds.
visibility <- impute.visibility(fauxmadrona)
# frequency of estimated personal visibility
table(visibility)

## End(Not run)

Estimates each person's personal visibility based on their self-reported degree and the number of their (direct) recruits. It uses the time the person was recruited as a factor in determining the number of recruits they produce.

Description

Usage

impute.visibility_mle(
  rds.data,
  max.coupons = NULL,
  type.impute = c("distribution", "mode", "median", "mean"),
  recruit.time = NULL,
  include.tree = FALSE,
  unit.scale = NULL,
  unit.model = c("cmp", "nbinom"),
  optimism = FALSE,
  guess = NULL,
  reflect.time = TRUE,
  maxit = 100,
  K = NULL,
  verbose = TRUE
)
impute.visibility_mle(
  rds.data,
  max.coupons = NULL,
  type.impute = c("distribution", "mode", "median", "mean"),
  recruit.time = NULL,
  include.tree = FALSE,
  unit.scale = NULL,
  unit.model = c("cmp", "nbinom"),
  optimism = FALSE,
  guess = NULL,
  reflect.time = TRUE,
  maxit = 100,
  K = NULL,
  verbose = TRUE
)

Arguments

`rds.data`	An rds.data.frame
`max.coupons`	The number of recruitment coupons distributed to each enrolled subject (i.e. the maximum number of recruitees for any subject). By default it is taken by the attribute or data, else the maximum recorded number of coupons.
`type.impute`	The type of imputation based on the conditional distribution. It can be of type `distribution`,`mode`,`median`, or `mean` with the first , the default, being a random draw from the conditional distribution.
`recruit.time`	vector; An optional value for the data/time that the person was interviewed. It needs to resolve as a numeric vector with number of elements the number of rows of the data with non-missing values of the network variable. If it is a character name of a variable in the data then that variable is used. If it is NULL then the sequence number of the recruit in the data is used. If it is NA then the recruitment is not used in the model. Otherwise, the recruitment time is used in the model to better predict the visibility of the person.
`include.tree`	logical; If `TRUE`, augment the reported network size by the number of recruits and one for the recruiter (if any). This reflects a more accurate value for the visibility, but is not the self-reported degree. In particular, it typically produces a positive visibility (compared to a possibility zero self-reported degree).
`unit.scale`	numeric; If not `NULL` it sets the numeric value of the scale parameter of the distribution of the unit sizes. For the negative binomial, it is the multiplier on the variance of the negative binomial compared to a Poisson (via the Poisson-Gamma mixture representation). Sometimes the scale is unnaturally large (e.g. 40) so this give the option of fixing it (rather than using the MLE of it). The model is fit with the parameter fixed at this passed value.
`unit.model`	The type of distribution for the unit sizes. It can be of `nbinom`, meaning a negative binomial. In this case, `unit.scale` is the multiplier on the variance of the negative binomial compared to a Poisson of the same mean. The alternative is `cmp`, meaning a Conway-Maxwell-Poisson distribution. In this case, `unit.scale` is the scale parameter compared to a Poisson of the same mean (values less than one mean under-dispersed and values over one mean over-dispersed). The default is `cmp`.
`optimism`	logical; If `TRUE` then add a term to the model allowing the (proportional) inflation of the self-reported degrees relative to the unit sizes.
`guess`	vector; if not `NULL`, the initial parameter values for the MLE fitting.
`reflect.time`	logical; If `FALSE` then the `recruit.time` is the time before the end of the study (instead of the time since the survey started or chronological time).
`maxit`	integer; The maximum number of iterations in the likelihood maximization. By default it is 100.
`K`	integer; The maximum degree. All self-reported degrees above this are recorded as being at least K. By default it is the 95th percentile of the self-reported network sizes.
`verbose`	logical; if this is `TRUE`, the program will print out additional

References

McLaughlin, K.R., M.S. Handcock, and L.G. Johnston, 2015. Inference for the visibility distribution for respondent-driven sampling. In JSM Proceedings. Alexandria, VA: American Statistical Association. 2259-2267.

Examples

## Not run: 
data(fauxmadrona)
# The next line fits the model for the self-reported personal
# network sizes and imputes the personal network sizes 
# It may take up to 60 seconds.
visibility <- impute.visibility(fauxmadrona)
# frequency of estimated personal visibility
table(visibility)

## End(Not run)
## Not run: 
data(fauxmadrona)
# The next line fits the model for the self-reported personal
# network sizes and imputes the personal network sizes 
# It may take up to 60 seconds.
visibility <- impute.visibility(fauxmadrona)
# frequency of estimated personal visibility
table(visibility)

## End(Not run)

Is an instance of rds.data.frame

Description

Is an instance of rds.data.frame

Usage

is.rds.data.frame(x)
is.rds.data.frame(x)

Arguments

`x`	An object to be tested.

Is an instance of rds.interval.estimate

Description

Is an instance of rds.interval.estimate

Usage

is.rds.interval.estimate(x)
is.rds.interval.estimate(x)

Arguments

`x`	An object to be tested.

Is an instance of rds.interval.estimate.list This is a (typically time ordered) sequence of RDS estimates of a comparable quantity

Description

Is an instance of rds.interval.estimate.list This is a (typically time ordered) sequence of RDS estimates of a comparable quantity

Usage

is.rds.interval.estimate.list(x)
is.rds.interval.estimate.list(x)

Arguments

`x`	An object to be tested.

Compute a test of trend in prevalences based on a likelihood-ratio statistic

Description

This function takes a series of point estimates and their associated standard errors and computes the p-value for the test of a monotone decrease in the population prevalences (in sequence order). The p-value for a monotone increase is also reported. An optional plot of the estimates and the null distribution of the test statistics is provided. More formally, let the $K$ population prevalences in sequence order be $p_1, \ldots, p_K$ . We test the null hypothesis:

$H_0 : p_1 = \ldots = p_K$

$H_1 : p_1 \ge p_2 \ldots \ge p_K$

with at least one equality strict. The alternatie hypothesis is for a monotone decreasing trend. A likelihood ratio statistic for this test has been derived (Bartholomew 1959). The null distribution of the likelihood ratio statistic is very complex but can be determined by a simple Monte Carlo process.
Alternatively, we can test the null hypothesis:

$H_0 : p_1 \ge p_2 \ldots \ge p_K$

$H_1 : \overline{H_0}$

The null distribution of the likelihood ratio statistic is very complex but can be determined by a simple Monte Carlo process. In both cases we also test for:

$H : p_1 \le p_2 \ldots \le p_K$

that is, a monotonically increasing trend. The function requires the isotone library.

Usage

LRT.trend.test(
  data,
  variables = colnames(data),
  null = "monotone",
  confidence.level = 0.95,
  number.of.bootstrap.samples = 5000,
  plot = NULL,
  seed = 1
)
LRT.trend.test(
  data,
  variables = colnames(data),
  null = "monotone",
  confidence.level = 0.95,
  number.of.bootstrap.samples = 5000,
  plot = NULL,
  seed = 1
)

Arguments

`data`	A two row matrix or data.frame of prevalence estimates and their standard errors. The first row is the prevalence estimates and the second are the standard errors. The column are the comparison groups in the order (e.g., time) there are to be assessed. The row names of `data` should be "estimate" and "sigma". This is
`variables`	A character vector of column names it select from `data`.
`null`	A character string indicating the null hypothesis to use. The value `"monotone"` uses the various monotone hypotheses as the nulls. If not `"monotone"`, the null is chosen to be that of equality of the means over all periods.
`confidence.level`	The confidence level for the confidence intervals. The default is 0.95 for 95%.
`number.of.bootstrap.samples`	The number of Monte Carlo draws to determine the null distribution of the likelihood ratio statistic.
`plot`	A character vector of choices, a subset of `estimates`, `distributions`. If `estimates` is given then a plot of the estimates and nominal 95% confidence bands (as error bars) is produced. If `distributions` is given then a plot is produced of the null distributions of the likelihood ratio statistic with the observed likelihood ratio statistics plotted as a vertical dashed line.
`seed`	The value of the random number seed. Preset by default to allow reproducibility.

Value

A list with components

pvalue.increasing: The p-value for the test of a monotone increase in population prevalence.
pvalue.decreasing: The p-value for the test of a monotone decrease in population prevalence.
L: The value of the likelihood-ratio statistic.
x: The passed vector of prevalence estimates in the order (e.g., time).
sigma The passed vector of standard error estimates corresponding to x.

Author(s)

Mark S. Handcock

References

Bartholomew, D. J. (1959). A test of homogeneity for ordered alternatives. Biometrika 46 36-48.

Examples


d <- t(data.frame(estimate=c(0.16,0.15,0.3), sigma=c(0.04,0.04,0.1)))
colnames(d) <- c("time_1","time_2","time_3") 
LRT.trend.test(d,number.of.bootstrap.samples=1000)
d <- t(data.frame(estimate=c(0.16,0.15,0.3), sigma=c(0.04,0.04,0.1)))
colnames(d) <- c("time_1","time_2","time_3") 
LRT.trend.test(d,number.of.bootstrap.samples=1000)

Compute a test of trend in prevalences based on a likelihood-ratio statistic

Description

This function takes a series of point estimates and their associated standard errors and computes the p-value for the test of a monotone decrease in the population prevalences (in sequence order). The p-value for a monotone increase is also reported. More formally, let the $K$ population prevalences in sequence order be $p_1, \ldots, p_K$ . We test the null hypothesis:

$H_0 : p_1 = \ldots = p_K$

$H_1 : p_1 \ge p_2 \ldots \ge p_K$

with at least one equality strict. A likelihood ratio statistic for this test has been derived (Bartholomew 1959). The null distribution of the likelihood ratio statistic is very complex but can be determined by a simple Monte Carlo process.
We also test the null hypothesis:

$H_0 : p_1 \ge p_2 \ldots \ge p_K$

$H_1 : \overline{H_0}$

The null distribution of the likelihood ratio statistic is very complex but can be determined by a simple Monte Carlo process. The function requires the isotone library.

Usage

LRT.value.trend(x, sigma)
LRT.value.trend(x, sigma)

Arguments

`x`	A vector of prevalence estimates in the order (e.g., time).
`sigma`	A vector of standard error estimates corresponding to `x`.

Value

A list with components

pvalue.increasing: The p-value for the test of a monotone increase in population prevalence.
pvalue.decreasing: The p-value for the test of a monotone decrease in population prevalence.
L: The value of the likelihood-ratio statistic.
x: The passed vector of prevalence estimates in the order (e.g., time).
sigma The passed vector of standard error estimates corresponding to x.

Author(s)

Mark S. Handcock

References

Bartholomew, D. J. (1959). A test of homogeneity for ordered alternatives. Biometrika 46 36-48.

Examples


## Not run: 
x <- c(0.16,0.15,0.3)
sigma <- c(0.04,0.04,0.1)
LRT.value.trend(x,sigma)

## End(Not run)
## Not run: 
x <- c(0.16,0.15,0.3)
sigma <- c(0.04,0.04,0.1)
LRT.value.trend(x,sigma)

## End(Not run)

MA Estimates

Description

This function computes the sequential sampling (MA) estimates for a categorical variable or numeric variable.

Usage

MA.estimates(
  rds.data,
  trait.variable,
  seed.selection = "degree",
  number.of.seeds = NULL,
  number.of.coupons = NULL,
  number.of.iterations = 3,
  N = NULL,
  M1 = 25,
  M2 = 20,
  seed = 1,
  initial.sampling.probabilities = NULL,
  MPLE.samplesize = 50000,
  SAN.maxit = 5,
  SAN.nsteps = 2^19,
  sim.interval = 10000,
  number.of.cross.ties = NULL,
  max.degree = NULL,
  parallel = 1,
  parallel.type = "PSOCK",
  full.output = FALSE,
  verbose = TRUE
)
MA.estimates(
  rds.data,
  trait.variable,
  seed.selection = "degree",
  number.of.seeds = NULL,
  number.of.coupons = NULL,
  number.of.iterations = 3,
  N = NULL,
  M1 = 25,
  M2 = 20,
  seed = 1,
  initial.sampling.probabilities = NULL,
  MPLE.samplesize = 50000,
  SAN.maxit = 5,
  SAN.nsteps = 2^19,
  sim.interval = 10000,
  number.of.cross.ties = NULL,
  max.degree = NULL,
  parallel = 1,
  parallel.type = "PSOCK",
  full.output = FALSE,
  verbose = TRUE
)

Arguments

`rds.data`	An `rds.data.frame` that indicates recruitment patterns by a pair of attributes named “id” and “recruiter.id”.
`trait.variable`	A string giving the name of the variable in the `rds.data` that contains a categorical or numeric variable to be analyzed.
`seed.selection`	An estimate of the mechanism guiding the choice of seeds. The choices are "allwithtrait" indicating that all the seeds had the trait; "random" meaning they were, as if, a simple random sample of individuals from the population; "sample" indicating that the seeds are taken as those in the sample (and resampled for the population with that composition if necessary); "degree" is proportional to the degree of the individual; "allwithtraitdegree" indicating that all the seeds had the trait and the probability of being a seed is proportional to the degree of the respondent.
`number.of.seeds`	The number of seeds chosen to initiate the sampling.
`number.of.coupons`	The number of coupons given to each respondent.
`number.of.iterations`	The number of iterations used at the core of the algorithm.
`N`	An estimate of the number of members of the population being sampled. If `NULL` it is read as the `pop.size.mid` attribute of the `rds.data` frame. If that is missing it defaults to 1000.
`M1`	The number of networked populations generated at each iteration.
`M2`	The number of (full) RDS samples generated for each networked population at each iteration.
`seed`	The random number seed used to initiate the computations.
`initial.sampling.probabilities`	Initialize sampling probabilities for the algorithm. If missing, they are taken as proportional to degree, and this is almost always the best starting values.
`MPLE.samplesize`	Number of samples to take in the computation of the maximum pseudolikelihood estimator (MPLE) of the working model parameter. The default is almost always sufficient.
`SAN.maxit`	A ceiling on the number of simulated annealing iterations.
`SAN.nsteps`	Number of MCMC proposals for all the annealing runs combined.
`sim.interval`	Number of MCMC steps between each of the M1 sampled networks per iteration.
`number.of.cross.ties`	The expected number of ties between those with the trait and those without. If missing, it is computed based on the respondent's reports of the number of ties they have to population members who have the trait (i.e. `ties.to.trait.variable`) and do not have the trait (i.e. `ties.not.to.trait.variable`).
`max.degree`	Impose ceiling on degree size.
`parallel`	Number of processors to use in the computations. The default is 1, that is no parallel processing.
`parallel.type`	The type of cluster to start. e.g. 'PSOCK', 'MPI', etc.
`full.output`	More verbose output
`verbose`	Should verbose diagnostics be printed while the algorithm is running.

Value

If trait.variable is numeric then the model-assisted estimate of the mean is returned, otherwise a vector of proportion estimates is returned. If full.output=TRUE this leads to:

If full.output=FALSE this leads to an object of class rds.interval.estimate which is a list with component

estimate

the numerical point estimate of proportion of thetrait.variable.

interval

a matrix with size columns and one row per category of trait.variable:

point estimate: The HT estimate of the population mean.
95% Lower Bound: Lower 95% confidence bound
95% Upper Bound: Upper 95% confidence bound
Design Effect: The design effect of the RDS
s.e.: standard error
n: count of the number of sample values with that value of the trait

rds.data

an rds.data.frame that indicates recruitment patterns by a pair of attributes named “id” and “recruiter.id”.

N

an estimate of the number of members of the population being sampled. If NULL it is read as the pop.size.mid attribute of the rds.data frame. If that is missing it defaults to 1000.

M1

the number of networked populations generated at each iteration.

M2

the number of (full) RDS populations generated for each networked population at each iteration.

seed

the random number seed used to initiate the computations.

seed.selection

an estimate of the mechanism guiding the choice of seeds. The choices are

"allwithtrait": indicating that all the seeds had the trait;
"random": meaning they were, as if, a simple random sample of individuals from the population;
"sample": indicating that the seeds are taken as those in the sample (and resampled for the population with that composition if necessary);
"degree": is proportional to the degree of the individual;
"allwithtraitdegree": indicating that all the seeds had the trait and the probability of being a seed is proportional to the degree of the respondent.

number.of.seeds

The number of seeds chosen to initiate the sampling.

number.of.coupons

The number of coupons given to each respondent.

number.of.iterations

The number of iterations used at the core of the algorithm.

outcome.variable

The name of the outcome variable

weight.type

The type of weighting used (i.e. MA)

uncertainty

The type of weighting used (i.e. MA)

details

A list of other diagnostic output from the computations.

varestBS

Output from the bootstrap procedure. A list with two elements: var is the bootstrap variance, and BSest is the vector of bootstrap estimates themselves.

coefficient

estimate of the parameter of the ERGM for the network.

Author(s)

Krista J. Gile with help from Mark S. Handcock

References

Gile, Krista J. 2011 Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation, Journal of the American Statistical Association, 106, 135-146.

Gile, Krista J., Handcock, Mark S., 2010. Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327. <doi:10.1111/j.1467-9531.2010.01223.x>

Gile, Krista J., Beaudry, Isabelle S. and Handcock, Mark S., 2018 Methods for Inference from Respondent-Driven Sampling Data, Annual Review of Statistics and Its Application <doi:10.1146/annurev-statistics-031017-100704>.

Examples


## Not run: 
data(faux)
MA.estimates(rds.data=faux,trait.variable='X')

## End(Not run)

## Not run: 
data(faux)
MA.estimates(rds.data=faux,trait.variable='X')

## End(Not run)

Diagnostic plots for the RDS recruitment process

Description

Diagnostic plots for the RDS recruitment process

Usage

## S3 method for class 'rds.data.frame'
plot(
  x,
  plot.type = c("Recruitment tree", "Network size by wave", "Recruits by wave",
    "Recruits per seed", "Recruits per subject"),
  stratify.by = NULL,
  ...
)
## S3 method for class 'rds.data.frame'
plot(
  x,
  plot.type = c("Recruitment tree", "Network size by wave", "Recruits by wave",
    "Recruits per seed", "Recruits per subject"),
  stratify.by = NULL,
  ...
)

Arguments

`x`	An rds.data.frame object.
`plot.type`	the type of diagnostic.
`stratify.by`	A factor used to color or stratify the plot elements.
`...`	Additional arguments for the underlying plot function if applicable.

Details

Several types of diagnostics are supported by the plot.type argument. 'Recruitment tree' displays a network plot of the RDS recruitment process. 'Network size by wave' monitors systematic changes is network size based on how far subjects are from the seed 'Recruits by wave' displays counts of subjects based on how far they rare from their seed. 'Recruit per seed' shows the total tree size for each seed. 'Recruits per subject' shows counts of how many subjects are recruited by each subject who are non-terminal.

Value

Either nothing (for the recruitment tree plot), or a ggplot2 object.

Examples

data(fauxmadrona)
## Not run: 
plot(fauxmadrona)

## End(Not run)
plot(fauxmadrona, plot.type='Recruits by wave')
plot(fauxmadrona, plot.type='Recruits per seed')
plot(fauxmadrona, plot.type='Recruits per subject')

plot(fauxmadrona, plot.type='Recruits by wave', stratify.by='disease')
plot(fauxmadrona, plot.type='Recruits per seed', stratify.by='disease')
plot(fauxmadrona, plot.type='Recruits per subject', stratify.by='disease')
data(fauxmadrona)
## Not run: 
plot(fauxmadrona)

## End(Not run)
plot(fauxmadrona, plot.type='Recruits by wave')
plot(fauxmadrona, plot.type='Recruits per seed')
plot(fauxmadrona, plot.type='Recruits per subject')

plot(fauxmadrona, plot.type='Recruits by wave', stratify.by='disease')
plot(fauxmadrona, plot.type='Recruits per seed', stratify.by='disease')
plot(fauxmadrona, plot.type='Recruits per subject', stratify.by='disease')

Prints an differential.activity.estimate object

Description

Prints an differential.activity.estimate object

Usage

## S3 method for class 'differential.activity.estimate'
print(x, ...)
## S3 method for class 'differential.activity.estimate'
print(x, ...)

Arguments

`x`	an differential.activity.estimate object
`...`	unused

Displays a pvalue.table

Description

Displays a pvalue.table

Usage

## S3 method for class 'pvalue.table'
print(x, ...)
## S3 method for class 'pvalue.table'
print(x, ...)

Arguments

`x`	a pvalue.table object
`...`	additional parameters passed to print.data.frame.

Displays an rds.contin.bootstrap

Description

Displays an rds.contin.bootstrap

Usage

## S3 method for class 'rds.contin.bootstrap'
print(x, show.table = FALSE, ...)
## S3 method for class 'rds.contin.bootstrap'
print(x, show.table = FALSE, ...)

Arguments

`x`	an rds.contin.bootstrap object
`show.table`	Display weighted contingency table
`...`	additional parameters passed to print.matrix.

Displays an rds.data.frame

Description

Displays an rds.data.frame

Usage

## S3 method for class 'rds.data.frame'
print(x, ...)
## S3 method for class 'rds.data.frame'
print(x, ...)

Arguments

`x`	an rds.data.frame object
`...`	additional parameters passed to print.data.frame.

Prints an `rds.interval.estimate` object

Description

Prints an rds.interval.estimate object

Usage

## S3 method for class 'rds.interval.estimate'
print(x, as.percentage = NULL, ...)
## S3 method for class 'rds.interval.estimate'
print(x, as.percentage = NULL, ...)

Arguments

`x`	an `rds.interval.estimate` object
`as.percentage`	logical. Print the interval estimates as percentages (as distinct from proportions). The default, NULL, means that it will determine if the variable is discrete or continuous and only print them as percentages if they are discrete.
`...`	unused

Summarizing Generalized Linear Model Fits with Odds Ratios

Description

print.summary.svyglm.RDS is a version of print.summary.svyglm that reports odds-ratios in place of coefficients in the summary table. This only applies for the binomial family. Otherwise it is identical to print.summary.svyglm. The default in
print.summary.svyglm is to display the log-odds-ratios and this displays the exponetiated from and a 95 p-values are still displayed.

Usage

## S3 method for class 'summary.svyglm.RDS'
print(
  x,
  digits = max(3, getOption("digits") - 3),
  symbolic.cor = x$symbolic.cor,
  signif.stars = getOption("show.signif.stars"),
  ...
)
## S3 method for class 'summary.svyglm.RDS'
print(
  x,
  digits = max(3, getOption("digits") - 3),
  symbolic.cor = x$symbolic.cor,
  signif.stars = getOption("show.signif.stars"),
  ...
)

Arguments

`x`	an object of class `"summary.svyglm.RDS"`, usually, a result of a call to `RDS::summary.svyglm`.
`digits`	the number of significant digits to use when printing.
`symbolic.cor`	logical. If `TRUE`, print the correlations in a symbolic form (see `symnum`) rather than as numbers.
`signif.stars`	logical. If `TRUE`, ‘significance stars’ are printed for each coefficient.
`...`	further arguments passed to or from other methods.

Examples


## For examples see example(svyglm)

## For examples see example(svyglm)

RDS Bootstrap Interval Estimates

Description

This function computes an interval estimate for one or more categorical variables. It optionally uses attributes of the RDS data set to determine the type of estimator and type of uncertainty estimate to use.

Usage

RDS.bootstrap.intervals(
  rds.data,
  outcome.variable,
  weight.type = NULL,
  uncertainty = NULL,
  N = NULL,
  subset = NULL,
  confidence.level = 0.95,
  number.of.bootstrap.samples = NULL,
  fast = TRUE,
  useC = TRUE,
  ci.type = "t",
  control = control.rds.estimates(),
  to.factor = FALSE,
  cont.breaks = 3,
  ...
)
RDS.bootstrap.intervals(
  rds.data,
  outcome.variable,
  weight.type = NULL,
  uncertainty = NULL,
  N = NULL,
  subset = NULL,
  confidence.level = 0.95,
  number.of.bootstrap.samples = NULL,
  fast = TRUE,
  useC = TRUE,
  ci.type = "t",
  control = control.rds.estimates(),
  to.factor = FALSE,
  cont.breaks = 3,
  ...
)

Arguments

`rds.data`	An `rds.data.frame` that indicates recruitment patterns by a pair of attributes named “id” and “recruiter.id”.
`outcome.variable`	A string giving the name of the variable in the `rds.data` that contains a categorical or numeric variable to be analyzed.
`weight.type`	A string giving the type of estimator to use. The options are `"Gile's SS"`, `"RDS-I"`, `"RDS-II"`, `"RDS-I (DS)"`, and `"Arithemic Mean"`. If `NULL` it defaults to `"Gile's SS"`.
`uncertainty`	A string giving the type of uncertainty estimator to use. The options are `"SRS"`, `"Gile"` and `"Salganik"`. This is usually determined by `weight.type` to be consistent with the estimator's origins. The estimators RDS-I, RDS-I (DS), and RDS-II default to `"Salganik"`, "Arithmetic Mean" defaults to `"SRS"` and "Gile's SS" defaults to the `"Gile"` bootstrap.
`N`	An estimate of the number of members of the population being sampled. If `NULL` it is read as the `population.size.mid` attribute of the `rds.data` frame. If that is missing it defaults to 1000.
`subset`	An optional criterion to subset `rds.data` by. It is a character string giving an R expression which, when evaluated, subset the data. In plain English, it can be something like `"seed > 0"` to exclude seeds. It can be the name of a logical vector of the same length of the outcome variable where TRUE means include it in the analysis. If `NULL` then no subsetting is done.
`confidence.level`	The confidence level for the confidence intervals. The default is 0.95 for 95%.
`number.of.bootstrap.samples`	The number of bootstrap samples to take in estimating the uncertainty of the estimator. If `NULL` it defaults to the number necessary to compute the standard error to accuracy 0.001. `outcome.variable`. Otherwise it will compute the population frequencies of each value of the `outcome.variable`.
`fast`	Use a fast bootstrap where the weights are reused from the estimator rather than being recomputed for each bootstrap sample.
`useC`	Use a C-level implementation of Gile's bootstrap (rather than the R level). The implementations should be a computational equivalent estimator (except for speed).
`ci.type`	Type of confidence interval to use, if possible. If "t", use lower and upper confidence interval values based on the standard deviation of the bootstrapped values and a t multiplier. If "pivotal", use lower and upper confidence interval values based on the basic bootstrap (also called the pivotal confidence interval). If "quantile", use lower and upper confidence interval values based on the quantiles of the bootstrap sample. If "proportion", use the "t" unless the estimated proportion is less than 0.15 or the bounds are outside [0,1 . In this case, try the "quantile" and constrain the bounds to be compatible with [0,1].
`control`	A list of control parameters for algorithm tuning. Constructed using `control.rds.estimates`.
`to.factor`	force variable to be a factor
`cont.breaks`	For continuous variates, some bootstrap proceedures require categorical data. In these cases, in order to contruct each bootstrap replicate, the outcome variable is split into cont.breaks categories.
`...`	Additional arguments for RDS.*.estimates.

Value

An object of class rds.interval.estimate summarizing the inference. The confidence interval and standard error are based on the bootstrap procedure. In additon, the object has attribute bsresult which provides details of the bootstrap procedure. The contents of the bsresult attribute depends on the uncertainty used. If uncertainty=="Salganik" then bsresult is a vector of standard deviations of the bootstrap samples. If uncertainty=="Gile's SS" then bsresult is a list with components for the bootstrap point estimate, the bootstrap samples themselves and the standard deviations of the bootstrap samples. If uncertainty=="SRS" then bsresult is NULL.

References

Gile, Krista J. 2011 Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation, Journal of the American Statistical Association, 106, 135-146.

Gile, Krista J., Handcock, Mark S., 2010. Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327. <doi:10.1111/j.1467-9531.2010.01223.x>

Examples


## Not run: 
data(fauxmadrona)
RDS.bootstrap.intervals(rds.data=fauxmadrona,weight.type="RDS-II",
     uncertainty="Salganik",
	outcome.variable="disease",N=1000,number.of.bootstrap.samples=50)

data(fauxtime)
RDS.bootstrap.intervals(rds.data=fauxtime,weight.type="HCG",
     uncertainty="HCG",
	outcome.variable="var1",N=1000,number.of.bootstrap.samples=10)

## End(Not run)

## Not run: 
data(fauxmadrona)
RDS.bootstrap.intervals(rds.data=fauxmadrona,weight.type="RDS-II",
     uncertainty="Salganik",
	outcome.variable="disease",N=1000,number.of.bootstrap.samples=50)

data(fauxtime)
RDS.bootstrap.intervals(rds.data=fauxtime,weight.type="HCG",
     uncertainty="HCG",
	outcome.variable="var1",N=1000,number.of.bootstrap.samples=10)

## End(Not run)

Compares the rates of two variables against one another.

Description

Compares the rates of two variables against one another.

Usage

RDS.compare.proportions(first.interval, second.interval, M = 10000)
RDS.compare.proportions(first.interval, second.interval, M = 10000)

Arguments

`first.interval`	An `rds.interval.estimate` object fit with either "Gile" or "Salganik" uncertainty.
`second.interval`	An `rds.interval.estimate` object fit with either "Gile" or "Salganik" uncertainty.
`M`	The number of bootstrap resamplings to use

Details

This function preforms a bootstrap test comparing the the rates of two variables against one another.

Examples

## Not run: 
data(faux)
int1 <- RDS.bootstrap.intervals(faux, outcome.variable=c("X"), 
weight.type="RDS-II", uncertainty="Salganik", N=1000,
number.ss.samples.per.iteration=1000, 
	confidence.level=0.95, number.of.bootstrap.samples=100)
int2 <- RDS.bootstrap.intervals(faux, outcome.variable=c("Y"), 
	weight.type="RDS-II", uncertainty="Salganik", N=1000,
number.ss.samples.per.iteration=1000,
confidence.level=0.95, number.of.bootstrap.samples=100)
RDS.compare.proportions(int1,int2)

## End(Not run)
## Not run: 
data(faux)
int1 <- RDS.bootstrap.intervals(faux, outcome.variable=c("X"), 
weight.type="RDS-II", uncertainty="Salganik", N=1000,
number.ss.samples.per.iteration=1000, 
	confidence.level=0.95, number.of.bootstrap.samples=100)
int2 <- RDS.bootstrap.intervals(faux, outcome.variable=c("Y"), 
	weight.type="RDS-II", uncertainty="Salganik", N=1000,
number.ss.samples.per.iteration=1000,
confidence.level=0.95, number.of.bootstrap.samples=100)
RDS.compare.proportions(int1,int2)

## End(Not run)

Compares the rates of two variables against one another.

Description

Compares the rates of two variables against one another.

Usage

RDS.compare.two.proportions(
  data,
  variables,
  confidence.level = 0.95,
  number.of.bootstrap.samples = 5000,
  plot = FALSE,
  seed = 1
)
RDS.compare.two.proportions(
  data,
  variables,
  confidence.level = 0.95,
  number.of.bootstrap.samples = 5000,
  plot = FALSE,
  seed = 1
)

Arguments

`data`	An object of class `rds.interval.estimates.list` with attribute `variables` containing a character vector of names of objects of class `rds.interval.estimate`.
`variables`	A character vector of column names to select from `data`.
`confidence.level`	The confidence level for the confidence intervals. The default is 0.95 for 95%.
`number.of.bootstrap.samples`	The number of Monte Carlo draws to determine the null distribution of the likelihood ratio statistic.
`plot`	Logical, if TRUE then a plot is produces of the null distribution of the likelihood ratio statistic with the observed statistics plotted as a vertical dashed line.
`seed`	The value of the random number seed. Preset by default to allow reproducability.

Value

An object of class pvalue.table containing the cross-tabulation of p-values for comparing the two classes

Homophily Configuration Graph Estimates

Description

This function computes the Homophily Configuration Graph type estimates for a categorical variable.

Usage

RDS.HCG.estimates(
  rds.data,
  outcome.variable,
  N = NULL,
  subset = NULL,
  small.fraction = FALSE,
  empir.lik = TRUE,
  to.factor = FALSE,
  cont.breaks = 3
)
RDS.HCG.estimates(
  rds.data,
  outcome.variable,
  N = NULL,
  subset = NULL,
  small.fraction = FALSE,
  empir.lik = TRUE,
  to.factor = FALSE,
  cont.breaks = 3
)

Arguments

`rds.data`	An `rds.data.frame` with recruitment time set.
`outcome.variable`	A string giving the name of the variable in the `rds.data` that contains a categorical variable to be analyzed.
`N`	Population size to be used to calculate the empirical likelihood interval. If NULL, this value is taken to be the population.size.mid attribute of the data and if that is not set, no finite population correction is used.
`subset`	An optional criterion to subset `rds.data` by. It is an R expression which, when evaluated, subset the data. In plain English, it can be something like `subset = seed > 0` to exclude seeds. It can also be the name of a logical vector of the same length of the outcome variable where TRUE means include it in the analysis. If `NULL` then no subsetting is done.
`small.fraction`	Should a small sample fraction be assumed
`empir.lik`	Should confidence intervals be estimated using empirical likelihood.
`to.factor`	force variable to be a factor
`cont.breaks`	If variable is numeric, how many discretization points should be used in the calculation of the weights.

Value

If the empir.lik is true, an object of class rds.interval.estimate is returned. This is a list with components

estimate: The numerical point estimate of proportion of the trait.variable.
interval: A matrix with six columns and one row per category of trait.variable:
- point estimate: The HT estimate of the population mean.
- 95% Lower Bound: Lower 95% confidence bound.
- 95% Upper Bound: Upper 95% confidence bound.
- Design Effect: The design effect of the RDS.
- s.e.: Standard error.
- n: Count of the number of sample values with that value of the trait.

Otherwise an object of class rds.HCG.estimate object is returned.

Author(s)

Ian E. Fellows

Examples


data(fauxtime)
RDS.HCG.estimates(rds.data=fauxtime,outcome.variable='var1')
data(fauxtime)
RDS.HCG.estimates(rds.data=fauxtime,outcome.variable='var1')

Compute RDS-I Estimates

Description

This function computes the RDS-I type estimates for a categorical variable. It is also referred to as the Salganik-Heckathorn estimator.

Usage

RDS.I.estimates(
  rds.data,
  outcome.variable,
  N = NULL,
  subset = NULL,
  smoothed = FALSE,
  empir.lik = TRUE,
  to.factor = FALSE,
  cont.breaks = 3
)
RDS.I.estimates(
  rds.data,
  outcome.variable,
  N = NULL,
  subset = NULL,
  smoothed = FALSE,
  empir.lik = TRUE,
  to.factor = FALSE,
  cont.breaks = 3
)

Arguments

`rds.data`	An `rds.data.frame` that indicates recruitment patterns by a pair of attributes named “id” and “recruiter.id”.
`outcome.variable`	A string giving the name of the variable in the `rds.data` that contains a categorical variable to be analyzed.
`N`	Population size to be used to calculate the empirical likelihood interval. If NULL, this value is taken to be the population.size.mid attribute of the data and if that is not set, no finite population correction is used.
`subset`	An optional criterion to subset `rds.data` by. It is an R expression which, when evaluated, subset the data. In plain English, it can be something like `subset = seed > 0` to exclude seeds. It can also be the name of a logical vector of the same length of the outcome variable where TRUE means include it in the analysis. If `NULL` then no subsetting is done.
`smoothed`	Logical, if TRUE then the “data smoothed” version of RDS-I is used, where it is assumed that the observed Markov process is reversible.
`empir.lik`	Should confidence intervals be estimated using empirical likelihood.
`to.factor`	force variable to be a factor
`cont.breaks`	The number of categories used for the RDS-I adjustment when the variate is continuous.

Value

If the empir.lik is true, an object of class rds.interval.estimate is returned. This is a list with components

estimate: The numerical point estimate of proportion of the trait.variable.
interval: A matrix with six columns and one row per category of trait.variable:
- point estimate: The HT estimate of the population mean.
- 95% Lower Bound: Lower 95% confidence bound.
- 95% Upper Bound: Upper 95% confidence bound.
- Design Effect: The design effect of the RDS.
- s.e.: Standard error.
- n: Count of the number of sample values with that value of the trait.

Otherwise an object of class rds.I.estimate object is returned.

Author(s)

Mark S. Handcock and W. Whipple Neely

References

Gile, Krista J., Handcock, Mark S., 2010. Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327. <doi:10.1111/j.1467-9531.2010.01223.x>

Neely, W. W., 2009. Bayesian methods for data from respondent driven sampling. Dissertation in-progress, Department of Statistics, University of Wisconsin, Madison.

Salganik, M., Heckathorn, D. D., 2004. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology 34, 193-239.

Volz, E., Heckathorn, D., 2008. Probability based estimation theory for Respondent Driven Sampling. The Journal of Official Statistics 24 (1), 79-97.

Examples


data(faux)
RDS.I.estimates(rds.data=faux,outcome.variable='X')
RDS.I.estimates(rds.data=faux,outcome.variable='X',smoothed=TRUE)
data(faux)
RDS.I.estimates(rds.data=faux,outcome.variable='X')
RDS.I.estimates(rds.data=faux,outcome.variable='X',smoothed=TRUE)

RDS-I weights

Description

RDS-I weights

Usage

rds.I.weights(rds.data, outcome.variable, N = NULL, smoothed = FALSE, ...)
rds.I.weights(rds.data, outcome.variable, N = NULL, smoothed = FALSE, ...)

Arguments

`rds.data`	An rds.data.frame
`outcome.variable`	The variable used to base the weights on.
`N`	Population size
`smoothed`	Should the data smoothed RDS-I weights be computed.
`...`	Unused

RDS-II Estimates

Description

This function computes the RDS-II estimates for a categorical variable or the RDS-II estimate for a numeric variable.

Usage

RDS.II.estimates(
  rds.data,
  outcome.variable,
  N = NULL,
  subset = NULL,
  empir.lik = TRUE,
  to.factor = FALSE
)
RDS.II.estimates(
  rds.data,
  outcome.variable,
  N = NULL,
  subset = NULL,
  empir.lik = TRUE,
  to.factor = FALSE
)

Arguments

`rds.data`	An `rds.data.frame` that indicates recruitment patterns by a pair of attributes named “id” and “recruiter.id”.
`outcome.variable`	A string giving the name of the variable in the `rds.data` that contains a categorical or numeric variable to be analyzed.
`N`	Population size to be used to calculate the empirical likelihood interval. If NULL, this value is taken to be the population.size.mid attribute of the data and if that is not set, no finite population correction is used.
`subset`	An optional criterion to subset `rds.data` by. It is an R expression which, when evaluated, subset the data. In plain English, it can be something like `subset = seed > 0` to exclude seeds. It can also be the name of a logical vector of the same length of the outcome variable where TRUE means include it in the analysis. If `NULL` then no subsetting is done.
`empir.lik`	If true, and outcome.variable is numeric, standard errors based on empirical likelihood will be given.
`to.factor`	force variable to be a factor

Value

If outcome.variable is numeric then the RDS-II estimate of the mean is returned, otherwise a vector of proportion estimates is returned. If the empir.lik is true, an object of class rds.interval.estimate is returned. This is a list with components

estimate: The numerical point estimate of proportion of the trait.variable.
interval: A matrix with six columns and one row per category of trait.variable:
- point estimate: The HT estimate of the population mean.
- 95% Lower Bound: Lower 95% confidence bound.
- 95% Upper Bound: Upper 95% confidence bound.
- Design Effect: The design effect of the RDS.
- s.e.: Standard error.
- n: Count of the number of sample values with that value of the trait.

Otherwise, an object of class rds.II.estimate is returned.

Author(s)

Mark S. Handcock and W. Whipple Neely

References

Gile, Krista J., Handcock, Mark S., 2010. Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327. <doi:10.1111/j.1467-9531.2010.01223.x>

Salganik, M., Heckathorn, D. D., 2004. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology 34, 193-239.

Volz, E., Heckathorn, D., 2008. Probability based estimation theory for Respondent Driven Sampling. The Journal of Official Statistics 24 (1), 79-97.

Examples


data(faux)
RDS.II.estimates(rds.data=faux,outcome.variable='X')
RDS.II.estimates(rds.data=faux,outcome.variable='X',subset= Y!="blue")

data(faux)
RDS.II.estimates(rds.data=faux,outcome.variable='X')
RDS.II.estimates(rds.data=faux,outcome.variable='X',subset= Y!="blue")

An object of class rds.interval.estimate

Description

This function creates an object of class rds.interval.estimate.

Usage

rds.interval.estimate(
  estimate,
  outcome.variable,
  weight.type,
  uncertainty,
  weights,
  N = NULL,
  conf.level = 0.95,
  csubset = ""
)
rds.interval.estimate(
  estimate,
  outcome.variable,
  weight.type,
  uncertainty,
  weights,
  N = NULL,
  conf.level = 0.95,
  csubset = ""
)

Arguments

`estimate`	The numerical point estimate of proportion of the `trait.variable`.
`outcome.variable`	A string giving the name of the variable in the `rds.data` that contains a categorical variable to be analyzed.
`weight.type`	A string giving the type of estimator to use. The options are `"Gile's SS"`, `"RDS-I"`, `"RDS-II"`, `"RDS-I (DS)"`, and `"Arithemic Mean"`. If `NULL` it defaults to `"Gile's SS"`.
`uncertainty`	A string giving the type of uncertainty estimator to use. The options are `"SRS"`, `"Gile"` and `"Salganik"`. This is usually determined by `weight.type` to be consistent with the estimator's origins. The estimators `"RDS-I"`, `"RDS-I (DS)"`, `"RDS-II"` default to `"Salganik"`, "Arithmetic Mean" defaults to `"SRS"` and "Gile's SS" defaults to the `"Gile"` bootstrap.
`weights`	A numerical vector of sampling weights for the sample, in order of the sample. They should be inversely proportional to the first-order inclusion probabilites, although this is not assessed or inforced.
`N`	An estimate of the number of members of the population being sampled. If `NULL` it is read as the `pop.size.mid` attribute of the `rds.data` frame. If that is missing it defaults to 1000.
`conf.level`	The confidence level for the confidence intervals. The default is 0.95 for 95%.
`csubset`	A character string representing text to add to the output label. Typically this will be the expression used it define the subset of the data used for the estimate.

Value

An object of class rds.interval.estimate is returned. This is a list with components

estimate: The numerical point estimate of proportion of the trait.variable.
interval: A matrix with six columns and one row per category of trait.variable:
- point estimate: The HT estimate of the population mean.
- 95% Lower Bound: Lower 95% confidence bound.
- 95% Upper Bound: Upper 95% confidence bound.
- Design Effect: The design effect of the RDS.
- s.e.: Standard error.
- n: Count of the number of sample values with that value of the trait.

Author(s)

Mark S. Handcock

References

Gile, Krista J., Handcock, Mark S., 2010. Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327. <doi:10.1111/j.1467-9531.2010.01223.x>

Salganik, M., Heckathorn, D. D., 2004. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology 34, 193-239.

Volz, E., Heckathorn, D., 2008. Probability based estimation theory for Respondent Driven Sampling. The Journal of Official Statistics 24 (1), 79-97.

Examples


data(faux)
RDS.I.estimates(rds.data=faux,outcome.variable='X',smoothed=TRUE)

data(faux)
RDS.I.estimates(rds.data=faux,outcome.variable='X',smoothed=TRUE)

Gile's SS Estimates

Description

This function computes the sequential sampling (SS) estimates for a categorical variable or numeric variable.

Usage

RDS.SS.estimates(
  rds.data,
  outcome.variable,
  N = NULL,
  subset = NULL,
  number.ss.samples.per.iteration = 500,
  number.ss.iterations = 5,
  control = control.rds.estimates(),
  hajek = TRUE,
  empir.lik = TRUE,
  to.factor = FALSE
)
RDS.SS.estimates(
  rds.data,
  outcome.variable,
  N = NULL,
  subset = NULL,
  number.ss.samples.per.iteration = 500,
  number.ss.iterations = 5,
  control = control.rds.estimates(),
  hajek = TRUE,
  empir.lik = TRUE,
  to.factor = FALSE
)

Arguments

`rds.data`	An `rds.data.frame` that indicates recruitment patterns by a pair of attributes named “id” and “recruiter.id”.
`outcome.variable`	A string giving the name of the variable in the `rds.data` that contains a categorical or numeric variable to be analyzed.
`N`	An estimate of the number of members of the population being sampled. If `NULL` it is read as the `population.size.mid` attribute of the `rds.data` frame. If that is missing it defaults to 1000.
`subset`	An optional criterion to subset `rds.data` by. It is an R expression which, when evaluated, subset the data. In plain English, it can be something like `subset = seed > 0` to exclude seeds. It can also be the name of a logical vector of the same length of the outcome variable where TRUE means include it in the analysis. If `NULL` then no subsetting is done.
`number.ss.samples.per.iteration`	The number of samples to take in estimating the inclusion probabilites in each iteration of the sequential sampling algorithm. If `NULL` it is read as the eponymous attribute of `rds.data`. If that is missing it defaults to 5000.
`number.ss.iterations`	The number of iterations of the sequential sampling algorithm. If that is missing it defaults to 5.
`control`	A list of control parameters for algorithm tuning. Constructed using `control.rds.estimates`.
`hajek`	logical; Use the standard Hajek-type estimator of Gile (2011) or the standard Hortitz-Thompson. The default is TRUE.
`empir.lik`	If true, and outcome.variable is numeric, standard errors based on empirical likelihood will be given.
`to.factor`	force variable to be a factor

Value

If outcome.variable is numeric then the Gile SS estimate of the mean is returned, otherwise a vector of proportion estimates is returned. If the empir.lik is true, an object of class rds.interval.estimate is returned. This is a list with components

estimate: The numerical point estimate of proportion of the trait.variable.
interval: A matrix with six columns and one row per category of trait.variable:
- point estimate: The HT estimate of the population mean.
- 95% Lower Bound: Lower 95% confidence bound.
- 95% Upper Bound: Upper 95% confidence bound.
- Design Effect: The design effect of the RDS.
- s.e.: Standard error.
- n: Count of the number of sample values with that value of the trait.

Otherwise, an object of class rds.SS.estimate is returned.

Author(s)

Krista J. Gile with help from Mark S. Handcock

References

Gile, Krista J. 2011 Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation, Journal of the American Statistical Association, 106, 135-146.

Gile, Krista J., Handcock, Mark S., 2010. Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327. <doi:10.1111/j.1467-9531.2010.01223.x>

Gile, Krista J., Handcock, Mark S., 2015 Network Model-Assisted Inference from Respondent-Driven Sampling Data, Journal of the Royal Statistical Society, A. <doi:10.1111/rssa.12091>.

Salganik, M., Heckathorn, D. D., 2004. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology 34, 193-239.

Volz, E., Heckathorn, D., 2008. Probability based estimation theory for Respondent Driven Sampling. The Journal of Official Statistics 24 (1), 79-97.

Examples


data(fauxmadrona)
RDS.SS.estimates(rds.data=fauxmadrona,outcome.variable="disease",N=1000)

data(fauxmadrona)
RDS.SS.estimates(rds.data=fauxmadrona,outcome.variable="disease",N=1000)

Create RDS samples with given characteristics

Description

Create RDS samples with given characteristics

Usage

rdssampleC(
  net,
  nnodes = network.size(net),
  nsamp0,
  fixinitial,
  nsamp,
  replace,
  coupons,
  select = NULL,
  bias = NULL,
  rds.samp = NULL,
  seed.distribution = NULL,
  attrall = FALSE,
  trait.variable = "disease",
  nsims = 1,
  seeds = NULL,
  prob.network.recall = 1,
  verbose = TRUE
)
rdssampleC(
  net,
  nnodes = network.size(net),
  nsamp0,
  fixinitial,
  nsamp,
  replace,
  coupons,
  select = NULL,
  bias = NULL,
  rds.samp = NULL,
  seed.distribution = NULL,
  attrall = FALSE,
  trait.variable = "disease",
  nsims = 1,
  seeds = NULL,
  prob.network.recall = 1,
  verbose = TRUE
)

Arguments

`net`	the network object from which to draw a sample
`nnodes`	the number of nodes in the network [at least as default]
`nsamp0`	the number of seeds to be drawn (i.e. the size of the 0th wave of sampling)
`fixinitial`	a variable that indicates the distribution from which to draw the initial seeds, if the seeds variable is NULL and the seed.distribution variable is NULL
`nsamp`	number of individuals in each RDS sample
`replace`	sampling with replacement
`coupons`	number of coupons
`select`	not used
`bias`	not used
`rds.samp`	not used
`seed.distribution`	a variable [what kind?] that indicates the distribution from which to draw the initial seeds
`attrall`	Whether all the information about the sample should be returned [??]
`trait.variable`	attribute of interest
`nsims`	number of RDS samples to draw
`seeds`	an array of seeds. Default is NULL, in which case the function draws the seeds from the nodes of the network.
`prob.network.recall`	simulates the probability that an individual will remember any particular link
`verbose`	Print verbose output

Value

A list with the following elements: nsample: vector of indices of sampled nodes wsample: vector of waves of each sampled node degsample: vector of degrees of sampled nodes attrsample: vector of attrs of sampled nodes toattr: vector of numbers of referrals to attrsd nodes tonoattr: vector of number of referrans to unattrsd nominators: recruiter of each sample

Import data from the 'RDSAT' format as an `rds.data.frame`

Description

This function imports RDSAT data files as rds.data.frame objects.

Usage

read.rdsat(file, delim = c("<auto>", "\t", " ", ","), N = NULL)
read.rdsat(file, delim = c("<auto>", "\t", " ", ","), N = NULL)

Arguments

`file`	the name of the file which the data are to be read from. If it does not contain an _absolute_ path, the file name is _relative_ to the current working directory, 'getwd()'. Tilde-expansion is performed where supported. As from R 2.10.0 this can be a compressed file (see 'file')
`delim`	The seperator defining columns. <auto> will guess the delimitor based on the file.
`N`	The population size (Optional).

Examples

fn <- paste0(path.package("RDS"),"/extdata/nyjazz.rdsat")
rd <- read.rdsat(fn)
plot(rd)
fn <- paste0(path.package("RDS"),"/extdata/nyjazz.rdsat")
rd <- read.rdsat(fn)
plot(rd)

Import data saved using write.rdsobj

Description

Import data saved using write.rdsobj

Usage

read.rdsobj(file)
read.rdsobj(file)

Arguments

file

the name of the file which the data are to be read from. If it does not contain an _absolute_ path, the file name is _relative_ to the current working directory, 'getwd()'. Tilde-expansion is performed where supported. As from R 2.10.0 this can be a compressed file (see 'file')

Plots the recruitment network using the Reingold Tilford algorithm.

Description

Plots the recruitment network using the Reingold Tilford algorithm.

Usage

reingold.tilford.plot(
  x,
  vertex.color = NULL,
  vertex.color.scale = hue_pal(),
  vertex.size = 2,
  vertex.size.range = c(1, 5),
  edge.arrow.size = 0,
  vertex.label.cex = 0.2,
  vertex.frame.color = NA,
  vertex.label = get.id(x),
  show.legend = TRUE,
  plot = TRUE,
  ...
)
reingold.tilford.plot(
  x,
  vertex.color = NULL,
  vertex.color.scale = hue_pal(),
  vertex.size = 2,
  vertex.size.range = c(1, 5),
  edge.arrow.size = 0,
  vertex.label.cex = 0.2,
  vertex.frame.color = NA,
  vertex.label = get.id(x),
  show.legend = TRUE,
  plot = TRUE,
  ...
)

Arguments

`x`	An rds.data.frame
`vertex.color`	The name of the categorical variable in x to color the points with.
`vertex.color.scale`	The scale to create the color palette.
`vertex.size`	The size of the vertex points. either a number or the name of a column of x.
`vertex.size.range`	If vertex.size represents a variable, vertex.size.range is a vector of length 2 representing the minimum and maximum cex for the points.
`edge.arrow.size`	The size of the arrow from recruiter to recruitee.
`vertex.label.cex`	The size expansion factor for the vertex.labels.
`vertex.frame.color`	the color of the outside of the vertex.points.
`vertex.label`	The name of a variable to use as vertex labels. NA implies no labels.
`show.legend`	If true and either vertex.color or vertex.size represent variables, legends will be displayed at the bottom of the plot.
`plot`	Logical, if TRUE then a plot is produced of recruitment tree. ratio statistic with the observed statistics plotted as a vertical dashed line.
`...`	Additional parameters passed to plot.igraph.

Value

A two-column vector of the positions of the nodes in the recruitment tree.

Examples

## Not run: 
data(fauxmadrona)
data(faux)
reingold.tilford.plot(faux)
reingold.tilford.plot(fauxmadrona,vertex.color="disease")

## End(Not run)
## Not run: 
data(fauxmadrona)
data(faux)
reingold.tilford.plot(faux)
reingold.tilford.plot(fauxmadrona,vertex.color="disease")

## End(Not run)

Determines the recruiter.id from recruitment coupon information

Description

Determines the recruiter.id from recruitment coupon information

Usage

rid.from.coupons(
  data,
  subject.coupon = NULL,
  coupon.variables,
  subject.id = NULL,
  seed.id = "seed"
)
rid.from.coupons(
  data,
  subject.coupon = NULL,
  coupon.variables,
  subject.id = NULL,
  seed.id = "seed"
)

Arguments

`data`	a data.frame
`subject.coupon`	The variable representing the coupon returned by subject
`coupon.variables`	The variable representing the coupon ids given to the subject
`subject.id`	The variable representing the subject's id
`seed.id`	The recruiter.id to assign to seed subjects.

Examples

fpath <- system.file("extdata", "nyjazz.csv", package="RDS")
dat <- read.csv(fpath)
dat$recruiter.id <- rid.from.coupons(dat,"own.coupon",
                      paste0("coupon.",1:7),"id")

#create and rds.data.frame
rds <- as.rds.data.frame(dat,network.size="network.size")
fpath <- system.file("extdata", "nyjazz.csv", package="RDS")
dat <- read.csv(fpath)
dat$recruiter.id <- rid.from.coupons(dat,"own.coupon",
                      paste0("coupon.",1:7),"id")

#create and rds.data.frame
rds <- as.rds.data.frame(dat,network.size="network.size")

Set the class of the control list

Description

This function sets the class of the control list, with the default being the name of the calling function.

Usage

set.control.class(
  myname = as.character(RDS::ult(sys.calls(), 2)[[1L]]),
  control = get("control", pos = parent.frame())
)
set.control.class(
  myname = as.character(RDS::ult(sys.calls(), 2)[[1L]]),
  control = get("control", pos = parent.frame())
)

Arguments

`myname`	Name of the class to set.
`control`	Control list. Defaults to the `control` variable in the calling function.

Value

The control list with class set.

Displays an rds.data.frame

Description

Displays an rds.data.frame

Usage

show.rds.data.frame(x, ...)
show.rds.data.frame(x, ...)

Arguments

`x`	an rds.data.frame object.
`...`	additional parameters passed to print.data.frame.

Summarizing Generalized Linear Model Fits with Odds Ratios for Survey Data

Description

RDS::summary.svyglm.RDS is a version of summary.svyglm that reports odds-ratios in place of coefficients in the summary table. This only applies for the binomial family. Otherwise it is identical to summary.svyglm. The default in summary.svyglm is to display the log-odds-ratios and this displays the exponetiated from and a 95 p-values are still displayed.

Usage

## S3 method for class 'svyglm.RDS'
summary(object, correlation = FALSE, df.resid = NULL, odds = TRUE, ...)
## S3 method for class 'svyglm.RDS'
summary(object, correlation = FALSE, df.resid = NULL, odds = TRUE, ...)

Arguments

`object`	an object of class `"svyglm"`, usually, a result of a call to `svyglm`.
`correlation`	logical; if `TRUE`, the correlation matrix of the estimated parameters is returned and printed.
`df.resid`	Optional denominator degrees of freedom for Wald tests.
`odds`	logical; Should the coefficients be reported as odds (rather than log-odds)?
`...`	further arguments passed to or from other methods.

Details

svyglm fits a generalised linear model to data from a complex survey design, with inverse-probability weighting and design-based standard errors.

There is no anova method for svyglm as the models are not fitted by maximum likelihood.

See the manual page on svyglm for detail of that function.

Value

RDS::summary.svyglm returns an object of class "summary.svyglm.RDS", a list with components

`call`	the component from `object`.
`family`	the component from `object`.
`deviance`	the component from `object`.
`contrasts`	the component from `object`.
`df.residual`	the component from `object`.
`null.deviance`	the component from `object`.
`df.null`	the component from `object`.
`deviance.resid`	the deviance residuals: see `residuals.svyglm`.
`coefficients`	the matrix of coefficients, standard errors, z-values and p-values. Aliased coefficients are omitted.
`aliased`	named logical vector showing if the original coefficients are aliased.
`dispersion`	either the supplied argument or the inferred/estimated dispersion if the latter is `NULL`.
`df`	a 3-vector of the rank of the model and the number of residual degrees of freedom, plus number of coefficients (including aliased ones).
`cov.unscaled`	the unscaled (`dispersion = 1`) estimated covariance matrix of the estimated coefficients.
`cov.scaled`	ditto, scaled by `dispersion`.
`correlation`	(only if `correlation` is true.) The estimated correlations of the estimated coefficients.
`symbolic.cor`	(only if `correlation` is true.) The value of the argument `symbolic.cor`.
`odds`	Are the coefficients reported as odds (rather than log-odds)?

Examples


## For examples see example(svyglm)

## For examples see example(svyglm)

calculates the mle. i.e. the row proportions of the transition matrix

Description

calculates the mle. i.e. the row proportions of the transition matrix

Usage

transition.counts.to.Markov.mle(transition.counts)
transition.counts.to.Markov.mle(transition.counts)

Arguments

transition.counts

a matrix or table of transition counts

Details

depreicated. just use prop.table(transition.counts,1)

Extract or replace the ultimate (last) element of a vector or a list, or an element counting from the end.

Description

Extract or replace the *ult*imate (last) element of a vector or a list, or an element counting from the end.

Usage

ult(x, i = 1L)
ult(x, i = 1L)

Arguments

`x`	a vector or a list.
`i`	index from the end of the list to extract or replace (where 1 is the last element, 2 is the penultimate element, etc.).

Value

An element of 'x'.

Examples

x <- 1:5
(last <- ult(x))
(penultimate <- ult(x, 2)) # 2nd last.



x <- 1:5
(last <- ult(x))
(penultimate <- ult(x, 2)) # 2nd last.

Volz-Heckathorn (RDS-II) weights

Description

Volz-Heckathorn (RDS-II) weights

Usage

vh.weights(degs, N = NULL)
vh.weights(degs, N = NULL)

Arguments

`degs`	The degrees (i.e. network sizes) of the sample units.
`N`	Population size

writes an rds.data.frame recruitment tree as a GraphViz file

Description

writes an rds.data.frame recruitment tree as a GraphViz file

Usage

write.graphviz(x, file)
write.graphviz(x, file)

Arguments

`x`	An rds.data.frame.
`file`	A character vector representing the file

Writes out the RDS tree in NetDraw format

Description

Writes out the RDS tree in NetDraw format

Usage

write.netdraw(x, file = NULL, by.seed = FALSE)
write.netdraw(x, file = NULL, by.seed = FALSE)

Arguments

`x`	An rds.data.frame.
`file`	a character vector representing a file.
`by.seed`	If true, seperate files will be created for each seed.

Details

If by.seed is false, two files are created using 'file' as a base name. paste0(file,".DL") contains the edge information, and paste0(file,".vna") contains the nodal attributes

Writes out the RDS tree in RDSAT format

Description

Writes out the RDS tree in RDSAT format

Usage

write.rdsat(x, file = NULL)
write.rdsat(x, file = NULL)

Arguments

`x`	An rds.data.frame.
`file`	a character vector representing a file.

Export an rds.data.frame to file

Description

Export an rds.data.frame to file

Usage

write.rdsobj(x, file)
write.rdsobj(x, file)

Arguments

`x`	The rds.data.frame to export
`file`	The name of the file to create.

Package 'RDS'

Help Index

indexing

Description

Usage

Arguments

Details

indexing

Description

Usage

Arguments

Details

converts to character with minimal loss of precision for numeric variables

Description

Usage

Arguments

Coerces a data.frame object into an rds.data.frame object.

Description

Usage

Arguments

Value

Examples

Does various checks and throws errors if x is not a valid rds.data.frame

Description

Usage

Arguments

Details

Performs a bootstrap test of independance between two categorical variables

Description

Usage

Arguments

Details

Examples

Calculates incidence and bootstrap confidence intervals for immunoassay data collected with RDS

Description

Usage

Arguments

Details

Examples

Bottleneck Plot

Description

Usage

Arguments

References

Examples

Compute estimates of the sampling weights of the respondent's observations based on various estimators

Description

Usage

Arguments

Value

See Also

Named element accessor for ergm control lists

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Auxiliary for Controlling RDS.bootstrap.intervals

Description

Usage

Arguments

Details

Value

See Also

Convergence Plots

Description

Usage

Arguments

References

Examples

Counts the number or recruiter->recruitee transitions between different levels of the grouping variable.

Description

Usage

Arguments

Examples

Calculates estimates at each successive wave of the sampling process

Description

Usage