Title: | Foundation of vector-like and list-like containers in Bioconductor |
---|---|
Description: | The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages). |
Authors: | Hervé Pagès [aut, cre], Michael Lawrence [aut], Patrick Aboyoun [aut], Aaron Lun [ctb], Beryl Kanali [ctb] (Converted vignettes from Sweave to RMarkdown) |
Maintainer: | Hervé Pagès <[email protected]> |
License: | Artistic-2.0 |
Version: | 0.45.2 |
Built: | 2024-11-16 03:33:47 UTC |
Source: | https://github.com/bioc/S4Vectors |
The S4Vectors package defines aggregate
methods
for Vector, Rle, and List objects.
## S4 method for signature 'Vector' aggregate(x, by, FUN, start=NULL, end=NULL, width=NULL, frequency=NULL, delta=NULL, ..., simplify=TRUE) ## S4 method for signature 'Rle' aggregate(x, by, FUN, start=NULL, end=NULL, width=NULL, frequency=NULL, delta=NULL, ..., simplify=TRUE) ## S4 method for signature 'List' aggregate(x, by, FUN, start=NULL, end=NULL, width=NULL, frequency=NULL, delta=NULL, ..., simplify=TRUE)
## S4 method for signature 'Vector' aggregate(x, by, FUN, start=NULL, end=NULL, width=NULL, frequency=NULL, delta=NULL, ..., simplify=TRUE) ## S4 method for signature 'Rle' aggregate(x, by, FUN, start=NULL, end=NULL, width=NULL, frequency=NULL, delta=NULL, ..., simplify=TRUE) ## S4 method for signature 'List' aggregate(x, by, FUN, start=NULL, end=NULL, width=NULL, frequency=NULL, delta=NULL, ..., simplify=TRUE)
x |
|
by |
An object with If |
FUN |
The function, found via |
start , end , width
|
The start, end, and width of the subsets. If |
frequency , delta
|
Optional arguments that specify the sampling frequency and increment
within the subsets (in the same fashion as |
... |
Optional arguments to |
simplify |
A logical value specifying whether the result should be simplified to a vector or matrix if possible. |
Subsets of x
can be specified either via the by
argument
or via the start
, end
, width
, frequency
, and
delta
arguments.
For example, if start
and end
are specified, then:
aggregate(x, FUN=FUN, start=start, end=end, ..., simplify=simplify)
is equivalent to:
sapply(seq_along(start), function(i) FUN(x[start[i]:end[i]], ...), simplify=simplify)
(replace x[start[i]:end[i]]
with 2D-style subsetting
x[start[i]:end[i], ]
if x
is a DataFrame object).
The aggregate
function in the stats
package.
The start
,
end
, and
width
generic functions defined in
the BiocGenerics package.
x <- Rle(10:2, 1:9) aggregate(x, x > 4, mean) aggregate(x, FUN=mean, start=1:26, width=20) ## Note that aggregate() works on a DataFrame object the same way it ## works on an ordinary data frame: aggregate(DataFrame(state.x77), list(Region=state.region), mean) aggregate(weight ~ feed, data=DataFrame(chickwts), mean) library(IRanges) by <- IRanges(start=1:26, width=20, names=LETTERS) aggregate(x, by, is.unsorted)
x <- Rle(10:2, 1:9) aggregate(x, x > 4, mean) aggregate(x, FUN=mean, start=1:26, width=20) ## Note that aggregate() works on a DataFrame object the same way it ## works on an ordinary data frame: aggregate(DataFrame(state.x77), list(Region=state.region), mean) aggregate(weight ~ feed, data=DataFrame(chickwts), mean) library(IRanges) by <- IRanges(start=1:26, width=20, names=LETTERS) aggregate(x, by, is.unsorted)
The virtual class Annotated
is used to standardize the
storage of metadata with a subclass.
The Annotated
class supports the storage of global metadata in a
subclass. This is done through the metadata
slot that stores a list
object.
In the code snippet below, x
is an Annotated object.
metadata(x)
, metadata(x) <- value
:Get or set the list holding arbitrary R objects as annotations. May be, and often is, empty.
P. Aboyoun
The Vector class, which extends Annotated directly.
showClass("Annotated") # shows (some of) the known subclasses ## If the IRanges package was not already loaded, this will show ## more subclasses: library(IRanges) showClass("Annotated")
showClass("Annotated") # shows (some of) the known subclasses ## If the IRanges package was not already loaded, this will show ## more subclasses: library(IRanges) showClass("Annotated")
bindROWS
and bindCOLS
are low-level generic functions
defined in the S4Vectors package for binding objects along their
1st or 2nd dimension. They are the workhorses behind higher-level
operations like c()
, rbind()
, or cbind()
on
most vector-like or rectangular objects defined in Bioconductor.
They are not intended to be used directly by the end user.
bindROWS(x, objects=list(), use.names=TRUE, ignore.mcols=FALSE, check=TRUE) bindCOLS(x, objects=list(), use.names=TRUE, ignore.mcols=FALSE, check=TRUE)
bindROWS(x, objects=list(), use.names=TRUE, ignore.mcols=FALSE, check=TRUE) bindCOLS(x, objects=list(), use.names=TRUE, ignore.mcols=FALSE, check=TRUE)
x |
An S4 object. |
objects |
A list of S4 objects to bind to |
use.names |
Should the names on the input objects be propagated? By default they are. |
ignore.mcols |
Should the metadata columns on the input objects be ignored? By defaut they are not (i.e. they are propagated). |
check |
Should the result object be validated before being returned to the user? By default it is. |
An object of the same class as x
.
Hervé Pagès
Some low-level string utilities to operate on ordinary character vectors. For more advanced string manipulations, see the Biostrings package.
unstrsplit(x, sep="") ## more to come...
unstrsplit(x, sep="") ## more to come...
x |
A list-like object where each list element is a character vector, or a character vector (identity). |
sep |
A single string containing the separator. |
unstrsplit(x, sep)
is equivalent to (but much faster than)
sapply(x, paste0, collapse=sep)
. It performs the reverse
transformation of strsplit( , fixed=TRUE)
, that is,
if x
is a character vector with no NAs and sep
a single
string, then unstrsplit(strsplit(x, split=sep, fixed=TRUE), sep)
is identical to x
. A notable exception to this though is when
strsplit
finds a match at the end of a string, in which case the
last element of the output (which should normally be an empty string)
is not returned (see ?strsplit
for the details).
A character vector with one string per list element in x
.
Hervé Pagès
The strsplit
function in the base
package.
x <- list(A=c("abc", "XY"), B=NULL, C=letters[1:4]) unstrsplit(x) unstrsplit(x, sep=",") unstrsplit(x, sep=" => ") data(islands) x <- names(islands) y <- strsplit(x, split=" ", fixed=TRUE) x2 <- unstrsplit(y, sep=" ") stopifnot(identical(x, x2)) ## But... names(x) <- x y <- strsplit(x, split="in", fixed=TRUE) x2 <- unstrsplit(y, sep="in") y[x != x2] ## In other words: strsplit() behavior sucks :-/
x <- list(A=c("abc", "XY"), B=NULL, C=letters[1:4]) unstrsplit(x) unstrsplit(x, sep=",") unstrsplit(x, sep=" => ") data(islands) x <- names(islands) y <- strsplit(x, split=" ", fixed=TRUE) x2 <- unstrsplit(y, sep=" ") stopifnot(identical(x, x2)) ## But... names(x) <- x y <- strsplit(x, split="in", fixed=TRUE) x2 <- unstrsplit(y, sep="in") y[x != x2] ## In other words: strsplit() behavior sucks :-/
The DataFrame
class extends the RectangularData virtual
class supports the storage of any type of object (with length
and [
methods) as columns.
On the whole, the DataFrame
behaves very similarly to
data.frame
, in terms of construction, subsetting, splitting,
combining, etc. The most notable exceptions have to do with handling
of the row names:
The row names are optional. This means calling rownames(x)
will return NULL
if there are no row names. Of course, it
could return seq_len(nrow(x))
, but returning NULL
informs, for example, combination functions that no row names are
desired (they are often a luxury when dealing with large data).
The row names are not required to be unique.
Subsetting by row names does not use partial matching.
As DataFrame
derives from Vector
, it is
possible to set an annotation
string. Also, another
DataFrame
can hold metadata on the columns.
For a class to be supported as a column, it must have length
and [
methods, where [
supports subsetting only by
i
and respects drop=FALSE
. Optionally, a method may be
defined for the showAsCell
generic, which should return a
vector of the same length as the subset of the column passed to
it. This vector is then placed into a data.frame
and converted
to text with format
. Thus, each element of the vector should be
some simple, usually character, representation of the corresponding
element in the column.
DataFrame(..., row.names = NULL, check.names = TRUE,
stringsAsFactors)
:Constructs a DataFrame
in similar fashion to
data.frame
. Each argument in ...
is coerced to
a DataFrame
and combined column-wise.
The row names should be given in
row.names
; otherwise, they are inherited from the
arguments, as in data.frame
. Explicitly passing
NULL
to row.names
ensures that there are no rownames.
If check.names
is TRUE
, the column names will
be checked for syntactic validity and made unique, if necessary.
To store an object of a class that does not support coercion to
DataFrame
, wrap it in I()
. The class must still have
methods for length
and [
.
The stringsAsFactors
argument is ignored. The coercion of
column arguments to DataFrame determines whether strings
become factors.
make_zero_col_DFrame(nrow)
:Constructs a zero-column DFrame object with nrow
rows.
Intended for developers to use in other packages and typically
not needed by the end user.
In the following code snippets, x
is a DataFrame
.
dim(x)
:Get the length two integer vector indicating in the first and second element the number of rows and columns, respectively.
dimnames(x)
, dimnames(x) <- value
:Get and set the two element list containing the row names
(character vector of length nrow(x)
or NULL
)
and the column names (character vector of length ncol(x)
).
as(from, "DataFrame")
:By default, constructs a new DataFrame
with from
as
its only column. If from
is a matrix
or
data.frame
, all of its columns become columns in the new
DataFrame
. If from
is a list, each element becomes a
column, recycling as necessary. Note that for the DataFrame
to behave correctly, each column object must support element-wise
subsetting via the [
method and return the number of elements with
length
. It is recommended to use the DataFrame
constructor, rather than this interface.
as.list(x)
: Coerces x
, a DataFrame
,
to a list
.
as.data.frame(x, row.names=NULL, optional=FALSE,
make.names=TRUE)
:Coerces x
, a DataFrame
, to a data.frame
.
Each column is coerced to a data.frame
and then column
bound together. If row.names
is NULL
, they are
propagated from x
, if it has any. Otherwise, they are
inferred by the data.frame
constructor.
Like the as.data.frame()
method for class matrix
,
the method for class DataFrame
supports the make.names
argument. make.names
can be set to TRUE
or FALSE
to indicate what should happen if the row names of x
(or the
row names supplied via the row.names
argument) are invalid
(e.g. contain duplicates). If they are invalid, and make.names
is TRUE
(the default), they get "fixed" by going thru
make.names(*, unique=TRUE)
. Otherwise (i.e. if make.names
is FALSE
), an error is raised. Note that unlike the method
for class matrix
, make.names=NA
is not supported.
NOTE: Conversion of x
to a data.frame
is not
supported if x
contains any list
, SimpleList
,
or CompressedList
columns.
as(from, "data.frame")
: Coerces a DataFrame
to a data.frame
by calling as.data.frame(from)
.
as.matrix(x)
: Coerces the DataFrame
to a
matrix
, if possible.
as.env(x, enclos = parent.frame())
:Creates an environment from x
with a symbol for each
colnames(x)
. The values are not actually copied into the
environment. Rather, they are dynamically bound using
makeActiveBinding
. This prevents unnecessary copying
of the data from the external vectors into R vectors. The values
are cached, so that the data is not copied every time the symbol
is accessed.
In the following code snippets, x
is a DataFrame
.
x[i,j,drop]
: Behaves very similarly to the
[.data.frame
method, except i
can be a
logical Rle
object and subsetting by matrix
indices
is not supported. Indices containing NA
's are also not
supported.
x[i,j] <- value
: Behaves very similarly to the
[<-.data.frame
method.
x[[i]]
: Behaves very similarly to the
[[.data.frame
method, except arguments j
and exact
are not supported. Column name matching is
always exact. Subsetting by matrices is not supported.
x[[i]] <- value
: Behaves very similarly to the
[[<-.data.frame
method, except argument j
is not supported.
The show()
method for DataFrame objects obeys global options
showHeadLines
and showTailLines
for controlling the number
of head and tail rows to display.
See ?get_showHeadLines
for more information.
Michael Lawrence
DataFrame-combine for combining DataFrame objects.
DataFrame-utils for other common operations on DataFrame objects.
TransposedDataFrame objects.
RectangularData and SimpleList which DataFrame extends directly.
get_showHeadLines
for controlling the number of
DataFrame rows to display.
score <- c(1L, 3L, NA) counts <- c(10L, 2L, NA) row.names <- c("one", "two", "three") df <- DataFrame(score) # single column df[["score"]] df <- DataFrame(score, row.names = row.names) #with row names rownames(df) df <- DataFrame(vals = score) # explicit naming df[["vals"]] # arrays ary <- array(1:4, c(2,1,2)) sw <- DataFrame(I(ary)) # a data.frame sw <- DataFrame(swiss) as.data.frame(sw) # swiss, without row names # now with row names sw <- DataFrame(swiss, row.names = rownames(swiss)) as.data.frame(sw) # swiss # subsetting sw[] # identity subset sw[,] # same sw[NULL] # no columns sw[,NULL] # no columns sw[NULL,] # no rows ## select columns sw[1:3] sw[,1:3] # same as above sw[,"Fertility"] sw[,c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)] ## select rows and columns sw[4:5, 1:3] sw[1] # one-column DataFrame ## the same sw[, 1, drop = FALSE] sw[, 1] # a (unnamed) vector sw[[1]] # the same sw[["Fertility"]] sw[["Fert"]] # should return 'NULL' sw[1,] # a one-row DataFrame sw[1,, drop=TRUE] # a list ## duplicate row, unique row names are created sw[c(1, 1:2),] ## indexing by row names sw["Courtelary",] subsw <- sw[1:5,1:4] subsw["C",] # no partial match (unlike with data.frame) ## row and column names cn <- paste("X", seq_len(ncol(swiss)), sep = ".") colnames(sw) <- cn colnames(sw) rn <- seq(nrow(sw)) rownames(sw) <- rn rownames(sw) ## column replacement df[["counts"]] <- counts df[["counts"]] df[[3]] <- score df[["X"]] df[[3]] <- NULL # deletion
score <- c(1L, 3L, NA) counts <- c(10L, 2L, NA) row.names <- c("one", "two", "three") df <- DataFrame(score) # single column df[["score"]] df <- DataFrame(score, row.names = row.names) #with row names rownames(df) df <- DataFrame(vals = score) # explicit naming df[["vals"]] # arrays ary <- array(1:4, c(2,1,2)) sw <- DataFrame(I(ary)) # a data.frame sw <- DataFrame(swiss) as.data.frame(sw) # swiss, without row names # now with row names sw <- DataFrame(swiss, row.names = rownames(swiss)) as.data.frame(sw) # swiss # subsetting sw[] # identity subset sw[,] # same sw[NULL] # no columns sw[,NULL] # no columns sw[NULL,] # no rows ## select columns sw[1:3] sw[,1:3] # same as above sw[,"Fertility"] sw[,c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)] ## select rows and columns sw[4:5, 1:3] sw[1] # one-column DataFrame ## the same sw[, 1, drop = FALSE] sw[, 1] # a (unnamed) vector sw[[1]] # the same sw[["Fertility"]] sw[["Fert"]] # should return 'NULL' sw[1,] # a one-row DataFrame sw[1,, drop=TRUE] # a list ## duplicate row, unique row names are created sw[c(1, 1:2),] ## indexing by row names sw["Courtelary",] subsw <- sw[1:5,1:4] subsw["C",] # no partial match (unlike with data.frame) ## row and column names cn <- paste("X", seq_len(ncol(swiss)), sep = ".") colnames(sw) <- cn colnames(sw) rn <- seq(nrow(sw)) rownames(sw) <- rn rownames(sw) ## column replacement df[["counts"]] <- counts df[["counts"]] df[[3]] <- score df[["X"]] df[[3]] <- NULL # deletion
Various methods are provided to combine DataFrame objects along their rows or columns, or to merge them.
In the code snippets below, all the input objects are expected to be DataFrame objects.
rbind(...)
: Creates a new DataFrame object by
aggregating the rows of the input objects.
Very similar to rbind.data.frame()
, except
in the handling of row names. If all elements have row names, they
are concatenated and made unique. Otherwise, the result does not
have row names.
The returned DataFrame object inherits its metadata and
metadata columns from the first input object.
cbind(...)
: Creates a new DataFrame object by
aggregating the columns of the input objects.
Very similar to cbind.data.frame()
.
The returned DataFrame object inherits its metadata from
the first input object.
The metadata columns of the returned DataFrame object are
obtained by combining the metadata columns of the input object with
combineRows()
.
combineRows(x, ...)
:combineRows()
is a generic
function documented in the man page for RectangularData
objects (see ?RectangularData
).
The method for DataFrame objects behaves as documented in
that man page.
combineCols(x, ..., use.names=TRUE)
:combineCols()
is a generic function documented in the man page for
RectangularData objects (see ?RectangularData
).
The method for DataFrame objects behaves as documented in
that man page.
combineUniqueCols(x, ..., use.names=TRUE)
: This function
is documented in the man page for RectangularData objects
(see ?RectangularData
).
merge(x, y, ...)
: Merges two DataFrame objects
x
and y
, with arguments in ...
being
the same as those allowed by the base merge()
. It is
allowed for either x
or y
to be a data.frame
.
Michael Lawrence, Hervé Pagès, and Aaron Lun
DataFrame-utils for other common operations on DataFrame objects.
DataFrame objects.
TransposedDataFrame objects.
RectangularData objects.
## --------------------------------------------------------------------- ## rbind() ## --------------------------------------------------------------------- x1 <- DataFrame(A=1:5, B=letters[1:5], C=11:15) y1 <- DataFrame(B=c(FALSE, NA, TRUE), C=c(FALSE, NA, TRUE), A=101:103) rbind(x1, y1) x2 <- DataFrame(A=Rle(101:103, 3:1), B=Rle(51:52, c(1, 5))) y2 <- DataFrame(A=runif(2), B=Rle(c("a", "b"))) rbind(x2, y2) ## --------------------------------------------------------------------- ## combineRows() ## --------------------------------------------------------------------- y3 <- DataFrame(A=runif(2)) combineRows(x2, y3) y4 <- DataFrame(B=Rle(c("a", "b")), C=runif(2)) combineRows(x2, y4) combineRows(y4, x2) combineRows(y4, x2, DataFrame(D=letters[1:3], B=301:303)) ## --------------------------------------------------------------------- ## combineCols() ## --------------------------------------------------------------------- X <- DataFrame(x=1) Y <- DataFrame(y="A") Z <- DataFrame(z=TRUE) combineCols(X, Y, Z, use.names=FALSE) Y <- DataFrame(y=LETTERS[1:2]) rownames(X) <- "foo" rownames(Y) <- c("foo", "bar") rownames(Z) <- "bar" combineCols(X, Y, Z) ## --------------------------------------------------------------------- ## combineUniqueCols() ## --------------------------------------------------------------------- X <- DataFrame(x=1) Y <- DataFrame(y=LETTERS[1:2], dup=1:2) Z <- DataFrame(z=TRUE, dup=2L) rownames(X) <- "foo" rownames(Y) <- c("foo", "bar") rownames(Z) <- "bar" combineUniqueCols(X, Y, Z) Z$dup <- 3 combineUniqueCols(X, Y, Z) ## --------------------------------------------------------------------- ## merge() ## --------------------------------------------------------------------- x6 <- DataFrame(key=c(155, 2, 33, 17, 2, 26, 1), aa=1:7) y6 <- DataFrame(key=1:26, bb=LETTERS) merge(x6, y6, by="key") merge(x6, y6, by="key", all.x=TRUE)
## --------------------------------------------------------------------- ## rbind() ## --------------------------------------------------------------------- x1 <- DataFrame(A=1:5, B=letters[1:5], C=11:15) y1 <- DataFrame(B=c(FALSE, NA, TRUE), C=c(FALSE, NA, TRUE), A=101:103) rbind(x1, y1) x2 <- DataFrame(A=Rle(101:103, 3:1), B=Rle(51:52, c(1, 5))) y2 <- DataFrame(A=runif(2), B=Rle(c("a", "b"))) rbind(x2, y2) ## --------------------------------------------------------------------- ## combineRows() ## --------------------------------------------------------------------- y3 <- DataFrame(A=runif(2)) combineRows(x2, y3) y4 <- DataFrame(B=Rle(c("a", "b")), C=runif(2)) combineRows(x2, y4) combineRows(y4, x2) combineRows(y4, x2, DataFrame(D=letters[1:3], B=301:303)) ## --------------------------------------------------------------------- ## combineCols() ## --------------------------------------------------------------------- X <- DataFrame(x=1) Y <- DataFrame(y="A") Z <- DataFrame(z=TRUE) combineCols(X, Y, Z, use.names=FALSE) Y <- DataFrame(y=LETTERS[1:2]) rownames(X) <- "foo" rownames(Y) <- c("foo", "bar") rownames(Z) <- "bar" combineCols(X, Y, Z) ## --------------------------------------------------------------------- ## combineUniqueCols() ## --------------------------------------------------------------------- X <- DataFrame(x=1) Y <- DataFrame(y=LETTERS[1:2], dup=1:2) Z <- DataFrame(z=TRUE, dup=2L) rownames(X) <- "foo" rownames(Y) <- c("foo", "bar") rownames(Z) <- "bar" combineUniqueCols(X, Y, Z) Z$dup <- 3 combineUniqueCols(X, Y, Z) ## --------------------------------------------------------------------- ## merge() ## --------------------------------------------------------------------- x6 <- DataFrame(key=c(155, 2, 33, 17, 2, 26, 1), aa=1:7) y6 <- DataFrame(key=1:26, bb=LETTERS) merge(x6, y6, by="key") merge(x6, y6, by="key", all.x=TRUE)
The DataFrame
class provides methods to compare across
rows of the DataFrame
, including ordering and matching. Each
DataFrame
is effectively treated as a vector of rows.
## S4 method for signature 'DataFrame' sameAsPreviousROW(x) ## S4 method for signature 'DataFrame,DataFrame' match(x, table, nomatch = NA_integer_, incomparables = NULL, ...) ## S4 method for signature 'DataFrame' order(..., na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "radix")) ## S4 method for signature 'DataFrame,DataFrame' pcompare(x, y) ## S4 method for signature 'DataFrame,DataFrame' e1 == e2 ## S4 method for signature 'DataFrame,DataFrame' e1 <= e2
## S4 method for signature 'DataFrame' sameAsPreviousROW(x) ## S4 method for signature 'DataFrame,DataFrame' match(x, table, nomatch = NA_integer_, incomparables = NULL, ...) ## S4 method for signature 'DataFrame' order(..., na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "radix")) ## S4 method for signature 'DataFrame,DataFrame' pcompare(x, y) ## S4 method for signature 'DataFrame,DataFrame' e1 == e2 ## S4 method for signature 'DataFrame,DataFrame' e1 <= e2
x , table , y , e1 , e2
|
A |
nomatch , incomparables
|
See |
... |
For For |
decreasing , na.last , method
|
See |
The treatment of a DataFrame
as a “vector of rows”
is useful in many cases, e.g., when each row is a record that needs
to be ordered or matched. The methods provided here allow the use of
all methods described in ?Vector-comparison
, including
sorting, matching, de-duplication, and so on.
Careful readers will notice this behaviour differs from the usual
semantics of a data.frame
, which acts as a list-like vector
of columns. This discrepancy rarely causes problems, as it is not
particularly common to compare columns of a data.frame
in
the first place.
Note that a match
method for DataFrame
objects is
explicitly defined to avoid calling the corresponding method for
List
objects, which would yield the (undesired) list-like
semantics. The same rationale is behind the explicit definition of
<=
and ==
despite the availability of pcompare
.
For sameAsPreviousROW
: see sameAsPreviousROW
.
For match
: see match
.
For order
: see order
.
For pcompare
, ==
and <=
: see pcompare
.
Aaron Lun
# Mocking up a DataFrame. DF <- DataFrame( A=sample(LETTERS, 100, replace=TRUE), B=sample(5, 100, replace=TRUE) ) # Matching: match(DF, DF[1:10,]) selfmatch(DF) unique(DF) # Ordering, alone and with other vectors: sort(DF) order(DF, runif(nrow(DF))) # Parallel comparison: DF==DF DF==DF[1,]
# Mocking up a DataFrame. DF <- DataFrame( A=sample(LETTERS, 100, replace=TRUE), B=sample(5, 100, replace=TRUE) ) # Matching: match(DF, DF[1:10,]) selfmatch(DF) unique(DF) # Ordering, alone and with other vectors: sort(DF) order(DF, runif(nrow(DF))) # Parallel comparison: DF==DF DF==DF[1,]
Common operations on DataFrame objects.
In the code snippet below, x
is a DataFrame object.
split(x, f, drop = FALSE)
:Splits x
into a SplitDataFrameList
object, according to f
, dropping elements corresponding
to unrepresented levels if drop
is TRUE
.
In the code snippet below, x
is a DataFrame object.
by(data, INDICES, FUN, ..., simplify = TRUE)
:Apply FUN
to each group of data
, a DataFrame,
formed by the factor (or list of factors) INDICES
. Exactly
the same contract as as.data.frame
.
In the code snippets below, x
is a DataFrame object.
na.omit(object)
:Returns a subset with incomplete cases removed.
na.exclude(object)
:Returns a subset with incomplete cases removed (but to be included with NAs in statistical results).
is.na(x)
:Returns a logical matrix indicating which cells are missing.
complete.cases(x)
:Returns a logical vector identifying which cases have no missing values.
In the code snippet below, x
is a DataFrame object.
transform(`_data`, ...)
: adds or replaces columns based on
expressions in ...
. See transform
.
A number of wrappers are implemented for performing statistical procedures, such as model fitting, with DataFrame objects.
xtabs(formula = ~., data, subset, na.action,
exclude = c(NA, NaN), drop.unused.levels = FALSE)
:Michael Lawrence
by
in the base package.
na.omit
in the stats package.
transform
in the base package.
xtabs
in the stats package.
splitAsList
in this package (S4Vectors).
SplitDataFrameList objects in the IRanges package.
DataFrame objects.
## split sw <- DataFrame(swiss) swsplit <- split(sw, sw[["Education"]]) ## rbind & cbind do.call(rbind, as.list(swsplit)) cbind(DataFrame(score), DataFrame(counts)) df <- DataFrame(as.data.frame(UCBAdmissions)) xtabs(Freq ~ Gender + Admit, df)
## split sw <- DataFrame(swiss) swsplit <- split(sw, sw[["Education"]]) ## rbind & cbind do.call(rbind, as.list(swsplit)) cbind(DataFrame(score), DataFrame(counts)) df <- DataFrame(as.data.frame(UCBAdmissions)) xtabs(Freq ~ Gender + Admit, df)
The DataFrameFactor class is a subclass of the Factor class where the levels are the rows of a DataFrame. It provides a few methods to mimic the behavior of an actual DataFrame while retaining the memory efficiency of the Factor structure.
DataFrameFactor(x, levels, index=NULL, ...) # constructor function
DataFrameFactor(x, levels, index=NULL, ...) # constructor function
x , levels
|
DataFrame objects. At least one of When See |
index |
|
... |
Optional metadata columns. |
A DataFrameFactor object.
DataFrameFactor objects support the same set of accessors as Factor
objects. In addition, it mimics some aspects of the DataFrame
interface. The general principle is that, for these methods, a
DataFrameFactor x
behaves like the expanded DataFrame
unfactor(x)
.
x$name
will return column name
from
levels(x)
and expand it according to the indices in x
.
x[i, j, ..., drop=TRUE]
will return a new DataFrameFactor
subsetted to entries i
, where the levels are subsetted by column to
contain only columns j
. If the resulting levels only have one column
and drop=TRUE
, the expanded values of the column are returned
directly.
dim(x)
will return the length of the DataFrameFactor and the
number of columns in its levels.
dimnames(x)
will return the names of the DataFrameFactor and
the column names in its levels.
The DataFrame-like methods implemented here are for convenience only.
Users should not assume that the DataFrameFactor complies with other aspects of
the DataFrame interface, due to fundamental differences between a DataFrame and
the Factor parent class, e.g., in the interpretation of their
“length”. Outside of the methods listed above, the DataFrameFactor is
not guaranteed to work as a drop-in replacement for a DataFrame - use
unfactor(x)
instead.
Aaron Lun
Factor objects for the parent class.
df <- DataFrame(X=sample(5, 100, replace=TRUE), Y=sample(c("A", "B"), 100, replace=TRUE)) dffac <- DataFrameFactor(df) dffac dffac$X dffac[,c("Y", "X")] dffac[1:10,"X"] colnames(dffac) # The usual Factor methods may also be used: unfactor(dffac) levels(dffac) as.integer(dffac)
df <- DataFrame(X=sample(5, 100, replace=TRUE), Y=sample(c("A", "B"), 100, replace=TRUE)) dffac <- DataFrameFactor(df) dffac dffac$X dffac[,c("Y", "X")] dffac[1:10,"X"] colnames(dffac) # The usual Factor methods may also be used: unfactor(dffac) levels(dffac) as.integer(dffac)
expand
transforms a DataFrame object into a new
DataFrame object where the columns specified by the user are
unlisted. The transformed DataFrame object has the same colnames
as the original but typically more rows.
## S4 method for signature 'DataFrame' expand(x, colnames, keepEmptyRows = FALSE, recursive = TRUE)
## S4 method for signature 'DataFrame' expand(x, colnames, keepEmptyRows = FALSE, recursive = TRUE)
x |
A DataFrame object with list-like columns or a Vector
object with list-like metadata columns (i.e. with list-like columns in
|
colnames |
A |
keepEmptyRows |
A |
recursive |
If |
A DataFrame object that has been expanded row-wise to match the length of the unlisted columns.
DataFrame objects.
library(IRanges) aa <- CharacterList("a", paste0("d", 1:2), paste0("b", 1:3), c(), "c") bb <- CharacterList(paste0("sna", 1:2),"foo", paste0("bar",1:3),c(),"hica") df <- DataFrame(aa=aa, bb=bb, cc=11:15) ## Expand by all list-like columns (aa, bb), dropping rows with empty ## list elements: expand(df) ## Expand the aa column only: expand(df, colnames="aa", keepEmptyRows=TRUE) expand(df, colnames="aa", keepEmptyRows=FALSE) ## Expand the aa and then the bb column: expand(df, colnames=c("aa","bb"), keepEmptyRows=TRUE) expand(df, colnames=c("aa","bb"), keepEmptyRows=FALSE) ## Expand the aa and dd column in parallel: df$dd <- relist(seq_along(unlist(aa)), aa) expand(df, colnames=c("aa","dd"), recursive=FALSE)
library(IRanges) aa <- CharacterList("a", paste0("d", 1:2), paste0("b", 1:3), c(), "c") bb <- CharacterList(paste0("sna", 1:2),"foo", paste0("bar",1:3),c(),"hica") df <- DataFrame(aa=aa, bb=bb, cc=11:15) ## Expand by all list-like columns (aa, bb), dropping rows with empty ## list elements: expand(df) ## Expand the aa column only: expand(df, colnames="aa", keepEmptyRows=TRUE) expand(df, colnames="aa", keepEmptyRows=FALSE) ## Expand the aa and then the bb column: expand(df, colnames=c("aa","bb"), keepEmptyRows=TRUE) expand(df, colnames=c("aa","bb"), keepEmptyRows=FALSE) ## Expand the aa and dd column in parallel: df$dd <- relist(seq_along(unlist(aa)), aa) expand(df, colnames=c("aa","dd"), recursive=FALSE)
The Factor class serves a similar role as factor in base R (a.k.a. ordinary factor) except that the levels of a Factor object can be any vector-like object, that is, they can be an ordinary vector or a Vector derivative, or even an ordinary factor or another Factor object.
A notable difference with ordinary factors is that Factor objects cannot
contain NA
s, at least for now.
Factor(x, levels, index=NULL, ...) # constructor function
Factor(x, levels, index=NULL, ...) # constructor function
x , levels
|
At least one of When When |
index |
|
... |
Optional metadata columns. |
There are 4 different ways to use the Factor()
constructor function:
Factor(x, levels)
(i.e. index
is missing):
In this case match(x, levels)
is used internally to encode
x
as a Factor object. An error is returned if some elements
in x
cannot be matched to levels
so it's important to
make sure that all the elements in x
are represented in
levels
when doing Factor(x, levels)
.
Factor(x)
(i.e. levels
and index
are missing):
This is equivalent to Factor(x, levels=unique(x))
.
Factor(levels=levels, index=index)
(i.e. x
is missing):
In this case the encoding of the Factor object is supplied via
index
, that is, index
must be an integer (or numeric)
vector of valid positive indices (no NA
s) into levels
.
This is the most efficient way to construct a Factor object.
Factor(levels=levels)
(i.e. x
and index
are
missing): This is a convenient way to construct a 0-length Factor
object with the specified levels. In other words, it's equivalent
to Factor(levels=levels, index=integer(0))
.
A Factor object.
Factor objects support the same set of accessors as ordinary factors. That is:
length(x)
to get the length of Factor object x
.
names(x)
and names(x) <- value
to get and set the
names of Factor object x
.
levels(x)
and levels(x) <- value
to get and set the
levels of Factor object x
.
nlevels(x)
to get the number of levels of Factor
object x
.
as.integer(x)
to get the encoding of Factor object x
.
Note that length(as.integer(x))
and
names(as.integer(x))
are the same as length(x)
and names(x)
, respectively.
In addition, because Factor objects are Vector derivatives, they
support the mcols()
and metadata()
getters and setters.
unfactor(x)
can be used to decode Factor object x
.
It returns an object of the same class as levels(x)
and same length
as x
. Note that it is the analog of as.character()
on ordinary
factors, with the notable difference that unfactor(x)
propagates the
names on x
.
For convenience, unfactor(x)
also works on ordinary factor x
.
unfactor()
supports extra arguments use.names
and
ignore.mcols
to control whether the names and metadata columns
on the Factor object to decode should be propagated or not.
By default they are propagated, that is, the default values for
use.names
and ignore.mcols
are TRUE
and
FALSE
, respectively.
From vector or Vector to Factor: coercion of a vector-like object x
to Factor is supported via as(x, "Factor")
and is equivalent to
Factor(x)
. There are 2 IMPORTANT EXCEPTIONS to this:
If x
is an ordinary factor, as(x, "Factor")
returns
a Factor with the same levels, encoding, and names, as x
.
Note that after coercing an ordinary factor to Factor, going back
to factor again (with as.factor()
) restores the original
object with no loss.
If x
is a Factor object, as(x, "Factor")
is either
a no-op (when x
is a Factor instance), or a
demotion to Factor (when x
is a Factor derivative like
GRangesFactor).
From Factor to integer: as.integer(x)
is supported on Factor object
x
and returns its encoding (see Accessors section above).
From Factor to factor: as.factor(x)
is supported on Factor object
x
and returns an ordinary factor where the levels are
as.character(levels(x))
.
From Factor to character: as.character(x)
is supported on Factor
object x
and is equivalent to unfactor(as.factor(x))
, which
is also equivalent to as.character(unfactor(x))
.
A Factor object can be subsetted with [
, like an ordinary factor.
2 or more Factor objects can be concatenated with c()
.
Note that, unlike with ordinary factors, c()
on Factor objects
preserves the class i.e. it returns a Factor object. In other words,
c()
acts as an endomorphism on Factor objects.
The levels of c(x, y)
are obtained by appending to levels(x)
the levels in levels(y)
that are "new" i.e. that are not already
in levels(x)
.
append()
, which is implemented on top of c()
, also works
on Factor objects.
Factor objects support comparing (e.g. ==
, !=
, <=
,
<
, match()
) and ordering (e.g. order()
, sort()
,
rank()
) operations. All these operations behave like they would
on the unfactored versions of their operands.
For example F1 <= F2
, match(F1, F2)
, and sort(F1)
,
are equivalent to unfactor(F1) <= unfactor(F2)
,
match(unfactor(F1), unfactor(F2))
, and sort(unfactor(F1))
,
respectively.
Hervé Pagès, with contributions from Aaron Lun
factor in base R.
GRangesFactor objects in the GenomicRanges package.
IRanges objects in the IRanges package.
Vector objects for the parent class.
anyDuplicated
in the BiocGenerics
package.
showClass("Factor") # Factor extends Vector ## --------------------------------------------------------------------- ## CONSTRUCTOR & ACCESSORS ## --------------------------------------------------------------------- library(IRanges) set.seed(123) ir0 <- IRanges(sample(5, 8, replace=TRUE), width=10, names=letters[1:8], ID=paste0("ID", 1:8)) ## Use explicit levels: ir1 <- IRanges(1:6, width=10) F1 <- Factor(ir0, levels=ir1) F1 length(F1) names(F1) levels(F1) # ir1 nlevels(F1) as.integer(F1) # encoding ## If we don't specify the levels, they'll be set to unique(ir0): F2 <- Factor(ir0) F2 length(F2) names(F2) levels(F2) # unique(ir0) nlevels(F2) as.integer(F2) ## --------------------------------------------------------------------- ## DECODING ## --------------------------------------------------------------------- unfactor(F1) stopifnot(identical(ir0, unfactor(F1))) stopifnot(identical(ir0, unfactor(F2))) unfactor(F1, use.names=FALSE) unfactor(F1, ignore.mcols=TRUE) ## --------------------------------------------------------------------- ## COERCION ## --------------------------------------------------------------------- F2b <- as(ir0, "Factor") # same as Factor(ir0) stopifnot(identical(F2, F2b)) as.factor(F2) as.factor(F1) as.character(F1) # same as unfactor(as.factor(F1)), # and also same as as.character(unfactor(F1)) ## On an ordinary factor 'f', 'as(f, "Factor")' and 'Factor(f)' are ## NOT the same: f <- factor(sample(letters, 500, replace=TRUE), levels=letters) as(f, "Factor") # same levels as 'f' Factor(f) # levels **are** 'f'! stopifnot(identical(f, as.factor(as(f, "Factor")))) ## --------------------------------------------------------------------- ## CONCATENATION ## --------------------------------------------------------------------- ir3 <- IRanges(c(5, 2, 8:6), width=10) F3 <- Factor(levels=ir3, index=2:4) F13 <- c(F1, F3) F13 levels(F13) stopifnot(identical(c(unfactor(F1), unfactor(F3)), unfactor(F13))) ## --------------------------------------------------------------------- ## COMPARING & ORDERING ## --------------------------------------------------------------------- F1 == F2 # same as unfactor(F1) == unfactor(F2) order(F1) # same as order(unfactor(F1)) order(F2) # same as order(unfactor(F2)) ## The levels of the Factor influence the order of the table: table(F1) table(F2)
showClass("Factor") # Factor extends Vector ## --------------------------------------------------------------------- ## CONSTRUCTOR & ACCESSORS ## --------------------------------------------------------------------- library(IRanges) set.seed(123) ir0 <- IRanges(sample(5, 8, replace=TRUE), width=10, names=letters[1:8], ID=paste0("ID", 1:8)) ## Use explicit levels: ir1 <- IRanges(1:6, width=10) F1 <- Factor(ir0, levels=ir1) F1 length(F1) names(F1) levels(F1) # ir1 nlevels(F1) as.integer(F1) # encoding ## If we don't specify the levels, they'll be set to unique(ir0): F2 <- Factor(ir0) F2 length(F2) names(F2) levels(F2) # unique(ir0) nlevels(F2) as.integer(F2) ## --------------------------------------------------------------------- ## DECODING ## --------------------------------------------------------------------- unfactor(F1) stopifnot(identical(ir0, unfactor(F1))) stopifnot(identical(ir0, unfactor(F2))) unfactor(F1, use.names=FALSE) unfactor(F1, ignore.mcols=TRUE) ## --------------------------------------------------------------------- ## COERCION ## --------------------------------------------------------------------- F2b <- as(ir0, "Factor") # same as Factor(ir0) stopifnot(identical(F2, F2b)) as.factor(F2) as.factor(F1) as.character(F1) # same as unfactor(as.factor(F1)), # and also same as as.character(unfactor(F1)) ## On an ordinary factor 'f', 'as(f, "Factor")' and 'Factor(f)' are ## NOT the same: f <- factor(sample(letters, 500, replace=TRUE), levels=letters) as(f, "Factor") # same levels as 'f' Factor(f) # levels **are** 'f'! stopifnot(identical(f, as.factor(as(f, "Factor")))) ## --------------------------------------------------------------------- ## CONCATENATION ## --------------------------------------------------------------------- ir3 <- IRanges(c(5, 2, 8:6), width=10) F3 <- Factor(levels=ir3, index=2:4) F13 <- c(F1, F3) F13 levels(F13) stopifnot(identical(c(unfactor(F1), unfactor(F3)), unfactor(F13))) ## --------------------------------------------------------------------- ## COMPARING & ORDERING ## --------------------------------------------------------------------- F1 == F2 # same as unfactor(F1) == unfactor(F2) order(F1) # same as order(unfactor(F1)) order(F2) # same as order(unfactor(F2)) ## The levels of the Factor influence the order of the table: table(F1) table(F2)
A FilterMatrix
object is a matrix meant for storing
the logical output of a set of FilterRules
, where
each rule corresponds to a column. The FilterRules
are stored
within the FilterMatrix
object, for the sake of
provenance. In general, a FilterMatrix
behaves like an
ordinary matrix
.
In the code snippets below, x
is a FilterMatrix
object.
filterRules(x)
: Get the FilterRules
corresponding to the columns of the matrix.
FilterMatrix(matrix, filterRules)
: Constructs a
FilterMatrix
, from a given matrix
and
filterRules
. Not usually called by the user, see
evalSeparately
.
summary(object, discarded = FALSE, percent = FALSE)
:Returns a numeric vector containing the total number of records
(nrow
), the number passed by each filter, and the number of
records that passed every filter. If discarded
is
TRUE
, then the numbers are inverted (i.e., the values are
subtracted from the number of rows). If percent
is
TRUE
, then the numbers are percent of total.
Michael Lawrence
evalSeparately
is the typical way to generate this
object.
FilterRules objects.
A FilterRules
object is a collection of filter
rules, which can be either expression
or function
objects. Rules can be disabled/enabled individually, facilitating
experimenting with different combinations of filters.
It is common to split a dataset into subsets during data
analysis. When data is large, however, representing subsets (e.g. by
logical vectors) and storing them as copies might become too costly in
terms of space. The FilterRules
class represents
subsets as lightweight expression
and/or function
objects. Subsets can then be calculated when needed (on the fly). This
avoids copying and storing a large number of subsets. Although it
might take longer to frequently recalculate a subset, it often is a
relatively fast operation and the space savings tend to be more than
worth it when data is large.
Rules may be either expressions or functions. Evaluating an expression
or invoking a function should result in a logical vector. Expressions
are often more convenient, but functions (i.e. closures) are generally
safer and more powerful, because the user can specify the enclosing
environment. If a rule is an expression, it is evaluated inside the
envir
argument to the eval
method (see below). If a
function, it is invoked with envir
as its only
argument. See examples.
In the code snippets below, x
is a FilterRules
object.
active(x)
: Get the logical vector of length
length(x)
, where TRUE
for an element indicates that
the corresponding rule in x
is active (and inactive
otherwise). Note that names(active(x))
is equal to
names(x)
.
active(x) <- value
: Replace the active state of the
filter rules. If value
is a logical vector, it should be of
length length(x)
and indicate which rules are
active. Otherwise, it can be either numeric or character vector, in which
case it sets the indicated rules (after dropping NA's) to active and
all others to inactive. See examples.
FilterRules(exprs = list(), ..., active = TRUE)
:Constructs a FilterRules
with the rules given in the list
exprs
or in ...
. The initial active state of the rules
is given by active
, which is recycled as
necessary. Elements in exprs
may be either character (parsed
into an expression), a language object (coerced to an expression), an
expression, or a function that takes at least one
argument. IMPORTANTLY, all arguments in ...
are
quote()
'd and then coerced to an expression. So,
for example, character data is only parsed if it is a literal.
The names of the filters are taken from the names of
exprs
and ...
, if given. Otherwise,
the character vectors take themselves as their name and the
others are deparsed (before any coercion). Thus, it is recommended
to always specify meaningful names. In any case, the names
are made valid and unique.
In the code snippets below, x
is a FilterRules
object.
x[i]
: Subsets the filter rules using the
same interface as for Vector
.
x[[i]]
: Extracts an expression or function via the same
interface as for List
.
x[[i]] <- value
: The same interface as for
List
. The default active state for new
rules is TRUE
.
In the code snippets below, x
is a FilterRules
object.
x & y
: Appends the rules in y
to the rules in
x
.
c(x, ..., recursive = FALSE)
: Concatenates the
FilterRule
instances in ...
onto the end of x
.
append(x, values, after = length(x))
: Appends the
values
FilterRules
instance onto x
at the
index given by after
.
eval(expr, envir = parent.frame(),
enclos = if (is.list(envir) || is.pairlist(envir))
parent.frame() else baseenv())
:Evaluates a FilterRules
instance (passed as the
expr
argument). Expression rules are
evaluated in envir
, while function rules are invoked with
envir
as their only argument. The evaluation of a rule
should yield a logical vector. The results from the rule
evaluations are combined via the AND operation (i.e. &
) so
that a single logical vector is returned from eval
.
evalSeparately(expr, envir = parent.frame(), enclos = if
(is.list(envir) || is.pairlist(envir)) parent.frame() else
baseenv())
: Evaluates separately each rule in a
FilterRules
instance (passed as the expr
argument). Expression rules are evaluated in envir
, while
function rules are invoked with envir
as their only
argument. The evaluation of a rule should yield a logical
vector. The results from the rule evaluations are combined into
a logical matrix, with a column for each rule. This is
essentially the parallel evaluator, while eval
is the
serial evaluator.
subsetByFilter(x, filter)
: Evaluates filter
on
x
and uses the result to subset x
. The result
contains only the elements in x
for which filter
evaluates to TRUE
.
summary(object, subject)
:Returns an integer vector with the number of elements
in subject
that pass each rule in object
, along with
a count of the elements that pass all filters.
When a closure (function) is included as a filter in a
FilterRules
object, it is converted to a FilterClosure
,
which is currently nothing more than a marker class that extends
function
. When a FilterClosure
filter is extracted,
there are some accessors and utilities for manipulating it:
params
:Gets a named list of the objects that are present in the enclosing environment (without inheritance). This assumes that a filter is constructed via a constructor function, and the objects in the frame of the constructor (typically, the formal arguments) are the parameters of the filter.
Michael Lawrence
FilterMatrix objects for storing the logical output of a set of FilterRules objects.
## constructing a FilterRules instance ## an empty set of filters filters <- FilterRules() ## as a simple character vector filts <- c("peaks", "promoters") filters <- FilterRules(filts) active(filters) # all TRUE ## with functions and expressions filts <- list(peaks = expression(peaks), promoters = expression(promoters), find_eboxes = function(rd) rep(FALSE, nrow(rd))) filters <- FilterRules(filts, active = FALSE) active(filters) # all FALSE ## direct, quoted args (character literal parsed) filters <- FilterRules(under_peaks = peaks, in_promoters = "promoters") filts <- list(under_peaks = expression(peaks), in_promoters = expression(promoters)) ## specify both exprs and additional args filters <- FilterRules(filts, diffexp = de) filts <- c("promoters", "peaks", "introns") filters <- FilterRules(filts) ## evaluation df <- DataFrame(peaks = c(TRUE, TRUE, FALSE, FALSE), promoters = c(TRUE, FALSE, FALSE, TRUE), introns = c(TRUE, FALSE, FALSE, FALSE)) eval(filters, df) fm <- evalSeparately(filters, df) identical(filterRules(fm), filters) summary(fm) summary(fm, percent = TRUE) fm <- evalSeparately(filters, df, serial = TRUE) ## set the active state directly active(filters) <- FALSE # all FALSE active(filters) <- TRUE # all TRUE active(filters) <- c(FALSE, FALSE, TRUE) active(filters)["promoters"] <- TRUE # use a filter name ## toggle the active state by name or index active(filters) <- c(NA, 2) # NA's are dropped active(filters) <- c("peaks", NA)
## constructing a FilterRules instance ## an empty set of filters filters <- FilterRules() ## as a simple character vector filts <- c("peaks", "promoters") filters <- FilterRules(filts) active(filters) # all TRUE ## with functions and expressions filts <- list(peaks = expression(peaks), promoters = expression(promoters), find_eboxes = function(rd) rep(FALSE, nrow(rd))) filters <- FilterRules(filts, active = FALSE) active(filters) # all FALSE ## direct, quoted args (character literal parsed) filters <- FilterRules(under_peaks = peaks, in_promoters = "promoters") filts <- list(under_peaks = expression(peaks), in_promoters = expression(promoters)) ## specify both exprs and additional args filters <- FilterRules(filts, diffexp = de) filts <- c("promoters", "peaks", "introns") filters <- FilterRules(filts) ## evaluation df <- DataFrame(peaks = c(TRUE, TRUE, FALSE, FALSE), promoters = c(TRUE, FALSE, FALSE, TRUE), introns = c(TRUE, FALSE, FALSE, FALSE)) eval(filters, df) fm <- evalSeparately(filters, df) identical(filterRules(fm), filters) summary(fm) summary(fm, percent = TRUE) fm <- evalSeparately(filters, df, serial = TRUE) ## set the active state directly active(filters) <- FALSE # all FALSE active(filters) <- TRUE # all TRUE active(filters) <- c(FALSE, FALSE, TRUE) active(filters)["promoters"] <- TRUE # use a filter name ## toggle the active state by name or index active(filters) <- c(NA, 2) # NA's are dropped active(filters) <- c("peaks", NA)
The Hits class is a container for representing a set of hits between a set of left nodes and a set of right nodes. Note that only the hits are stored in the object. No information about the left or right nodes is stored, except their number.
For example, the findOverlaps
function, defined
and documented in the IRanges package, returns the hits between
the query
and subject
arguments in a Hits
object.
## Constructor functions Hits(from=integer(0), to=integer(0), nLnode=0L, nRnode=0L, ..., sort.by.query=FALSE) SelfHits(from=integer(0), to=integer(0), nnode=0L, ..., sort.by.query=FALSE)
## Constructor functions Hits(from=integer(0), to=integer(0), nLnode=0L, nRnode=0L, ..., sort.by.query=FALSE) SelfHits(from=integer(0), to=integer(0), nnode=0L, ..., sort.by.query=FALSE)
from , to
|
2 integer vectors of the same length.
The values in |
nLnode , nRnode
|
Number of left and right nodes. |
... |
Metadata columns to set on the Hits object. All the metadata columns must
be vector-like objects of the same length as |
sort.by.query |
Should the hits in the returned object be sorted by query? If yes, then a SortedByQueryHits object is returned (SortedByQueryHits is a subclass of Hits). |
nnode |
Number of nodes. |
In the code snippets below, x
is a Hits object.
length(x)
:get the number of hits
from(x)
: Equivalent to as.data.frame(x)[[1]]
.
to(x)
: Equivalent to as.data.frame(x)[[2]]
.
nLnode(x)
, nrow(x)
:get the number of left nodes
nRnode(x)
, ncol(x)
:get the number of right nodes
countLnodeHits(x)
:Counts the number of hits for each left node, returning an integer vector.
countRnodeHits(x)
:Counts the number of hits for each right node, returning an integer vector.
The following accessors are just aliases for the above accessors:
queryHits(x)
: alias for from(x)
.
subjectHits(x)
: alias for to(x)
.
queryLength(x)
: alias for nLnode(x)
.
subjectLength(x)
: alias for nRnode(x)
.
countQueryHits(x)
: alias for countLnodeHits(x)
.
countSubjectHits(x)
: alias for countRnodeHits(x)
.
In the code snippets below, x
is a Hits object.
as.matrix(x)
: Coerces x
to a two
column integer matrix, with each row representing a hit
between a left node (first column) and a right node (second
column).
as.table(x)
: Counts the number of hits for
each left node in x
and outputs the counts as a table
.
as(x, "DataFrame")
: Creates a DataFrame by
combining the result of as.matrix(x)
with mcols(x)
.
as.data.frame(x)
: Attempts to coerce the result of
as(x, "DataFrame")
to a data.frame
.
In the code snippets below, x
is a Hits object.
x[i]
:Return a new Hits object made of the elements selected by i
.
x[i, j]
:Like the above, but allow the user to conveniently subset the metadata
columns thru j
.
x[i] <- value
:Replacement version of x[i]
.
See ?`[`
in this package (the S4Vectors
package) for more information about subsetting Vector derivatives and
for an important note about the x[i, j]
form.
c(x, ..., ignore.mcols=FALSE)
:Concatenate Hits object x
and the Hits objects in
...
together.
See ?c
in this package (the S4Vectors
package) for more information about concatenating Vector derivatives.
In the code snippets below, x
is a Hits object.
t(x)
:Transpose x
by interchanging the left and right nodes. This
allows, for example, counting the number of hits for each right node
using as.table
.
remapHits(x, Lnodes.remapping=NULL, new.nLnode=NA,
Rnodes.remapping=NULL, new.nRnode=NA)
:Only supports SortedByQueryHits objects at the moment.
Remaps the left and/or right nodes in x
. The left nodes are
remapped thru the map specified via the Lnodes.remapping
and
new.nLnode
arguments. The right nodes are remapped thru the
map specified via the Rnodes.remapping
and new.nRnode
arguments.
Lnodes.remapping
must represent a function defined on the
1..M interval that takes values in the 1..N interval, where N is
nLnode(x)
and M is the value specified by the user via the
new.nLnode
argument. Note that this mapping function doesn't
need to be injective or surjective. Also it is not represented by an R
function but by an integer vector of length M with no NAs. More precisely
Lnodes.remapping
can be NULL (identity map), or a vector of
nLnode(x)
non-NA integers that are >= 1 and
<= new.nLnode
, or a factor of length nLnode(x)
with no NAs (a factor is treated as an integer vector, and, if missing,
new.nLnode
is taken to be its number of levels). Note that
a factor will typically be used to represent a mapping function that is
not injective.
The same applies to the Rnodes.remapping
.
remapHits
returns a Hits object where from(x)
and
to(x)
have been remapped thru the 2 specified maps. This
remapping is actually only the 1st step of the transformation, and is
followed by 2 additional steps: (2) the removal of duplicated hits,
and (3) the reordering of the hits (first by query hits, then by subject
hits). Note that if the 2 maps are injective then the remapping won't
introduce duplicated hits, so, in that case, step (2) is a no-op (but
is still performed). Also if the "query map" is strictly ascending and
the "subject map" ascending then the remapping will preserve the order
of the hits, so, in that case, step (3) is also a no-op (but is still
performed).
breakTies(x, method=c("first", "last"), rank)
: Restrict the
hits so that every left node maps to at most one right node. If
method
is “first”, for each left node, select the
edge with the first (lowest rank) right node, if any. If
method
is “last”, select the edge with the last
(highest rank) right node. If rank
is not missing, it
should be a formula specifying an alternative ranking according to
its terms (see rank
).
A SelfHits object is a Hits object where the left and right nodes are
identical. For a SelfHits object x
, nLnode(x)
is equal to
nRnode(x)
. The object can be seen as an oriented graph where
nLnode
is the nb of nodes and the hits are the (oriented) edges.
SelfHits objects support the same set of accessors as Hits objects
plus the nnode()
accessor that is equivalent to nLnode()
and nRnode()
.
We also provide two little utilities to operate on a SelfHits object
x
:
isSelfHit(x)
: A self hit is an edge from a node to
itself. isSelfHit(x)
returns a logical vector parallel
to x
indicating which elements in x
are self hits.
isRedundantHit(x)
: When there is more than 1 edge between
2 given nodes (regardless of orientation), the extra edges are considered
to be redundant hits. isRedundantHit(x)
returns a logical
vector parallel to x
indicating which elements in x
are redundant hits.
Michael Lawrence and Hervé Pagès
Hits-comparison for comparing and ordering hits.
The findOverlaps
function in the
IRanges package which returns SortedByQueryHits object.
Hits-examples in the IRanges package, for some examples of Hits object basic manipulation.
setops-methods in the IRanges package, for set operations on Hits objects.
from <- c(5, 2, 3, 3, 3, 2) to <- c(11, 15, 5, 4, 5, 11) id <- letters[1:6] Hits(from, to, 7, 15, id) Hits(from, to, 7, 15, id, sort.by.query=TRUE) ## --------------------------------------------------------------------- ## selectHits() ## --------------------------------------------------------------------- x <- c("a", "b", "a", "c", "d") table <- c("a", "e", "d", "a", "a", "d") hits <- findMatches(x, table) # sorts the hits by query hits selectHits(hits, select="all") # no-op selectHits(hits, select="first") selectHits(hits, select="first", nodup=TRUE) selectHits(hits, select="last") selectHits(hits, select="last", nodup=TRUE) selectHits(hits, select="arbitrary") selectHits(hits, select="count") ## --------------------------------------------------------------------- ## remapHits() ## --------------------------------------------------------------------- Lnodes.remapping <- factor(c(a="A", b="B", c="C", d="D")[x], levels=LETTERS[1:4]) remapHits(hits, Lnodes.remapping=Lnodes.remapping) ## See ?`Hits-examples` in the IRanges package for more examples of basic ## manipulation of Hits objects. ## --------------------------------------------------------------------- ## SelfHits objects ## --------------------------------------------------------------------- hits2 <- SelfHits(c(2, 3, 3, 3, 3, 3, 4, 4, 4), c(4, 3, 2:4, 2, 2:3, 2), 4) ## Hits 2 and 4 are self hits (from 3rd node to itself): which(isSelfHit(hits2)) ## Hits 4, 6, 7, 8, and 9, are redundant hits: which(isRedundantHit(hits2)) hits3 <- findMatches(x) hits3[!isSelfHit(hits3)] hits3[!(isSelfHit(hits3) | isRedundantHit(hits3))]
from <- c(5, 2, 3, 3, 3, 2) to <- c(11, 15, 5, 4, 5, 11) id <- letters[1:6] Hits(from, to, 7, 15, id) Hits(from, to, 7, 15, id, sort.by.query=TRUE) ## --------------------------------------------------------------------- ## selectHits() ## --------------------------------------------------------------------- x <- c("a", "b", "a", "c", "d") table <- c("a", "e", "d", "a", "a", "d") hits <- findMatches(x, table) # sorts the hits by query hits selectHits(hits, select="all") # no-op selectHits(hits, select="first") selectHits(hits, select="first", nodup=TRUE) selectHits(hits, select="last") selectHits(hits, select="last", nodup=TRUE) selectHits(hits, select="arbitrary") selectHits(hits, select="count") ## --------------------------------------------------------------------- ## remapHits() ## --------------------------------------------------------------------- Lnodes.remapping <- factor(c(a="A", b="B", c="C", d="D")[x], levels=LETTERS[1:4]) remapHits(hits, Lnodes.remapping=Lnodes.remapping) ## See ?`Hits-examples` in the IRanges package for more examples of basic ## manipulation of Hits objects. ## --------------------------------------------------------------------- ## SelfHits objects ## --------------------------------------------------------------------- hits2 <- SelfHits(c(2, 3, 3, 3, 3, 3, 4, 4, 4), c(4, 3, 2:4, 2, 2:3, 2), 4) ## Hits 2 and 4 are self hits (from 3rd node to itself): which(isSelfHit(hits2)) ## Hits 4, 6, 7, 8, and 9, are redundant hits: which(isRedundantHit(hits2)) hits3 <- findMatches(x) hits3[!isSelfHit(hits3)] hits3[!(isSelfHit(hits3) | isRedundantHit(hits3))]
==
, !=
, <=
, >=
, <
, >
,
match()
, %in%
, order()
, sort()
, and
rank()
can be used on Hits objects to compare and order hits.
Note that only the "pcompare"
, "match"
, and "order"
methods are actually defined for Hits objects. This is all what is
needed to make all the other comparing and ordering operations (i.e.
==
, !=
, <=
, >=
, <
, >
,
%in%
, sort()
, and rank()
) work on these objects
(see ?`Vector-comparison`
for more information about this).
## S4 method for signature 'Hits,Hits' pcompare(x, y) ## S4 method for signature 'Hits,Hits' match(x, table, nomatch=NA_integer_, incomparables=NULL, method=c("auto", "quick", "hash")) ## S4 method for signature 'Hits' order(..., na.last=TRUE, decreasing=FALSE, method=c("auto", "shell", "radix"))
## S4 method for signature 'Hits,Hits' pcompare(x, y) ## S4 method for signature 'Hits,Hits' match(x, table, nomatch=NA_integer_, incomparables=NULL, method=c("auto", "quick", "hash")) ## S4 method for signature 'Hits' order(..., na.last=TRUE, decreasing=FALSE, method=c("auto", "shell", "radix"))
x , y , table
|
Compatible Hits objects, that is, Hits objects with the same subject and query lengths. |
nomatch |
The value to be returned in the case when no match is found.
It is coerced to an |
incomparables |
Not supported. |
method |
For For |
... |
One or more Hits objects. The additional Hits objects are used to break ties. |
na.last |
Ignored. |
decreasing |
|
Only hits that belong to Hits objects with same subject and query lengths can be compared.
Hits are ordered by query hit first, and then by subject hit.
On a Hits object, order
, sort
, and rank
are consistent with this order.
pcompare(x, y)
:Performs element-wise (aka "parallel") comparison of 2 Hits
objects x
and y
, that is, returns an integer vector where
the i-th element is less than, equal to, or greater than zero if
x[i]
is considered to be respectively less than, equal to, or
greater than y[i]
. See ?`Vector-comparison`
for
how x
or y
is recycled when the 2 objects don't have the
same length.
match(x, table, nomatch=NA_integer_, method=c("auto", "quick", "hash"))
:Returns an integer vector of the length of x
, containing the
index of the first matching hit in table
(or nomatch
if
there is no matching hit) for each hit in x
.
order(...)
:Returns a permutation which rearranges its first argument (a Hits object) into ascending order, breaking ties by further arguments (also Hits objects).
Hervé Pagès
Hits objects.
Vector-comparison for general information about comparing, ordering, and tabulating vector-like objects.
## --------------------------------------------------------------------- ## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 Hits OBJECTS ## --------------------------------------------------------------------- hits <- Hits(c(2, 4, 4, 4, 5, 5), c(3, 1, 3, 2, 3, 2), 6, 3) hits pcompare(hits, hits[3]) pcompare(hits[3], hits) hits == hits[3] hits != hits[3] hits >= hits[3] hits < hits[3] ## --------------------------------------------------------------------- ## B. match(), %in% ## --------------------------------------------------------------------- table <- hits[-c(1, 3)] match(hits, table) hits %in% table ## --------------------------------------------------------------------- ## C. order(), sort(), rank() ## --------------------------------------------------------------------- order(hits) sort(hits) rank(hits)
## --------------------------------------------------------------------- ## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 Hits OBJECTS ## --------------------------------------------------------------------- hits <- Hits(c(2, 4, 4, 4, 5, 5), c(3, 1, 3, 2, 3, 2), 6, 3) hits pcompare(hits, hits[3]) pcompare(hits[3], hits) hits == hits[3] hits != hits[3] hits >= hits[3] hits < hits[3] ## --------------------------------------------------------------------- ## B. match(), %in% ## --------------------------------------------------------------------- table <- hits[-c(1, 3)] match(hits, table) hits %in% table ## --------------------------------------------------------------------- ## C. order(), sort(), rank() ## --------------------------------------------------------------------- order(hits) sort(hits) rank(hits)
Perform set operations on Hits objects.
union(x, y)
, intersect(x, y)
, setdiff(x, y)
, and
setequal(x, y)
work on Hits objects x
and y
only if the objects are compatible Hits objects, that is, if they
have the same subject and query lengths. These operations return respectively
the union, intersection, (asymmetric!) difference, and equality of the
sets of hits in x
and y
.
union
returns a Hits object obtained by appending to x
the hits in y
that are not already in x
.
intersect
returns a Hits object obtained by keeping only
the hits in x
that are also in y
.
setdiff
returns a Hits object obtained by dropping from
x
the hits that are in y
.
setequal
returns TRUE
if x
and y
contain the
same sets of hits and FALSE
otherwise.
union
, intersect
, and setdiff
propagate the names and
metadata columns of their first argument (x
).
Hervé Pagès and Michael Lawrence
Hits objects.
Hits-comparison for comparing and ordering hits.
BiocGenerics::union
,
BiocGenerics::intersect
,
and BiocGenerics::setdiff
in the BiocGenerics package for general information about
these generic functions.
x <- Hits(c(2, 4, 4, 4, 5, 5), c(3, 1, 3, 2, 3, 2), 6, 3, score=11:16) x y <- Hits(c(1, 3, 4, 4, 5, 5, 5), c(3, 3, 2, 1, 2, 1, 3), 6, 3, score=21:27) y union(x, y) union(y, x) # same hits as in union(x, y), but in different order intersect(x, y) intersect(y, x) # same hits as in intersect(x, y), but in # different order setdiff(x, y) setdiff(y, x) setequal(x, y)
x <- Hits(c(2, 4, 4, 4, 5, 5), c(3, 1, 3, 2, 3, 2), 6, 3, score=11:16) x y <- Hits(c(1, 3, 4, 4, 5, 5, 5), c(3, 3, 2, 1, 2, 1, 3), 6, 3, score=21:27) y union(x, y) union(y, x) # same hits as in union(x, y), but in different order intersect(x, y) intersect(y, x) # same hits as in intersect(x, y), but in # different order setdiff(x, y) setdiff(y, x) setequal(x, y)
The HitsList class stores a set of Hits objects. It's typically
used to represent the result of findOverlaps
on
two IntegerRangesList objects.
Roughly the same set of utilities are provided for HitsList as for Hits:
The as.matrix
method coerces a HitsList object in a
similar way to Hits, except a column is prepended that indicates
which space (or element in the query IntegerRangesList)
to which the row corresponds.
The as.table
method flattens or unlists the list, counts the
number of hits for each query range and outputs the counts as a
table
, which has the same shape as from a single Hits
object.
To transpose a HitsList object x
, so that the subject
and query in each space are interchanged, call t(x)
. This
allows, for example, counting the number of hits for each subject
element using as.table
.
queryHits(x)
: Equivalent to
unname(as.matrix(x)[,1])
.
subjectHits(x)
: Equivalent to
unname(as.matrix(x)[,2])
.
space(x)
: gets the character vector naming the space
in the query IntegerRangesList for each hit,
or NULL
if the query did not have any names.
In the code snippets below, x
is a HitsList object.
as.matrix(x)
: calls as.matrix
on each
Hits, combines them row-wise and offsets the
indices so that they are aligned with the result of calling
unlist
on the query and subject.
as.table(x)
: counts the number of hits for each
query element in x
and outputs the counts as a table
,
which is aligned with the result of calling unlist
on the query.
t(x)
: Interchange the query and subject in each space
of x
, returns a transposed HitsList object.
This class is highly experimental. It has not been well tested and may disappear at any time.
Michael Lawrence
findOverlaps
in the IRanges package,
which returns a HitsList object when the query and subject are
IntegerRangesList objects.
hits <- Hits(rep(1:20, each=5), 100:1, 20, 100) hlist <- splitAsList(hits, 1:5) hlist hlist[[1]] hlist[[2]] ## Some sanity checks: hits1 <- Hits(c(4, 4, 15, 15), c(1, 2, 3, 4), 20, 4) hits2 <- Hits(c(4, 4, 15, 15), c(1, 2, 3, 4), 20, 4, sort.by.query=TRUE) fA <- c(1, 1, 2, 2) hlist1A <- split(hits1, fA) hlist2A <- split(hits2, fA) stopifnot(identical(as(hlist1A, "SortedByQueryHitsList"), hlist2A)) stopifnot(identical(hlist1A, as(hlist2A, "HitsList"))) fB <- c(1, 2, 1, 2) hlist1B <- split(hits1, fB) hlist2B <- split(hits2, fB) stopifnot(identical(as(hlist1B, "SortedByQueryHitsList"), hlist2B)) stopifnot(identical(hlist1B, as(hlist2B, "HitsList")))
hits <- Hits(rep(1:20, each=5), 100:1, 20, 100) hlist <- splitAsList(hits, 1:5) hlist hlist[[1]] hlist[[2]] ## Some sanity checks: hits1 <- Hits(c(4, 4, 15, 15), c(1, 2, 3, 4), 20, 4) hits2 <- Hits(c(4, 4, 15, 15), c(1, 2, 3, 4), 20, 4, sort.by.query=TRUE) fA <- c(1, 1, 2, 2) hlist1A <- split(hits1, fA) hlist2A <- split(hits2, fA) stopifnot(identical(as(hlist1A, "SortedByQueryHitsList"), hlist2A)) stopifnot(identical(hlist1A, as(hlist2A, "HitsList"))) fB <- c(1, 2, 1, 2) hlist1B <- split(hits1, fB) hlist2B <- split(hits2, fB) stopifnot(identical(as(hlist1B, "SortedByQueryHitsList"), hlist2B)) stopifnot(identical(hlist1B, as(hlist2B, "HitsList")))
Some low-level utility functions to operate on ordinary integer vectors.
isSequence(x, of.length=length(x)) toListOfIntegerVectors(x, sep=",") ## more to come...
isSequence(x, of.length=length(x)) toListOfIntegerVectors(x, sep=",") ## more to come...
x |
For For |
of.length |
The expected length of the integer sequence. |
sep |
The separator represented as a single-letter string. |
isSequence()
returns TRUE
or FALSE
depending
on whether x
is identical to seq_len(of.length)
or not.
toListOfIntegerVectors()
is a fast and memory-efficient
implementation of
lapply(strsplit(x, sep, fixed=TRUE), as.integer)
but, unlike the above code, it will raise an error if the input contains NAs or strings that don't represent integer values.
A list parallel to x
where each list element is an integer
vector.
Hervé Pagès
## --------------------------------------------------------------------- ## isSequence() ## --------------------------------------------------------------------- isSequence(1:5) # TRUE isSequence(5:1) # FALSE isSequence(0:5) # FALSE isSequence(integer(0)) # TRUE isSequence(1:5, of.length=5) # TRUE (the expected length) isSequence(1:5, of.length=6) # FALSE (not the expected length) ## --------------------------------------------------------------------- ## toListOfIntegerVectors() ## --------------------------------------------------------------------- x <- c("1116,0,-19", " +55291 , 2476,", "19184,4269,5659,6470,6721,7469,14601", "7778889, 426900, -4833,5659,6470,6721,7096", "19184 , -99999") y <- toListOfIntegerVectors(x) y ## When it doesn't choke on an NA or string that doesn't represent ## an integer value, toListOfIntegerVectors() is equivalent to ## the function below but is faster and more memory-efficient: toListOfIntegerVectors2 <- function(x, sep=",") { lapply(strsplit(x, sep, fixed=TRUE), as.integer) } y2 <- toListOfIntegerVectors2(x) stopifnot(identical(y, y2))
## --------------------------------------------------------------------- ## isSequence() ## --------------------------------------------------------------------- isSequence(1:5) # TRUE isSequence(5:1) # FALSE isSequence(0:5) # FALSE isSequence(integer(0)) # TRUE isSequence(1:5, of.length=5) # TRUE (the expected length) isSequence(1:5, of.length=6) # FALSE (not the expected length) ## --------------------------------------------------------------------- ## toListOfIntegerVectors() ## --------------------------------------------------------------------- x <- c("1116,0,-19", " +55291 , 2476,", "19184,4269,5659,6470,6721,7469,14601", "7778889, 426900, -4833,5659,6470,6721,7096", "19184 , -99999") y <- toListOfIntegerVectors(x) y ## When it doesn't choke on an NA or string that doesn't represent ## an integer value, toListOfIntegerVectors() is equivalent to ## the function below but is faster and more memory-efficient: toListOfIntegerVectors2 <- function(x, sep=",") { lapply(strsplit(x, sep, fixed=TRUE), as.integer) } y2 <- toListOfIntegerVectors2(x) stopifnot(identical(y, y2))
isSorted
and isStrictlySorted
test if a vector-like object
is sorted or strictly sorted, respectively.
isConstant
tests if a vector-like or array-like object is constant.
Currently only isConstant
methods for vectors or arrays of type
integer or double are implemented.
isSorted(x) isStrictlySorted(x) isConstant(x)
isSorted(x) isStrictlySorted(x) isConstant(x)
x |
A vector-like object. Can also be an array-like object for
|
Vector-like objects of length 0 or 1 are always considered to be sorted, strictly sorted, and constant.
Strictly sorted and constant objects are particular cases of sorted objects.
isStrictlySorted(x)
is equivalent to
isSorted(x) && !anyDuplicated(x)
A single logical i.e. TRUE
, FALSE
or NA
.
Hervé Pagès
duplicated
and unique
.
## --------------------------------------------------------------------- ## A. isSorted() and isStrictlySorted() ## --------------------------------------------------------------------- x <- 1:10 isSorted(x) # TRUE isSorted(-x) # FALSE isSorted(rev(x)) # FALSE isSorted(-rev(x)) # TRUE isStrictlySorted(x) # TRUE x2 <- rep(x, each=2) isSorted(x2) # TRUE isStrictlySorted(x2) # FALSE ## --------------------------------------------------------------------- ## B. "isConstant" METHOD FOR integer VECTORS ## --------------------------------------------------------------------- ## On a vector with no NAs: stopifnot(isConstant(rep(-29L, 10000))) ## On a vector with NAs: stopifnot(!isConstant(c(0L, NA, -29L))) stopifnot(is.na(isConstant(c(-29L, -29L, NA)))) ## On a vector of length <= 1: stopifnot(isConstant(NA_integer_)) ## --------------------------------------------------------------------- ## C. "isConstant" METHOD FOR numeric VECTORS ## --------------------------------------------------------------------- ## This method does its best to handle rounding errors and special ## values NA, NaN, Inf and -Inf in a way that "makes sense". ## Below we only illustrate handling of rounding errors. ## Here values in 'x' are "conceptually" the same: x <- c(11/3, 2/3 + 4/3 + 5/3, 50 + 11/3 - 50, 7.00001 - 1000003/300000) ## However, due to machine rounding errors, they are not *strictly* ## equal: duplicated(x) unique(x) ## only *nearly* equal: all.equal(x, rep(11/3, 4)) # TRUE ## 'isConstant(x)' uses 'all.equal()' internally to decide whether ## the values in 'x' are all the same or not: stopifnot(isConstant(x)) ## This is not perfect though: isConstant((x - 11/3) * 1e8) # FALSE on Intel Pentium paltforms # (but this is highly machine dependent!)
## --------------------------------------------------------------------- ## A. isSorted() and isStrictlySorted() ## --------------------------------------------------------------------- x <- 1:10 isSorted(x) # TRUE isSorted(-x) # FALSE isSorted(rev(x)) # FALSE isSorted(-rev(x)) # TRUE isStrictlySorted(x) # TRUE x2 <- rep(x, each=2) isSorted(x2) # TRUE isStrictlySorted(x2) # FALSE ## --------------------------------------------------------------------- ## B. "isConstant" METHOD FOR integer VECTORS ## --------------------------------------------------------------------- ## On a vector with no NAs: stopifnot(isConstant(rep(-29L, 10000))) ## On a vector with NAs: stopifnot(!isConstant(c(0L, NA, -29L))) stopifnot(is.na(isConstant(c(-29L, -29L, NA)))) ## On a vector of length <= 1: stopifnot(isConstant(NA_integer_)) ## --------------------------------------------------------------------- ## C. "isConstant" METHOD FOR numeric VECTORS ## --------------------------------------------------------------------- ## This method does its best to handle rounding errors and special ## values NA, NaN, Inf and -Inf in a way that "makes sense". ## Below we only illustrate handling of rounding errors. ## Here values in 'x' are "conceptually" the same: x <- c(11/3, 2/3 + 4/3 + 5/3, 50 + 11/3 - 50, 7.00001 - 1000003/300000) ## However, due to machine rounding errors, they are not *strictly* ## equal: duplicated(x) unique(x) ## only *nearly* equal: all.equal(x, rep(11/3, 4)) # TRUE ## 'isConstant(x)' uses 'all.equal()' internally to decide whether ## the values in 'x' are all the same or not: stopifnot(isConstant(x)) ## This is not perfect though: isConstant((x - 11/3) * 1e8) # FALSE on Intel Pentium paltforms # (but this is highly machine dependent!)
List objects are Vector objects with a "[["
,
elementType
and elementNROWS
method.
The List class serves a similar role as list in base R.
It adds one slot, the elementType
slot, to the two slots shared by
all Vector objects.
The elementType
slot is the preferred location for List
subclasses to store the type of data represented in the sequence. It is
designed to take a character of length 1 representing the class of the
sequence elements. While the List class performs no validity checking
based on elementType
, if a subclass expects elements to be of a
given type, that subclass is expected to perform the necessary validity
checking. For example, the subclass IntegerList (defined
in the IRanges package) has elementType = "integer"
and its
validity method checks if this condition is TRUE.
To be functional, a class that inherits from List must define at least
a "[["
method (in addition to the minimum set of Vector
methods).
List objects and derivatives are typically constructed using one of the following methods:
Many constructor functions are provided in S4Vectors and other
Bioconductor packages for List objects and derivatives e.g.
List()
, IntegerList()
,
RleList()
,
IntegerRangesList()
,
GRangesList()
, etc...
Which one to use depends on the particular type of List derivative
one wishes to construct e.g. use IntegerList()
to get an IntegerList object,
RleList()
to get an RleList
object, etc...
Note that the name of a constructor function is always the name of a valid class. See the man page of a particular constructor function for the details.
Many coercion methods are defined in S4Vectors and other Bioconductor packages to turn all kinds of objects into List objects.
One general and convenient way to convert any vector-like object
x
into a List is to call as(x, "List")
. This will
yield an object from a subclass of List. Note that this subclass
will typically extend CompressedList but not necessarily
(see ?CompressedList
in the IRanges
package for more information about CompressedList
objects).
However, if a specific type of List derivative is desired (e.g. CompressedGRangesList), then coercing explicitly to that class is preferrable as it is more robust and more readable.
splitAsList()
, relist()
,
or extractList()
splitAsList()
behaves like base::split()
except that it returns a List derivative instead of an ordinary
list. See ?splitAsList
for more information.
The relist()
methods for List objects and
derivatives, as well as the extractList()
function, are defined in the IRanges package.
They provide very efficient ways to construct a List derivative from
the vector-like object passed to their first argument (flesh
for relist()
and x
for extractList()
).
See ?extractList
in the IRanges
package for more information.
In the following code snippets, x
is a List object.
length(x)
:Get the number of list elements in x
.
names(x)
, names(x) <- value
:Get or set the names of the elements in the List.
mcols(x, use.names=FALSE)
, mcols(x) <- value
:Get or set the metadata columns. See Vector man page for more information.
elementType(x)
:Get the scalar string naming the class from which all elements must derive.
elementNROWS(x)
:Get the length (or nb of row for a matrix-like object) of each of
the elements. Equivalent to sapply(x, NROW)
.
isEmpty(x)
:Returns a logical indicating either if the sequence has no elements or if all its elements are empty.
To List.
as(x, "List")
: Converts a vector-like object into a
List, usually a CompressedList derivative.
One notable exception is when x
is an ordinary list,
in which case as(x, "List")
returns a SimpleList
derivative.
To explicitly request a SimpleList derivative, call
as(x, "SimpleList")
.
See ?CompressedList
(you might need to load
the IRanges package first) and ?SimpleList
for
more information about the CompressedList and SimpleList
representations.
From List. In the code snippets below, x
is a List object.
as.list(x, ...)
, as(from, "list")
:Turns x
into an ordinary list.
unlist(x, recursive=TRUE, use.names=TRUE)
:Concatenates the elements of x
into a single vector-like
object (of class elementType(x)
).
as.data.frame(x, row.names=NULL, optional=FALSE ,
value.name="value", use.outer.mcols=FALSE,
group_name.as.factor=FALSE, ...)
:Coerces a List
to a data.frame
. The result has the
same length as unlisted x
with two additional columns,
group
and group_name
. group
is an integer
that indicates which list element the record came from.
group_name
holds the list name associated with each
record; value is character
by default and factor
when
group_name.as.factor
is TRUE.
When use.outer.mcols
is TRUE the metadata columns on the
outer list elements of x
are replicated out and included
in the data.frame
. List objects that unlist to a
single vector (column) are given the column name 'value' by default.
A custom name can be provided in value.name
.
Splitting values in the resulting data.frame
by the original
groups in x
should be done using the group
column as
the f
argument to splitAsList
. To relist data, use
x
as the skeleton
argument to relist
.
In the code snippets below, x
is a List object.
x[i]
:Return a new List object made of the list elements selected by
subscript i
. Subscript i
can be of any type supported
by subsetting of a Vector object (see Vector man page for the
details), plus the following types: IntegerList,
LogicalList, CharacterList,
integer-RleList, logical-RleList,
character-RleList, and IntegerRangesList.
Those additional types perform subsetting within the list elements
rather than across them.
x[i] <- value
:Replacement version of x[i]
.
x[[i]]
:Return the selected list element i
, where i
is an
numeric or character vector of length 1.
x[[i]] <- value
:Replacement version of x[[i]]
.
x$name
, x$name <- value
:Similar to x[[name]]
and x[[name]] <- value
, but
name
is taken literally as an element name.
P. Aboyoun and H. Pagès
splitAsList for splitting a vector-like object into a List object.
relist and extractList in the IRanges package for efficiently constructing a List derivative from a vector-like object.
List-utils for common operations on List objects.
Vector objects for the parent class.
The SimpleList class for a direct extension of the List class.
The CompressedList class defined in the IRanges package for another direct extension of the List class.
The IntegerList, RleList, and IRanges classes and constructors defined in the IRanges package for some examples of List derivatives.
showClass("List") # shows only the known subclasses define in this package ## --------------------------------------------------------------------- ## A. CONSTRUCTION ## --------------------------------------------------------------------- x <- sample(500, 20) y0 <- splitAsList(x, x %% 4) y0 levels <- paste0("G", 1:10) f1 <- factor(sample(levels, length(x), replace=TRUE), levels=levels) y1 <- splitAsList(x, f1) y1 f2 <- factor(sample(levels, 26, replace=TRUE), levels=levels) y2 <- splitAsList(letters, f2) y2 library(IRanges) # for the NumericList() constructor and the # coercion to CompressedCharacterList NumericList(A=runif(10), B=NULL, C=runif(3)) ## Another way to obtain 'splitAsList(letters, f2)' but using ## 'splitAsList()' should be preferred as it is a lot more efficient: y2b <- as(split(letters, f2), "CompressedCharacterList") # inefficient! stopifnot(identical(y2, y2b)) ## --------------------------------------------------------------------- ## B. SUBSETTING ## --------------------------------------------------------------------- ## Single-bracket and double-bracket subsetting behave like on ordinary ## lists: y1[c(10, 1, 2, 2)] y1[c(-10, -1, -2)] y1[c(TRUE, FALSE)] y1[c("G8", "G1")] head(y1) tail(y1, n=3) y1[[2]] # note the difference with y1[2] y1[["G2"]] # note the difference with y1["G2"] y0[["3"]] y0[[3]] ## In addition to all the forms of subscripting supported by ordinary ## lists, List objects and derivatives accept a subscript that is a ## list-like object. This form of subsetting is called "list-style ## subsetting": i <- list(4:3, -2, 1) # ordinary list y1[i] i <- y1 >= 200 # LogicalList object y1[i] ## List-style subsetting also works with an RleList or IntegerRangesList ## subscript: i <- RleList(y1 >= 200) # RleList object y1[i] i <- IRangesList(RleList(y1 >= 200)) # IRangesList object y1[i] ## --------------------------------------------------------------------- ## C. THE "UNLIST -> TRANFORM -> RELIST" IDIOM ## --------------------------------------------------------------------- ## The "unlist -> transform -> relist" idiom is a very efficient way to ## apply the same simple transformation to all the **inner elements** of ## a list-like object (i.e. to all the elements of its list elements). ## The result is another list-like object with the same shape as the ## original object (but not necessarily the same class): relist(sqrt(unlist(y1)), y1) relist(toupper(unlist(y2)), y2) ## However note that sqrt(), toupper(), and many other base functions, ## can be used directly on a List derivative. This is because the IRanges ## package defines methods for these functions that know how to handle ## List objects: sqrt(y1) # same as 'relist(sqrt(unlist(y1)), y1)' toupper(y2) # same as 'relist(toupper(unlist(y2)), y2)'
showClass("List") # shows only the known subclasses define in this package ## --------------------------------------------------------------------- ## A. CONSTRUCTION ## --------------------------------------------------------------------- x <- sample(500, 20) y0 <- splitAsList(x, x %% 4) y0 levels <- paste0("G", 1:10) f1 <- factor(sample(levels, length(x), replace=TRUE), levels=levels) y1 <- splitAsList(x, f1) y1 f2 <- factor(sample(levels, 26, replace=TRUE), levels=levels) y2 <- splitAsList(letters, f2) y2 library(IRanges) # for the NumericList() constructor and the # coercion to CompressedCharacterList NumericList(A=runif(10), B=NULL, C=runif(3)) ## Another way to obtain 'splitAsList(letters, f2)' but using ## 'splitAsList()' should be preferred as it is a lot more efficient: y2b <- as(split(letters, f2), "CompressedCharacterList") # inefficient! stopifnot(identical(y2, y2b)) ## --------------------------------------------------------------------- ## B. SUBSETTING ## --------------------------------------------------------------------- ## Single-bracket and double-bracket subsetting behave like on ordinary ## lists: y1[c(10, 1, 2, 2)] y1[c(-10, -1, -2)] y1[c(TRUE, FALSE)] y1[c("G8", "G1")] head(y1) tail(y1, n=3) y1[[2]] # note the difference with y1[2] y1[["G2"]] # note the difference with y1["G2"] y0[["3"]] y0[[3]] ## In addition to all the forms of subscripting supported by ordinary ## lists, List objects and derivatives accept a subscript that is a ## list-like object. This form of subsetting is called "list-style ## subsetting": i <- list(4:3, -2, 1) # ordinary list y1[i] i <- y1 >= 200 # LogicalList object y1[i] ## List-style subsetting also works with an RleList or IntegerRangesList ## subscript: i <- RleList(y1 >= 200) # RleList object y1[i] i <- IRangesList(RleList(y1 >= 200)) # IRangesList object y1[i] ## --------------------------------------------------------------------- ## C. THE "UNLIST -> TRANFORM -> RELIST" IDIOM ## --------------------------------------------------------------------- ## The "unlist -> transform -> relist" idiom is a very efficient way to ## apply the same simple transformation to all the **inner elements** of ## a list-like object (i.e. to all the elements of its list elements). ## The result is another list-like object with the same shape as the ## original object (but not necessarily the same class): relist(sqrt(unlist(y1)), y1) relist(toupper(unlist(y2)), y2) ## However note that sqrt(), toupper(), and many other base functions, ## can be used directly on a List derivative. This is because the IRanges ## package defines methods for these functions that know how to handle ## List objects: sqrt(y1) # same as 'relist(sqrt(unlist(y1)), y1)' toupper(y2) # same as 'relist(toupper(unlist(y2)), y2)'
Various functions and methods for looping on List objects, functional programming on List objects, and evaluation of an expression in a List object.
## Looping on List objects: ## ------------------------ ## S4 method for signature 'List' lapply(X, FUN, ...) ## S4 method for signature 'List' sapply(X, FUN, ..., simplify=TRUE, USE.NAMES=TRUE) endoapply(X, FUN, ...) revElements(x, i) mendoapply(FUN, ..., MoreArgs=NULL) pc(...) ## Functional programming methods for List objects: ## ------------------------------------------------ ## S4 method for signature 'List' Reduce(f, x, init, right=FALSE, accumulate=FALSE) ## S4 method for signature 'List' Filter(f, x) ## S4 method for signature 'List' Find(f, x, right=FALSE, nomatch=NULL) ## S4 method for signature 'List' Map(f, ...) ## S4 method for signature 'List' Position(f, x, right=FALSE, nomatch=NA_integer_) ## Evaluation of an expression in a List object: ## --------------------------------------------- ## S4 method for signature 'List' within(data, expr, ...) ## Constructing list matrices: ## --------------------------------------------- ## S4 method for signature 'List' rbind(..., deparse.level=1L) ## S4 method for signature 'List' cbind(..., deparse.level=1L)
## Looping on List objects: ## ------------------------ ## S4 method for signature 'List' lapply(X, FUN, ...) ## S4 method for signature 'List' sapply(X, FUN, ..., simplify=TRUE, USE.NAMES=TRUE) endoapply(X, FUN, ...) revElements(x, i) mendoapply(FUN, ..., MoreArgs=NULL) pc(...) ## Functional programming methods for List objects: ## ------------------------------------------------ ## S4 method for signature 'List' Reduce(f, x, init, right=FALSE, accumulate=FALSE) ## S4 method for signature 'List' Filter(f, x) ## S4 method for signature 'List' Find(f, x, right=FALSE, nomatch=NULL) ## S4 method for signature 'List' Map(f, ...) ## S4 method for signature 'List' Position(f, x, right=FALSE, nomatch=NA_integer_) ## Evaluation of an expression in a List object: ## --------------------------------------------- ## S4 method for signature 'List' within(data, expr, ...) ## Constructing list matrices: ## --------------------------------------------- ## S4 method for signature 'List' rbind(..., deparse.level=1L) ## S4 method for signature 'List' cbind(..., deparse.level=1L)
X , x
|
A list, data.frame or List object. |
FUN |
The function to be applied to each element of |
... |
For For |
simplify , USE.NAMES
|
See |
MoreArgs |
A list of other arguments to |
i |
Index specifying the elements to replace. Can be anything supported
by |
f , init , right , accumulate , nomatch
|
See |
data |
A List object. |
expr |
Expression to evaluate. |
deparse.level |
See |
Like the standard lapply
function defined in the
base package, the lapply
method for List objects
returns a list of the same length as X
, with each element being
the result of applying FUN
to the corresponding element of X
.
Like the standard sapply
function defined in the
base package, the sapply
method for List objects
is a user-friendly version of lapply
by default returning a vector
or matrix if appropriate.
endoapply
and mendoapply
perform the endomorphic equivalents
of lapply
and mapply
by returning
objects of the same class as the inputs rather than an ordinary list.
revElements(x, i)
reverses the list elements in x
specified
by i
. It's equivalent to, but faster than, doing
x[i] <- endoapply(x[i], rev)
.
pc(...)
combine list-like objects by concatenating them in an
element-wise fashion. It's similar to, but faster than,
mapply(c, ..., SIMPLIFY=FALSE)
. With the following differences:
pc()
ignores the supplied objects that are NULL.
pc()
does not recycle its arguments. All the supplied
objects must have the same length.
If one of the supplied objects is a List object, then
pc()
returns a List object.
pc()
always returns a homogenous list or List object,
that is, an object where all the list elements have the same type.
The R base package defines some higher-order functions that are commonly
found in Functional Programming Languages.
See ?base::Reduce
for the details, and, in particular,
for a description of their arguments.
The S4Vectors package provides methods for List objects, so,
in addition to be an ordinary vector or list, the x
argument can
also be a List object.
within
evaluates expr
within as.env(data)
via
eval(data)
. Similar to with
, except assignments made
during evaluation are taken as assignments into data
, i.e.,
new symbols have their value appended to data
, and assigning
new values to existing symbols results in replacement.
There are methods for cbind
and rbind
that will bind
multiple lists together into a basic list matrix. The usual
geometric constraints apply. In the future, this might return a List
(+ dimensions), but for now the return value is an ordinary list.
endoapply
returns an object of the same class as X
,
each element of which is the result of applying FUN
to the
corresponding element of X
.
mendoapply
returns an object of the same class as the first
object specified in ...
, each element of which is the result
of applying FUN
to the corresponding elements of ...
.
pc
returns a list or List object of the same length as the
input objects.
See ?base::Reduce
for the value returned by the
functional programming methods.
See ?base::within
for the value returned by
within
.
cbind
and rbind
return a list matrix.
P. Aboyoun and H. Pagès
The List class.
base::lapply
and base::mapply
for the default lapply
and mapply
methods.
base::Reduce
for the default functional
programming methods.
base::within
for the default within
method.
base::cbind
and
base::rbind
for the default matrix binding
methods.
a <- data.frame(x = 1:10, y = rnorm(10)) b <- data.frame(x = 1:10, y = rnorm(10)) endoapply(a, function(x) (x - mean(x))/sd(x)) mendoapply(function(e1, e2) (e1 - mean(e1)) * (e2 - mean(e2)), a, b) x <- list(a=11:13, b=26:21, c=letters) y <- list(-(5:1), c("foo", "bar"), 0.25) pc(x, y) library(IRanges) x <- IntegerList(a=11:13, b=26:21, c=31:36, d=4:2) y <- NumericList(-(5:1), 1:2, numeric(0), 0.25) pc(x, y) Reduce("+", x) Filter(is.unsorted, x) pos1 <- Position(is.unsorted, x) stopifnot(identical(Find(is.unsorted, x), x[[pos1]])) pos2 <- Position(is.unsorted, x, right=TRUE) stopifnot(identical(Find(is.unsorted, x, right=TRUE), x[[pos2]])) y <- x * 1000L Map("c", x, y) rbind(x, y) cbind(x, y)
a <- data.frame(x = 1:10, y = rnorm(10)) b <- data.frame(x = 1:10, y = rnorm(10)) endoapply(a, function(x) (x - mean(x))/sd(x)) mendoapply(function(e1, e2) (e1 - mean(e1)) * (e2 - mean(e2)), a, b) x <- list(a=11:13, b=26:21, c=letters) y <- list(-(5:1), c("foo", "bar"), 0.25) pc(x, y) library(IRanges) x <- IntegerList(a=11:13, b=26:21, c=31:36, d=4:2) y <- NumericList(-(5:1), 1:2, numeric(0), 0.25) pc(x, y) Reduce("+", x) Filter(is.unsorted, x) pos1 <- Position(is.unsorted, x) stopifnot(identical(Find(is.unsorted, x), x[[pos1]])) pos2 <- Position(is.unsorted, x, right=TRUE) stopifnot(identical(Find(is.unsorted, x, right=TRUE), x[[pos2]])) y <- x * 1000L Map("c", x, y) rbind(x, y) cbind(x, y)
The LLint class is a container for storing a vector of large integers (i.e. long long int values at the C level).
LLint(length=0L) as.LLint(x) is.LLint(x)
LLint(length=0L) as.LLint(x) is.LLint(x)
length |
A non-negative number (i.e. integer, double, or LLint value) specifying the desired length. |
x |
Object to be coerced or tested. |
LLint vectors aim to provide the same functionality as integer vectors in base R but their values are stored as long long int values at the C level vs int values for integer vectors. Note that on Intel platforms long long int values are 64-bit and int values 32-bit only. Therefore LLint vectors can hold values in the +/-9.223e18 range (approximately) vs +/-2.147e9 only for integer vectors.
NAs are supported and the NA_LLint_
constant is predefined for
convenience as as(NA, "LLint")
.
Names are not supported for now.
Coercions from/to logical, integer, double, and character are supported.
Operations from the Arith
, Compare
and
Summary
groups are supported.
More operations coming soon...
Hervé Pagès
## A long long int uses 8 bytes (i.e. 64 bits) in C: .Machine$sizeof.longlong ## --------------------------------------------------------------------- ## SIMPLE EXAMPLES ## --------------------------------------------------------------------- LLint() LLint(10) as.LLint(3e9) as.LLint("3000000000") x <- as.LLint(1:10 * 111111111) x * x 5 * x # result as vector of doubles (i.e. 'x' coerced to double) 5L * x # result as LLint vector (i.e. 5L coerced to LLint vector) max(x) min(x) range(x) sum(x) x <- as.LLint(1:20) prod(x) x <- as.LLint(1:21) prod(x) # result is out of LLint range (+/-9.223e18) prod(as.numeric(x)) x <- as.LLint(1:75000) sum(x * x * x) == sum(x) * sum(x) ## Note that max(), min() and range() *always* return an LLint vector ## when called on an LLint vector, even when the vector is empty: max(LLint()) # NA with no warning min(LLint()) # NA with no warning ## This differs from how max(), min() and range() behave on an empty ## integer vector: max(integer()) # -Inf with a warning min(integer()) # Inf with a warning ## --------------------------------------------------------------------- ## GOING FROM STRINGS TO INTEGERS ## --------------------------------------------------------------------- ## as.integer() behaves like as.integer(as.double()) on a character ## vector. With the following consequence: s <- "-2.9999999999999999" as.integer(s) # -3 ## as.LLint() converts the string *directly* to LLint, without ## coercing to double first: as.LLint(s) # decimal part ignored ## --------------------------------------------------------------------- ## GOING FROM DOUBLE-PRECISION VALUES TO INTEGERS AND VICE-VERSA ## --------------------------------------------------------------------- ## Be aware that a double-precision value is not guaranteed to represent ## exactly an integer > 2^53. This can cause some surprises: 2^53 == 2^53 + 1 # TRUE, yep! ## And therefore: as.LLint(2^53) == as.LLint(2^53 + 1) # also TRUE ## This can be even more disturbing when passing a big literal integer ## value because the R parser will turn it into a double-precision value ## before passing it to as.LLint(): x1 <- as.LLint(9007199254740992) # same as as.LLint(2^53) x1 x2 <- as.LLint(9007199254740993) # same as as.LLint(2^53 + 1) x2 x1 == x2 # still TRUE ## However, no precision is lost if a string literal is used instead: x1 <- as.LLint("9007199254740992") x1 x2 <- as.LLint("9007199254740993") x2 x1 == x2 # FALSE x2 - x1 d1 <- as.double(x1) d2 <- as.double(x2) # warning! d1 == d2 # TRUE ## --------------------------------------------------------------------- ## LLint IS IMPLEMENTED AS AN S4 CLASS ## --------------------------------------------------------------------- class(LLint(10)) typeof(LLint(10)) # S4 storage.mode(LLint(10)) # S4 is.vector(LLint(10)) # FALSE is.atomic(LLint(10)) # FALSE ## This means that an LLint vector cannot go in an ordinary data ## frame: ## Not run: data.frame(id=as.LLint(1:5)) # error! ## End(Not run) ## A DataFrame needs to be used instead: DataFrame(id=as.LLint(1:5)) ## --------------------------------------------------------------------- ## SANITY CHECKS ## --------------------------------------------------------------------- x <- as.integer(c(0, 1, -1, -3, NA, -99)) y <- as.integer(c(-6, NA, -4:3, 0, 1999, 6:10, NA)) xx <- as.LLint(x) yy <- as.LLint(y) ## Operations from "Arith" group: stopifnot(identical(x + y, as.integer(xx + yy))) stopifnot(identical(as.LLint(y + x), yy + xx)) stopifnot(identical(x - y, as.integer(xx - yy))) stopifnot(identical(as.LLint(y - x), yy - xx)) stopifnot(identical(x * y, as.integer(xx * yy))) stopifnot(identical(as.LLint(y * x), yy * xx)) stopifnot(identical(x / y, xx / yy)) stopifnot(identical(y / x, yy / xx)) stopifnot(identical(x %/% y, as.integer(xx %/% yy))) stopifnot(identical(as.LLint(y %/% x), yy %/% xx)) stopifnot(identical(x %% y, as.integer(xx %% yy))) stopifnot(identical(as.LLint(y %% x), yy %% xx)) stopifnot(identical(x ^ y, xx ^ yy)) stopifnot(identical(y ^ x, yy ^ xx)) ## Operations from "Compare" group: stopifnot(identical(x == y, xx == yy)) stopifnot(identical(y == x, yy == xx)) stopifnot(identical(x != y, xx != yy)) stopifnot(identical(y != x, yy != xx)) stopifnot(identical(x <= y, xx <= yy)) stopifnot(identical(y <= x, yy <= xx)) stopifnot(identical(x >= y, xx >= yy)) stopifnot(identical(y >= x, yy >= xx)) stopifnot(identical(x < y, xx < yy)) stopifnot(identical(y < x, yy < xx)) stopifnot(identical(x > y, xx > yy)) stopifnot(identical(y > x, yy > xx)) ## Operations from "Summary" group: stopifnot(identical(max(y), as.integer(max(yy)))) stopifnot(identical(max(y, na.rm=TRUE), as.integer(max(yy, na.rm=TRUE)))) stopifnot(identical(min(y), as.integer(min(yy)))) stopifnot(identical(min(y, na.rm=TRUE), as.integer(min(yy, na.rm=TRUE)))) stopifnot(identical(range(y), as.integer(range(yy)))) stopifnot(identical(range(y, na.rm=TRUE), as.integer(range(yy, na.rm=TRUE)))) stopifnot(identical(sum(y), as.integer(sum(yy)))) stopifnot(identical(sum(y, na.rm=TRUE), as.integer(sum(yy, na.rm=TRUE)))) stopifnot(identical(prod(y), as.double(prod(yy)))) stopifnot(identical(prod(y, na.rm=TRUE), as.double(prod(yy, na.rm=TRUE))))
## A long long int uses 8 bytes (i.e. 64 bits) in C: .Machine$sizeof.longlong ## --------------------------------------------------------------------- ## SIMPLE EXAMPLES ## --------------------------------------------------------------------- LLint() LLint(10) as.LLint(3e9) as.LLint("3000000000") x <- as.LLint(1:10 * 111111111) x * x 5 * x # result as vector of doubles (i.e. 'x' coerced to double) 5L * x # result as LLint vector (i.e. 5L coerced to LLint vector) max(x) min(x) range(x) sum(x) x <- as.LLint(1:20) prod(x) x <- as.LLint(1:21) prod(x) # result is out of LLint range (+/-9.223e18) prod(as.numeric(x)) x <- as.LLint(1:75000) sum(x * x * x) == sum(x) * sum(x) ## Note that max(), min() and range() *always* return an LLint vector ## when called on an LLint vector, even when the vector is empty: max(LLint()) # NA with no warning min(LLint()) # NA with no warning ## This differs from how max(), min() and range() behave on an empty ## integer vector: max(integer()) # -Inf with a warning min(integer()) # Inf with a warning ## --------------------------------------------------------------------- ## GOING FROM STRINGS TO INTEGERS ## --------------------------------------------------------------------- ## as.integer() behaves like as.integer(as.double()) on a character ## vector. With the following consequence: s <- "-2.9999999999999999" as.integer(s) # -3 ## as.LLint() converts the string *directly* to LLint, without ## coercing to double first: as.LLint(s) # decimal part ignored ## --------------------------------------------------------------------- ## GOING FROM DOUBLE-PRECISION VALUES TO INTEGERS AND VICE-VERSA ## --------------------------------------------------------------------- ## Be aware that a double-precision value is not guaranteed to represent ## exactly an integer > 2^53. This can cause some surprises: 2^53 == 2^53 + 1 # TRUE, yep! ## And therefore: as.LLint(2^53) == as.LLint(2^53 + 1) # also TRUE ## This can be even more disturbing when passing a big literal integer ## value because the R parser will turn it into a double-precision value ## before passing it to as.LLint(): x1 <- as.LLint(9007199254740992) # same as as.LLint(2^53) x1 x2 <- as.LLint(9007199254740993) # same as as.LLint(2^53 + 1) x2 x1 == x2 # still TRUE ## However, no precision is lost if a string literal is used instead: x1 <- as.LLint("9007199254740992") x1 x2 <- as.LLint("9007199254740993") x2 x1 == x2 # FALSE x2 - x1 d1 <- as.double(x1) d2 <- as.double(x2) # warning! d1 == d2 # TRUE ## --------------------------------------------------------------------- ## LLint IS IMPLEMENTED AS AN S4 CLASS ## --------------------------------------------------------------------- class(LLint(10)) typeof(LLint(10)) # S4 storage.mode(LLint(10)) # S4 is.vector(LLint(10)) # FALSE is.atomic(LLint(10)) # FALSE ## This means that an LLint vector cannot go in an ordinary data ## frame: ## Not run: data.frame(id=as.LLint(1:5)) # error! ## End(Not run) ## A DataFrame needs to be used instead: DataFrame(id=as.LLint(1:5)) ## --------------------------------------------------------------------- ## SANITY CHECKS ## --------------------------------------------------------------------- x <- as.integer(c(0, 1, -1, -3, NA, -99)) y <- as.integer(c(-6, NA, -4:3, 0, 1999, 6:10, NA)) xx <- as.LLint(x) yy <- as.LLint(y) ## Operations from "Arith" group: stopifnot(identical(x + y, as.integer(xx + yy))) stopifnot(identical(as.LLint(y + x), yy + xx)) stopifnot(identical(x - y, as.integer(xx - yy))) stopifnot(identical(as.LLint(y - x), yy - xx)) stopifnot(identical(x * y, as.integer(xx * yy))) stopifnot(identical(as.LLint(y * x), yy * xx)) stopifnot(identical(x / y, xx / yy)) stopifnot(identical(y / x, yy / xx)) stopifnot(identical(x %/% y, as.integer(xx %/% yy))) stopifnot(identical(as.LLint(y %/% x), yy %/% xx)) stopifnot(identical(x %% y, as.integer(xx %% yy))) stopifnot(identical(as.LLint(y %% x), yy %% xx)) stopifnot(identical(x ^ y, xx ^ yy)) stopifnot(identical(y ^ x, yy ^ xx)) ## Operations from "Compare" group: stopifnot(identical(x == y, xx == yy)) stopifnot(identical(y == x, yy == xx)) stopifnot(identical(x != y, xx != yy)) stopifnot(identical(y != x, yy != xx)) stopifnot(identical(x <= y, xx <= yy)) stopifnot(identical(y <= x, yy <= xx)) stopifnot(identical(x >= y, xx >= yy)) stopifnot(identical(y >= x, yy >= xx)) stopifnot(identical(x < y, xx < yy)) stopifnot(identical(y < x, yy < xx)) stopifnot(identical(x > y, xx > yy)) stopifnot(identical(y > x, yy > xx)) ## Operations from "Summary" group: stopifnot(identical(max(y), as.integer(max(yy)))) stopifnot(identical(max(y, na.rm=TRUE), as.integer(max(yy, na.rm=TRUE)))) stopifnot(identical(min(y), as.integer(min(yy)))) stopifnot(identical(min(y, na.rm=TRUE), as.integer(min(yy, na.rm=TRUE)))) stopifnot(identical(range(y), as.integer(range(yy)))) stopifnot(identical(range(y, na.rm=TRUE), as.integer(range(yy, na.rm=TRUE)))) stopifnot(identical(sum(y), as.integer(sum(yy)))) stopifnot(identical(sum(y, na.rm=TRUE), as.integer(sum(yy, na.rm=TRUE)))) stopifnot(identical(prod(y), as.double(prod(yy)))) stopifnot(identical(prod(y, na.rm=TRUE), as.double(prod(yy, na.rm=TRUE))))
Pairs
is a Vector
that stores two parallel vectors (any
object that can be a column in a DataFrame
). It
provides conveniences for performing binary operations on the vectors,
as well as for converting between an equivalent list
representation. Virtually all of the typical R vector operations
should behave as expected.
A typical use case is representing the pairing from a
findOverlaps
call, for which
findOverlapPairs
is a shortcut.
Pairs(first, second, ..., names = NULL, hits = NULL)
:Constructs a Pairs object by aligning the vectors first
and
second
. The vectors must have the same length, unless
hits
is specified. Arguments in ...
are combined
as columns in the mcols
of the result. The names
argument specifies the names on the result. If hits
is not
NULL
, it should be a Hits
object that
collates the elements in first
and second
to produce
the corresponding pairs.
In the code snippets below, x
is a Pairs
object.
names(x)
, names(x) <- value
:get or set the names
first(x)
, first(x) <- value
:get or set the first member of each pair
second(x)
, second(x) <- value
:get or set the second member of each pair
zipup(x)
: Interleaves the Pairs
object
x
into a list, where each element is composed of a
pair. The type of list depends on the type of the elements.
zipdown(x)
: The inverse of zipup()
. Converts
x
, a list where every element is of length 2, to a
Pairs
object, by assuming that each element of the list
represents a pair.
In the code snippets below, x
is a Pairs
object.
x[i]
:Subset the Pairs object.
Michael Lawrence
Hits-class, a typical way to define a pairing.
findOverlapPairs
in the IRanges
package, which generates an instance of this class based on
overlaps.
setops-methods in the IRanges package, for set operations on Pairs objects.
p <- Pairs(1:10, Rle(1L, 10), score=rnorm(10), names=letters[1:10]) identical(first(p), 1:10) mcols(p)$score p as.data.frame(p) z <- zipup(p) first(p) <- Rle(1:10) identical(zipdown(z), p)
p <- Pairs(1:10, Rle(1L, 10), score=rnorm(10), names=letters[1:10]) identical(first(p), 1:10) mcols(p)$score p as.data.frame(p) z <- zipup(p) first(p) <- Rle(1:10) identical(zipdown(z), p)
RectangularData is a virtual class with no slots to be extended by classes that aim at representing objects with a 2D rectangular shape.
Some examples of RectangularData extensions are:
The DataFrame class defined in this package (S4Vectors).
The DelayedMatrix class defined in the DelayedArray package.
The SummarizedExperiment and Assays classes defined in the SummarizedExperiment package.
Any object that belongs to a class that extends RectangularData is called a RectangularData derivative.
Users should be able to access and manipulate RectangularData derivatives
via the standard 2D API defined in base R, that is, using things like
dim()
, nrow()
, ncol()
, dimnames()
, the 2D form
of [
(x[i, j]
), rbind()
, cbind()
, etc...
Not all RectangularData derivatives will necessarily support the full
2D API but they must support at least dim()
, nrow(x)
,
ncol(x)
, NROW(x)
, and NCOL(x)
. And of course,
dim()
must return an integer vector of length 2 on any of these
objects.
Developers who implement RectangularData extensions should also make
sure that they support low-level operations bindROWS()
and
bindCOLS()
.
In the following code snippets, x
is a RectangularData derivative.
Not all RectangularData derivatives will support all these accessors.
dim(x)
:Length two integer vector defined as c(nrow(x), ncol(x))
.
Must work on any RectangularData derivative.
nrow(x)
, ncol(x)
:Get the number of rows and columns, respectively. Must work on any RectangularData derivative.
NROW(x)
, NCOL(x)
:Same as nrow(x)
and ncol(x)
, respectively.
Must work on any RectangularData derivative.
dimnames(x)
:Length two list of character vectors defined as
list(rownames(x), colnames(x))
.
rownames(x)
, colnames(x)
:Get the names of the rows and columns, respectively.
In the code snippets below, x
is a RectangularData derivative.
x[i, j, drop=TRUE]
:Return a new RectangularData derivative of the same class as x
made of the selected rows and columns.
For single row and/or column selection, the drop
argument
specifies whether or not to "drop the dimensions" of the result.
More precisely, when drop=TRUE
(the default), a single
row or column is returned as a vector-like object (of length/NROW
equal to ncol(x)
if a single row, or equal to nrow(x)
if a single column).
Not all RectangularData derivatives support the drop
argument.
For example DataFrame and DelayedMatrix
objects support it (only for a single column
selection for DataFrame objects), but
SummarizedExperiment objects don't
(drop
is ignored for these objects and subsetting always
returns a SummarizedExperiment
derivative of the same class as x
).
head(x, n=6L)
:If n
is non-negative, returns the first n rows of the
RectangularData derivative.
If n
is negative, returns all but the last abs(n)
rows of the RectangularData derivative.
tail(x, n=6L)
:If n
is non-negative, returns the last n rows of the
RectangularData derivative.
If n
is negative, returns all but the first abs(n)
rows of the RectangularData derivative.
subset(x, subset, select, drop=FALSE)
:Return a new RectangularData derivative using:
logical expression indicating rows to keep, where missing values are taken as FALSE.
expression indicating columns to keep.
passed on to [
indexing operator.
In the code snippets below, all the input objects are expected to be RectangularData derivatives.
rbind(...)
:Creates a new RectangularData derivative by aggregating the rows of the input objects.
cbind(...)
:Creates a new RectangularData derivative by aggregating the columns of the input objects.
combineRows(x, ...)
: Creates a new RectangularData derivative
(of the same class as x
) by aggregating the rows of the input
objects. Unlike rbind()
, combineRows()
will handle cases
involving differences in the column names of the input objects by
adding the missing columns to them, and filling these
columns with NA
s. The column names of the returned object are a
union of the column names of the input objects.
Behaves like an endomorphism with respect to its first argument
i.e. returns an object of the same class as x
.
Finally note that this is a generic function with methods defined for DataFrame objects and other RectangularData derivatives.
combineCols(x, ..., use.names=TRUE)
: Creates a new
RectangularData derivative (of the same class as x
) by
aggregating the columns of the input objects. Unlike cbind()
,
combineCols()
will handle cases involving differences in the
number of rows of the input objects.
If use.names=TRUE
, all objects are expected to have
non-NULL
, non-duplicated row names. These row names do not have
to be the same, or even shared, across the input objects. Missing rows
in any individual input object are filled with NA
s, such
that the row names of the returned object are a union of the row names
of the input objects.
If use.names=FALSE
, all objects are expected to have the same
number of rows, and this function behaves the same as cbind()
.
The row names of the returned object is set to rownames(x)
.
Differences in the row names between input objects are ignored.
Behaves like an endomorphism with respect to its first argument
i.e. returns an object of the same class as x
.
Finally note that this is a generic function with methods defined for DataFrame objects and other RectangularData derivatives.
combineUniqueCols(x, ..., use.names=TRUE)
: Same as
combineCols()
, but this function will attempt to collapse
multiple columns with the same name across the input objects into a
single column in the output. This guarantees that the column names
in the output object are always unique. The only exception is for
unnamed columns, which are not collapsed. The function works on any
rectangular objects for which combineCols()
works.
When use.names=TRUE
, collapsing is only performed if the
duplicated column has identical values for the shared rows in the
input objects involved. Otherwise, the contents of the later
input object is simply ignored with a warning. Similarly, if
use.names=FALSE
, the duplicated columns must be identical
for all rows in the affected input objects.
Behaves like an endomorphism with respect to its first argument
i.e. returns an object of the same class as x
.
Finally note that this function is implemented on top of
combineCols()
and is expected to work on any RectangularData
derivatives for which combineCols()
works.
Hervé Pagès and Aaron Lun
DataFrame for a RectangularData extension that mimics
data.frame
objects from base R.
DataFrame-combine for combineRows()
,
combineCols()
, and combineUniqueCols()
examples
involving DataFrame objects.
data.frame objects in base R.
showClass("RectangularData") # shows (some of) the known subclasses
showClass("RectangularData") # shows (some of) the known subclasses
The Rle class is a general container for storing an atomic vector
that is stored in a run-length encoding format. It is based on the
rle
function from the base package.
Rle(values, lengths)
:This constructor creates an Rle instance out of an atomic
vector or factor object values
and an integer or numeric vector
lengths
with all positive elements that represent how many times
each value is repeated. The length of these two vectors must be the same.
lengths
can be missing in which case values
is turned
into an Rle.
In the code snippets below, x
is an Rle object:
runLength(x)
:Returns the run lengths for x
.
runValue(x)
:Returns the run values for x
.
nrun(x)
:Returns the number of runs in x
.
start(x)
:Returns the starts of the runs for x
.
end(x)
:Returns the ends of the runs for x
.
width(x)
:Same as runLength(x)
.
In the code snippets below, x
is an Rle object:
runLength(x) <- value
:Replaces x
with a new Rle object using run values
runValue(x)
and run lengths value
.
runValue(x) <- value
:Replaces x
with a new Rle object using run values
value
and run lengths runLength(x)
.
In the code snippets below, from
is an atomic vector:
as(from, "Rle")
:This coercion creates an Rle instances out of an atomic
vector from
.
In the code snippets below, x
and from
are Rle objects:
as.vector(x, mode="any")
, as(from, "vector")
:Creates an atomic vector based on the values contained in
x
. The vector will be coerced to the requested mode
,
unless mode
is "any", in which case the most appropriate
type is chosen.
as.factor(x)
, as(from, "factor")
: Creates a factor object
based on the values contained in x
.
as.data.frame(x)
, as(from, "data.frame")
: Creates
a data.frame
with a single column holding the result of
as.vector(x)
.
decode(x)
: Converts an Rle to its native form, such as an
atomic vector or factor. Calling decode
on a non-Rle will
return x
by default, so it is generally safe for ensuring
that an object is native.
In the code snippets below, x
is an Rle object:
x[i, drop=getOption("dropRle", default=FALSE)]
:Subsets x
by index i
, where i
can be positive
integers, negative integers, a logical vector of the same length as
x
, an Rle object of the same length as x
containing logical values, or an IRanges object.
When drop=FALSE
returns an Rle object. When drop=TRUE
,
returns an atomic vector.
x[i] <- value
:Replaces elements in x
specified by i
with corresponding
elements in value
. Supports the same types for i
as
x[i]
.
x %in% table
:Returns a logical Rle representing set membership in
table
.
c(x, ..., ignore.mcols=FALSE)
:Concatenate Rle object x
and the Rle objects in
...
together.
See ?c
in this package (the S4Vectors
package) for more information about concatenating Vector derivatives.
append(x, values, after = length(x))
:Insert one Rle into another Rle.
values
the Rle to insert.
after
the subscript in x
after which the values
are to be inserted.
findRun(x, vec)
:Returns an integer vector indicating the run indices in Rle vec
that are referenced by the indices in the integer vector x
.
head(x, n = 6L)
:If n
is non-negative, returns the first n elements of x
.
If n
is negative, returns all but the last abs(n)
elements
of x
.
is.na(x)
:Returns a logical Rle indicating which values are NA
.
is.finite(x)
:Returns a logical Rle indicating which values are finite.
is.unsorted(x, na.rm = FALSE, strictly = FALSE)
:Returns a logical value specifying if x
is unsorted.
na.rm
remove missing values from check.
strictly
check for _strictly_ increasing values.
length(x)
:Returns the underlying vector length of x
.
match(x, table, nomatch = NA_integer_, incomparables = NULL)
:Matches the values in x
to table
:
table
the values to be matched against.
nomatch
the value to be returned in the case when no match is found.
incomparables
a vector of values that cannot be matched.
Any value in x
matching a value in this vector is assigned
the nomatch
value.
rep(x, times, length.out, each)
, rep.int(x, times)
:Repeats the values in x
through one of the following conventions:
times
Vector giving the number of times to repeat each
element if of length length(x)
, or to repeat the whole vector
if of length 1.
length.out
Non-negative integer. The desired length of the output vector.
each
Non-negative integer. Each element of x
is
repeated each
times.
rev(x)
:Reverses the order of the values in x
.
show(object)
:Prints out the Rle object in a user-friendly way.
order(..., na.last=TRUE, decreasing=FALSE, method=c("auto", "shell", "radix"))
:Returns a permutation which rearranges its first argument
into ascending or descending order, breaking ties by further
arguments. See order
.
sort(x, decreasing=FALSE, na.last=NA)
:Sorts the values in x
.
decreasing
If TRUE
, sort values in decreasing
order. If FALSE
, sort values in increasing order.
na.last
If TRUE
, missing values are placed last.
If FALSE
, they are placed first. If NA
, they are
removed.
subset(x, subset)
:Returns a new Rle object made of the subset using logical vector
subset
.
table(...)
:Returns a table containing the counts of the unique values. Supported
arguments include useNA
with values of ‘no’ and ‘ifany’.
Multiple Rle's must be concatenated with c()
before calling
table
.
tabulate(bin, nbins = max(bin, 1L, na.rm = TRUE))
:Just like tabulate
, except optimized for Rle.
tail(x, n = 6L)
:If n
is non-negative, returns the last n elements of x
.
If n
is negative, returns all but the first abs(n)
elements
of x
.
unique(x, incomparables = FALSE, ...)
:Returns the unique run values. The incomparables
argument takes a
vector of values that cannot be compared with FALSE
being a special
value that means that all values can be compared.
In the code snippets below, x
and y
are Rle object or
some other vector-like object:
setdiff(x, y)
: Returns the unique elements in
x
that are not in y
.
union(x, y)
:Returns the unique elements in either x
or y
.
intersect(x, y)
:Returns the unique elements in both x
and y
.
P. Aboyoun
Rle-utils, Rle-runstat, and aggregate for more operations on Rle objects.
x <- Rle(10:1, 1:10) x runLength(x) runValue(x) nrun(x) diff(x) unique(x) sort(x) x[c(1,3,5,7,9)] x > 4 x2 <- Rle(LETTERS[c(21:26, 25:26)], 8:1) table(x2) y <- Rle(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,TRUE)) y as.vector(y) rep(y, 10) c(y, x > 5)
x <- Rle(10:1, 1:10) x runLength(x) runValue(x) nrun(x) diff(x) unique(x) sort(x) x[c(1,3,5,7,9)] x > 4 x2 <- Rle(LETTERS[c(21:26, 25:26)], 8:1) table(x2) y <- Rle(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,TRUE)) y as.vector(y) rep(y, 10) c(y, x > 5)
The runsum
, runmean
, runmed
, runwtsum
,
runq
functions calculate the sum, mean, median, weighted sum,
and order statistic for fixed width running windows.
runsum(x, k, endrule = c("drop", "constant"), ...) runmean(x, k, endrule = c("drop", "constant"), ...) ## S4 method for signature 'Rle' smoothEnds(y, k = 3) ## S4 method for signature 'Rle' runmed(x, k, endrule = c("median", "keep", "drop", "constant"), algorithm = NULL, print.level = 0) runwtsum(x, k, wt, endrule = c("drop", "constant"), ...) runq(x, k, i, endrule = c("drop", "constant"), ...)
runsum(x, k, endrule = c("drop", "constant"), ...) runmean(x, k, endrule = c("drop", "constant"), ...) ## S4 method for signature 'Rle' smoothEnds(y, k = 3) ## S4 method for signature 'Rle' runmed(x, k, endrule = c("median", "keep", "drop", "constant"), algorithm = NULL, print.level = 0) runwtsum(x, k, wt, endrule = c("drop", "constant"), ...) runq(x, k, i, endrule = c("drop", "constant"), ...)
x , y
|
The data object. |
k |
An integer indicating the fixed width of the running window. Must be
odd when |
endrule |
A character string indicating how the values at the beginning and the end (of the data) should be treated. |
wt |
A numeric vector of length |
i |
An integer in [0, k] indicating which order statistic to calculate. |
algorithm , print.level
|
See |
... |
Additional arguments passed to methods. Specifically,
|
The runsum
, runmean
, runmed
, runwtsum
,
and runq
functions provide efficient methods for calculating
the specified numeric summary by performing the looping in compiled code.
An object of the same class as x
.
P. Aboyoun and V. Obenchain
runmed
, Rle-class, RleList-class
x <- Rle(1:10, 1:10) runsum(x, k = 3) runsum(x, k = 3, endrule = "constant") runmean(x, k = 3) runwtsum(x, k = 3, wt = c(0.25, 0.5, 0.25)) runq(x, k = 5, i = 3, endrule = "constant") ## Missing and non-finite values x <- Rle(c(1, 2, NA, 0, 3, Inf, 4, NaN)) runsum(x, k = 2) runsum(x, k = 2, na.rm = TRUE) runmean(x, k = 2, na.rm = TRUE) runwtsum(x, k = 2, wt = c(0.25, 0.5), na.rm = TRUE) runq(x, k = 2, i = 2, na.rm = TRUE) ## max value in window ## The .naive_runsum() function demonstrates the semantics of ## runsum(). This test ensures the behavior is consistent with ## base::sum(). .naive_runsum <- function(x, k, na.rm=FALSE) sapply(0:(length(x)-k), function(offset) sum(x[1:k + offset], na.rm=na.rm)) x0 <- c(1, Inf, 3, 4, 5, NA) x <- Rle(x0) target1 <- .naive_runsum(x0, 3, na.rm = TRUE) target2 <- .naive_runsum(x, 3, na.rm = TRUE) stopifnot(target1 == target2) current <- as.vector(runsum(x, 3, na.rm = TRUE)) stopifnot(target1 == current) ## runmean() and runwtsum() : x <- Rle(c(2, 1, NA, 0, 1, -Inf)) runmean(x, k = 3) runmean(x, k = 3, na.rm = TRUE) runwtsum(x, k = 3, wt = c(0.25, 0.50, 0.25)) runwtsum(x, k = 3, wt = c(0.25, 0.50, 0.25), na.rm = TRUE) ## runq() : runq(x, k = 3, i = 1, na.rm = TRUE) ## smallest value in window runq(x, k = 3, i = 3, na.rm = TRUE) ## largest value in window ## When na.rm = TRUE, it is possible the number of non-NA ## values in the window will be less than the 'i' specified. ## Here we request the 4th smallest value in the window, ## which tranlates to the value at the 4/5 (0.8) percentile. x <- Rle(c(1, 2, 3, 4, 5)) runq(x, k=length(x), i=4, na.rm=TRUE) ## The same request on a Rle with two missing values ## finds the value at the 0.8 percentile of the vector ## at the new length of 3 after the NA's have been removed. ## This translates to round((0.8) * 3). x <- Rle(c(1, 2, 3, NA, NA)) runq(x, k=length(x), i=4, na.rm=TRUE)
x <- Rle(1:10, 1:10) runsum(x, k = 3) runsum(x, k = 3, endrule = "constant") runmean(x, k = 3) runwtsum(x, k = 3, wt = c(0.25, 0.5, 0.25)) runq(x, k = 5, i = 3, endrule = "constant") ## Missing and non-finite values x <- Rle(c(1, 2, NA, 0, 3, Inf, 4, NaN)) runsum(x, k = 2) runsum(x, k = 2, na.rm = TRUE) runmean(x, k = 2, na.rm = TRUE) runwtsum(x, k = 2, wt = c(0.25, 0.5), na.rm = TRUE) runq(x, k = 2, i = 2, na.rm = TRUE) ## max value in window ## The .naive_runsum() function demonstrates the semantics of ## runsum(). This test ensures the behavior is consistent with ## base::sum(). .naive_runsum <- function(x, k, na.rm=FALSE) sapply(0:(length(x)-k), function(offset) sum(x[1:k + offset], na.rm=na.rm)) x0 <- c(1, Inf, 3, 4, 5, NA) x <- Rle(x0) target1 <- .naive_runsum(x0, 3, na.rm = TRUE) target2 <- .naive_runsum(x, 3, na.rm = TRUE) stopifnot(target1 == target2) current <- as.vector(runsum(x, 3, na.rm = TRUE)) stopifnot(target1 == current) ## runmean() and runwtsum() : x <- Rle(c(2, 1, NA, 0, 1, -Inf)) runmean(x, k = 3) runmean(x, k = 3, na.rm = TRUE) runwtsum(x, k = 3, wt = c(0.25, 0.50, 0.25)) runwtsum(x, k = 3, wt = c(0.25, 0.50, 0.25), na.rm = TRUE) ## runq() : runq(x, k = 3, i = 1, na.rm = TRUE) ## smallest value in window runq(x, k = 3, i = 3, na.rm = TRUE) ## largest value in window ## When na.rm = TRUE, it is possible the number of non-NA ## values in the window will be less than the 'i' specified. ## Here we request the 4th smallest value in the window, ## which tranlates to the value at the 4/5 (0.8) percentile. x <- Rle(c(1, 2, 3, 4, 5)) runq(x, k=length(x), i=4, na.rm=TRUE) ## The same request on a Rle with two missing values ## finds the value at the 0.8 percentile of the vector ## at the new length of 3 after the NA's have been removed. ## This translates to round((0.8) * 3). x <- Rle(c(1, 2, 3, NA, NA)) runq(x, k=length(x), i=4, na.rm=TRUE)
Common operations on Rle objects.
Rle objects have support for S4 group generic functionality:
Arith
"+"
, "-"
, "*"
, "^"
,
"%%"
, "%/%"
, "/"
Compare
"=="
, ">"
, "<"
, "!="
,
"<="
, ">="
Logic
"&"
, "|"
Ops
"Arith"
, "Compare"
, "Logic"
Math
"abs"
, "sign"
, "sqrt"
,
"ceiling"
, "floor"
, "trunc"
, "cummax"
,
"cummin"
, "cumprod"
, "cumsum"
, "log"
,
"log10"
, "log2"
, "log1p"
, "acos"
,
"acosh"
, "asin"
, "asinh"
, "atan"
,
"atanh"
, "exp"
, "expm1"
, "cos"
,
"cosh"
, "sin"
, "sinh"
, "tan"
, "tanh"
,
"gamma"
, "lgamma"
, "digamma"
, "trigamma"
Math2
"round"
, "signif"
Summary
"max"
, "min"
, "range"
,
"prod"
, "sum"
, "any"
, "all"
Complex
"Arg"
, "Conj"
, "Im"
,
"Mod"
, "Re"
See S4groupGeneric for more details.
In the code snippets below, x
is an Rle object:
summary(object, ..., digits = max(3, getOption("digits") - 3))
:Summarizes the Rle object using an atomic vector convention. The
digits
argument is used for number formatting with
signif()
.
In the code snippets below, x
is an Rle object:
!x
:Returns logical negation (NOT) of x
.
which(x)
:Returns an integer vector representing the TRUE
indices of
x
.
In the code snippets below, x
is an Rle object:
diff(x, lag = 1, differences = 1
:Returns suitably lagged and iterated differences of x
.
lag
An integer indicating which lag to use.
differences
An integer indicating the order of the difference.
pmax(..., na.rm = FALSE)
, pmax.int(..., na.rm = FALSE)
:Parallel maxima of the Rle input values. Removes NA
s when
na.rm = TRUE
.
pmin(..., na.rm = FALSE)
, pmin.int(..., na.rm = FALSE)
:Parallel minima of the Rle input values. Removes NA
s when
na.rm = TRUE
.
which.max(x)
: Returns the index of the first element matching
the maximum value of x
.
mean(x, na.rm = FALSE)
:Calculates the mean of x
. Removes NA
s when
na.rm = TRUE
.
var(x, y = NULL, na.rm = FALSE)
:Calculates the variance of x
or covariance of x
and y
if both are supplied. Removes NA
s when na.rm = TRUE
.
cov(x, y, use = "everything")
, cor(x, y, use = "everything")
:Calculates the covariance and correlation respectively of Rle objects
x
and y
.
The use
argument is an optional character string giving a method for
computing covariances in the presence of missing values. This must be
(an abbreviation of) one of the strings "everything"
,
"all.obs"
, "complete.obs"
, "na.or.complete"
, or
"pairwise.complete.obs"
.
sd(x, na.rm = FALSE)
:Calculates the standard deviation of x
. Removes NA
s
when na.rm = TRUE
.
median(x, na.rm = FALSE)
:Calculates the median of x
. Removes NA
s when
na.rm = TRUE
.
quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 7, ...)
:Calculates the specified quantiles of x
.
probs
A numeric vector of probabilities with values in [0,1].
na.rm
If TRUE
, removes NA
s from x
before the quantiles are computed.
names
If TRUE
, the result has names describing the
quantiles.
type
An integer between 1 and 9 selecting one of the nine
quantile algorithms detailed in quantile
.
Further arguments passed to or from other methods.
mad(x, center = median(x), constant = 1.4826, na.rm = FALSE, low = FALSE, high = FALSE)
:Calculates the median absolute deviation of x
.
center
The center to calculate the deviation from.
constant
The scale factor.
na.rm
If TRUE
, removes NA
s from x
before the mad is computed.
low
If TRUE
, compute the 'lo-median'.
high
If TRUE
, compute the 'hi-median'.
IQR(x, na.rm = FALSE)
:Calculates the interquartile range of x
.
na.rm
If TRUE
, removes NA
s from x
before the IQR is computed.
smoothEnds(y, k = 3)
:Smooth end points of an Rle y
using subsequently smaller
medians and Tukey's end point rule at the very end.
k
An integer indicating the width of largest median window; must be odd.
In the code snippets below, x
is an Rle object:
nchar(x, type = "chars", allowNA = FALSE)
:Returns an integer Rle representing the number of characters in the
corresponding values of x
.
type
One of c("bytes", "chars", "width")
.
allowNA
Should NA
be returned for invalid multibyte
strings rather than throwing an error?
substr(x, start, stop)
, substring(text, first, last = 1000000L)
:Returns a character or factor Rle containing the specified substrings
beginning at start
/first
and ending at
stop
/last
.
chartr(old, new, x)
:Returns a character or factor Rle containing a translated version of
x
.
old
A character string specifying the characters to be translated.
new
A character string specifying the translations.
tolower(x)
:Returns a character or factor Rle containing a lower case version of
x
.
toupper(x)
:Returns a character or factor Rle containing an upper case version of
x
.
sub(pattern, replacement, x, ignore.case = FALSE,
perl = FALSE, fixed = FALSE, useBytes = FALSE)
:Returns a character or factor Rle containing replacements based on
matches determined by regular expression matching. See sub
for a description of the arguments.
gsub(pattern, replacement, x, ignore.case = FALSE,
perl = FALSE, fixed = FALSE, useBytes = FALSE)
:Returns a character or factor Rle containing replacements based on
matches determined by regular expression matching. See gsub
for a description of the arguments.
paste(..., sep = " ", collapse = NULL)
:Returns a character or factor Rle containing a concatenation of
the values in ...
.
In the code snippets below, x
is an Rle object:
levels(x)
, levels(x) <- value
:Gets and sets the factor levels, respectively.
nlevels(x)
:Returns the number of factor levels.
droplevels(x)
:Drops unused factor levels.
P. Aboyoun
Rle objects.
x <- Rle(10:1, 1:10) x sqrt(x) x^2 + 2 * x + 1 range(x) sum(x) mean(x) z <- c("the", "quick", "red", "fox", "jumps", "over", "the", "lazy", "brown", "dog") z <- Rle(z, seq_len(length(z))) chartr("a", "@", z) toupper(z)
x <- Rle(10:1, 1:10) x sqrt(x) x^2 + 2 * x + 1 range(x) sum(x) mean(x) z <- c("the", "quick", "red", "fox", "jumps", "over", "the", "lazy", "brown", "dog") z <- Rle(z, seq_len(length(z))) chartr("a", "@", z) toupper(z)
shiftApply
loops and applies a function overs subsequences
of vector-like objects X
and Y
.
shiftApply(SHIFT, X, Y, FUN, ..., OFFSET=0L, simplify=TRUE, verbose=FALSE)
shiftApply(SHIFT, X, Y, FUN, ..., OFFSET=0L, simplify=TRUE, verbose=FALSE)
SHIFT |
A non-negative integer vector of shift values. |
X , Y
|
The vector-like objects to shift. |
FUN |
The function, found via |
... |
Further arguments for |
OFFSET |
A non-negative integer offset to maintain throughout the shift operations. |
simplify |
A logical value specifying whether or not the result should be simplified to a vector or matrix if possible. |
verbose |
A logical value specifying whether or not to
print the |
Let i
be the indices in SHIFT
,
X_i = window(X, 1 + OFFSET, length(X) - SHIFT[i])
, and
Y_i = window(Y, 1 + SHIFT[i], length(Y) - OFFSET)
.
shiftApply
calculates the set of FUN(X_i, Y_i, ...)
values
and returns the results in a convenient form.
set.seed(0) lambda <- c(rep(0.001, 4500), seq(0.001, 10, length = 500), seq(10, 0.001, length = 500)) xRle <- Rle(rpois(1e7, lambda)) yRle <- Rle(rpois(1e7, lambda[c(251:length(lambda), 1:250)])) cor(xRle, yRle) shifts <- seq(235, 265, by=3) corrs <- shiftApply(shifts, yRle, xRle, FUN=cor) cor(xRle, yRle) shiftApply(249:251, yRle, xRle, FUN=function(x, y) var(x, y) / (sd(x) * sd(y)))
set.seed(0) lambda <- c(rep(0.001, 4500), seq(0.001, 10, length = 500), seq(10, 0.001, length = 500)) xRle <- Rle(rpois(1e7, lambda)) yRle <- Rle(rpois(1e7, lambda[c(251:length(lambda), 1:250)])) cor(xRle, yRle) shifts <- seq(235, 265, by=3) corrs <- shiftApply(shifts, yRle, xRle, FUN=cor) cor(xRle, yRle) shiftApply(249:251, yRle, xRle, FUN=function(x, y) var(x, y) / (sd(x) * sd(y)))
Low-level utilities that control display of vector-like objects.
get_showHeadLines() set_showHeadLines(n=5) get_showTailLines() set_showTailLines(n=5)
get_showHeadLines() set_showHeadLines(n=5) get_showTailLines() set_showTailLines(n=5)
n |
A non-negative integer that controls the number of vector elements to display. |
For the sake of keeping display compact, the show()
methods
for Vector derivatives only display 5 head and 5 tail vector
elements.
However, the number of head and tail elements to display can be changed
by setting global options showHeadLines
and showTailLines
to the desired values.
get_showHeadLines()
, set_showHeadLines()
,
get_showTailLines()
, and set_showTailLines()
are
convenience functions for getting/setting these global options.
get_showHeadLines()
and get_showTailLines()
return the
current showHeadLines
and showTailLines
values.
set_showHeadLines()
and set_showTailLines()
return the
showHeadLines
and showTailLines
values before the
change, invisibly.
library(IRanges) ir <- IRanges(start=11:45, width=10) ir # displays 5 head and 5 tail ranges set_showHeadLines(18) ir # displays 18 head ranges set_showHeadLines() # back to default
library(IRanges) ir <- IRanges(start=11:45, width=10) ir # displays 5 head and 5 tail ranges set_showHeadLines(18) ir # displays 18 head ranges set_showHeadLines() # back to default
The (non-virtual) SimpleList class extends the List virtual class.
The SimpleList class is the simplest, most generic concrete implementation of the List abstraction. It provides an implementation that subclasses can easily extend.
In a SimpleList object the list elements are stored internally in an ordinary list.
See the List man page for a quick overview of how to construct List objects in general.
The following constructor is provided for SimpleList objects:
SimpleList(...)
:Takes possibly named objects as elements for the new SimpleList object.
Same as for List objects. See the List man page for more information.
All the coercions documented in the List man page apply to SimpleList objects.
Same as for List objects. See the List man page for more information.
Same as for List objects. See ?`List-utils`
for
more information.
When a SimpleList object is displayed, the "Simple" prefix is removed
from the real class name of the object.
See classNameForDisplay
for more information about this.
List objects for the parent class.
The CompressedList class defined in the IRanges package for a more efficient alternative to SimpleList.
The SimpleIntegerList class defined in the IRanges package for a SimpleList subclass example.
The DataFrame class for another SimpleList subclass example.
## Displaying a SimpleList object: x1 <- SimpleList(a=letters, i=Rle(22:20, 4:2)) class(x1) ## The "Simple" prefix is removed from the real class name of the ## object: x1 library(IRanges) x2 <- IntegerList(11:12, integer(0), 3:-2, compress=FALSE) class(x2) ## The "Simple" prefix is removed from the real class name of the ## object: x2 ## This is controlled by internal helper classNameForDisplay(): classNameForDisplay(x2)
## Displaying a SimpleList object: x1 <- SimpleList(a=letters, i=Rle(22:20, 4:2)) class(x1) ## The "Simple" prefix is removed from the real class name of the ## object: x1 library(IRanges) x2 <- IntegerList(11:12, integer(0), 3:-2, compress=FALSE) class(x2) ## The "Simple" prefix is removed from the real class name of the ## object: x2 ## This is controlled by internal helper classNameForDisplay(): classNameForDisplay(x2)
split
divides the data in a vector-like object x
into the
groups defined by f
.
NOTE: This man page is for the split
methods defined in the
S4Vectors package. See ?base::split
for the
default method (defined in the base package).
## S4 method for signature 'Vector,ANY' split(x, f, drop=FALSE, ...) ## S4 method for signature 'ANY,Vector' split(x, f, drop=FALSE, ...) ## S4 method for signature 'Vector,Vector' split(x, f, drop=FALSE, ...) ## S4 method for signature 'list,Vector' split(x, f, drop=FALSE, ...) splitAsList(x, f, drop=FALSE, ...) relistToClass(x)
## S4 method for signature 'Vector,ANY' split(x, f, drop=FALSE, ...) ## S4 method for signature 'ANY,Vector' split(x, f, drop=FALSE, ...) ## S4 method for signature 'Vector,Vector' split(x, f, drop=FALSE, ...) ## S4 method for signature 'list,Vector' split(x, f, drop=FALSE, ...) splitAsList(x, f, drop=FALSE, ...) relistToClass(x)
x , f
|
2 vector-like objects of the same length. |
drop |
Logical indicating if levels that do not occur should be dropped (if
|
... |
Extra arguments passed to any of the first 3 Extra arguments passed to the last Extra arguments passed to |
The first 3 split()
methods just delegate to splitAsList()
.
The last split()
method just does:
split(x, as.vector(f), drop=drop, ...)
splitAsList()
is an S4 generic function. It is the workhorse
behind the first 3 split()
methods above. It behaves like
base::split()
except that it returns a List derivative
instead of an ordinary list. The exact class of this List
derivative depends only on the class of x
and can be obtained
independently with relistToClass(x)
.
Note that relistToClass(x)
is the opposite of elementType(y)
in the sense that the former returns the class of the result of relisting
(or splitting) x
while the latter returns the class of the result
of unlisting (or unsplitting) y
.
More formally, if x
is an object that is relistable and y
a list-like object:
relistToClass(x) is class(relist(x, some_skeleton)) elementType(y) is class(unlist(y))
Therefore, for any object x
for which relistToClass(x)
is defined and returns a valid class,
elementType(new(relistToClass(x)))
should return class(x)
.
splitAsList()
and the first 3 split()
methods behave like
base::split()
except that they return a List
derivative (of class relistToClass(x)
) instead of an
ordinary list. Like with base::split()
, all the
list elements in this object have the same class as x
.
The split
function in the base package.
The relist
methods and
extractList
generic function defined
in the IRanges package.
## On an Rle object: x <- Rle(101:105, 6:2) split(x, c("B", "B", "A", "B", "A")) ## On a DataFrame object: groups <- c("group1", "group2") DF <- DataFrame( a=letters[1:10], i=101:110, group=rep(factor(groups, levels=groups), c(3, 7)) ) split(DF, DF$group) ## Use splitAsList() if you need to split an ordinary vector into a ## List object: split(letters, 1:2) # ordinary list splitAsList(letters, 1:2) # List object
## On an Rle object: x <- Rle(101:105, 6:2) split(x, c("B", "B", "A", "B", "A")) ## On a DataFrame object: groups <- c("group1", "group2") DF <- DataFrame( a=letters[1:10], i=101:110, group=rep(factor(groups, levels=groups), c(3, 7)) ) split(DF, DF$group) ## Use splitAsList() if you need to split an ordinary vector into a ## List object: split(letters, 1:2) # ordinary list splitAsList(letters, 1:2) # List object
The S4Vectors package defines stack
methods
for List and matrix
objects.
It also introduces mstack()
, a variant of stack
where the list is taken as the list of arguments in ...
.
## S4 method for signature 'List' stack(x, index.var="name", value.var="value", name.var=NULL) ## S4 method for signature 'matrix' stack(x, row.var=names(dimnames(x))[1L], col.var=names(dimnames(x))[2L], value.var="value") mstack(..., .index.var="name")
## S4 method for signature 'List' stack(x, index.var="name", value.var="value", name.var=NULL) ## S4 method for signature 'matrix' stack(x, row.var=names(dimnames(x))[1L], col.var=names(dimnames(x))[2L], value.var="value") mstack(..., .index.var="name")
x |
A List derivative (for the |
index.var , .index.var
|
A single string specifying the column name for the index (source name) column. |
value.var |
A single string specifying the column name for the values. |
name.var |
TODO |
row.var , col.var
|
TODO |
... |
The objects to stack. Each of them should be a Vector
or |
As with stack
on a list
, stack
on a
List derivative constructs a DataFrame with two columns:
one for the unlisted values, the other indicating the name of the
element from which each value was obtained. index.var
specifies the column name for the index (source name) column and
value.var
specifies the column name for the values.
[TODO: Document stack()
method for matrix
objects.]
library(IRanges) starts <- IntegerList(c(1, 5), c(2, 8)) ends <- IntegerList(c(3, 8), c(5, 9)) rgl <- IRangesList(start=starts, end=ends) rangeDataFrame <- stack(rgl, "space", "ranges")
library(IRanges) starts <- IntegerList(c(1, 5), c(2, 8)) ends <- IntegerList(c(3, 8), c(5, 9)) rgl <- IRangesList(start=starts, end=ends) rangeDataFrame <- stack(rgl, "space", "ranges")
Low-level utility functions and classes defined in the S4Vectors package to support subsetting of vector-like objects. They are not intended to be used directly.
The TransposedDataFrame class is a container for representing a transposed DataFrame object, that is, a rectangular data container where the rows are the variables and the columns the observations.
A typical situation for using a TransposedDataFrame object
is when one needs to store a DataFrame object in the
assay()
component of
a SummarizedExperiment object
but the rows in the DataFrame object should correspond to the
samples and the columns to the features. In this case the
DataFrame object must first be transposed so that the variables
in it run "horizontally" instead of "vertically". See the Examples
section at the bottom of this man page for an example.
TransposedDataFrame objects are constructed by calling t()
on a DataFrame object.
Like for a DataFrame object, or, more generally, for a data-frame-like object, the length of a TransposedDataFrame object is its number of variables. However, unlike for a data-frame-like object, its length is also its number of rows, not its number of columns. For this reason, a TransposedDataFrame object is NOT considered to be a data-frame-like object.
Hervé Pagès
DataFrame objects.
SummarizedExperiment objects in the SummarizedExperiment package.
## A DataFrame object with 3 variables: df <- DataFrame(aa=101:126, bb=letters, cc=Rle(c(TRUE, FALSE), 13), row.names=LETTERS) dim(df) length(df) df$aa tdf <- t(df) tdf dim(tdf) length(tdf) tdf$aa t(tdf) # back to 'df' stopifnot(identical(df, t(tdf))) tdf$aa <- 0.05 * tdf$aa x1 <- DataFrame(A=1:5, B=letters[1:5], C=11:15) y1 <- DataFrame(B=c(FALSE, NA, TRUE), C=c(FALSE, NA, TRUE), A=101:103) cbind(t(x1), t(y1)) stopifnot(identical(t(rbind(x1, y1)), cbind(t(x1), t(y1)))) ## A TransposedDataFrame object can be used in the assay() component of a ## SummarizedExperiment object if the transposed layout is needed i.e. if ## the rows and columns of the original DataFrame object need to be treated ## as the samples and features (in this order) of the SummarizedExperiment ## object: library(SummarizedExperiment) se1 <- SummarizedExperiment(df) se1 assay(se1) # the 3 variables run "vertically" se2 <- SummarizedExperiment(tdf) se2 assay(se2) # the 3 variables run "horizontally"
## A DataFrame object with 3 variables: df <- DataFrame(aa=101:126, bb=letters, cc=Rle(c(TRUE, FALSE), 13), row.names=LETTERS) dim(df) length(df) df$aa tdf <- t(df) tdf dim(tdf) length(tdf) tdf$aa t(tdf) # back to 'df' stopifnot(identical(df, t(tdf))) tdf$aa <- 0.05 * tdf$aa x1 <- DataFrame(A=1:5, B=letters[1:5], C=11:15) y1 <- DataFrame(B=c(FALSE, NA, TRUE), C=c(FALSE, NA, TRUE), A=101:103) cbind(t(x1), t(y1)) stopifnot(identical(t(rbind(x1, y1)), cbind(t(x1), t(y1)))) ## A TransposedDataFrame object can be used in the assay() component of a ## SummarizedExperiment object if the transposed layout is needed i.e. if ## the rows and columns of the original DataFrame object need to be treated ## as the samples and features (in this order) of the SummarizedExperiment ## object: library(SummarizedExperiment) se1 <- SummarizedExperiment(df) se1 assay(se1) # the 3 variables run "vertically" se2 <- SummarizedExperiment(tdf) se2 assay(se2) # the 3 variables run "horizontally"
The Vector virtual class serves as the heart of the S4Vectors package and has over 90 subclasses. It serves a similar role as vector in base R.
The Vector class supports the storage of global and element-wise metadata:
The global metadata annotates the object as a whole:
this metadata is accessed via the metadata
accessor and
is represented as an ordinary list;
The element-wise metadata annotates individual elements
of the object: this metadata is accessed via the mcols
accessor (mcols
stands for metadata columns) and
is represented as a DataFrame object with a row for each
element and a column for each metadata variable. Note that the
element-wise metadata can also be NULL
.
To be functional, a class that inherits from Vector must define at
least a length
and a "["
method.
In the following code snippets, x
is a Vector object.
length(x)
:Get the number of elements in x
.
lengths(x, use.names=TRUE)
:Get the length of each of the elements.
Note: The lengths
method for Vector objects is currently
defined as an alias for elementNROWS
(with addition
of the use.names
argument), so is equivalent to
sapply(x, NROW)
, not to sapply(x, length)
.
NROW(x)
:Equivalent to either nrow(x)
or length(x)
, depending on
whether x
has dimensions (i.e. dim(x)
is not NULL
)
or not (i.e. dim(x)
is NULL
).
names(x)
, names(x) <- value
:Get or set the names of the elements in the Vector.
rename(x, value, ...)
:Replace the names of x
according to a mapping defined by a named
character vector, formed by concatenating value
with any
arguments in ...
. The names of the character vector
indicate the source names, and the corresponding values the
destination names. This also works on a plain old vector
.
unname(x)
: removes the names from x
, if any.
nlevels(x)
:Returns the number of factor levels.
mcols(x, use.names=TRUE)
, mcols(x) <- value
:Get or set the metadata columns.
If use.names=TRUE
and the metadata columns are not NULL
,
then the names of x
are propagated as the row names of the
returned DataFrame object.
When setting the metadata columns, the supplied value must be NULL
or a DataFrame object holding element-wise metadata.
elementMetadata(x, use.names=FALSE)
,
elementMetadata(x) <- value
,
values(x, use.names=FALSE)
,
values(x) <- value
:Alternatives to mcols
functions. Their use is discouraged.
as(from, "data.frame")
, as.data.frame(from)
:Coerces from
, a Vector
, to a data.frame
by
first coercing the Vector
to a vector
via
as.vector
. Note that many Vector
derivatives do not
support as.vector
, so this coercion is possible only for
certain types.
as.env(x)
:Constructs an environment object containing the elements of
mcols(x)
.
In the code snippets below, x
is a Vector object.
x[i]
:When supported, return a new Vector object of the same class as x
made of the elements selected by i
. i
can be missing;
an NA-free logical, numeric, or character vector or factor (as ordinary
vector or Rle object); or a IntegerRanges object.
x[i, j]
:Like the above, but allow the user to conveniently subset the metadata
columns thru j
.
NOTE TO DEVELOPERS: A Vector subclass with a true 2-D semantic (e.g.
SummarizedExperiment) needs to overwrite
the "["
method for Vector objects. This means that code intended
to operate on an arbitrary Vector derivative x
should not use
this feature as there is no guarantee that x
supports it. For
this reason this feature should preferrably be used interactively
only.
x[i] <- value
:Replacement version of x[i]
.
In the code snippets below, x
is a Vector object.
subset(x, subset, select, drop=FALSE, ...)
:Return a new Vector object made of the subset using logical vector
subset
, where missing values are taken as FALSE.
TODO: Document select
, drop
, and ...
.
window(x, start=NA, end=NA, width=NA)
:Extract the subsequence from x
that corresponds to the window
defined by start
, end
, and width
.
At most 2 of start
, end
, and width
can be set
to a non-NA
value, which must be a non-negative integer.
More precisely:
If width
is set to NA
, then start
or
end
or both can be set to NA
. In this case
start=NA
is equivalent to start=1
and
end=NA
is equivalent to end=length(x)
.
If width
is set to a non-negative integer value, then
one of start
or end
must be set to a non-negative
integer value and the other one to NA
.
head(x, n=6L)
:If n
is non-negative, returns the first n elements of the Vector
object.
If n
is negative, returns all but the last abs(n)
elements
of the Vector object.
tail(x, n=6L)
:If n
is non-negative, returns the last n elements of the Vector
object.
If n
is negative, returns all but the first abs(n)
elements
of the Vector object.
rev(x)
:Return a new Vector object made of the original elements in the reverse order.
rep(x, times, length.out, each)
: and rep.int(x, times)
:
Repeats the values in x
through one of the following conventions:
times
: Vector giving the number of times to repeat each
element if of length length(x)
, or to repeat the whole
vector if of length 1.
length.out
: Non-negative integer. The desired length of
the output vector.
each
: Non-negative integer. Each element of x
is
repeated each
times.
In the code snippets below, x
is a Vector object.
c(x, ..., ignore.mcols=FALSE)
:Concatenate x
and the Vector objects in ...
together.
Any object in ...
should belong to the same class as x
or to one of its subclasses. If not, then an attempt will be made to
coerce it with as(object, class(x), strict=FALSE)
.
NULL
s are accepted and ignored.
The result of the concatenation is an object of the same class
as x
.
Handling of the metadata columns:
If only one of the Vector objects has metadata columns,
then the corresponding metadata columns are attached to
the other Vector objects and set to NA
.
When multiple Vector objects have their own metadata columns, the user must ensure that each such DataFrame have identical layouts to each other (same columns defined), in order for the concatenation to be successful, otherwise an error will be thrown.
The user can call c(x, ..., ignore.mcols=FALSE)
in
order to concatenate Vector objects with differing sets of
metadata columns, which will result in the concatenated
object having NO metadata columns.
IMPORTANT NOTE: Be aware that calling c
with named arguments
(e.g. c(a=x, b=y)
) tends to break method dispatch so please
make sure that args
is an unnamed list when using
do.call(c, args)
to concatenate a list of objects together.
append(x, values, after=length(x))
:Insert the Vector
values
onto x
at the position
given by after
. values
must have an elementType
that extends that of x
.
expand.grid(...)
: Find cartesian product of every
vector in ...
and return a data.frame, each column of
which corresponds to an argument.
See expand.grid
.
[FOR ADVANCED USERS OR DEVELOPERS]
Displaying of a Vector object is controlled by 2 internal helpers,
classNameForDisplay
and showAsCell
.
For most objects classNameForDisplay(x)
just returns class(x)
.
However, for some objects it can return the name of a parent class that is
more suitable for display because it's simpler and as informative as the
real class name. See SimpleList objects (defined in this package)
and CompressedList objects (defined in the IRanges
package) for examples of objects for which classNameForDisplay
returns the name of a parent class.
showAsCell(x)
produces a character vector parallel to
x
(i.e. with one string per vector element in x
) that
contains compact string representations of each elements in x
.
Note that classNameForDisplay
and showAsCell
are generic
functions so developers can implement methods to control how their own
Vector extension gets displayed.
Vector-comparison for comparing, ordering, and tabulating vector-like objects.
Vector-setops for set operations on vector-like objects.
Vector-merge for merging vector-like objects.
Factor for a direct Vector extension that serves a similar role as factor in base R.
List for a direct Vector extension that serves a similar role as list in base R.
extractList for grouping elements of a vector-like object into a list-like object.
DataFrame which is the type of object returned by the
mcols
accessor.
The Annotated class, which Vector extends.
showClass("Vector") # shows (some of) the known subclasses
showClass("Vector") # shows (some of) the known subclasses
Generic functions and methods for comparing, ordering, and tabulating vector-like objects.
## Element-wise (aka "parallel") comparison of 2 Vector objects ## ------------------------------------------------------------ pcompare(x, y) ## S4 method for signature 'Vector,Vector' e1 == e2 ## S4 method for signature 'Vector,ANY' e1 == e2 ## S4 method for signature 'ANY,Vector' e1 == e2 ## S4 method for signature 'Vector,Vector' e1 <= e2 ## S4 method for signature 'Vector,ANY' e1 <= e2 ## S4 method for signature 'ANY,Vector' e1 <= e2 ## S4 method for signature 'Vector,Vector' e1 != e2 ## S4 method for signature 'Vector,ANY' e1 != e2 ## S4 method for signature 'ANY,Vector' e1 != e2 ## S4 method for signature 'Vector,Vector' e1 >= e2 ## S4 method for signature 'Vector,ANY' e1 >= e2 ## S4 method for signature 'ANY,Vector' e1 >= e2 ## S4 method for signature 'Vector,Vector' e1 < e2 ## S4 method for signature 'Vector,ANY' e1 < e2 ## S4 method for signature 'ANY,Vector' e1 < e2 ## S4 method for signature 'Vector,Vector' e1 > e2 ## S4 method for signature 'Vector,ANY' e1 > e2 ## S4 method for signature 'ANY,Vector' e1 > e2 ## sameAsPreviousROW() ## ------------------- sameAsPreviousROW(x) ## match() ## ------- ## S4 method for signature 'Vector,Vector' match(x, table, nomatch = NA_integer_, incomparables = NULL, ...) ## selfmatch() ## ----------- selfmatch(x, ...) ## duplicated() & unique() ## ----------------------- ## S4 method for signature 'Vector' duplicated(x, incomparables=FALSE, ...) ## S4 method for signature 'Vector' unique(x, incomparables=FALSE, ...) ## %in% ## ---- ## S4 method for signature 'Vector,Vector' x %in% table ## S4 method for signature 'Vector,ANY' x %in% table ## S4 method for signature 'ANY,Vector' x %in% table ## findMatches() & countMatches() ## ------------------------------ findMatches(x, table, select=c("all", "first", "last"), ...) countMatches(x, table, ...) ## sort() ## ------ ## S4 method for signature 'Vector' sort(x, decreasing=FALSE, na.last=NA, by) ## rank() ## ------ ## S4 method for signature 'Vector' rank(x, na.last = TRUE, ties.method = c("average", "first", "last", "random", "max", "min"), by) ## xtfrm() ## ------- ## S4 method for signature 'Vector' xtfrm(x) ## table() ## ------- ## S4 method for signature 'Vector' table(...)
## Element-wise (aka "parallel") comparison of 2 Vector objects ## ------------------------------------------------------------ pcompare(x, y) ## S4 method for signature 'Vector,Vector' e1 == e2 ## S4 method for signature 'Vector,ANY' e1 == e2 ## S4 method for signature 'ANY,Vector' e1 == e2 ## S4 method for signature 'Vector,Vector' e1 <= e2 ## S4 method for signature 'Vector,ANY' e1 <= e2 ## S4 method for signature 'ANY,Vector' e1 <= e2 ## S4 method for signature 'Vector,Vector' e1 != e2 ## S4 method for signature 'Vector,ANY' e1 != e2 ## S4 method for signature 'ANY,Vector' e1 != e2 ## S4 method for signature 'Vector,Vector' e1 >= e2 ## S4 method for signature 'Vector,ANY' e1 >= e2 ## S4 method for signature 'ANY,Vector' e1 >= e2 ## S4 method for signature 'Vector,Vector' e1 < e2 ## S4 method for signature 'Vector,ANY' e1 < e2 ## S4 method for signature 'ANY,Vector' e1 < e2 ## S4 method for signature 'Vector,Vector' e1 > e2 ## S4 method for signature 'Vector,ANY' e1 > e2 ## S4 method for signature 'ANY,Vector' e1 > e2 ## sameAsPreviousROW() ## ------------------- sameAsPreviousROW(x) ## match() ## ------- ## S4 method for signature 'Vector,Vector' match(x, table, nomatch = NA_integer_, incomparables = NULL, ...) ## selfmatch() ## ----------- selfmatch(x, ...) ## duplicated() & unique() ## ----------------------- ## S4 method for signature 'Vector' duplicated(x, incomparables=FALSE, ...) ## S4 method for signature 'Vector' unique(x, incomparables=FALSE, ...) ## %in% ## ---- ## S4 method for signature 'Vector,Vector' x %in% table ## S4 method for signature 'Vector,ANY' x %in% table ## S4 method for signature 'ANY,Vector' x %in% table ## findMatches() & countMatches() ## ------------------------------ findMatches(x, table, select=c("all", "first", "last"), ...) countMatches(x, table, ...) ## sort() ## ------ ## S4 method for signature 'Vector' sort(x, decreasing=FALSE, na.last=NA, by) ## rank() ## ------ ## S4 method for signature 'Vector' rank(x, na.last = TRUE, ties.method = c("average", "first", "last", "random", "max", "min"), by) ## xtfrm() ## ------- ## S4 method for signature 'Vector' xtfrm(x) ## table() ## ------- ## S4 method for signature 'Vector' table(...)
x , y , e1 , e2 , table
|
Vector-like objects. |
nomatch |
See |
incomparables |
The The See The |
select |
Only |
ties.method |
See |
decreasing , na.last
|
See |
by |
A formula referencing the metadata columns by which to sort,
e.g., |
... |
A Vector object for Otherwise, extra arguments supported by specific methods. In particular:
|
Doing pcompare(x, y)
on 2 vector-like objects x
and y
of length 1 must return an integer less than, equal to, or greater than zero
if the single element in x
is considered to be respectively less than,
equal to, or greater than the single element in y
.
If x
or y
have a length != 1, then they are typically expected
to have the same length so pcompare(x, y)
can operate element-wise,
that is, in that case it returns an integer vector of the same length
as x
and y
where the i-th element is the result of compairing
x[i]
and y[i]
. If x
and y
don't have the same
length and are not zero-length vectors, then the shortest is first
recycled to the length of the longest. If one of them is a zero-length
vector then pcompare(x, y)
returns a zero-length integer vector.
selfmatch(x, ...)
is equivalent to match(x, x, ...)
. This
is actually how the default ANY
method is implemented. However note
that the default selfmatch(x, ...)
for Vector x
will
typically be more efficient than match(x, x, ...)
, and can be made
even more so if a specific selfmatch
method is implemented for a
given subclass.
findMatches
is an enhanced version of match
which, by default
(i.e. if select="all"
), returns all the matches in a Hits
object.
countMatches
returns an integer vector of the length of x
containing the number of matches in table
for each element
in x
.
For pcompare
: see Details section above.
For sameAsPreviousROW
: a logical vector of length equal to x
,
indicating whether each entry of x
is equal to the previous entry.
The first entry is always FALSE
for a non-zero-length x
.
For match
and selfmatch
: an integer vector of the
same length as x
.
For duplicated
, unique
, and %in%
: see
?BiocGenerics::duplicated
,
?BiocGenerics::unique
,
and ?`%in%`
.
For findMatches
: a Hits object by default (i.e. if
select="all"
).
For countMatches
: an integer vector of the length of x
containing the number of matches in table
for each element
in x
.
For sort
: see ?BiocGenerics::sort
.
For xtfrm
: see ?base::xtfrm
.
For table
: a 1D array of integer values promoted to the
"table"
class. See ?BiocGeneric::table
for more information.
The following notes are for developers who want to implement comparing, ordering, and tabulating methods for their own Vector subclass.
Subclass comparison methods can be split into various categories. The first category must be implemented for each subclass, as these do not have sensible defaults for arbitrary Vector objects:
The S4Vectors package provides no order
method for
Vector objects. So calling order
on a Vector
derivative for which no specific order
method is defined
will use base::order
, which calls xtfrm
, with in
turn calls order
, which calls xtfrm
, and so on.
This infinite recursion of S4 dispatch eventually results in an
error about reaching the stack limit.
To avoid this behavior, a specialized order
method needs
to be implemented for specific Vector subclasses (e.g.
for Hits and IntegerRanges objects).
sameAsPreviousROW
is default implemented on top of the
==
method, so will work out-of-the-box on Vector
objects for which ==
works as expected. However, ==
is default implemented on top of pcompare
, which itself has
a default implementation that relies on sameAsPreviousROW
!
This again leads to infinite recursion and an error about the stack
limit.
To avoid this behavior, a specialized sameAsPreviousROW
method
must be implemented for specific Vector subclasses.
The second category contains methods that have default implementations provided for all Vector objects and their derivatives. These methods rely on the first category to provide sensible default behaviour without further work from the developer. However, it is often the case that greater efficiency can be achieved for a specific data structure by writing a subclass-specific version of these methods.
The pcompare
method for Vector objects is implemented
on top of order
and sameAsPreviousROW
, and so will
work out-of-the-box on Vector derivatives for which
order
and sameAsPreviousROW
work as expected.
The xtfrm
method for Vector objects is also implemented
on top of order
and sameAsPreviousROW
, and so will
also work out-of-the-box on Vector derivatives for which
order
and sameAsPreviousROW
work as expected.
selfmatch
is itself implemented on top of xtfrm
(indirectly, via grouping
) so it will work
out-of-the-box on Vector objects for which xtfrm
works as expected.
The match
method for Vector objects is
implemented on top of selfmatch
, so works out-of-the-box
on Vector objects for which selfmatch
works as expected.
(A careful reader may notice that xtfrm
and order
could be
swapped between categories to achieve the same effect. Similarly,
sameAsPreviousROW
and pcompare
could also be swapped. The exact
categorization of these methods is left to the discretion of the developer,
though this is mostly academic if both choices are specialized.)
The third category also contains methods that have default implementations, but unlike the second category, these defaults are straightforward and generally do not require any specialization for efficiency purposes.
The 6 traditional binary comparison operators are: ==
,
!=
, <=
, >=
, <
, and >
.
The S4Vectors package provides the following methods for
these operators:
setMethod("==", c("Vector", "Vector"), function(e1, e2) { pcompare(e1, e2) == 0L } ) setMethod("<=", c("Vector", "Vector"), function(e1, e2) { pcompare(e1, e2) <= 0L } ) setMethod("!=", c("Vector", "Vector"), function(e1, e2) { !(e1 == e2) } ) setMethod(">=", c("Vector", "Vector"), function(e1, e2) { e2 <= e1 } ) setMethod("<", c("Vector", "Vector"), function(e1, e2) { !(e2 <= e1) } ) setMethod(">", c("Vector", "Vector"), function(e1, e2) { !(e1 <= e2) } )
With these definitions, the 6 binary operators work out-of-the-box
on Vector objects for which pcompare
works the
expected way. If pcompare
is not implemented, then it's
enough to implement ==
and <=
methods to have the
4 remaining operators (!=
, >=
, <
, and
>
) work out-of-the-box.
The duplicated
, unique
, and %in%
methods for
Vector objects are implemented on top of selfmatch
,
duplicated
, and match
, respectively, so they work
out-of-the-box on Vector objects for which selfmatch
,
duplicated
, and match
work the expected way.
Also the default findMatches
and countMatches
methods
are implemented on top of match
and selfmatch
so they
work out-of-the-box on Vector objects for which those things
work the expected way.
The sort
method for Vector objects is implemented on
top of order
, so it works out-of-the-box on Vector
objects for which order
works the expected way.
The table
method for Vector objects is implemented on
top of selfmatch
, order
, and as.character
, so
it works out-of-the-box on a Vector object for which those
things work the expected way.
Hervé Pagès, with contributions from Aaron Lun
The Vector class.
Hits-comparison for comparing and ordering hits.
Vector-setops for set operations on vector-like objects.
Vector-merge for merging vector-like objects.
IntegerRanges-comparison in the IRanges package for comparing and ordering ranges.
==
and %in%
in the base package,
and BiocGenerics::match
,
BiocGenerics::duplicated
,
BiocGenerics::unique
,
BiocGenerics::order
,
BiocGenerics::sort
,
BiocGenerics::rank
in the
BiocGenerics package for general information about
the comparison/ordering operators and functions.
The Hits class.
BiocGeneric::table
in the
BiocGenerics package.
## --------------------------------------------------------------------- ## A. SIMPLE EXAMPLES ## --------------------------------------------------------------------- y <- c(16L, -3L, -2L, 15L, 15L, 0L, 8L, 15L, -2L) selfmatch(y) x <- c(unique(y), 999L) findMatches(x, y) countMatches(x, y) ## See ?`IntegerRanges-comparison` for more examples (on IntegerRanges ## objects). You might need to load the IRanges package first. ## --------------------------------------------------------------------- ## B. FOR DEVELOPERS: HOW TO IMPLEMENT THE BINARY COMPARISON OPERATORS ## FOR YOUR Vector SUBCLASS ## --------------------------------------------------------------------- ## The answer is: don't implement them. Just implement pcompare() and the ## binary comparison operators will work out-of-the-box. Here is an ## example: ## (1) Implement a simple Vector subclass. setClass("Raw", contains="Vector", representation(data="raw")) setMethod("length", "Raw", function(x) length(x@data)) setMethod("[", "Raw", function(x, i, j, ..., drop) { x@data <- x@data[i]; x } ) x <- new("Raw", data=charToRaw("AB.x0a-BAA+C")) stopifnot(identical(length(x), 12L)) stopifnot(identical(x[7:3], new("Raw", data=charToRaw("-a0x.")))) ## (2) Implement a "pcompare" method for Raw objects. setMethod("pcompare", c("Raw", "Raw"), function(x, y) {as.integer(x@data) - as.integer(y@data)} ) stopifnot(identical(which(x == x[1]), c(1L, 9L, 10L))) stopifnot(identical(x[x < x[5]], new("Raw", data=charToRaw(".-+"))))
## --------------------------------------------------------------------- ## A. SIMPLE EXAMPLES ## --------------------------------------------------------------------- y <- c(16L, -3L, -2L, 15L, 15L, 0L, 8L, 15L, -2L) selfmatch(y) x <- c(unique(y), 999L) findMatches(x, y) countMatches(x, y) ## See ?`IntegerRanges-comparison` for more examples (on IntegerRanges ## objects). You might need to load the IRanges package first. ## --------------------------------------------------------------------- ## B. FOR DEVELOPERS: HOW TO IMPLEMENT THE BINARY COMPARISON OPERATORS ## FOR YOUR Vector SUBCLASS ## --------------------------------------------------------------------- ## The answer is: don't implement them. Just implement pcompare() and the ## binary comparison operators will work out-of-the-box. Here is an ## example: ## (1) Implement a simple Vector subclass. setClass("Raw", contains="Vector", representation(data="raw")) setMethod("length", "Raw", function(x) length(x@data)) setMethod("[", "Raw", function(x, i, j, ..., drop) { x@data <- x@data[i]; x } ) x <- new("Raw", data=charToRaw("AB.x0a-BAA+C")) stopifnot(identical(length(x), 12L)) stopifnot(identical(x[7:3], new("Raw", data=charToRaw("-a0x.")))) ## (2) Implement a "pcompare" method for Raw objects. setMethod("pcompare", c("Raw", "Raw"), function(x, y) {as.integer(x@data) - as.integer(y@data)} ) stopifnot(identical(which(x == x[1]), c(1L, 9L, 10L))) stopifnot(identical(x[x < x[5]], new("Raw", data=charToRaw(".-+"))))
A merge
method for vector-like objects.
## S4 method for signature 'Vector,Vector' merge(x, y, ..., all=FALSE, all.x=NA, all.y=NA, sort=TRUE)
## S4 method for signature 'Vector,Vector' merge(x, y, ..., all=FALSE, all.x=NA, all.y=NA, sort=TRUE)
x , y , ...
|
Vector-like objects, typically all of the same class and typically not list-like objects (even though some list-like objects like IntegerRanges and DNAStringSet are supported). Duplicated elements in each object are removed with a warning. |
all |
|
all.x , all.y
|
To be used only when merging 2 objects (binary merge).
Both If |
sort |
Whether to sort the merged result. |
This merge
method acts much like merge.data.frame
,
except for 3 important differences:
The matching is based on the vector values, not arbitrary columns in a table.
Self merging is a no-op if sort=FALSE
(or object already
sorted) and if the object has no duplicates.
This merge
method accepts an arbitrary number of vector-like
objects (n-ary merge).
If some of the objects to merge are list-like objects not supported by
the method described here, then the merging is simply done by calling
base::merge()
on the objects. This might succeed or not...
A vector-like object of the same class as the input objects (if they all have the same class) containing the merged vector values and metadata columns.
The Vector class.
Vector-comparison for comparing and ordering vector-like objects.
Vector-setops for set operations on vector-like objects.
library(GenomicRanges) x <- GRanges(c("chr1:1-1000", "chr2:2000-3000"), score=c(0.45, 0.1), a1=c(5L, 7L), a2=c(6, 8)) y <- GRanges(c("chr2:150-151", "chr1:1-10", "chr2:2000-3000"), score=c(0.7, 0.82, 0.1), b1=c(0L, 5L, 1L), b2=c(1, -2, 1)) merge(x, y) merge(x, y, all=TRUE) merge(x, y, all.x=TRUE) merge(x, y, all.y=TRUE) ## Shared metadata columns must agree: mcols(x)$score[2] <- 0.11 #merge(x, y) # error! ## NAs agree with anything: mcols(x)$score[2] <- NA merge(x, y)
library(GenomicRanges) x <- GRanges(c("chr1:1-1000", "chr2:2000-3000"), score=c(0.45, 0.1), a1=c(5L, 7L), a2=c(6, 8)) y <- GRanges(c("chr2:150-151", "chr1:1-10", "chr2:2000-3000"), score=c(0.7, 0.82, 0.1), b1=c(0L, 5L, 1L), b2=c(1, -2, 1)) merge(x, y) merge(x, y, all=TRUE) merge(x, y, all.x=TRUE) merge(x, y, all.y=TRUE) ## Shared metadata columns must agree: mcols(x)$score[2] <- 0.11 #merge(x, y) # error! ## NAs agree with anything: mcols(x)$score[2] <- NA merge(x, y)
Perform set operations on Vector objects.
## S4 method for signature 'Vector,Vector' union(x, y) ## S4 method for signature 'Vector,Vector' intersect(x, y) ## S4 method for signature 'Vector,Vector' setdiff(x, y) ## S4 method for signature 'Vector,Vector' setequal(x, y)
## S4 method for signature 'Vector,Vector' union(x, y) ## S4 method for signature 'Vector,Vector' intersect(x, y) ## S4 method for signature 'Vector,Vector' setdiff(x, y) ## S4 method for signature 'Vector,Vector' setequal(x, y)
x , y
|
Vector-like objects. |
The union
, intersect
, and setdiff
methods for
Vector objects return a Vector object containing respectively
the union, intersection, and (asymmetric!) difference of the 2 sets of
vector elements in x
and y
.
The setequal
method for Vector objects checks for set
equality between x
and y
.
They're defined as follow:
setMethod("union", c("Vector", "Vector"), function(x, y) unique(c(x, y)) ) setMethod("intersect", c("Vector", "Vector"), function(x, y) unique(x[x %in% y]) ) setMethod("setdiff", c("Vector", "Vector"), function(x, y) unique(x[!(x %in% y)]) ) setMethod("setequal", c("Vector", "Vector"), function(x, y) all(x %in% y) && all(y %in% x) )
so they work out-of-the-box on Vector objects for which c
,
unique
, and %in%
are defined.
union
returns a Vector object obtained by appending to x
the elements in y
that are not already in x
.
intersect
returns a Vector object obtained by keeping only
the elements in x
that are also in y
.
setdiff
returns a Vector object obtained by dropping from
x
the elements that are in y
.
setequal
returns TRUE
if x
and y
contain the
same sets of vector elements and FALSE
otherwise.
union
, intersect
, and setdiff
propagate the names and
metadata columns of their first argument (x
).
Hervé Pagès
Vector-comparison for comparing and ordering vector-like objects.
Vector-merge for merging vector-like objects.
Vector objects.
BiocGenerics::union
,
BiocGenerics::intersect
,
and BiocGenerics::setdiff
in the BiocGenerics package for general information about
these generic functions.
## See ?`Hits-setops` for some examples.
## See ?`Hits-setops` for some examples.
The zipup
and zipdown
functions convert between two
parallel vectors and a list of doublets (elements of length 2). The
metaphor, borrowed from Python's zip
, is that of a zipper. The
zipup
function interleaves the elements of the parallel vectors
into a list of doublets. The inverse operation is zipdown
,
which returns a Pairs
object.
zipup(x, y, ...) zipdown(x, ...)
zipup(x, y, ...) zipdown(x, ...)
x , y
|
For |
... |
Arguments passed to methods. |
For zipup
, a list-like object, where every element is of length 2.
For zipdown
, a Pairs
object.
Pairs objects.
z <- zipup(1:10, Rle(1L, 10)) pairs <- zipdown(z)
z <- zipup(1:10, Rle(1L, 10)) pairs <- zipdown(z)