For Loop to Read in Several Files R
Programming in R
Introduction
[ Slides ] [ R Code ] General Overview
One of the principal attractions of using the R (http://cran.at.r-project.org) environment is the ease with which users can write their own programs and custom functions. The R programming syntax is extremely easy to learn, even for users with no previous programming experience. Once the basic R programming control structures are understood, users can employ the R linguistic communication as a powerful environment to perform circuitous custom analyses of almost whatever type of data.
Format of this Manual
In this transmission all commands are given in code boxes, where the R lawmaking is printed in black, the comment text in blue and the output generated by R in green. All comments/explanations get-go with the standard comment sign '#' to prevent them from existence interpreted by R as commands. This manner the content in the code boxes tin can exist pasted with their comment text into the R console to evaluate their utility. Occasionally, several commands are printed on 1 line and separated by a semicolon ';'. Commands starting with a '$' sign need to be executed from a Unix or Linux trounce. Windows users can simply ignore them.
R Basics
The R & BioConductor manual provides a general introduction to the usage of the R environment and its bones command syntax.
Lawmaking Editors for R
Several fantabulous lawmaking editors are available that provide functionalities like R syntax highlighting, automobile code indenting and utilities to send code/functions to the R console.
- Basic code editors provided by Rguis
- RStudio: GUI-based IDE for R
- Vim-R-Tmux: R working environs based on vim and tmux
- Emacs (ESS add-on package)
- gedit and Rgedit
- RKWard
- Eclipse
- Tinn-R
- Notepad++ (NppToR)
Programming in R using Vim or Emacs Programming in R using RStudio
Integrating R with Vim and Tmux
Users interested in integrating R with vim and tmux may want to consult the Vim-R-Tmux configuration page.
Finding Help
Reference list on R programming (selection)
- R Programming for Bioinformatics, past Robert Gentleman
- Advanced R, by Hadley Wickham
- Due south Programming, by Westward. N. Venables and B. D. Ripley
- Programming with Data, by John M. Chambers
- R Help & R Coding Conventions, Henrik Bengtsson, Lund University
- Programming in R (Vincent Zoonekynd)
- Peter's R Programming Pages, University of Warwick
- Rtips, Paul Johnsson, University of Kansas
- R for Programmers, Norm Matloff, UC Davis
- Loftier-Performance R, Dirk Eddelbuettel tutorial presented at useR-2008
- C/C++ level programming for R, Gopi Goswami
Control Structures
Conditional Executions
Comparing Operators
- equal:==
- non equal: !=
- greater/less than:> <
- greater/less than or equal:>= <=
Logical Operators
- and:&
- or: |
- not: !
If Statements
If statements operate on length-i logical vectors.
Syntax
if(cond1=true) { cmd1 } else { cmd2 }
Example
if(1==0) {
impress(1)
} else {
print(2)
}
[ane] 2
Avert inserting newlines between '} else'.
Ifelse Statements
Ifelse statements operate on vectors of variable length.
Syntax
ifelse(test, true_value, false_value)
Example
x <- one:x # Creates sample information
ifelse(ten<5 | x>8, x, 0)
[ane] 1 2 iii four 0 0 0 0 ix 10
Loops
The most commonly used loop structures in R are for, while and utilise loops. Less mutual are repeat loops. The interruption function is used to interruption out of loops, and next halts the processing of the current iteration and advances the looping index.
For Loop
For loops are controlled by a looping vector. In every iteration of the loop one value in the looping vector is assigned to a variable that can be used in the statements of the torso of the loop. Usually, the number of loop iterations is defined past the number of values stored in the looping vector and they are processed in the aforementioned society as they are stored in the looping vector.
Syntax
for(variable in sequence) {
statements
}
Example
mydf <- iris
myve <- NULL # Creates empty storage container
for(i in seq(forth=mydf[,1])) {
myve <- c(myve, mean(as.numeric(mydf[i, ane:three]))) # Note: inject approach is much faster than append with 'c'. Meet beneath for details.
}
myve
[ane] 3.333333 3.100000 three.066667 three.066667 3.333333 3.666667 three.133333 3.300000
[ix] 2.900000 3.166667 three.533333 iii.266667 iii.066667 2.800000 3.666667 3.866667
Table of Contents
Example: condition*
10 <- ane:10
z <- NULL
for(i in seq(along=ten)) {
if(x[i] < 5) {
z <- c(z, x[i] - one)
} else {
z <- c(z, ten[i] / x[i])
}
}
z
[1] 0 one 2 3 1 i 1 1 one 1
Table of Contents
Example: cease on condition and print error message
x <- ane:x
z <- NULL
for(i in seq(along=x)) {
if (10[i]<5) {
z <- c(z,ten[i]-1)
} else {
stop("values need to exist <5")
}
}
Error: values demand to exist <5
z
[1] 0 ane 2 iii
While Loop
Similar to for loop, simply the iterations are controlled past a conditional statement.
Syntax
while(condition) statements
Example
z <- 0
while(z < 5) {
z <- z + ii
impress(z)
}
[i] 2
[1] 4
[1] vi
Apply Loop Family unit
For T wo-Dimensional Data Sets: apply
Syntax
apply(Ten, MARGIN, FUN, ARGs)
10: array, matrix or information.frame; MARGIN: one for rows, 2 for columns, c(one,two) for both; FUN: one or more than functions; ARGs: possible arguments for function
Example
## With custom function ## Same as to a higher place but with a single line of code
## Example for applying predefined mean function
utilize(iris[,ane:3], ane, mean)
[1] 3.333333 iii.100000 3.066667 3.066667 3.333333 3.666667 iii.133333 3.300000
...
x <- 1:10
exam <- function(ten) {
if(ten < 5) { # Defines some custom function
x-ane
} else {
x / x
}
}
apply(as.matrix(x), 1, exam) #
Returns aforementioned event every bit previous for loop*
[1] 0 1 two three one 1 1 1 one one
employ(as.matrix(x), 1, part(ten) { if (x<5) { x-i } else { 10/x } })
[ane] 0 i two 3 1 i 1 1 1 1
Table of Contents
For R agged Arrays: tapply
Applies a office to array categories of variable lengths (ragged array). Group is defined by factor.
Syntax
tapply(vector, factor, FUN)
Example
## The aggregate function provides related utilities
## Computes mean values of vector agregates defined by cistron
tapply(as.vector(iris[,4]), factor(iris[,5]), mean)
setosa versicolor virginica
0.246 1.326 2.026
aggregate(iris[,one:4], listing(iris$Species), hateful)
Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa v.006 3.428 1.462 0.246
two versicolor v.936 2.770 4.260 1.326
3 virginica half dozen.588 two.974 5.552 two.026
For Vectors and Lists: lapply and sapply
Both apply a function to vector or list objects. The function lapply returns a listing, while sapply attempts to return the simplest information object, such as vector or matrix instead of listing.
Syntax
lapply(X, FUN)
sapply(X, FUN)
Case
$Sepal.Width $Petal.Length $Sepal.Width $Petal.Length ##
## Creates a sample list
mylist <- equally.list(iris[one:3,i:iii])
mylist
$Sepal.Length
[1] 5.ane 4.9 4.7
[i] 3.5 3.0 3.two
[i] 1.four i.4 1.3
## Compute sum of each list component and render issue as list
lapply(mylist, sum)
$Sepal.Length
[1] fourteen.seven
[one] 9.7
[one] four.1
sapply(mylist, sum) Compute sum of each list component and render result equally vector
Sepal.Length Sepal.Width Petal.Length
14.vii ix.7 4.1
Other Loops
Echo Loop
Syntax
s repeat argument
Loop is repeated until a break is specified. This ways at that place needs to exist a 2d argument to test whether or non to break from the loop.
Example
z <- 0
repeat {
z <- z + 1
print(z)
if(z > 100) suspension()
}
Improving Speed Functioning of Loops
Looping over very big data sets tin become slow in R. All the same, this limitation can be overcome by eliminating sure operations in loops or avoiding loops over the data intensive dimension in an object altogether. The latter can exist achieved by performing mainly vector-to-vecor or matrix-to-matrix computations which run often over 100 times faster than the respective for() or apply() loops in R. For this purpose, one tin make apply of the existing speed-optimized R functions (e.yard.: rowSums, rowMeans, tabular array, tabulate) or one can pattern custom functions that avoid expensive R loops by using vector- or matrix-based approaches. Alternatively, one can write programs that will perform all time consuming computations on the C-level.
(1) Speed comparing of for loops with an append versus an inject footstep:
results <- numeric(length(myMA[,1]))
myMA <- matrix(rnorm(1000000), 100000, 10, dimnames=list(1:100000, paste("C", 1:10, sep="")))
results <- NULL
arrangement.time(for(i in seq(along=myMA[,1])) results <- c(results, mean(myMA[i,])))
user system elapsed
39.156 6.369 45.559
organization.fourth dimension(for(i in seq(along=myMA[,1])) results[i] <- mean(myMA[i,]))
user organisation elapsed
1.550 0.005 ane.556
The inject approach is 20-50 times faster than the append version.
(2) Speed comparison of apply loop versus rowMeans for computing the mean for each row in a big matrix:
system.fourth dimension(myMAmean <- rowMeans(myMA))
system.time(myMAmean <- apply(myMA, one, mean))
user system elapsed
i.452 0.005 1.456
user system elapsed
0.005 0.001 0.006
The rowMeans approach is over 200 times faster than the apply loop.
(three) Speed comparison of apply loop versus vectorized approach for computing the standard deviation of each row:
organisation.fourth dimension(myMAsd <- sqrt((rowSums((myMA-rowMeans(myMA))^ii)) / (length(myMA[ane,])-1))) myMAsd[1:4]
organization.time(myMAsd <- apply(myMA, one, sd))
user system elapsed
3.707 0.014 three.721
myMAsd[i:4]
one 2 3 4
0.8505795 1.3419460 one.3768646 1.3005428
user system elapsed
0.020 0.009 0.028
1 two iii 4
0.8505795 i.3419460 1.3768646 1.3005428
The vector-based approach in the last footstep is over 200 times faster than the employ loop.
(4) Case for computing the mean for any custom selection of columns without compromising the speed performance:
## In the post-obit the colums are named co-ordinate to their selection in myList
myList <- tapply(colnames(myMA), c(1,1,1,two,2,2,three,3,4,4), list)
myMAmean <- sapply(myList, function(x) rowMeans(myMA[,ten]))
colnames(myMAmean) <- sapply(myList, paste, collapse="_")
myMAmean[1:4,]
C1_C2_C3 C4_C5_C6 C7_C8 C9_C10
1 0.0676799 -0.2860392 0.09651984 -0.7898946
ii -0.6120203 -0.7185961 0.91621371 1.1778427
3 0.2960446 -0.2454476 -1.18768621 0.9019590
4 0.9733695 -0.6242547 0.95078869 -0.7245792
## Alternative to achieve the same result with similar performance, but in a much less elegant fashion
myselect <- c(1,1,ane,two,2,2,3,three,4,4) # The colums are named according to the option stored in myselect
myList <- tapply(seq(forth=myMA[1,]), myselect, role(10) paste("myMA[ ,", x, "]", sep=""))
myList <- sapply(myList, function(ten) paste("(", paste(ten, plummet=" + "),")/", length(x)))
myMAmean <- sapply(myList, office(x) eval(parse(text=x)))
colnames(myMAmean) <- tapply(colnames(myMA), myselect, paste, collapse="_")
myMAmean[ane:4,]
C1_C2_C3 C4_C5_C6 C7_C8 C9_C10
ane 0.0676799 -0.2860392 0.09651984 -0.7898946
2 -0.6120203 -0.7185961 0.91621371 i.1778427
3 0.2960446 -0.2454476 -ane.18768621 0.9019590
4 0.9733695 -0.6242547 0.95078869 -0.7245792
Functions
A very useful feature of the R environment is the possibility to expand existing functions and to easily write custom functions. In fact, virtually of the R software can be viewed as a serial of R functions.
Syntax to define functions
myfct <- function(arg1, arg2, ...) {
function_body
}
The value returned by a role is the value of the office body, which is usually an unassigned final expression, e.grand.: return()
Syntax to telephone call functions
myfct(arg1=..., arg2=...)
Syntax Rules for Functions
General
Functions are defined by (i) consignment with the keyword function, (two) the annunciation of arguments/variables (arg1, arg2, ...) and (3) the definition of operations (function_body) that perform computations on the provided arguments. A function name needs to exist assigned to telephone call the part (come across below).
Naming
Function names can exist most annihilation. Notwithstanding, the usage of names of existing functions should be avoided.
Arguments
It is often useful to provide default values for arguments (eastward.g. : arg1=1:10). This way they don't need to be provided in a part telephone call. The argument list can as well be left empty (myfct <- function() { fct_body }) when a role is expected to return always the same value(s). The argument '...' can be used to permit one office to pass on argument settings to another.
Function trunk
The actual expressions (commands/operations) are defined in the function body which should be enclosed past braces. The individual commands are separated by semicolons or new lines (preferred).
Calling functions
Functions are called by their name followed by parentheses containing possible argument names. Empty parenthesis after the role proper noun volition upshot in an error message when a function requires certain arguments to be provided by the user. The function name solitary volition impress the definition of a role.
Variables created inside a office be only for the life time of a function. Thus, they are not accessible exterior of the function. To force variables in functions to exist globally, i can use this special assignment operator: '<<-'. If a global variable is used in a office, then the global variable will be masked merely inside the role.
Example: Role nuts
myfct(two, 5) # the argument names are not necessary, but then the order of the specified values becomes important
myfct <- function(x1, x2=five) {
z1 <- x1/x1
z2 <- x2*x2
myvec <- c(z1, z2)
return(myvec)
}
myfct # prints definition of function
myfct(x1=2, x2=5) # applies function to values 2 and v
[i] 1 25
myfct(x1=two) # does the same as before, just the default value '5' is used in this case
Example: Function with optional arguments
myfct2(x1=5, opt_arg=30:twenty) # a custom vector is used instead when the optional argument (opt_arg) is specified
myfct2 <- function(x1=5, opt_arg) {
if(missing(opt_arg)) { # 'missing()' is used to examination whether a value was specified as an statement
z1 <- one:10
} else {
z1 <- opt_arg
}
true cat("my part returns:", "\n")
return(z1/x1)
}
myfct2(x1=5) # performs calculation on default vector (z1) that is defined in the part body
my function returns:
[1] 0.2 0.4 0.6 0.8 1.0 ane.2 1.iv one.6 1.viii 2.0
my function returns:
[ane] 6.0 5.8 five.6 5.4 five.2 five.0 4.8 4.6 4.4 4.2 iv.0
Command utilities for functions: return, warning and stop
Return
The evaluation menstruation of a function may be terminated at any stage with the return role. This is oft used in combination with conditional evaluations.
Finish
To stop the action of a part and impress an error bulletin, one can utilize the end function.
To print a alarm message in unexpected situations without aborting the evaluation menses of a office, one tin employ the function alarm("...").
myfct(x1=-2)
myfct <- function(x1) {
if (x1>=0) print(x1) else stop("This function did non finish, because x1 < 0")
warning("Value needs to be > 0")
}
myfct(x1=2)
[1] two
Warning bulletin:
In myfct(x1 = 2) : Value needs to be > 0
Error in myfct(x1 = -ii) : This function did not finish, because x1 < 0
Useful Utilities
Debugging Utilities
Several debugging utilities are bachelor for R. The nearly of import utilities are: traceback(), browser(), options(error=recover), options(fault=NULL) and debug() . The Debugging in R page provides an overview of the available resources.
Regular Expressions
R's regular expression utilities work similar as in other languages. To larn how to use them in R, ane can consult the main help folio on this topic with ?regexp. The post-obit gives a few basic examples.
The grep role tin exist used for finding patterns in strings, here letter of the alphabet A in vector month.proper name.
calendar month.proper name[grep("A", month.proper name)]
[1] "April" "August"
Case for using regular expressions to substitute a pattern by some other one using the sub/gsub office with a back reference. Remember: single escapes '\' need to be double escaped '\\' in R.
gsub("(i.*a)", "xxx_\\1", "virginica", perl = TRUE)
[i] "vxxx_irginica"
Example for divide and paste functions
strsplit(ten, "_") # splits cord on inserted graphic symbol from above paste(rev(unlist(strsplit(x, NULL))), collapse="") # reverses character string by splitting first all characters into vector fields and and so collapsing them with paste
ten <- gsub("(a)", "\\1_", month.name[i], perl=True) # performs exchange with back reference which inserts in this example a '_' grapheme
x
[i] "Ja_nua_ry"
[[i]]
[ane] "Ja" "nua" "ry"
[ane] "yr_aun_aJ"
Example for importing specific lines in a file with a regular expression. The following example demonstrates the retrieval of specific lines from an external file with a regular expression. First, an external file is created with the cat role, all lines of this file are imported into a vector with readLines, the specific elements (lines) are and so retieved with the grep function, and the resulting lines are separate into vector fields with strsplit.
cat(month.proper noun, file="zzz.txt", sep="\n")
ten <- readLines("zzz.txt")
ten <- x[c(grep("^J", as.character(x), perl = TRUE))]
t(as.information.frame(strsplit(x, "u")))
[,1] [,2]
c..Jan....ary.. "Jan" "ary"
c..J....ne.. "J" "ne"
c..J....ly.. "J" "ly"
Interpreting Grapheme String as Expression
Case
mylist <- ls() # generates vector of object names in session
mylist[1] # prints name of 1st entry in vector simply does not execute it as expression that returns values of tenth object
become(mylist[1]) # uses 1st entry name in vector and executes it as expression
eval(parse(text=mylist[1])) # alternative approach to obtain similar result
Time, Appointment and Sleep
Example
date() # returns the current system date and time Sys.sleep(1) # break execution of R expressions for a given number of seconds (eastward.g. in loop)
system.time(ls()) # returns CPU (and other) times that an expression used, here ls()
user system elapsed
0 0 0
[i] "Wednesday December eleven 15:31:17 2012"
Calling External Software with Organization Control
The arrangement command allows to phone call any command-line software from inside R on Linux, UNIX and OSX systems.
arrangement("...") # provide nether '...' command to run external software e.m. Perl, Python, C++ programs
Related utilities on Windows operating systems
x <- trounce("dir", intern=T) # reads current working directory and assigns to file
trounce.exec("C:/Documents and Settings/Administrator/Desktop/my_file.txt") # opens file with associated program
Miscellaneous Utilities
(1) Batch import and export of many files.
In the following example all file names ending with *.txt in the current directory are first assigned to a list (the '$' sign is used to ballast the match to the end of a string). 2d, the files are imported one-past-one using a for loop where the original names are assigned to the generated information frames with the assign role. Consult help with ?read.table to sympathize arguments row.names=1 and comment.char = "A". Third, the data frames are exported using their names for file naming and appending the extension *.out.
files <- list.files(pattern=".txt$")
for(i in files) {
ten <- read.table(i, header=TRUE, comment.char = "A", sep="\t")
assign(i, x)
impress(i)
write.tabular array(x, paste(i, c(".out"), sep=""), quote=FALSE, sep="\t", col.names = NA)
}
(2) Running Web Applications (basics on designing spider web customer/crawling/scraping scripts in R)
Instance for obtaining MW values for peptide sequences from the EXPASY'southward pI/MW Tool web folio.
myentries <- c("MKWVTFISLLFLFSSAYS", "MWVTFISLL", "MFISLLFLFSSAYS")
myresult <- Zip
for(i in myentries) {
myurl <- paste("http://ca.expasy.org/cgi-bin/pi_tool?poly peptide=", i, "&resolution=monoisotopic", sep="")
10 <- url(myurl)
res <- readLines(x)
close(x)
mylines <- res[grep("Theoretical pI/Mw:", res)]
myresult <- c(myresult, every bit.numeric(gsub(".*/ ", "", mylines)))
print(myresult)
Sys.sleep(ane) # halts process for i sec to give web service a break
}
final <- data.frame(Pep=myentries, MW=myresult)
true cat("\n The MW values for my peptides are:\due north")
final
Pep MW
one MKWVTFISLLFLFSSAYS 2139.xi
2 MWVTFISLL 1108.sixty
3 MFISLLFLFSSAYS 1624.82
Running R Programs
(1) Executing an R script from the R panel
source("my_script.R")
(2.1) Syntax for running R programs from the command-line. Requires in first line of my_script.R the following argument: #!/usr/bin/env Rscript
$ Rscript my_script.R # or only ./myscript.R later on making file executable with 'chmod +x my_script.R'
All commands starting with a '$' sign demand to exist executed from a Unix or Linux shell.
(2.2) Alternatively, 1 tin use the following syntax to run R programs in BATCH mode from the control-line.
$ R CMD BATCH [options] my_script.R [outfile]
The output file lists the commands from the script file and their outputs. If no outfile is specified, the name used is that of infile and .Rout is appended to outfile. To stop all the usual R command line data from being written to the outfile, add this equally first line to my_script.R file: options(echo=FALSE). If the control is run similar this R CMD BATCH --no-save my_script.R, then nil will be saved in the .Rdata file which can become oft very big. More on this can exist found on the aid pages: $ R CMD BATCH --aid or ?BATCH.
(ii.3) Another alternative for running R programs as silently as possible.
$ R --slave < my_infile > my_outfile
Statement --slave makes R run as 'quietly' equally possible.
(3) Passing Control-Line Arguments to R Programs
Create an R script, here named test.R, like this one:
######################
myarg <- commandArgs()
print(iris[ane:myarg[half-dozen], ])
######################
Then run it from the command-line like this:
$ Rscript test.R 10
In the given case the number 10 is passed on from the command-line as an argument to the R script which is used to return to STDOUT the outset 10 rows of the iris sample information. If several arguments are provided, they will be interpreted every bit one string that needs to be split information technology in R with the strsplit office.
(4) Submitting R script to a Linux cluster via Torque
Create the following shell script my_script.sh
#################################
#!/bin/bash
cd $PBS_O_WORKDIR
R CMD BATCH --no-save my_script.R
#################################
This script doesn't demand to accept executable permissions. Utilize the following qsub command to send this shell script to the Linux cluster from the directory where the R script my_script.R is located. To utilize several CPUs on the Linux cluster, ane can separate the input information into several smaller subsets and execute for each subset a separate process from a dedicated directory.
$ qsub my_script.sh
Here is a brusk R script that generates the required files and directories automatically and submits the jobs to the nodes: submit2cluster.R. For more details, see also this 'Tutorial on Parallel Programming in R' by Hanna Sevcikova
(v) Submitting jobs to Torque or any other queuing/scheduling system via the BatchJobs package. This package provides 1 of the most advanced resource for submitting jobs to queuing systems from within R. A related parcel is BiocParallel from Bioconductor which extends many functionalities of BatchJobs to genome data analysis. Useful documentation for BatchJobs: Technical Report, GitHub folio, Slide Show, Config samples.
getConfig() # Returns BatchJobs configuration settings ## Some test function ## Adds jobs to registry object (here reg) ## Submit jobs or chunks of jobs to batch system via cluster function ## Load results from BatchJobTest-files/jobs/01/1-result.RData
library(BatchJobs)
loadConfig(conffile = ".BatchJobs.R")
## Loads configuration file. Here .BatchJobs.R containing only this line:
## cluster.functions <- makeClusterFunctionsTorque("torque.tmpl")
## The template file torque.tmpl is expected to be in the current working
## director. It can be downloaded from here:
## https://github.com/tudo-r/BatchJobs/blob/master/examples/cfTorque/simple.tmpl
reg <- makeRegistry(id="BatchJobTest", work.dir="results")
## Constructs a registry object. Output files from R will be stored under directory "results",
## while the
.
standard objects from BatchJobs will be stored in the directory "BatchJobTest-files"
print(reg)
f <- part(x) {
system("ls -al >> test.txt")
ten
}
ids <- batchMap(reg, fun=f, one:10)
print(ids)
showStatus(reg)
done <- submitJobs(reg, resources=list(walltime=3600, nodes="1:ppn=four", retention="4gb"))
loadResult(reg, 1)
Object-Oriented Programming (OOP)
R supports 2 systems for object-oriented programming (OOP). An older S3 organisation and a more than recently introduced S4 system. The latter is more formal, supports multiple inheritance, multiple dispatch and introspection. Many of these features are not available in the older S3 organisation. In general, the OOP approach taken past R is to split up the class specifications from the specifications of generic functions (part-centric system). The post-obit introduction is restricted to the S4 system since information technology is present the preferred OOP method for R. More information virtually OOP in R tin be found in the following introductions: Vincent Zoonekynd's introduction to S3 Classes, S4 Classes in 15 pages, Christophe Genolini'southward S4 Intro, The R.oo bundle, BioC Course: Advanced R for Bioinformatics, Programming with R by John Chambers and R Programming for Bioinformatics by Robert Gentleman.
Define S4 Classes
(A) Ascertain S4 Classes with setClass() and new()
y <- matrix(i:50, x, 5) # Sample data set
setClass(Class="myclass",
representation=representation(a="Whatever"),
prototype=epitome(a=y[1:2,]), # Defines default value (optional)
validity=function(object) { # Can exist defined in a carve up footstep using setValidity
if(course(object@a)!="matrix") {
return(paste("expected matrix, simply obtained", class(object@a)))
} else {
return(TRUE)
}
}
)
Table of Contents
The setClass function defines classes. Its most important arguments are
- Class: the name of the class
- representation: the slots that the new form should have and/or other classes that this grade extends.
- prototype: an object providing default data for the slots.
- contains: the classes that this course extends.
- validity, access, version : command arguments included for compatibility with S-Plus.
- where: the environment to use to shop or remove the definition equally meta data.
(B) The function new creates an example of a class (here myclass)
new("myclass", a=iris) # Returns an fault message due to wrong input type (iris is data frame)
myobj <- new("myclass", a=y)
myobj
An object of form "myclass"
Slot "a":
[,1] [,two] [,3] [,4] [,5]
[1,] 1 11 21 31 41
[ii,] two 12 22 32 42
...
Mistake in validObject(.Object) :
invalid class "myclass" object: expected matrix, but obtained data.frame
- Course: the name of the class
- ...: Data to include in the new object with arguments according to slots in class definition.
(C) A more generic mode of creating course instances is to ascertain an initialization method (details below)
setMethod("initialize", "myclass", part(.Object, a) {
.Object@a <- a/a
.Object
})
new("myclass", a = y)
[1] "initialize"
new("myclass", a = y)> new("myclass", a = y)
An object of class "myclass"
Slot "a":
[,1] [,2] [,3] [,4] [,5]
[1,] one i 1 one ane
[2,] 1 one 1 1 1
...
(D) Usage and helper functions
myobj@a # The '@' extracts the contents of a slot. Usage should be limited to internal functions!
initialize(.Object=myobj, a=as.matrix(cars[1:3,])) # Creates a new S4 object from an onetime 1.
# removeClass("myclass") # Removes object from electric current session; does non apply to associated methods.
(E) Inheritance: allows to ascertain new classes that inherit all backdrop (east.g. data slots, methods) from their existing parent classes
Slot "b": Slot "c": Slot "d": getClass("myclass1") Slots: Proper name: a b Known Subclasses: "myclass3" getClass("myclass2") Slots: Proper name: c d Known Subclasses: "myclass3" getClass("myclass3") Slots: Name: a b c d Extends: "myclass1", "myclass2"
setClass("myclass1", representation(a = "character", b = "grapheme"))
setClass("myclass2", representation(c = "numeric", d = "numeric"))
setClass("myclass3", contains=c("myclass1", "myclass2"))
new("myclass3", a=letters[1:4], b=messages[1:4], c=1:4, d=4:1)
An object of class "myclass3"
Slot "a":
[1] "a" "b" "c" "d"
[1] "a" "b" "c" "d"
[ane] one 2 3 iv
[1] iv 3 2 i
Class "myclass1" [in ".GlobalEnv"]
Course: character character
Form "myclass2" [in ".GlobalEnv"]
Form: numeric numeric
Class "myclass3" [in ".GlobalEnv"]
Class: character character numeric numeric
The argument contains allows to extend existing classes; this propagates all slots of parent classes.
(F) Coerce objects to some other course
setAs(from="myclass", to="character", def=function(from) as.character(every bit.matrix(from@a)))
as(myobj, "character")
[ane] "ane" "two" "3" "four" "five" "half-dozen" "seven" "8" "9" "10" "11" "12" "13" "14" "fifteen"
...
(G) Virtual classes are constructs for which no instances will exist or can be created. They are used to link together classes which may have distinct representations (e.grand. cannot inherit from each other) merely for which one wants to provide similar functionality. Ofttimes it is desired to create a virtual grade and to and so have several other classes extend information technology. Virtual classes can exist defined by leaving out the representation argument or including the class VIRTUAL:
setClass("myVclass")
setClass("myVclass", representation(a = "character", "VIRTUAL"))
- getClass("myclass")
- getSlots("myclass")
- slotNames("myclass")
- extends("myclass2")
Assign Generics and Methods
Assign generics and methods with setGeneric() and setMethod()
(A) Accessor part (to avoid usage of '@')
setGeneric(name="acc", def=function(x) standardGeneric("acc"))
setMethod(f="acc", signature="myclass", definition=function(ten) {
render(x@a)
})
acc(myobj)
[,1] [,2] [,3] [,four] [,5]
[1,] 1 11 21 31 41
[2,] 2 12 22 32 42
...
setGeneric(name="acc<-", def=function(x, value) standardGeneric("acc<-"))
setReplaceMethod(f="acc", signature="myclass", definition=function(ten, value) {
x@a <- value
return(x)
})
## Later this the following supervene upon operations with 'acc' piece of work on new object class
acc(myobj)[1,one] <- 999 # Replaces get-go value
colnames(acc(myobj)) <- letters[1:5] # Assigns new column names
rownames(acc(myobj)) <- messages[ane:10] # Assigns new row names
myobj
An object of class "myclass"
Slot "a":
a b c d eastward
a 999 11 21 31 41
b 2 12 22 32 42
...
(B.2) Replacement method using "[" operator ([<-)
setReplaceMethod(f="[", signature="myclass", definition=function(x, i, j, value) {
x@a[i,j] <- value
return(x)
})
myobj[1,2] <- 999
myobj
An object of class "myclass"
Slot "a":
a b c d east
a 999 999 21 31 41
b 2 12 22 32 42
...
(C) Define beliefs of "[" subsetting operator (no generic required!)
setMethod(f="[", signature="myclass",
definition=function(x, i, j, ..., drop) {
x@a <- x@a[i,j]
return(ten)
})
myobj[one:2,] # Standard subsetting works now on new class
An object of class "myclass"
Slot "a":
a b c d e
a 999 999 21 31 41
b 2 12 22 32 42
...
(D) Define impress beliefs
setMethod(f= show", signature="myclass", definition=office(object) {
true cat("An instance of ", "\"", class(object), "\"", " with ", length(acc(object)[,1]), " elements", "\north", sep="")
if(length(acc(object)[,1])>=5) {
print(every bit.data.frame(rbind(acc(object)[one:2,], ...=rep("...", length(acc(object)[ane,])),
acc(object)[(length(acc(object)[,1])-1):length(acc(object)[,ane]),])))
} else {
print(acc(object))
}})
myobj # Prints object with custom method
An case of "myclass" with 10 elements
a b c d e
a 999 999 21 31 41
b 2 12 22 32 42
... ... ... ... ... ...
i 9 nineteen 29 39 49
j 10 20 30 40 50
(E) Define a information specific function (here randomize row order)
setGeneric(name="randomize", def=function(x) standardGeneric("randomize"))
setMethod(f="randomize", signature="myclass", definition=function(x) {
acc(x)[sample(1:length(acc(x)[,1]), length(acc(x)[,1])), ]
})
randomize(myobj)
a b c d e
j 10 20 thirty 40 50
b 2 12 22 32 42
...
(F) Ascertain a graphical plotting function and allow user to access it with generic plot function
setMethod(f="plot", signature="myclass", definition=role(ten, ...) {
barplot(as.matrix(acc(x)), ...)
})
plot(myobj)
(Grand) Functions to inspect methods
- showMethods(course="myclass")
- findMethods("randomize")
- getMethod("randomize", signature="myclass")
- existsMethod("randomize", signature="myclass")
Edifice R Packages
To get familiar with the structure, building and submission process of R packages, users should carefully read the documentation on this topic available on these sites:
- Writing R Extensions, R spider web site
- R Packages, by Hadley Wickham
- R Package Primer, by Karl Broman
- Bundle Guidelines, Bioconductor
- Advanced R Programming Class, Bioconductor
Brusque Overview of Package Building Procedure
(A) Automated parcel edifice with the package.skeleton function:
package.skeleton(name="mypackage", code_files=c("script1.R", "script2.R"))
Note: this is an optional only very user-friendly function to get started with a new package. The given instance will create a directory named mypackage containing the skeleton of the packet for all functions, methods and classes divers in the R script(due south) passed on to the code_files argument. The basic structure of the bundle directory is described here. The package directory will also incorporate a file named 'Read-and-delete-me' with the following instructions for completing the package:
- Edit the assistance file skeletons in man, perchance combining assist files for multiple functions.
- Edit the exports in NAMESPACE, and add necessary imports.
- Put whatever C/C++/Fortran code in src.
- If you have compiled code, add a useDynLib() directive to NAMESPACE.
- Run R CMD build to build the package tarball.
- Run R CMD check to bank check the package tarball.
- Read Writing R Extensions for more than data.
(B) Once a bundle skeleton is available one can build the package from the command-line (Linux/OS 10):
This will create a tarball of the bundle with its version number encoded in the file proper name, e.g.: mypackage_1.0.tar.gz.
Subsequently, the package tarball needs to be checked for errors with:
$ R CMD bank check mypackage_1.0.tar.gz
All bug in a parcel'southward source code and documentation should be addressed until R CMD check returns no mistake or warning messages anymore.
(C) Install parcel from source:
Linux:
install.packages("mypackage_1.0.tar.gz", repos=NULL)
OS X:
install.packages("mypackage_1.0.tar.gz", repos=NULL, blazon="source")
Tabular array of Contents
Windows requires a zip archive for installing R packages, which tin be most conveniently created from the command-line (Linux/Bone 10) by installing the parcel in a local directory (here tempdir) and and so creating a zip archive from the installed package directory:
## The resulting mypackage.nothing annal tin be installed under Windows like this:
$ mkdir tempdir
$ R CMD INSTALL -l tempdir mypackage_1.0.tar.gz
$ cd tempdir
$ zip -r mypackage mypackage
install.packages("mypackage.nada", repos=NULL)
Table of Contents
This procedure simply works for packages which do non rely on compiled lawmaking (C/C++). Instructions to fully build an R packet nether Windows tin can be found here and here.
(D) Maintain/expand an existing package:
- Add new functions, methods and classes to the script files in the ./R directory in your package
- Add together their names to the NAMESPACE file of the package
- Boosted *.Rd assist templates can be generated with the prompt*() functions similar this:
source("myscript.R") # imports functions, methods and classes from myscript.R
prompt(myfct) # writes assistance file myfct.Rd
promptClass("myclass") # writes file myclass-grade.Rd
promptMethods("mymeth") # writes help file mymeth.Rd
- The resulting *.Rd help files can exist edited in a text editor and properly rendered and viewed from within R like this:
library(tools)
Rd2txt("./mypackage/human/myfct.Rd") # renders *.Rd files as they wait in terminal help pages
checkRd("./mypackage/human being/myfct.Rd") # checks *.Rd help file for issues
(E) Submit parcel to a public repository
Download on of the above exercise files, so start editing this R source file with a programming text editor, such as Vim, Emacs or one of the R GUI text editors. Here is the HTML version of the code with syntax coloring.
Sample Scripts
Batch Operations on Many Files
## (3) Import created files, perform calculations and consign to renamed files ## (4) Same as above, but file naming past alphabetize data frame. This way one can organize file names past external table. ## (5) Append content of all input files to one file. ## (6) Write the above code into a text file and execute it with the commands 'source' and 'BATCH'.
## (1) Start R from an empty test directory
## (2) Create some files as sample information
for(i in month.proper name) {
mydf <- information.frame(Month=month.name, Rain=runif(12, min=10, max=100), Evap=runif(12, min=1000, max=2000))
write.table(mydf, file=paste(i , ".infile", sep=""), quote=F, row.names=F, sep="\t")
}
files <- list.files(pattern=".infile$")
for(i in seq(along=files)) { # starting time for loop with numeric or character vector; numeric vector is often more than flexible
x <- read.table(files[i], header=TRUE, row.names=1, comment.char = "A", sep="\t")
x <- data.frame(x, sum=apply(x, 1, sum), mean=apply(ten, i, mean)) # calculates sum and mean for each data frame
assign(files[i], 10) # generates information frame object and names it after content in variable 'i'
print(files[i], quote=F) # prints loop iteration to screen to check its condition
write.table(ten, paste(files[i], c(".out"), sep=""), quote=Fake, sep="\t", col.names = NA)
}
name_df <- data.frame(Old_name=sort(files), New_name=sort(month.abb))
for(i in seq(along=name_df[,one])) {
10 <- read.table(as.vector(name_df[i,ane]), header=Truthful, row.names=1, comment.char = "A", sep="\t")
x <- information.frame(x, sum=apply(x, 1, sum), hateful=apply(x, ane, mean))
assign(equally.vector(name_df[i,2]), ten) # generates data frame object and names information technology after 'i' entry in column 2
print(every bit.vector(name_df[i,1]), quote=F)
write.table(ten, paste(equally.vector(name_df[i,two]), c(".out"), sep=""), quote=FALSE, sep="\t", col.names = NA)
}
files <- list.files(pattern=".infile$")
all_files <- data.frame(files=NULL, Month=NULL, Gain=NULL , Loss=Zilch, sum=Nothing, mean=NULL) # creates empty data frame container
for(i in seq(forth=files)) {
x <- read.table(files[i], header=TRUE, row.names=i, comment.char = "A", sep="\t")
10 <- data.frame(x, sum=employ(x, 1, sum), mean=use(x, i, hateful))
x <- data.frame(file=rep(files[i], length(x[,1])), x) # adds file tracking column to 10
all_files <- rbind(all_files, x) # appends data from all files to data frame 'all_files'
write.tabular array(all_files, file="all_files.xls", quote=Faux, sep="\t", col.names = NA)
}
source("my_script.R") # execute from R panel
$ R CMD BATCH my_script.R # execute from shell
Big-scale Array Analysis
Sample script to perform big-scale expression array analysis with circuitous queries: lsArray.R. To demo what the script does, run it like this:
source("http://kinesthesia.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/lsArray.R")
Graphical Procedures: Characteristic Map Instance
Script to plot characteristic maps of genes or chromosomes: featureMap.R. To demo what the script does, run it like this:
source("http://kinesthesia.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/featureMap.txt")
Sequence Analysis Utilities
Includes sequence batch import, sub-setting, design matching, AA Composition, NEEDLE, PHYLIP, etc. The script 'sequenceAnalysis.R' demonstrates how R can be used as a powerful tool for managing and analyzing large sets of biological sequences. This case also shows how easy it is to integrate R with the EMBOSS projection or other external programs. The script provides the post-obit functionality:
- Batch sequence import into R data frame
- Motif searching with hit statistics
- Analysis of sequence composition
- All-confronting-all sequence comparisons
- Generation of phylogenetic trees
To demonstrate the utilities of the script, users can simply execute information technology from R with the following source control:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/sequenceAnalysis.txt")
Pattern Matching and Positional Parsing of Sequences
Functions for importing sequences into R, retrieving reverse and complement of nucleotide sequences, pattern searching, positional parsing and exporting search results in HTML format: patternSearch.R. To demo what the script does, run it similar this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/patternSearch.R")
Identify Over-Represented Strings in Sequence Sets
Functions for finding over-represented words in sets of Deoxyribonucleic acid, RNA or protein sequences: wordFinder.R. To demo what the script does, run information technology similar this:
source("http://kinesthesia.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/wordFinder.R")
Translate DNA into Protein
Script 'translateDNA.R' for translating NT sequences into AA sequences (required codon table). To demo what the script does, run it like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/translateDNA.R")
Subsetting of Construction Definition Files (SDF)
Script for importing and subsetting SDF files: sdfSubset.R. To demo what the script does, run information technology like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/sdfSubset.R")
Managing Latex BibTeX Databases
Script for importing BibTeX databases into R, retrieving the individual references with a full-text search function and viewing the results in R or in HubMed: BibTex.R. To demo what the script does, run information technology like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/BibTex.R")
Loan Payments and Amortization Tables
This script calculates monthly and almanac mortgage or loan payments, generates amortization tables and plots the results: mortgage.R. A Shiny App using this function has been created past Antoine Soetewey here. To demo what the script does, run it like this:
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/mortgage.R")
Course Consignment: GC Content, Reverse & Complement
Apply the above data to write a role that calculates for a set up of DNA sequences their GC content and generates their reverse and complement. Hither are some useful commands that can be incorporated in this function:
## Write each grapheme of sequence into divide vector field and reverse its society ## Generate the sequence complement by replacing Thousand|C|A|T by C|One thousand|T|A
## Generate an example data frame with ID numbers and Dna sequences
fx <- function(test) {
10 <- as.integer(runif(20, min=one, max=5))
x[x==one] <- "A"; x[x==ii] <- "T"; x[x==3] <- "Chiliad"; x[10==4] <- "C"
paste(ten, sep = "", plummet ="")
}
z1 <- c()
for(i in 1:l) {
z1 <- c(fx(i), z1)
}
z1 <- data.frame(ID=seq(along=z1), Seq=z1)
z1
my_split <- strsplit(as.character(z1[1,ii]),"")
my_rev <- rev(my_split[[one]])
paste(my_rev, collapse="")
## Use 'apply' or 'for loop' to apply the above operations to all sequences in sample data frame 'z1'
## Calculate in the same loop the GC content for each sequence using the post-obit command
table(my_split[[i]])/length(my_split[[one]])
- Serbo-Croation version translated past Jovana Milutinovich
Source: http://manuals.bioinformatics.ucr.edu/home/programming-in-r
0 Response to "For Loop to Read in Several Files R"
Publicar un comentario