For Loop to Read in Several Files R

Programming in R

Contents

1 Introduction
2 R Basics
3 Code Editors for R
4 Integrating R with Vim and Tmux
5 Finding Assistance
6 Command Structures
1. vi.one Conditional Executions
  1. 6.1.one Comparison Operators
  2. six.i.2 Logical Operators
  3. 6.1.3 If Statements
  4. 6.1.4 Ifelse Statements
2. 6.2 Loops
  1. 6.2.1 For Loop
  2. 6.2.2 While Loop
  3. 6.ii.iii Apply Loop Family
    1. 6.2.three.ane For Two-Dimensional Data Sets: utilize
    2. half dozen.2.three.2 For Ragged Arrays: tapply
    3. half-dozen.2.3.iii For Vectors and Lists: lapply and sapply
  4. 6.2.four Other Loops
  5. vi.ii.5 Improving Speed Performance of Loops
seven Functions
viii Useful Utilities
1. 8.1 Debugging Utilities
2. 8.2 Regular Expressions
3. eight.3 Interpreting Character String as Expression
4. 8.4 Fourth dimension, Date and Sleep
5. eight.five Calling External Software with System Command
6. 8.6 Miscellaneous Utilities
ix Running R Programs
10 Object-Oriented Programming (OOP)
1. x.1 Define S4 Classes
2. 10.2 Assign Generics and Methods
11 Building R Packages
12 Reproducible Research past Integrating R with Latex or Markdown
13 R Programming Exercises
1. 13.ane Exercise Slides
2. 13.two Sample Scripts
  1. thirteen.2.1 Batch Operations on Many Files
  2. xiii.ii.two Large-scale Array Analysis
  3. 13.2.3 Graphical Procedures: Characteristic Map Example
  4. 13.2.4 Sequence Analysis Utilities
  5. 13.2.5 Pattern Matching and Positional Parsing of Sequences
  6. thirteen.2.6 Identify Over-Represented Strings in Sequence Sets
  7. 13.2.7 Translate DNA into Protein
  8. xiii.2.8 Subsetting of Structure Definition Files (SDF)
  9. xiii.ii.ix Managing Latex BibTeX Databases
  10. xiii.two.10 Loan Payments and Amortization Tables
  11. 13.two.eleven Course Assignment: GC Content, Contrary & Complement
14 Translation of this Page

Introduction

[ Slides ] [ R Code ]

General Overview

One of the principal attractions of using the R (http://cran.at.r-project.org) environment is the ease with which users can write their own programs and custom functions. The R programming syntax is extremely easy to learn, even for users with no previous programming experience. Once the basic R programming control structures are understood, users can employ the R linguistic communication as a powerful environment to perform circuitous custom analyses of almost whatever type of data.

Format of this Manual

In this transmission all commands are given in code boxes, where the R lawmaking is printed in black, the comment text in blue and the output generated by R in green. All comments/explanations get-go with the standard comment sign '#' to prevent them from existence interpreted by R as commands. This manner the content in the code boxes tin can exist pasted with their comment text into the R console to evaluate their utility. Occasionally, several commands are printed on 1 line and separated by a semicolon ';'. Commands starting with a '$' sign need to be executed from a Unix or Linux trounce. Windows users can simply ignore them.

R Basics

The R & BioConductor manual provides a general introduction to the usage of the R environment and its bones command syntax.

Lawmaking Editors for R

Several fantabulous lawmaking editors are available that provide functionalities like R syntax highlighting, automobile code indenting and utilities to send code/functions to the R console.

Basic code editors provided by Rguis
RStudio: GUI-based IDE for R
Vim-R-Tmux: R working environs based on vim and tmux
Emacs (ESS add-on package)
gedit and Rgedit
RKWard
Eclipse
Tinn-R
Notepad++ (NppToR)

Programming in R using Vim or Emacs Programming in R using RStudio

Integrating R with Vim and Tmux

Users interested in integrating R with vim and tmux may want to consult the Vim-R-Tmux configuration page.

Finding Help

Reference list on R programming (selection)

R Programming for Bioinformatics, past Robert Gentleman
Advanced R, by Hadley Wickham
Due south Programming, by Westward. N. Venables and B. D. Ripley
Programming with Data, by John M. Chambers
R Help & R Coding Conventions, Henrik Bengtsson, Lund University
Programming in R (Vincent Zoonekynd)
Peter's R Programming Pages, University of Warwick
Rtips, Paul Johnsson, University of Kansas
R for Programmers, Norm Matloff, UC Davis
Loftier-Performance R, Dirk Eddelbuettel tutorial presented at useR-2008
C/C++ level programming for R, Gopi Goswami

Control Structures

Conditional Executions

Comparing Operators

equal:==
non equal: !=
greater/less than:> <
greater/less than or equal:>= <=

Logical Operators

and:&
or: |
not: !

If Statements

If statements operate on length-i logical vectors.

Syntax

if(cond1=true) { cmd1 } else { cmd2 }

Example

if(1==0) { impress(1) } else { print(2) } [ane] 2

Avert inserting newlines between '} else'.

Ifelse Statements

Ifelse statements operate on vectors of variable length.

Syntax

ifelse(test, true_value, false_value)

Example

x <- one:x # Creates sample information ifelse(ten<5 | x>8, x, 0) [ane] 1 2 iii four 0 0 0 0 ix 10

Loops

The most commonly used loop structures in R are for, while and utilise loops. Less mutual are repeat loops. The interruption function is used to interruption out of loops, and next halts the processing of the current iteration and advances the looping index.

For Loop

For loops are controlled by a looping vector. In every iteration of the loop one value in the looping vector is assigned to a variable that can be used in the statements of the torso of the loop. Usually, the number of loop iterations is defined past the number of values stored in the looping vector and they are processed in the aforementioned society as they are stored in the looping vector.

Syntax

for(variable in sequence) { statements }

Example

mydf <- iris myve <- NULL # Creates empty storage container for(i in seq(forth=mydf[,1])) { myve <- c(myve, mean(as.numeric(mydf[i, ane:three]))) # Note: inject approach is much faster than append with 'c'. Meet beneath for details. } myve [ane] 3.333333 3.100000 three.066667 three.066667 3.333333 3.666667 three.133333 3.300000 [ix] 2.900000 3.166667 three.533333 iii.266667 iii.066667 2.800000 3.666667 3.866667

Table of Contents

Example: condition*

10 <- ane:10 z <- NULL for(i in seq(along=ten)) { if(x[i] < 5) { z <- c(z, x[i] - one) } else { z <- c(z, ten[i] / x[i]) } } z [1] 0 one 2 3 1 i 1 1 one 1

Table of Contents

Example: cease on condition and print error message

x <- ane:x z <- NULL for(i in seq(along=x)) { if (10[i]<5) { z <- c(z,ten[i]-1) } else { stop("values need to exist <5") } } Error: values demand to exist <5 z [1] 0 ane 2 iii

While Loop

Similar to for loop, simply the iterations are controlled past a conditional statement.

Syntax

while(condition) statements

Example

z <- 0 while(z < 5) { z <- z + ii impress(z) } [i] 2 [1] 4 [1] vi

Apply Loop Family unit

For T wo-Dimensional Data Sets: apply

Syntax

apply(Ten, MARGIN, FUN, ARGs)

10: array, matrix or information.frame; MARGIN: one for rows, 2 for columns, c(one,two) for both; FUN: one or more than functions; ARGs: possible arguments for function

Example

## Example for applying predefined mean function utilize(iris[,ane:3], ane, mean) [1] 3.333333 iii.100000 3.066667 3.066667 3.333333 3.666667 iii.133333 3.300000 ...

## With custom function x <- 1:10 exam <- function(ten) {

                                                                                                                                                                                                                                                                                                                                                                                                                      # Defines some custom function                                          
                                                                                                                                                                                                                                                                                                      if(ten < 5) {                            
                                                        x-ane                            
                            } else {                            
                                                        x / x                            
                            }                            
                            }                            
                            apply(as.matrix(x), 1, exam)                          #

                                                                                                                              Returns aforementioned event every bit previous for loop*
                            [1] 0 1 two three one 1 1 1 one one                                                                ## Same as to a higher place but with a single line of code
                                                                employ(as.matrix(x), 1, part(ten) { if (x<5) { x-i } else { 10/x } })

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                [ane] 0 i two 3 1 i 1 1 1 1

Table of Contents

For R agged Arrays: tapply

Applies a office to array categories of variable lengths (ragged array). Group is defined by factor.

Syntax

tapply(vector, factor, FUN)

Example

## Computes mean values of vector agregates defined by cistron tapply(as.vector(iris[,4]), factor(iris[,5]), mean) setosa versicolor virginica 0.246 1.326 2.026

## The aggregate function provides related utilities aggregate(iris[,one:4], listing(iris$Species), hateful) Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width 1 setosa v.006 3.428 1.462 0.246 two versicolor v.936 2.770 4.260 1.326 3 virginica half dozen.588 two.974 5.552 two.026

For Vectors and Lists: lapply and sapply

Both apply a function to vector or list objects. The function lapply returns a listing, while sapply attempts to return the simplest information object, such as vector or matrix instead of listing.

Syntax

lapply(X, FUN) sapply(X, FUN)

Case

## Creates a sample list mylist <- equally.list(iris[one:3,i:iii]) mylist $Sepal.Length [1] 5.ane 4.9 4.7

$Sepal.Width [i] 3.5 3.0 3.two

                                                       $Petal.Length
                            [i] 1.four i.4 1.3
                                                
                        ## Compute sum of each list component and render issue as list                        
                        lapply(mylist, sum)
                        $Sepal.Length
                          [1] fourteen.seven                             $Sepal.Width
                            [one] 9.7
                                                       $Petal.Length
                            [one] four.1
                                                                                                    ##

                                                                                                                                                                                                                                                          Compute sum of each list component and render result equally vector                                  
                                                                                                                                            sapply(mylist, sum)
                        Sepal.Length  Sepal.Width Petal.Length                          
                          14.vii          ix.7          4.1

Other Loops

Echo Loop

Syntax

repeat argument s

Loop is repeated until a break is specified. This ways at that place needs to exist a 2d argument to test whether or non to break from the loop.

Example

z <- 0 repeat { z <- z + 1 print(z) if(z > 100) suspension() }

Improving Speed Functioning of Loops

Looping over very big data sets tin become slow in R. All the same, this limitation can be overcome by eliminating sure operations in loops or avoiding loops over the data intensive dimension in an object altogether. The latter can exist achieved by performing mainly vector-to-vecor or matrix-to-matrix computations which run often over 100 times faster than the respective for() or apply() loops in R. For this purpose, one tin make apply of the existing speed-optimized R functions (e.yard.: rowSums, rowMeans, tabular array, tabulate) or one can pattern custom functions that avoid expensive R loops by using vector- or matrix-based approaches. Alternatively, one can write programs that will perform all time consuming computations on the C-level.

(1) Speed comparing of for loops with an append versus an inject footstep:

myMA <- matrix(rnorm(1000000), 100000, 10, dimnames=list(1:100000, paste("C", 1:10, sep=""))) results <- NULL arrangement.time(for(i in seq(along=myMA[,1])) results <- c(results, mean(myMA[i,]))) user system elapsed 39.156 6.369 45.559

results <- numeric(length(myMA[,1])) organization.fourth dimension(for(i in seq(along=myMA[,1])) results[i] <- mean(myMA[i,])) user organisation elapsed 1.550 0.005 ane.556

The inject approach is 20-50 times faster than the append version.

(2) Speed comparison of apply loop versus rowMeans for computing the mean for each row in a big matrix:

system.time(myMAmean <- apply(myMA, one, mean)) user system elapsed i.452 0.005 1.456

system.fourth dimension(myMAmean <- rowMeans(myMA)) user system elapsed 0.005 0.001 0.006

The rowMeans approach is over 200 times faster than the apply loop.

(three) Speed comparison of apply loop versus vectorized approach for computing the standard deviation of each row:

organization.time(myMAsd <- apply(myMA, one, sd)) user system elapsed 3.707 0.014 three.721 myMAsd[i:4] one 2 3 4 0.8505795 1.3419460 one.3768646 1.3005428

organisation.fourth dimension(myMAsd <- sqrt((rowSums((myMA-rowMeans(myMA))^ii)) / (length(myMA[ane,])-1))) user system elapsed 0.020 0.009 0.028

                                                       myMAsd[1:4]
                                    1         two         iii         4                              
                              0.8505795 i.3419460 1.3768646 1.3005428

The vector-based approach in the last footstep is over 200 times faster than the employ loop.

(4) Case for computing the mean for any custom selection of columns without compromising the speed performance:

## In the post-obit the colums are named co-ordinate to their selection in myList myList <- tapply(colnames(myMA), c(1,1,1,two,2,2,three,3,4,4), list) myMAmean <- sapply(myList, function(x) rowMeans(myMA[,ten])) colnames(myMAmean) <- sapply(myList, paste, collapse="_") myMAmean[1:4,] C1_C2_C3 C4_C5_C6 C7_C8 C9_C10 1 0.0676799 -0.2860392 0.09651984 -0.7898946 ii -0.6120203 -0.7185961 0.91621371 1.1778427 3 0.2960446 -0.2454476 -1.18768621 0.9019590 4 0.9733695 -0.6242547 0.95078869 -0.7245792
## Alternative to achieve the same result with similar performance, but in a much less elegant fashion myselect <- c(1,1,ane,two,2,2,3,three,4,4) # The colums are named according to the option stored in myselect myList <- tapply(seq(forth=myMA[1,]), myselect, role(10) paste("myMA[ ,", x, "]", sep="")) myList <- sapply(myList, function(ten) paste("(", paste(ten, plummet=" + "),")/", length(x))) myMAmean <- sapply(myList, office(x) eval(parse(text=x))) colnames(myMAmean) <- tapply(colnames(myMA), myselect, paste, collapse="_") myMAmean[ane:4,] C1_C2_C3 C4_C5_C6 C7_C8 C9_C10 ane 0.0676799 -0.2860392 0.09651984 -0.7898946 2 -0.6120203 -0.7185961 0.91621371 i.1778427 3 0.2960446 -0.2454476 -ane.18768621 0.9019590 4 0.9733695 -0.6242547 0.95078869 -0.7245792

Functions

A very useful feature of the R environment is the possibility to expand existing functions and to easily write custom functions. In fact, virtually of the R software can be viewed as a serial of R functions.

Syntax to define functions

myfct <- function(arg1, arg2, ...) { function_body }

The value returned by a role is the value of the office body, which is usually an unassigned final expression, e.grand.: return()

Syntax to telephone call functions

myfct(arg1=..., arg2=...)

Syntax Rules for Functions

General

Functions are defined by (i) consignment with the keyword function, (two) the annunciation of arguments/variables (arg1, arg2, ...) and (3) the definition of operations (function_body) that perform computations on the provided arguments. A function name needs to exist assigned to telephone call the part (come across below).

Naming

Function names can exist most annihilation. Notwithstanding, the usage of names of existing functions should be avoided.

Arguments

It is often useful to provide default values for arguments (eastward.g. : arg1=1:10). This way they don't need to be provided in a part telephone call. The argument list can as well be left empty (myfct <- function() { fct_body }) when a role is expected to return always the same value(s). The argument '...' can be used to permit one office to pass on argument settings to another.

Function trunk

The actual expressions (commands/operations) are defined in the function body which should be enclosed past braces. The individual commands are separated by semicolons or new lines (preferred).

Calling functions

Functions are called by their name followed by parentheses containing possible argument names. Empty parenthesis after the role proper noun volition upshot in an error message when a function requires certain arguments to be provided by the user. The function name solitary volition impress the definition of a role.

Variables created inside a office be only for the life time of a function. Thus, they are not accessible exterior of the function. To force variables in functions to exist globally, i can use this special assignment operator: '<<-'. If a global variable is used in a office, then the global variable will be masked merely inside the role.

Example: Role nuts

myfct <- function(x1, x2=five) { z1 <- x1/x1 z2 <- x2*x2 myvec <- c(z1, z2) return(myvec) } myfct # prints definition of function myfct(x1=2, x2=5) # applies function to values 2 and v [i] 1 25

myfct(two, 5) # the argument names are not necessary, but then the order of the specified values becomes important myfct(x1=two) # does the same as before, just the default value '5' is used in this case

Example: Function with optional arguments

myfct2 <- function(x1=5, opt_arg) { if(missing(opt_arg)) { # 'missing()' is used to examination whether a value was specified as an statement z1 <- one:10 } else { z1 <- opt_arg } true cat("my part returns:", "\n") return(z1/x1) } myfct2(x1=5) # performs calculation on default vector (z1) that is defined in the part body my function returns: [1] 0.2 0.4 0.6 0.8 1.0 ane.2 1.iv one.6 1.viii 2.0

myfct2(x1=5, opt_arg=30:twenty) # a custom vector is used instead when the optional argument (opt_arg) is specified my function returns: [ane] 6.0 5.8 five.6 5.4 five.2 five.0 4.8 4.6 4.4 4.2 iv.0

Command utilities for functions: return, warning and stop

Return

The evaluation menstruation of a function may be terminated at any stage with the return role. This is oft used in combination with conditional evaluations.

Finish

To stop the action of a part and impress an error bulletin, one can utilize the end function.

To print a alarm message in unexpected situations without aborting the evaluation menses of a office, one tin employ the function alarm("...").

myfct <- function(x1) { if (x1>=0) print(x1) else stop("This function did non finish, because x1 < 0") warning("Value needs to be > 0") } myfct(x1=2) [1] two Warning bulletin: In myfct(x1 = 2) : Value needs to be > 0

myfct(x1=-2) Error in myfct(x1 = -ii) : This function did not finish, because x1 < 0

Useful Utilities

Debugging Utilities

Several debugging utilities are bachelor for R. The nearly of import utilities are: traceback(), browser(), options(error=recover), options(fault=NULL) and debug() . The Debugging in R page provides an overview of the available resources.

Regular Expressions

R's regular expression utilities work similar as in other languages. To larn how to use them in R, ane can consult the main help folio on this topic with ?regexp. The post-obit gives a few basic examples.

The grep role tin exist used for finding patterns in strings, here letter of the alphabet A in vector month.proper name.

calendar month.proper name[grep("A", month.proper name)] [1] "April" "August"

Case for using regular expressions to substitute a pattern by some other one using the sub/gsub office with a back reference. Remember: single escapes '\' need to be double escaped '\\' in R.

gsub("(i.*a)", "xxx_\\1", "virginica", perl = TRUE) [i] "vxxx_irginica"

Example for divide and paste functions

ten <- gsub("(a)", "\\1_", month.name[i], perl=True) # performs exchange with back reference which inserts in this example a '_' grapheme x [i] "Ja_nua_ry"

strsplit(ten, "_") # splits cord on inserted graphic symbol from above [[i]] [ane] "Ja" "nua" "ry"

                                                   paste(rev(unlist(strsplit(x, NULL))), collapse="")                          # reverses character string by splitting first all characters into vector fields and and so collapsing them with paste
                            [ane] "yr_aun_aJ"

Example for importing specific lines in a file with a regular expression. The following example demonstrates the retrieval of specific lines from an external file with a regular expression. First, an external file is created with the cat role, all lines of this file are imported into a vector with readLines, the specific elements (lines) are and so retieved with the grep function, and the resulting lines are separate into vector fields with strsplit.

cat(month.proper noun, file="zzz.txt", sep="\n") ten <- readLines("zzz.txt") ten <- x[c(grep("^J", as.character(x), perl = TRUE))] t(as.information.frame(strsplit(x, "u"))) [,1] [,2] c..Jan....ary.. "Jan" "ary" c..J....ne.. "J" "ne" c..J....ly.. "J" "ly"

Interpreting Grapheme String as Expression

Case

mylist <- ls() # generates vector of object names in session mylist[1] # prints name of 1st entry in vector simply does not execute it as expression that returns values of tenth object become(mylist[1]) # uses 1st entry name in vector and executes it as expression eval(parse(text=mylist[1])) # alternative approach to obtain similar result

Time, Appointment and Sleep

Example

system.time(ls()) # returns CPU (and other) times that an expression used, here ls() user system elapsed 0 0 0

date() # returns the current system date and time [i] "Wednesday December eleven 15:31:17 2012"

                                                   Sys.sleep(1)                          # break execution of R expressions for a given number of seconds (eastward.g. in loop)

Calling External Software with Organization Control

The arrangement command allows to phone call any command-line software from inside R on Linux, UNIX and OSX systems.

arrangement("...") # provide nether '...' command to run external software e.m. Perl, Python, C++ programs

Related utilities on Windows operating systems

x <- trounce("dir", intern=T) # reads current working directory and assigns to file trounce.exec("C:/Documents and Settings/Administrator/Desktop/my_file.txt") # opens file with associated program

Miscellaneous Utilities

(1) Batch import and export of many files.
In the following example all file names ending with *.txt in the current directory are first assigned to a list (the '$' sign is used to ballast the match to the end of a string). 2d, the files are imported one-past-one using a for loop where the original names are assigned to the generated information frames with the assign role. Consult help with ?read.table to sympathize arguments row.names=1 and comment.char = "A". Third, the data frames are exported using their names for file naming and appending the extension *.out.

files <- list.files(pattern=".txt$") for(i in files) { ten <- read.table(i, header=TRUE, comment.char = "A", sep="\t") assign(i, x) impress(i) write.tabular array(x, paste(i, c(".out"), sep=""), quote=FALSE, sep="\t", col.names = NA) }

(2) Running Web Applications (basics on designing spider web customer/crawling/scraping scripts in R)
Instance for obtaining MW values for peptide sequences from the EXPASY'southward pI/MW Tool web folio.

myentries <- c("MKWVTFISLLFLFSSAYS", "MWVTFISLL", "MFISLLFLFSSAYS") myresult <- Zip for(i in myentries) { myurl <- paste("http://ca.expasy.org/cgi-bin/pi_tool?poly peptide=", i, "&resolution=monoisotopic", sep="") 10 <- url(myurl) res <- readLines(x) close(x) mylines <- res[grep("Theoretical pI/Mw:", res)] myresult <- c(myresult, every bit.numeric(gsub(".*/ ", "", mylines))) print(myresult) Sys.sleep(ane) # halts process for i sec to give web service a break } final <- data.frame(Pep=myentries, MW=myresult) true cat("\n The MW values for my peptides are:\due north") final Pep MW one MKWVTFISLLFLFSSAYS 2139.xi 2 MWVTFISLL 1108.sixty 3 MFISLLFLFSSAYS 1624.82

Running R Programs

(1) Executing an R script from the R panel

source("my_script.R")

(2.1) Syntax for running R programs from the command-line. Requires in first line of my_script.R the following argument: #!/usr/bin/env Rscript

$ Rscript my_script.R # or only ./myscript.R later on making file executable with 'chmod +x my_script.R'

All commands starting with a '$' sign demand to exist executed from a Unix or Linux shell.

(2.2) Alternatively, 1 tin use the following syntax to run R programs in BATCH mode from the control-line.

$ R CMD BATCH [options] my_script.R [outfile]

The output file lists the commands from the script file and their outputs. If no outfile is specified, the name used is that of infile and .Rout is appended to outfile. To stop all the usual R command line data from being written to the outfile, add this equally first line to my_script.R file: options(echo=FALSE). If the control is run similar this R CMD BATCH --no-save my_script.R, then nil will be saved in the .Rdata file which can become oft very big. More on this can exist found on the aid pages: $ R CMD BATCH --aid or ?BATCH.

(ii.3) Another alternative for running R programs as silently as possible.

$ R --slave < my_infile > my_outfile

Statement --slave makes R run as 'quietly' equally possible.

(3) Passing Control-Line Arguments to R Programs
Create an R script, here named test.R, like this one:

######################
myarg <- commandArgs()
print(iris[ane:myarg[half-dozen], ])
######################

Then run it from the command-line like this:

$ Rscript test.R 10

In the given case the number 10 is passed on from the command-line as an argument to the R script which is used to return to STDOUT the outset 10 rows of the iris sample information. If several arguments are provided, they will be interpreted every bit one string that needs to be split information technology in R with the strsplit office.

(4) Submitting R script to a Linux cluster via Torque
Create the following shell script my_script.sh

#################################
#!/bin/bash
cd $PBS_O_WORKDIR
R CMD BATCH --no-save my_script.R
#################################

This script doesn't demand to accept executable permissions. Utilize the following qsub command to send this shell script to the Linux cluster from the directory where the R script my_script.R is located. To utilize several CPUs on the Linux cluster, ane can separate the input information into several smaller subsets and execute for each subset a separate process from a dedicated directory.

$ qsub my_script.sh

Here is a brusk R script that generates the required files and directories automatically and submits the jobs to the nodes: submit2cluster.R. For more details, see also this 'Tutorial on Parallel Programming in R' by Hanna Sevcikova

(v) Submitting jobs to Torque or any other queuing/scheduling system via the BatchJobs package. This package provides 1 of the most advanced resource for submitting jobs to queuing systems from within R. A related parcel is BiocParallel from Bioconductor which extends many functionalities of BatchJobs to genome data analysis. Useful documentation for BatchJobs: Technical Report, GitHub folio, Slide Show, Config samples.

library(BatchJobs) loadConfig(conffile = ".BatchJobs.R") ## Loads configuration file. Here .BatchJobs.R containing only this line: ## cluster.functions <- makeClusterFunctionsTorque("torque.tmpl") ## The template file torque.tmpl is expected to be in the current working ## director. It can be downloaded from here: ## https://github.com/tudo-r/BatchJobs/blob/master/examples/cfTorque/simple.tmpl

getConfig() # Returns BatchJobs configuration settings reg <- makeRegistry(id="BatchJobTest", work.dir="results") ## Constructs a registry object. Output files from R will be stored under directory "results", ## while the

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  standard objects from                                                          BatchJobs                                                          will be stored in the directory "BatchJobTest-files"                                                                                                                                                                                                                                                                                                                                                                                                                              .                                    
print(reg)                                      ## Some test function                                      
f <- part(x) {
        system("ls -al >> test.txt")
        ten
}
                                                                          ## Adds jobs to registry object (here reg)                                      
ids <- batchMap(reg, fun=f, one:10)
print(ids)
showStatus(reg)
                                                                          ## Submit jobs or chunks of jobs to batch system via cluster function                                      
done <- submitJobs(reg, resources=list(walltime=3600, nodes="1:ppn=four", retention="4gb"))
                                                                          ## Load results from BatchJobTest-files/jobs/01/1-result.RData                                      
loadResult(reg, 1)

Object-Oriented Programming (OOP)

R supports 2 systems for object-oriented programming (OOP). An older S3 organisation and a more than recently introduced S4 system. The latter is more formal, supports multiple inheritance, multiple dispatch and introspection. Many of these features are not available in the older S3 organisation. In general, the OOP approach taken past R is to split up the class specifications from the specifications of generic functions (part-centric system). The post-obit introduction is restricted to the S4 system since information technology is present the preferred OOP method for R. More information virtually OOP in R tin be found in the following introductions: Vincent Zoonekynd's introduction to S3 Classes, S4 Classes in 15 pages, Christophe Genolini'southward S4 Intro, The R.oo bundle, BioC Course: Advanced R for Bioinformatics, Programming with R by John Chambers and R Programming for Bioinformatics by Robert Gentleman.

Define S4 Classes

(A) Ascertain S4 Classes with setClass() and new()

y <- matrix(i:50, x, 5) # Sample data set setClass(Class="myclass", representation=representation(a="Whatever"), prototype=epitome(a=y[1:2,]), # Defines default value (optional) validity=function(object) { # Can exist defined in a carve up footstep using setValidity if(course(object@a)!="matrix") { return(paste("expected matrix, simply obtained", class(object@a))) } else { return(TRUE) } } )

Table of Contents

The setClass function defines classes. Its most important arguments are

Class: the name of the class
representation: the slots that the new form should have and/or other classes that this grade extends.
prototype: an object providing default data for the slots.
contains: the classes that this course extends.
validity, access, version : command arguments included for compatibility with S-Plus.
where: the environment to use to shop or remove the definition equally meta data.

(B) The function new creates an example of a class (here myclass)

myobj <- new("myclass", a=y) myobj An object of form "myclass" Slot "a": [,1] [,two] [,3] [,4] [,5] [1,] 1 11 21 31 41 [ii,] two 12 22 32 42 ...

new("myclass", a=iris) # Returns an fault message due to wrong input type (iris is data frame) Mistake in validObject(.Object) : invalid class "myclass" object: expected matrix, but obtained data.frame

Course: the name of the class
...: Data to include in the new object with arguments according to slots in class definition.

(C) A more generic mode of creating course instances is to ascertain an initialization method (details below)

setMethod("initialize", "myclass", part(.Object, a) { .Object@a <- a/a .Object }) new("myclass", a = y) [1] "initialize" new("myclass", a = y)> new("myclass", a = y) An object of class "myclass" Slot "a": [,1] [,2] [,3] [,4] [,5] [1,] one i 1 one ane [2,] 1 one 1 1 1 ...

(D) Usage and helper functions

myobj@a # The '@' extracts the contents of a slot. Usage should be limited to internal functions! initialize(.Object=myobj, a=as.matrix(cars[1:3,])) # Creates a new S4 object from an onetime 1. # removeClass("myclass") # Removes object from electric current session; does non apply to associated methods.

(E) Inheritance: allows to ascertain new classes that inherit all backdrop (east.g. data slots, methods) from their existing parent classes

setClass("myclass1", representation(a = "character", b = "grapheme")) setClass("myclass2", representation(c = "numeric", d = "numeric")) setClass("myclass3", contains=c("myclass1", "myclass2")) new("myclass3", a=letters[1:4], b=messages[1:4], c=1:4, d=4:1) An object of class "myclass3" Slot "a": [1] "a" "b" "c" "d"

Slot "b": [1] "a" "b" "c" "d"

                                                                           Slot "c":
                                      [ane] one 2 3 iv
                                                                           Slot "d":
                                      [1] iv 3 2 i
                                                                                                         getClass("myclass1")
                                    Class "myclass1" [in ".GlobalEnv"]
                                         Slots:
                                                                               Proper name:          a         b
                                        Course: character character
                                                                               Known Subclasses: "myclass3"
                                                                      
                                                                       getClass("myclass2")
                                    Form "myclass2" [in ".GlobalEnv"]
                                         Slots:
                                                                               Proper name:        c       d
                                        Form: numeric numeric
                                                                               Known Subclasses: "myclass3"
                                                                      
                                                                       getClass("myclass3")
                                    Class "myclass3" [in ".GlobalEnv"]
                                         Slots:
                                                                               Name:          a         b         c         d
                                        Class: character character   numeric   numeric
                                                                               Extends: "myclass1", "myclass2"

The argument contains allows to extend existing classes; this propagates all slots of parent classes.

(F) Coerce objects to some other course

setAs(from="myclass", to="character", def=function(from) as.character(every bit.matrix(from@a))) as(myobj, "character") [ane] "ane" "two" "3" "four" "five" "half-dozen" "seven" "8" "9" "10" "11" "12" "13" "14" "fifteen" ...

(G) Virtual classes are constructs for which no instances will exist or can be created. They are used to link together classes which may have distinct representations (e.grand. cannot inherit from each other) merely for which one wants to provide similar functionality. Ofttimes it is desired to create a virtual grade and to and so have several other classes extend information technology. Virtual classes can exist defined by leaving out the representation argument or including the class VIRTUAL:

setClass("myVclass") setClass("myVclass", representation(a = "character", "VIRTUAL"))

getClass("myclass")
getSlots("myclass")
slotNames("myclass")
extends("myclass2")

Assign Generics and Methods

Assign generics and methods with setGeneric() and setMethod()

(A) Accessor part (to avoid usage of '@')

setGeneric(name="acc", def=function(x) standardGeneric("acc")) setMethod(f="acc", signature="myclass", definition=function(ten) { render(x@a) }) acc(myobj) [,1] [,2] [,3] [,four] [,5] [1,] 1 11 21 31 41 [2,] 2 12 22 32 42 ...

setGeneric(name="acc<-", def=function(x, value) standardGeneric("acc<-")) setReplaceMethod(f="acc", signature="myclass", definition=function(ten, value) { x@a <- value return(x) }) ## Later this the following supervene upon operations with 'acc' piece of work on new object class acc(myobj)[1,one] <- 999 # Replaces get-go value colnames(acc(myobj)) <- letters[1:5] # Assigns new column names rownames(acc(myobj)) <- messages[ane:10] # Assigns new row names myobj An object of class "myclass" Slot "a": a b c d eastward a 999 11 21 31 41 b 2 12 22 32 42 ...

(B.2) Replacement method using "[" operator ([<-)

setReplaceMethod(f="[", signature="myclass", definition=function(x, i, j, value) { x@a[i,j] <- value return(x) }) myobj[1,2] <- 999 myobj An object of class "myclass" Slot "a": a b c d east a 999 999 21 31 41 b 2 12 22 32 42 ...

(C) Define beliefs of "[" subsetting operator (no generic required!)

setMethod(f="[", signature="myclass", definition=function(x, i, j, ..., drop) { x@a <- x@a[i,j] return(ten) }) myobj[one:2,] # Standard subsetting works now on new class An object of class "myclass" Slot "a": a b c d e a 999 999 21 31 41 b 2 12 22 32 42 ...

(D) Define impress beliefs

setMethod(f= show", signature="myclass", definition=office(object) { true cat("An instance of ", "\"", class(object), "\"", " with ", length(acc(object)[,1]), " elements", "\north", sep="") if(length(acc(object)[,1])>=5) { print(every bit.data.frame(rbind(acc(object)[one:2,], ...=rep("...", length(acc(object)[ane,])), acc(object)[(length(acc(object)[,1])-1):length(acc(object)[,ane]),]))) } else { print(acc(object)) }}) myobj # Prints object with custom method An case of "myclass" with 10 elements a b c d e a 999 999 21 31 41 b 2 12 22 32 42 ... ... ... ... ... ... i 9 nineteen 29 39 49 j 10 20 30 40 50

(E) Define a information specific function (here randomize row order)

setGeneric(name="randomize", def=function(x) standardGeneric("randomize")) setMethod(f="randomize", signature="myclass", definition=function(x) { acc(x)[sample(1:length(acc(x)[,1]), length(acc(x)[,1])), ] }) randomize(myobj) a b c d e j 10 20 thirty 40 50 b 2 12 22 32 42 ...

(F) Ascertain a graphical plotting function and allow user to access it with generic plot function

setMethod(f="plot", signature="myclass", definition=role(ten, ...) { barplot(as.matrix(acc(x)), ...) }) plot(myobj)

(Grand) Functions to inspect methods

showMethods(course="myclass")
findMethods("randomize")
getMethod("randomize", signature="myclass")
existsMethod("randomize", signature="myclass")

Edifice R Packages

To get familiar with the structure, building and submission process of R packages, users should carefully read the documentation on this topic available on these sites:

Writing R Extensions, R spider web site
R Packages, by Hadley Wickham
R Package Primer, by Karl Broman
Bundle Guidelines, Bioconductor
Advanced R Programming Class, Bioconductor

Brusque Overview of Package Building Procedure

(A) Automated parcel edifice with the package.skeleton function:

package.skeleton(name="mypackage", code_files=c("script1.R", "script2.R"))

Note: this is an optional only very user-friendly function to get started with a new package. The given instance will create a directory named mypackage containing the skeleton of the packet for all functions, methods and classes divers in the R script(due south) passed on to the code_files argument. The basic structure of the bundle directory is described here. The package directory will also incorporate a file named 'Read-and-delete-me' with the following instructions for completing the package:

Edit the assistance file skeletons in man, perchance combining assist files for multiple functions.
Edit the exports in NAMESPACE, and add necessary imports.
Put whatever C/C++/Fortran code in src.
If you have compiled code, add a useDynLib() directive to NAMESPACE.
Run R CMD build to build the package tarball.
Run R CMD check to bank check the package tarball.
Read Writing R Extensions for more than data.

(B) Once a bundle skeleton is available one can build the package from the command-line (Linux/OS 10):

This will create a tarball of the bundle with its version number encoded in the file proper name, e.g.: mypackage_1.0.tar.gz.
Subsequently, the package tarball needs to be checked for errors with:

$ R CMD bank check mypackage_1.0.tar.gz

All bug in a parcel'southward source code and documentation should be addressed until R CMD check returns no mistake or warning messages anymore.

(C) Install parcel from source:

Linux:

install.packages("mypackage_1.0.tar.gz", repos=NULL)

OS X:

install.packages("mypackage_1.0.tar.gz", repos=NULL, blazon="source")

Tabular array of Contents

Windows requires a zip archive for installing R packages, which tin be most conveniently created from the command-line (Linux/Bone 10) by installing the parcel in a local directory (here tempdir) and and so creating a zip archive from the installed package directory:

$ mkdir tempdir $ R CMD INSTALL -l tempdir mypackage_1.0.tar.gz $ cd tempdir $ zip -r mypackage mypackage

## The resulting mypackage.nothing annal tin be installed under Windows like this: install.packages("mypackage.nada", repos=NULL)

Table of Contents

This procedure simply works for packages which do non rely on compiled lawmaking (C/C++). Instructions to fully build an R packet nether Windows tin can be found here and here.

(D) Maintain/expand an existing package:

Add new functions, methods and classes to the script files in the ./R directory in your package
Add together their names to the NAMESPACE file of the package
Boosted *.Rd assist templates can be generated with the prompt*() functions similar this:

source("myscript.R") # imports functions, methods and classes from myscript.R prompt(myfct) # writes assistance file myfct.Rd promptClass("myclass") # writes file myclass-grade.Rd promptMethods("mymeth") # writes help file mymeth.Rd

The resulting *.Rd help files can exist edited in a text editor and properly rendered and viewed from within R like this:

library(tools) Rd2txt("./mypackage/human/myfct.Rd") # renders *.Rd files as they wait in terminal help pages checkRd("./mypackage/human being/myfct.Rd") # checks *.Rd help file for issues

(E) Submit parcel to a public repository

Download on of the above exercise files, so start editing this R source file with a programming text editor, such as Vim, Emacs or one of the R GUI text editors. Here is the HTML version of the code with syntax coloring.

Sample Scripts

Batch Operations on Many Files

## (1) Start R from an empty test directory ## (2) Create some files as sample information for(i in month.proper name) { mydf <- information.frame(Month=month.name, Rain=runif(12, min=10, max=100), Evap=runif(12, min=1000, max=2000)) write.table(mydf, file=paste(i , ".infile", sep=""), quote=F, row.names=F, sep="\t") }

## (3) Import created files, perform calculations and consign to renamed files files <- list.files(pattern=".infile$") for(i in seq(along=files)) { # starting time for loop with numeric or character vector; numeric vector is often more than flexible x <- read.table(files[i], header=TRUE, row.names=1, comment.char = "A", sep="\t") x <- data.frame(x, sum=apply(x, 1, sum), mean=apply(ten, i, mean)) # calculates sum and mean for each data frame assign(files[i], 10) # generates information frame object and names it after content in variable 'i' print(files[i], quote=F) # prints loop iteration to screen to check its condition write.table(ten, paste(files[i], c(".out"), sep=""), quote=Fake, sep="\t", col.names = NA) }

                                                                                                          ## (4) Same as above, but file naming past alphabetize data frame. This way one can organize file names past external table.                                    
                                    name_df <- data.frame(Old_name=sort(files), New_name=sort(month.abb))
                                    for(i in seq(along=name_df[,one])) {                                    
                                    10 <- read.table(as.vector(name_df[i,ane]), header=Truthful, row.names=1, comment.char = "A", sep="\t")
                                    x <- information.frame(x, sum=apply(x, 1, sum), hateful=apply(x, ane, mean))
                                    assign(equally.vector(name_df[i,2]), ten)                                    # generates data frame object and names information technology after 'i' entry in column 2                                    
                                    print(every bit.vector(name_df[i,1]), quote=F)
                                    write.table(ten, paste(equally.vector(name_df[i,two]), c(".out"), sep=""), quote=FALSE, sep="\t", col.names = NA)
                                    }
                                                                                                          ## (5) Append content of all input files to one file.                                    
                                    files <- list.files(pattern=".infile$")
                                    all_files <- data.frame(files=NULL, Month=NULL, Gain=NULL , Loss=Zilch, sum=Nothing, mean=NULL)                                    # creates empty data frame container                                    
                                    for(i in seq(forth=files)) {
                                    x <- read.table(files[i], header=TRUE, row.names=i, comment.char = "A", sep="\t")
                                    10 <- data.frame(x, sum=employ(x, 1, sum), mean=use(x, i, hateful))                                    
                                    x <- data.frame(file=rep(files[i], length(x[,1])), x)                                    # adds file tracking column to 10                                    
                                    all_files <- rbind(all_files, x)                                    # appends data from all files to data frame 'all_files'                                    
                                    write.tabular array(all_files, file="all_files.xls", quote=Faux, sep="\t", col.names = NA)                                    
                                    }
                                                                                                          ## (6) Write the above code into a text file and execute it with the commands 'source' and 'BATCH'.                                    
                                    source("my_script.R") # execute from R panel                                    
                                    $ R CMD BATCH my_script.R                                    # execute from shell

Big-scale Array Analysis

Sample script to perform big-scale expression array analysis with circuitous queries: lsArray.R. To demo what the script does, run it like this:

source("http://kinesthesia.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/lsArray.R")

Graphical Procedures: Characteristic Map Instance

Script to plot characteristic maps of genes or chromosomes: featureMap.R. To demo what the script does, run it like this:

source("http://kinesthesia.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/featureMap.txt")

Sequence Analysis Utilities

Includes sequence batch import, sub-setting, design matching, AA Composition, NEEDLE, PHYLIP, etc. The script 'sequenceAnalysis.R' demonstrates how R can be used as a powerful tool for managing and analyzing large sets of biological sequences. This case also shows how easy it is to integrate R with the EMBOSS projection or other external programs. The script provides the post-obit functionality:

Batch sequence import into R data frame
Motif searching with hit statistics
Analysis of sequence composition
All-confronting-all sequence comparisons
Generation of phylogenetic trees

To demonstrate the utilities of the script, users can simply execute information technology from R with the following source control:

source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/sequenceAnalysis.txt")

Pattern Matching and Positional Parsing of Sequences

Functions for importing sequences into R, retrieving reverse and complement of nucleotide sequences, pattern searching, positional parsing and exporting search results in HTML format: patternSearch.R. To demo what the script does, run it similar this:

source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/patternSearch.R")

Identify Over-Represented Strings in Sequence Sets

Functions for finding over-represented words in sets of Deoxyribonucleic acid, RNA or protein sequences: wordFinder.R. To demo what the script does, run information technology similar this:

source("http://kinesthesia.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/wordFinder.R")

Translate DNA into Protein

Script 'translateDNA.R' for translating NT sequences into AA sequences (required codon table). To demo what the script does, run it like this:

source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/translateDNA.R")

Subsetting of Construction Definition Files (SDF)

Script for importing and subsetting SDF files: sdfSubset.R. To demo what the script does, run information technology like this:

source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/sdfSubset.R")

Managing Latex BibTeX Databases

Script for importing BibTeX databases into R, retrieving the individual references with a full-text search function and viewing the results in R or in HubMed: BibTex.R. To demo what the script does, run information technology like this:

source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/BibTex.R")

Loan Payments and Amortization Tables

This script calculates monthly and almanac mortgage or loan payments, generates amortization tables and plots the results: mortgage.R. A Shiny App using this function has been created past Antoine Soetewey here. To demo what the script does, run it like this:

source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/mortgage.R")

Course Consignment: GC Content, Reverse & Complement

Apply the above data to write a role that calculates for a set up of DNA sequences their GC content and generates their reverse and complement. Hither are some useful commands that can be incorporated in this function:

## Generate an example data frame with ID numbers and Dna sequences fx <- function(test) { 10 <- as.integer(runif(20, min=one, max=5)) x[x==one] <- "A"; x[x==ii] <- "T"; x[x==3] <- "Chiliad"; x[10==4] <- "C" paste(ten, sep = "", plummet ="") } z1 <- c() for(i in 1:l) { z1 <- c(fx(i), z1) } z1 <- data.frame(ID=seq(along=z1), Seq=z1) z1

## Write each grapheme of sequence into divide vector field and reverse its society my_split <- strsplit(as.character(z1[1,ii]),"") my_rev <- rev(my_split[[one]]) paste(my_rev, collapse="")

                                                                                                                      ## Generate the sequence complement by replacing Thousand|C|A|T by C|One thousand|T|A
                                          ## Use 'apply' or 'for loop' to apply the above operations to all sequences in sample data frame 'z1'
                                          ## Calculate in the same loop the GC content for each sequence using the post-obit command                                        
                                        table(my_split[[i]])/length(my_split[[one]])

Serbo-Croation version translated past Jovana Milutinovich

mejiawhater1965.blogspot.com

Source: http://manuals.bioinformatics.ucr.edu/home/programming-in-r

For Loop to Read in Several Files R

Introduction

R Basics

Lawmaking Editors for R

Integrating R with Vim and Tmux

Finding Help

Control Structures

Conditional Executions

Comparing Operators

Logical Operators

If Statements

Ifelse Statements

Loops

For Loop

While Loop

Apply Loop Family unit

For T wo-Dimensional Data Sets: apply

For R agged Arrays: tapply

For Vectors and Lists: lapply and sapply

Other Loops

Improving Speed Functioning of Loops

Functions

Useful Utilities

Debugging Utilities

Regular Expressions

Interpreting Grapheme String as Expression

Time, Appointment and Sleep

Calling External Software with Organization Control

Miscellaneous Utilities

Running R Programs

Object-Oriented Programming (OOP)

Define S4 Classes

Assign Generics and Methods

Edifice R Packages

Sample Scripts

Batch Operations on Many Files

Big-scale Array Analysis

Graphical Procedures: Characteristic Map Instance

Sequence Analysis Utilities

Pattern Matching and Positional Parsing of Sequences

Identify Over-Represented Strings in Sequence Sets

Translate DNA into Protein

Subsetting of Construction Definition Files (SDF)

Managing Latex BibTeX Databases

Loan Payments and Amortization Tables

Course Consignment: GC Content, Reverse & Complement

0 Response to "For Loop to Read in Several Files R"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel