For Loop to Read in Several Files R

Programming in R

Contents

  1. 1 Introduction
  2. 2 R Basics
  3. 3 Code Editors for R
  4. 4 Integrating R with Vim and Tmux
  5. 5 Finding Assistance
  6. 6 Command Structures
    1. vi.one Conditional Executions
      1. 6.1.one Comparison Operators
      2. six.i.2 Logical Operators
      3. 6.1.3 If Statements
      4. 6.1.4 Ifelse Statements
    2. 6.2 Loops
      1. 6.2.1 For Loop
      2. 6.2.2 While Loop
      3. 6.ii.iii Apply Loop Family
        1. 6.2.three.ane For Two-Dimensional Data Sets: utilize
        2. half dozen.2.three.2 For Ragged Arrays: tapply
        3. half-dozen.2.3.iii For Vectors and Lists: lapply and sapply
      4. 6.2.four Other Loops
      5. vi.ii.5 Improving Speed Performance of Loops
  7. seven Functions
  8. viii Useful Utilities
    1. 8.1 Debugging Utilities
    2. 8.2 Regular Expressions
    3. eight.3 Interpreting Character String as Expression
    4. 8.4 Fourth dimension, Date and Sleep
    5. eight.five Calling External Software with System Command
    6. 8.6 Miscellaneous Utilities
  9. ix Running R Programs
  10. 10 Object-Oriented Programming (OOP)
    1. x.1 Define S4 Classes
    2. 10.2 Assign Generics and Methods
  11. 11 Building R Packages
  12. 12 Reproducible Research past Integrating R with Latex or Markdown
  13. 13 R Programming Exercises
    1. 13.ane Exercise Slides
    2. 13.two Sample Scripts
      1. thirteen.2.1 Batch Operations on Many Files
      2. xiii.ii.two Large-scale Array Analysis
      3. 13.2.3 Graphical Procedures: Characteristic Map Example
      4. 13.2.4 Sequence Analysis Utilities
      5. 13.2.5 Pattern Matching and Positional Parsing of Sequences
      6. thirteen.2.6 Identify Over-Represented Strings in Sequence Sets
      7. 13.2.7 Translate DNA into Protein
      8. xiii.2.8 Subsetting of Structure Definition Files (SDF)
      9. xiii.ii.ix Managing Latex BibTeX Databases
      10. xiii.two.10 Loan Payments and Amortization Tables
      11. 13.two.eleven Course Assignment: GC Content, Contrary & Complement
  14. 14 Translation of this Page

Introduction

[ Slides ]   [ R Code ]

General Overview

One of the principal attractions of using the R (http://cran.at.r-project.org) environment is the ease with which users can write their own programs and custom functions. The R programming syntax is extremely easy to learn, even for users with no previous programming experience. Once the basic R programming control structures are understood, users can employ the R linguistic communication as a powerful environment to perform circuitous custom analyses of almost whatever type of data.

Format of this Manual

In this transmission all commands are given in code boxes, where the R lawmaking is printed in black, the comment text in blue and the output generated by R in green. All comments/explanations get-go with the standard comment sign '#' to prevent them from existence interpreted by R as commands. This manner the content in the code boxes tin can exist pasted with their comment text into the R console to evaluate their utility. Occasionally, several commands are printed on 1 line and separated by a semicolon ';'. Commands starting with a '$' sign need to be executed from a Unix or Linux trounce. Windows users can simply ignore them.

R Basics

The R & BioConductor manual provides a general introduction to the usage of the R environment and its bones command syntax.


Lawmaking Editors for R

Several fantabulous lawmaking editors are available that provide functionalities like R syntax highlighting, automobile code indenting and utilities to send code/functions to the R console.

  • Basic code editors provided by Rguis
  • RStudio: GUI-based IDE for R
  • Vim-R-Tmux: R working environs based on vim and tmux
  • Emacs (ESS add-on package)
  • gedit and Rgedit
  • RKWard
  • Eclipse
  • Tinn-R
  • Notepad++ (NppToR)

Programming in R using Vim or Emacs                                                              Programming in R using RStudio








Integrating R with Vim and Tmux

Users interested in integrating R with vim and tmux may want to consult the Vim-R-Tmux configuration page.

Finding Help

Reference list on R programming (selection)

  • R Programming for Bioinformatics, past Robert Gentleman
  • Advanced R, by Hadley Wickham
  • Due south Programming, by Westward. N. Venables and B. D. Ripley
  • Programming with Data, by John M. Chambers
  • R Help & R Coding Conventions, Henrik Bengtsson, Lund University
  • Programming in R (Vincent Zoonekynd)
  • Peter's R Programming Pages, University of Warwick
  • Rtips, Paul Johnsson, University of Kansas
  • R for Programmers, Norm Matloff, UC Davis
  • Loftier-Performance R, Dirk Eddelbuettel tutorial presented at useR-2008
  • C/C++ level programming for R, Gopi Goswami

Control Structures

Conditional Executions

Comparing Operators

  • equal:==
  • non equal: !=
  • greater/less than:> <
  • greater/less than or equal:>= <=

Logical Operators

  • and:&
  • or: |
  • not: !

If Statements

If statements operate on length-i logical vectors.

Syntax

if(cond1=true) { cmd1 } else { cmd2 }

Example

if(1==0) {
impress(1)
} else {
print(2)
}
[ane] 2

Avert inserting newlines between '} else'.


Ifelse Statements

Ifelse statements operate on vectors of variable length.

Syntax

ifelse(test, true_value, false_value)

Example

x <- one:x # Creates sample information
ifelse(ten<5 | x>8, x, 0)
[ane]  1  2  iii  four  0  0  0  0  ix 10


Loops

The most commonly used loop structures in R are for, while and utilise loops. Less mutual are repeat loops. The interruption function is used to interruption out of loops, and next halts the processing of the current iteration and advances the looping index.


For Loop

For loops are controlled by a looping vector. In every iteration of the loop one value in the looping vector is assigned to a variable that can be used in the statements of the torso of the loop. Usually, the number of loop iterations is defined past the number of values stored in the looping vector and they are processed in the aforementioned society as they are stored in the looping vector.

Syntax

for(variable in sequence) {
statements
}

Example

mydf <- iris
myve <- NULL # Creates empty storage container
for(i in seq(forth=mydf[,1])) {
myve <- c(myve, mean(as.numeric(mydf[i, ane:three]))) # Note: inject approach is much faster than append with 'c'. Meet beneath for details.
}
myve
[ane] 3.333333 3.100000 three.066667 three.066667 3.333333 3.666667 three.133333 3.300000
[ix] 2.900000 3.166667 three.533333 iii.266667 iii.066667 2.800000 3.666667 3.866667

Table of Contents

Example: condition*

10 <- ane:10
z <- NULL
for(i in seq(along=ten)) {
if(x[i] < 5) {
z <- c(z, x[i] - one)
} else {
z <- c(z, ten[i] / x[i])
}
}
z
[1] 0 one 2 3 1 i 1 1 one 1

Table of Contents

Example: cease on condition and print error message

x <- ane:x
z <- NULL
for(i in seq(along=x)) {
if (10[i]<5) {
z <- c(z,ten[i]-1)
} else {
stop("values need to exist <5")
}
}
Error: values demand to exist <5
z
[1] 0 ane 2 iii

While Loop

Similar to for loop, simply the iterations are controlled past a conditional statement.

Syntax

while(condition) statements

Example

z <- 0
while(z < 5) {
z <- z + ii
impress(z)
}
[i] 2
[1] 4
[1] vi

Apply Loop Family unit

For T wo-Dimensional Data Sets: apply

Syntax

apply(Ten, MARGIN, FUN, ARGs)


10
: array, matrix or information.frame; MARGIN: one for rows, 2 for columns, c(one,two) for both; FUN: one or more than functions; ARGs: possible arguments for function

Example

## Example for applying predefined mean function
utilize(iris[,ane:3], ane, mean)
[1] 3.333333 iii.100000 3.066667 3.066667 3.333333 3.666667 iii.133333 3.300000
...

## With custom function
x <- 1:10
exam <- function(ten) {

# Defines some custom function
if(ten < 5) {
x-ane
} else {
x / x
}
}
apply(as.matrix(x), 1, exam)
#
Returns aforementioned event every bit previous for loop*
[1] 0 1 two three one 1 1 1 one one

## Same as to a higher place but with a single line of code
employ(as.matrix(x), 1, part(ten) { if (x<5) { x-i } else { 10/x } })


[ane] 0 i two 3 1 i 1 1 1 1

Table of Contents


For R agged Arrays: tapply

Applies a office to array categories of variable lengths (ragged array). Group is defined by factor.

Syntax

tapply(vector, factor, FUN)

Example

## Computes mean values of vector agregates defined by cistron
tapply(as.vector(iris[,4]), factor(iris[,5]), mean)
setosa versicolor  virginica
0.246      1.326      2.026

## The aggregate function provides related utilities
aggregate(iris[,one:4], listing(iris$Species), hateful)

Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        v.006       3.428        1.462       0.246
two versicolor        v.936       2.770        4.260       1.326
3  virginica        half dozen.588       two.974        5.552       two.026


    For Vectors and Lists: lapply and sapply

    Both apply a function to vector or list objects. The function lapply returns a listing, while sapply attempts to return the simplest information object, such as vector or matrix instead of listing.

    Syntax

    lapply(X, FUN)
    sapply(X, FUN)

    Case

    ## Creates a sample list
    mylist <- equally.list(iris[one:3,i:iii])
    mylist
    $Sepal.Length
    [1] 5.ane 4.9 4.7

    $Sepal.Width
    [i] 3.5 3.0 3.two

    $Petal.Length
    [i] 1.four i.4 1.3


    ## Compute sum of each list component and render issue as list
    lapply(mylist, sum)
    $Sepal.Length
    [1] fourteen.seven

    $Sepal.Width
    [one] 9.7

    $Petal.Length
    [one] four.1

    ##

    Compute sum of each list component and render result equally vector
    sapply(mylist, sum)
    Sepal.Length  Sepal.Width Petal.Length
    14.vii          ix.7          4.1

      Other Loops

      Echo Loop

      Syntax

      repeat argument s

      Loop is repeated until a break is specified. This ways at that place needs to exist a 2d argument to test whether or non to break from the loop.

      Example

      z <- 0
      repeat {
      z <- z + 1
      print(z)
      if(z > 100) suspension()
      }

      Improving Speed Functioning of Loops

      Looping over very big data sets tin become slow in R. All the same, this limitation can be overcome by eliminating sure operations in loops or avoiding loops over the data intensive dimension in an object altogether. The latter can exist achieved by performing mainly vector-to-vecor or matrix-to-matrix computations which run often over 100 times faster than the respective for() or apply() loops in R. For this purpose, one tin make apply of the existing speed-optimized R functions (e.yard.: rowSums, rowMeans, tabular array, tabulate) or one can pattern custom functions that avoid expensive R loops by using vector- or matrix-based approaches. Alternatively, one can write programs that will perform all time consuming computations on the C-level.

      (1) Speed comparing of for loops with an append versus an inject footstep:

      myMA <- matrix(rnorm(1000000), 100000, 10, dimnames=list(1:100000, paste("C", 1:10, sep="")))
      results <- NULL
      arrangement.time(for(i in seq(along=myMA[,1])) results <- c(results, mean(myMA[i,])))
         user  system elapsed
      39.156   6.369  45.559

      results <- numeric(length(myMA[,1]))
      organization.fourth dimension(for(i in seq(along=myMA[,1])) results[i] <- mean(myMA[i,]))
         user  organisation elapsed
      1.550   0.005   ane.556


        The inject approach is 20-50 times faster than the append version.

        (2) Speed comparison of apply loop versus rowMeans for computing the mean for each row in a big matrix:

        system.time(myMAmean <- apply(myMA, one, mean))
          user  system elapsed
        i.452   0.005   1.456

        system.fourth dimension(myMAmean <- rowMeans(myMA))
           user  system elapsed
        0.005   0.001   0.006


        The rowMeans approach is over 200 times faster than the apply loop.

        (three) Speed comparison of apply loop versus vectorized approach for computing the standard deviation of each row:

        organization.time(myMAsd <- apply(myMA, one, sd))
           user  system elapsed
        3.707   0.014   three.721

        myMAsd[i:4]
                one         2         3         4
        0.8505795 1.3419460 one.3768646 1.3005428

        organisation.fourth dimension(myMAsd <- sqrt((rowSums((myMA-rowMeans(myMA))^ii)) / (length(myMA[ane,])-1)))
           user  system elapsed
        0.020   0.009   0.028

        myMAsd[1:4]
                1         two         iii         4
        0.8505795 i.3419460 1.3768646 1.3005428


        The vector-based approach in the last footstep is over 200 times faster than the employ loop.

        (4) Case for computing the mean for any custom selection of columns without compromising the speed performance:

        ## In the post-obit the colums are named co-ordinate to their selection in myList
        myList <- tapply(colnames(myMA), c(1,1,1,two,2,2,three,3,4,4), list)
        myMAmean <- sapply(myList, function(x) rowMeans(myMA[,ten]))
        colnames(myMAmean) <- sapply(myList, paste, collapse="_")
        myMAmean[1:4,]
            C1_C2_C3   C4_C5_C6       C7_C8     C9_C10
        1  0.0676799 -0.2860392  0.09651984 -0.7898946
        ii -0.6120203 -0.7185961  0.91621371  1.1778427
        3  0.2960446 -0.2454476 -1.18768621  0.9019590
        4  0.9733695 -0.6242547  0.95078869 -0.7245792

        ## Alternative to achieve the same result with similar performance, but in a much less elegant fashion
        myselect <- c(1,1,ane,two,2,2,3,three,4,4) # The colums are named according to the option stored in myselect
        myList <- tapply(seq(forth=myMA[1,]), myselect, role(10) paste("myMA[ ,", x, "]", sep=""))
        myList <- sapply(myList, function(ten) paste("(", paste(ten, plummet=" + "),")/", length(x)))
        myMAmean <- sapply(myList, office(x) eval(parse(text=x)))
        colnames(myMAmean) <- tapply(colnames(myMA), myselect, paste, collapse="_")
        myMAmean[ane:4,]
            C1_C2_C3   C4_C5_C6       C7_C8     C9_C10
        ane  0.0676799 -0.2860392  0.09651984 -0.7898946
        2 -0.6120203 -0.7185961  0.91621371  i.1778427
        3  0.2960446 -0.2454476 -ane.18768621  0.9019590
        4  0.9733695 -0.6242547  0.95078869 -0.7245792

        Functions

        A very useful feature of the R environment is the possibility to expand existing functions and to easily write custom functions. In fact, virtually of the R software can be viewed as a serial of R functions.

        Syntax to define functions

        myfct <- function(arg1, arg2, ...) {
        function_body
        }

        The value returned by a role is the value of the office body, which is usually an unassigned final expression, e.grand.: return()

        Syntax to telephone call functions

        myfct(arg1=..., arg2=...)

        Syntax Rules for Functions

        General

        Functions are defined by (i) consignment with the keyword function, (two) the annunciation of arguments/variables (arg1, arg2, ...) and (3) the definition of operations (function_body) that perform computations on the provided arguments. A function name needs to exist assigned to telephone call the part (come across below).

        Naming

        Function names can exist most annihilation. Notwithstanding, the usage of names of existing functions should be avoided.

        Arguments

        It is often useful to provide default values for arguments (eastward.g. : arg1=1:10). This way they don't need to be provided in a part telephone call. The argument list can as well be left empty (myfct <- function() { fct_body }) when a role is expected to return always the same value(s). The argument '...' can be used to permit one office to pass on argument settings to another.

        Function trunk

        The actual expressions (commands/operations) are defined in the function body which should be enclosed past braces. The individual commands are separated by semicolons or new lines (preferred).

        Calling functions

        Functions are called by their name followed by parentheses containing possible argument names. Empty parenthesis after the role proper noun volition upshot in an error message when a function requires certain arguments to be provided by the user. The function name solitary volition impress the definition of a role.

        Variables created inside a office be only for the life time of a function. Thus, they are not accessible exterior of the function. To force variables in functions to exist globally, i can use this special assignment operator: '<<-'. If a global variable is used in a office, then the global variable will be masked merely inside the role.

        Example: Role nuts

        myfct <- function(x1, x2=five) {
        z1 <- x1/x1
        z2 <- x2*x2
        myvec <- c(z1, z2)
        return(myvec)
        }
        myfct # prints definition of function
        myfct(x1=2, x2=5) # applies function to values 2 and v
        [i]  1 25

        myfct(two, 5) # the argument names are not necessary, but then the order of the specified values becomes important
        myfct(x1=two) # does the same as before, just the default value '5' is used in this case


        Example: Function with optional arguments

        myfct2 <- function(x1=5, opt_arg) {
        if(missing(opt_arg)) { # 'missing()' is used to examination whether a value was specified as an statement
        z1 <- one:10
        } else {
        z1 <- opt_arg
        }
        true cat("my part returns:", "\n")
        return(z1/x1)
        }
        myfct2(x1=5) # performs calculation on default vector (z1) that is defined in the part body
        my function returns:
        [1] 0.2 0.4 0.6 0.8 1.0 ane.2 1.iv one.6 1.viii 2.0

        myfct2(x1=5, opt_arg=30:twenty) # a custom vector is used instead when the optional argument (opt_arg) is specified
        my function returns:
        [ane] 6.0 5.8 five.6 5.4 five.2 five.0 4.8 4.6 4.4 4.2 iv.0


        Command utilities for functions: return, warning and stop

        Return

        The evaluation menstruation of a function may be terminated at any stage with the return role. This is oft used in combination with conditional evaluations.

        Finish

        To stop the action of a part and impress an error bulletin, one can utilize the end function.

        To print a alarm message in unexpected situations without aborting the evaluation menses of a office, one tin employ the function alarm("...").

        myfct <- function(x1) {
        if (x1>=0) print(x1) else stop("This function did non finish, because x1 < 0")
        warning("Value needs to be > 0")
        }
        myfct(x1=2)
        [1] two
        Warning bulletin:
        In myfct(x1 = 2) : Value needs to be > 0

        myfct(x1=-2)
        Error in myfct(x1 = -ii) : This function did not finish, because x1 < 0

        Useful Utilities

        Debugging Utilities

        Several debugging utilities are bachelor for R. The nearly of import utilities are: traceback(), browser(), options(error=recover), options(fault=NULL) and debug() . The Debugging in R page provides an overview of the available resources.

        Regular Expressions

        R's regular expression utilities work similar as in other languages. To larn how to use them in R, ane can consult the main help folio on this topic with ?regexp. The post-obit gives a few basic examples.

        The grep role tin exist used for finding patterns in strings, here letter of the alphabet A in vector month.proper name.

        calendar month.proper name[grep("A", month.proper name)]
        [1] "April"  "August"

        Case for using regular expressions to substitute a pattern by some other one using the sub/gsub office with a back reference. Remember: single escapes '\' need to be double escaped '\\' in R.

        gsub("(i.*a)", "xxx_\\1", "virginica", perl = TRUE)
        [i] "vxxx_irginica"

        Example for divide and paste functions

        ten <- gsub("(a)", "\\1_", month.name[i], perl=True) # performs exchange with back reference which inserts in this example a '_' grapheme
        x
        [i] "Ja_nua_ry"

        strsplit(ten, "_") # splits cord on inserted graphic symbol from above
        [[i]]
        [ane] "Ja"  "nua" "ry"

        paste(rev(unlist(strsplit(x, NULL))), collapse="") # reverses character string by splitting first all characters into vector fields and and so collapsing them with paste
        [ane] "yr_aun_aJ"


        Example for importing specific lines in a file with a regular expression. The following example demonstrates the retrieval of specific lines from an external file with a regular expression. First, an external file is created with the cat role, all lines of this file are imported into a vector with readLines, the specific elements (lines) are and so retieved with the grep function, and the resulting lines are separate into vector fields with strsplit.

        cat(month.proper noun, file="zzz.txt", sep="\n")
        ten <- readLines("zzz.txt")
        ten <- x[c(grep("^J", as.character(x), perl = TRUE))]
        t(as.information.frame(strsplit(x, "u")))
                        [,1]  [,2]
        c..Jan....ary.. "Jan" "ary"
        c..J....ne..    "J"   "ne"
        c..J....ly..    "J"   "ly"


        Interpreting Grapheme String as Expression

        Case

        mylist <- ls() # generates vector of object names in session
        mylist[1] # prints name of 1st entry in vector simply does not execute it as expression that returns values of tenth object
        become(mylist[1]) # uses 1st entry name in vector and executes it as expression
        eval(parse(text=mylist[1])) # alternative approach to obtain similar result

        Time, Appointment and Sleep

        Example

        system.time(ls()) # returns CPU (and other) times that an expression used, here ls()
           user  system elapsed
        0       0       0

        date() # returns the current system date and time
        [i] "Wednesday December eleven 15:31:17 2012"

        Sys.sleep(1) # break execution of R expressions for a given number of seconds (eastward.g. in loop)


        Calling External Software with Organization Control

        The arrangement command allows to phone call any command-line software from inside R on Linux, UNIX and OSX systems.

        arrangement("...") # provide nether '...' command to run external software e.m. Perl, Python, C++ programs

        Related utilities on Windows operating systems

        x <- trounce("dir", intern=T) # reads current working directory and assigns to file
        trounce.exec("C:/Documents and Settings/Administrator/Desktop/my_file.txt") # opens file with associated program


        Miscellaneous Utilities

        (1) Batch import and export of many files.
        In the following example all file names ending with *.txt in the current directory are first assigned to a list (the '$' sign is used to ballast the match to the end of a string). 2d, the files are imported one-past-one using a for loop where the original names are assigned to the generated information frames with the assign role. Consult help with ?read.table to sympathize arguments row.names=1 and comment.char = "A". Third, the data frames are exported using their names for file naming and appending the extension *.out.

        files <- list.files(pattern=".txt$")
        for(i in files) {
        ten <- read.table(i, header=TRUE, comment.char = "A", sep="\t")
        assign(i, x)
        impress(i)
        write.tabular array(x, paste(i, c(".out"), sep=""), quote=FALSE, sep="\t", col.names = NA)
        }

        (2) Running Web Applications (basics on designing spider web customer/crawling/scraping scripts in R)
        Instance for obtaining MW values for peptide sequences from the EXPASY'southward pI/MW Tool web folio.

        myentries <- c("MKWVTFISLLFLFSSAYS", "MWVTFISLL", "MFISLLFLFSSAYS")
        myresult <- Zip
        for(i in myentries) {
        myurl <- paste("http://ca.expasy.org/cgi-bin/pi_tool?poly peptide=", i, "&resolution=monoisotopic", sep="")
        10 <- url(myurl)
        res <- readLines(x)
        close(x)
        mylines <- res[grep("Theoretical pI/Mw:", res)]
        myresult <- c(myresult, every bit.numeric(gsub(".*/ ", "", mylines)))
        print(myresult)
        Sys.sleep(ane) # halts process for i sec to give web service a break
        }
        final <- data.frame(Pep=myentries, MW=myresult)
        true cat("\n The MW values for my peptides are:\due north")
        final
                         Pep      MW
        one MKWVTFISLLFLFSSAYS 2139.xi
        2          MWVTFISLL 1108.sixty
        3     MFISLLFLFSSAYS 1624.82

                              

        Running R Programs

        (1) Executing an R script from the R panel

        source("my_script.R")

        (2.1) Syntax for running R programs from the command-line. Requires in first line of my_script.R the following argument: #!/usr/bin/env Rscript

        $ Rscript my_script.R # or only ./myscript.R later on making file executable with 'chmod +x my_script.R'

        All commands starting with a '$' sign demand to exist executed from a Unix or Linux shell.

        (2.2) Alternatively, 1 tin use the following syntax to run R programs in BATCH mode from the control-line.

        $ R CMD BATCH [options] my_script.R [outfile]

        The output file lists the commands from the script file and their outputs. If no outfile is specified, the name used is that of infile and .Rout is appended to outfile. To stop all the usual R command line data from being written to the outfile, add this equally first line to my_script.R file: options(echo=FALSE). If the control is run similar this R CMD BATCH --no-save my_script.R, then nil will be saved in the .Rdata file which can become oft very big. More on this can exist found on the aid pages: $ R CMD BATCH --aid or ?BATCH.

        (ii.3) Another alternative for running R programs as silently as possible.

        $ R --slave < my_infile > my_outfile

        Statement --slave makes R run as 'quietly' equally possible.

        (3) Passing Control-Line Arguments to R Programs
        Create an R script, here named test.R, like this one:

        ######################
        myarg <- commandArgs()
        print(iris[ane:myarg[half-dozen], ])
        ######################


        Then run it from the command-line like this:

        $ Rscript test.R 10

        In the given case the number 10 is passed on from the command-line as an argument to the R script which is used to return to STDOUT the outset 10 rows of the iris sample information. If several arguments are provided, they will be interpreted every bit one string that needs to be split information technology in R with the strsplit office.


        (4) Submitting R script to a Linux cluster via Torque
        Create the following shell script my_script.sh

        #################################
        #!/bin/bash
        cd $PBS_O_WORKDIR
        R CMD BATCH --no-save my_script.R
        #################################

        This script doesn't demand to accept executable permissions. Utilize the following qsub command to send this shell script to the Linux cluster from the directory where the R script my_script.R is located. To utilize several CPUs on the Linux cluster, ane can separate the input information into several smaller subsets and execute for each subset a separate process from a dedicated directory.

        $ qsub my_script.sh

        Here is a brusk R script that generates the required files and directories automatically and submits the jobs to the nodes: submit2cluster.R. For more details, see also this 'Tutorial on Parallel Programming in R' by Hanna Sevcikova

        (v) Submitting jobs to Torque or any other queuing/scheduling system via the BatchJobs package. This package provides 1 of the most advanced resource for submitting jobs to queuing systems from within R. A related parcel is BiocParallel from Bioconductor which extends many functionalities of BatchJobs to genome data analysis. Useful documentation for BatchJobs: Technical Report, GitHub folio, Slide Show, Config samples.

        library(BatchJobs)
        loadConfig(conffile = ".BatchJobs.R")
        ## Loads configuration file. Here .BatchJobs.R containing only this line:
                ## cluster.functions <- makeClusterFunctionsTorque("torque.tmpl")
                ## The template file torque.tmpl is expected to be in the current working
                ## director. It can be downloaded from here:
                ## https://github.com/tudo-r/BatchJobs/blob/master/examples/cfTorque/simple.tmpl

        getConfig() # Returns BatchJobs configuration settings
        reg <- makeRegistry(id="BatchJobTest", work.dir="results")
        ## Constructs a registry object. Output files from R will be stored under directory "results",
        ## while the

        standard objects from BatchJobs will be stored in the directory "BatchJobTest-files" .
        print(reg)

        ## Some test function
        f <- part(x) {
                system("ls -al >> test.txt")
                ten
        }

        ## Adds jobs to registry object (here reg)
        ids <- batchMap(reg, fun=f, one:10)
        print(ids)
        showStatus(reg)

        ## Submit jobs or chunks of jobs to batch system via cluster function
        done <- submitJobs(reg, resources=list(walltime=3600, nodes="1:ppn=four", retention="4gb"))

        ## Load results from BatchJobTest-files/jobs/01/1-result.RData
        loadResult(reg, 1)


        Object-Oriented Programming (OOP)

        R supports 2 systems for object-oriented programming (OOP). An older S3 organisation and a more than recently introduced S4 system. The latter is more formal, supports multiple inheritance, multiple dispatch and introspection. Many of these features are not available in the older S3 organisation. In general, the OOP approach taken past R is to split up the class specifications from the specifications of generic functions (part-centric system). The post-obit introduction is restricted to the S4 system since information technology is present the preferred OOP method for R. More information virtually OOP in R tin be found in the following introductions: Vincent Zoonekynd's introduction to S3 Classes, S4 Classes in 15 pages, Christophe Genolini'southward S4 Intro, The R.oo bundle, BioC Course: Advanced R for Bioinformatics, Programming with R by John Chambers and R Programming for Bioinformatics by Robert Gentleman.

          Define S4 Classes

          (A) Ascertain S4 Classes with setClass() and new()

          y <- matrix(i:50, x, 5) # Sample data set
          setClass(Class="myclass",
          representation=representation(a="Whatever"),
          prototype=epitome(a=y[1:2,]), # Defines default value (optional)
          validity=function(object) { # Can exist defined in a carve up footstep using setValidity
          if(course(object@a)!="matrix") {
          return(paste("expected matrix, simply obtained", class(object@a)))
          } else {
          return(TRUE)
          }
          }
          )

          Table of Contents

          The setClass function defines classes. Its most important arguments are

          • Class: the name of the class
          • representation: the slots that the new form should have and/or other classes that this grade extends.
          • prototype: an object providing default data for the slots.
          • contains: the classes that this course extends.
          • validity, access, version : command arguments included for compatibility with S-Plus.
          • where: the environment to use to shop or remove the definition equally meta data.

          (B) The function new creates an example of a class (here myclass)

          myobj <- new("myclass", a=y)
          myobj
          An object of form "myclass"
          Slot "a":
          [,1] [,two] [,3] [,4] [,5]
          [1,]    1   11   21   31   41
          [ii,]    two   12   22   32   42
          ...

          new("myclass", a=iris) # Returns an fault message due to wrong input type (iris is data frame)
          Mistake in validObject(.Object) :
          invalid class "myclass" object: expected matrix, but obtained data.frame


          • Course: the name of the class
          • ...: Data to include in the new object with arguments according to slots in class definition.

          (C) A more generic mode of creating course instances is to ascertain an initialization method (details below)

          setMethod("initialize", "myclass", part(.Object, a) {
          .Object@a <- a/a
          .Object
          })
          new("myclass", a = y)
          [1] "initialize"
          new("myclass", a = y)> new("myclass", a = y)
          An object of class "myclass"
          Slot "a":
          [,1] [,2] [,3] [,4] [,5]
          [1,]    one    i    1    one    ane
          [2,]    1    one    1    1    1
          ...

          (D) Usage and helper functions

          myobj@a # The '@' extracts the contents of a slot. Usage should be limited to internal functions!
          initialize(.Object=myobj, a=as.matrix(cars[1:3,])) # Creates a new S4 object from an onetime 1.
          # removeClass("myclass") # Removes object from electric current session; does non apply to associated methods.


          (E) Inheritance: allows to ascertain new classes that inherit all backdrop (east.g. data slots, methods) from their existing parent classes

          setClass("myclass1", representation(a = "character", b = "grapheme"))
          setClass("myclass2", representation(c = "numeric", d = "numeric"))
          setClass("myclass3", contains=c("myclass1", "myclass2"))
          new("myclass3", a=letters[1:4], b=messages[1:4], c=1:4, d=4:1)
          An object of class "myclass3"
          Slot "a":
          [1] "a" "b" "c" "d"

          Slot "b":
          [1] "a" "b" "c" "d"

          Slot "c":
          [ane] one 2 3 iv

          Slot "d":
          [1] iv 3 2 i

          getClass("myclass1")
          Class "myclass1" [in ".GlobalEnv"]

          Slots:

          Proper name:          a         b
          Course: character character

          Known Subclasses: "myclass3"

          getClass("myclass2")
          Form "myclass2" [in ".GlobalEnv"]

          Slots:

          Proper name:        c       d
          Form: numeric numeric

          Known Subclasses: "myclass3"

          getClass("myclass3")
          Class "myclass3" [in ".GlobalEnv"]

          Slots:

          Name:          a         b         c         d
          Class: character character   numeric   numeric

          Extends: "myclass1", "myclass2"


          The argument contains allows to extend existing classes; this propagates all slots of parent classes.

          (F) Coerce objects to some other course

          setAs(from="myclass", to="character", def=function(from) as.character(every bit.matrix(from@a)))
          as(myobj, "character")
          [ane] "ane"  "two"  "3"  "four"  "five"  "half-dozen"  "seven"  "8"  "9"  "10" "11" "12" "13" "14" "fifteen"
          ...

          (G) Virtual classes are constructs for which no instances will exist or can be created. They are used to link together classes which may have distinct representations (e.grand. cannot inherit from each other) merely for which one wants to provide similar functionality. Ofttimes it is desired to create a virtual grade and to and so have several other classes extend information technology. Virtual classes can exist defined by leaving out the representation argument or including the class VIRTUAL:

          setClass("myVclass")
          setClass("myVclass", representation(a = "character", "VIRTUAL"))

          • getClass("myclass")
          • getSlots("myclass")
          • slotNames("myclass")
          • extends("myclass2")

          Assign Generics and Methods

          Assign generics and methods with setGeneric() and setMethod()

          (A) Accessor part (to avoid usage of '@')

          setGeneric(name="acc", def=function(x) standardGeneric("acc"))
          setMethod(f="acc", signature="myclass", definition=function(ten) {
          render(x@a)
          })
          acc(myobj)
                [,1] [,2] [,3] [,four] [,5]
          [1,]    1   11   21   31   41
          [2,]    2   12   22   32   42
          ...

          setGeneric(name="acc<-", def=function(x, value) standardGeneric("acc<-"))
          setReplaceMethod(f="acc", signature="myclass", definition=function(ten, value) {
          x@a <- value
          return(x)
          })
          ## Later this the following supervene upon operations with 'acc' piece of work on new object class
          acc(myobj)[1,one] <- 999 # Replaces get-go value
          colnames(acc(myobj)) <- letters[1:5] # Assigns new column names
          rownames(acc(myobj)) <- messages[ane:10] # Assigns new row names
          myobj
          An object of class "myclass"
          Slot "a":
          a  b  c  d  eastward
          a 999 11 21 31 41
          b   2 12 22 32 42

          ...

          (B.2) Replacement method using "[" operator ([<-)

          setReplaceMethod(f="[", signature="myclass", definition=function(x, i, j, value) {
          x@a[i,j] <- value
          return(x)
          })
          myobj[1,2] <- 999
          myobj
          An object of class "myclass"
          Slot "a":
          a   b  c  d  east
          a 999 999 21 31 41
          b   2  12 22 32 42
          ...

          (C) Define beliefs of "[" subsetting operator (no generic required!)

          setMethod(f="[", signature="myclass",
          definition=function(x, i, j, ..., drop) {
          x@a <- x@a[i,j]
          return(ten)
          })
          myobj[one:2,] # Standard subsetting works now on new class
          An object of class "myclass"
          Slot "a":
          a   b  c  d  e
          a 999 999 21 31 41
          b   2  12 22 32 42

          ...

          (D) Define impress beliefs

          setMethod(f= show", signature="myclass", definition=office(object) {
          true cat("An instance of ", "\"", class(object), "\"", " with ", length(acc(object)[,1]), " elements", "\north", sep="")
          if(length(acc(object)[,1])>=5) {
          print(every bit.data.frame(rbind(acc(object)[one:2,], ...=rep("...", length(acc(object)[ane,])),
          acc(object)[(length(acc(object)[,1])-1):length(acc(object)[,ane]),])))
          } else {
          print(acc(object))
          }})
          myobj # Prints object with custom method
          An case of "myclass" with 10 elements
          a   b   c   d   e
          a   999 999  21  31  41
          b     2  12  22  32  42
          ... ... ... ... ... ...
          i     9  nineteen  29  39  49
          j    10  20  30  40  50

            (E) Define a information specific function (here randomize row order)

            setGeneric(name="randomize", def=function(x) standardGeneric("randomize"))
            setMethod(f="randomize", signature="myclass", definition=function(x) {
            acc(x)[sample(1:length(acc(x)[,1]), length(acc(x)[,1])), ]
            })
            randomize(myobj)
            a   b  c  d  e
            j  10  20 thirty 40 50
            b   2  12 22 32 42
            ...

            (F) Ascertain a graphical plotting function and allow user to access it with generic plot function

            setMethod(f="plot", signature="myclass", definition=role(ten, ...) {
            barplot(as.matrix(acc(x)), ...)
            })
            plot(myobj)

            (Grand) Functions to inspect methods

            • showMethods(course="myclass")
            • findMethods("randomize")
            • getMethod("randomize", signature="myclass")
            • existsMethod("randomize", signature="myclass")

            Edifice R Packages

            To get familiar with the structure, building and submission process of R packages, users should carefully read the documentation on this topic available on these sites:

            • Writing R Extensions, R spider web site
            • R Packages, by Hadley Wickham
            • R Package Primer, by Karl Broman
            • Bundle Guidelines, Bioconductor
            • Advanced R Programming Class, Bioconductor

            Brusque Overview of Package Building Procedure


            (A) Automated parcel edifice with the package.skeleton function:

            package.skeleton(name="mypackage", code_files=c("script1.R", "script2.R"))

            Note: this is an optional only very user-friendly function to get started with a new package. The given instance will create a directory named mypackage containing the skeleton of the packet for all functions, methods and classes divers in the R script(due south) passed on to the code_files argument. The basic structure of the bundle directory is described here. The package directory will also incorporate a file named 'Read-and-delete-me' with the following instructions for completing the package:

            • Edit the assistance file skeletons in man, perchance combining assist files for multiple functions.
            • Edit the exports in NAMESPACE, and add necessary imports.
            • Put whatever C/C++/Fortran code in src.
            • If you have compiled code, add a useDynLib() directive to NAMESPACE.
            • Run R CMD build to build the package tarball.
            • Run R CMD check to bank check the package tarball.
            • Read Writing R Extensions for more than data.

            (B) Once a bundle skeleton is available one can build the package from the command-line (Linux/OS 10):

            This will create a tarball of the bundle with its version number encoded in the file proper name, e.g.: mypackage_1.0.tar.gz.
            Subsequently, the package tarball needs to be checked for errors with:

            $ R CMD bank check mypackage_1.0.tar.gz

            All bug in a parcel'southward source code and documentation should be addressed until R CMD check returns no mistake or warning messages anymore.

            (C) Install parcel from source:

            Linux:

            install.packages("mypackage_1.0.tar.gz", repos=NULL)

            OS X:

            install.packages("mypackage_1.0.tar.gz", repos=NULL, blazon="source")

            Tabular array of Contents

            Windows requires a zip archive for installing R packages, which tin be most conveniently created from the command-line (Linux/Bone 10) by installing the parcel in a local directory (here tempdir) and and so creating a zip archive from the installed package directory:

            $ mkdir tempdir
            $ R CMD INSTALL -l tempdir mypackage_1.0.tar.gz
            $ cd tempdir
            $ zip -r mypackage mypackage

            ## The resulting mypackage.nothing annal tin be installed under Windows like this:
            install.packages("mypackage.nada", repos=NULL)

            Table of Contents

            This procedure simply works for packages which do non rely on compiled lawmaking (C/C++). Instructions to fully build an R packet nether Windows tin can be found here and here.

            (D) Maintain/expand an existing package:

            • Add new functions, methods and classes to the script files in the ./R directory in your package
            • Add together their names to the NAMESPACE file of the package
            • Boosted *.Rd assist templates can be generated with the prompt*() functions similar this:

            source("myscript.R") # imports functions, methods and classes from myscript.R
            prompt(myfct) # writes assistance file myfct.Rd
            promptClass("myclass") # writes file myclass-grade.Rd
            promptMethods("mymeth") # writes help file mymeth.Rd

              • The resulting *.Rd help files can exist edited in a text editor and properly rendered and viewed from within R like this:

            library(tools)
            Rd2txt("./mypackage/human/myfct.Rd") # renders *.Rd files as they wait in terminal help pages
            checkRd("./mypackage/human being/myfct.Rd") # checks *.Rd help file for issues

            (E) Submit parcel to a public repository

            Download on of the above exercise files, so start editing this R source file with a programming text editor, such as Vim, Emacs or one of the R GUI text editors. Here is the HTML version of the code with syntax coloring.


            Sample Scripts

            Batch Operations on Many Files

            ## (1) Start R from an empty test directory
            ## (2) Create some files as sample information

            for(i in month.proper name) {
            mydf <- information.frame(Month=month.name, Rain=runif(12, min=10, max=100), Evap=runif(12, min=1000, max=2000))
            write.table(mydf, file=paste(i , ".infile", sep=""), quote=F, row.names=F, sep="\t")
            }

            ## (3) Import created files, perform calculations and consign to renamed files
            files <- list.files(pattern=".infile$")
            for(i in seq(along=files)) { # starting time for loop with numeric or character vector; numeric vector is often more than flexible
            x <- read.table(files[i], header=TRUE, row.names=1, comment.char = "A", sep="\t")
            x <- data.frame(x, sum=apply(x, 1, sum), mean=apply(ten, i, mean)) # calculates sum and mean for each data frame
            assign(files[i], 10) # generates information frame object and names it after content in variable 'i'
            print(files[i], quote=F) # prints loop iteration to screen to check its condition
            write.table(ten, paste(files[i], c(".out"), sep=""), quote=Fake, sep="\t", col.names = NA)
            }

            ## (4) Same as above, but file naming past alphabetize data frame. This way one can organize file names past external table.
            name_df <- data.frame(Old_name=sort(files), New_name=sort(month.abb))
            for(i in seq(along=name_df[,one])) {
            10 <- read.table(as.vector(name_df[i,ane]), header=Truthful, row.names=1, comment.char = "A", sep="\t")
            x <- information.frame(x, sum=apply(x, 1, sum), hateful=apply(x, ane, mean))
            assign(equally.vector(name_df[i,2]), ten) # generates data frame object and names information technology after 'i' entry in column 2
            print(every bit.vector(name_df[i,1]), quote=F)
            write.table(ten, paste(equally.vector(name_df[i,two]), c(".out"), sep=""), quote=FALSE, sep="\t", col.names = NA)
            }

            ## (5) Append content of all input files to one file.
            files <- list.files(pattern=".infile$")
            all_files <- data.frame(files=NULL, Month=NULL, Gain=NULL , Loss=Zilch, sum=Nothing, mean=NULL) # creates empty data frame container
            for(i in seq(forth=files)) {
            x <- read.table(files[i], header=TRUE, row.names=i, comment.char = "A", sep="\t")
            10 <- data.frame(x, sum=employ(x, 1, sum), mean=use(x, i, hateful))
            x <- data.frame(file=rep(files[i], length(x[,1])), x) # adds file tracking column to 10
            all_files <- rbind(all_files, x) # appends data from all files to data frame 'all_files'
            write.tabular array(all_files, file="all_files.xls", quote=Faux, sep="\t", col.names = NA)
            }

            ## (6) Write the above code into a text file and execute it with the commands 'source' and 'BATCH'.
            source("my_script.R") # execute from R panel
            $ R CMD BATCH my_script.R # execute from shell


            Big-scale Array Analysis

            Sample script to perform big-scale expression array analysis with circuitous queries: lsArray.R. To demo what the script does, run it like this:

            source("http://kinesthesia.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/lsArray.R")

            Graphical Procedures: Characteristic Map Instance

            Script to plot characteristic maps of genes or chromosomes: featureMap.R. To demo what the script does, run it like this:

            source("http://kinesthesia.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/featureMap.txt")

            Sequence Analysis Utilities

            Includes sequence batch import, sub-setting, design matching, AA Composition, NEEDLE, PHYLIP, etc. The script 'sequenceAnalysis.R' demonstrates how R can be used as a powerful tool for managing and analyzing large sets of biological sequences. This case also shows how easy it is to integrate R with the EMBOSS projection or other external programs. The script provides the post-obit functionality:

            • Batch sequence import into R data frame
            • Motif searching with hit statistics
            • Analysis of sequence composition
            • All-confronting-all sequence comparisons
            • Generation of phylogenetic trees

            To demonstrate the utilities of the script, users can simply execute information technology from R with the following source control:

            source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/sequenceAnalysis.txt")

            Pattern Matching and Positional Parsing of Sequences

            Functions for importing sequences into R, retrieving reverse and complement of nucleotide sequences, pattern searching, positional parsing and exporting search results in HTML format: patternSearch.R. To demo what the script does, run it similar this:

            source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/patternSearch.R")

            Identify Over-Represented Strings in Sequence Sets

            Functions for finding over-represented words in sets of Deoxyribonucleic acid, RNA or protein sequences: wordFinder.R. To demo what the script does, run information technology similar this:

            source("http://kinesthesia.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/wordFinder.R")

            Translate DNA into Protein

            Script 'translateDNA.R' for translating NT sequences into AA sequences (required codon table). To demo what the script does, run it like this:

            source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/translateDNA.R")

            Subsetting of Construction Definition Files (SDF)

            Script for importing and subsetting SDF files: sdfSubset.R. To demo what the script does, run information technology like this:

            source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/sdfSubset.R")

            Managing Latex BibTeX Databases

            Script for importing BibTeX databases into R, retrieving the individual references with a full-text search function and viewing the results in R or in HubMed: BibTex.R. To demo what the script does, run information technology like this:

            source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/BibTex.R")

            Loan Payments and Amortization Tables

            This script calculates monthly and almanac mortgage or loan payments, generates amortization tables and plots the results: mortgage.R. A Shiny App using this function has been created past Antoine Soetewey here. To demo what the script does, run it like this:

            source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/mortgage.R")

            Course Consignment: GC Content, Reverse & Complement

            Apply the above data to write a role that calculates for a set up of DNA sequences their GC content and generates their reverse and complement. Hither are some useful commands that can be incorporated in this function:

            ## Generate an example data frame with ID numbers and Dna sequences
            fx <- function(test) {
            10 <- as.integer(runif(20, min=one, max=5))
            x[x==one] <- "A"; x[x==ii] <- "T"; x[x==3] <- "Chiliad"; x[10==4] <- "C"
            paste(ten, sep = "", plummet ="")
            }
            z1 <- c()
            for(i in 1:l) {
            z1 <- c(fx(i), z1)
            }
            z1 <- data.frame(ID=seq(along=z1), Seq=z1)
            z1

            ## Write each grapheme of sequence into divide vector field and reverse its society
            my_split <- strsplit(as.character(z1[1,ii]),"")
            my_rev <- rev(my_split[[one]])
            paste(my_rev, collapse="")

            ## Generate the sequence complement by replacing Thousand|C|A|T by C|One thousand|T|A
            ## Use 'apply' or 'for loop' to apply the above operations to all sequences in sample data frame 'z1'
            ## Calculate in the same loop the GC content for each sequence using the post-obit command

            table(my_split[[i]])/length(my_split[[one]])


            • Serbo-Croation version translated past Jovana Milutinovich

            mejiawhater1965.blogspot.com

            Source: http://manuals.bioinformatics.ucr.edu/home/programming-in-r

            0 Response to "For Loop to Read in Several Files R"

            Publicar un comentario

            Iklan Atas Artikel

            Iklan Tengah Artikel 1

            Iklan Tengah Artikel 2

            Iklan Bawah Artikel