Processing of excel/csv files

Deciding how to deal with csv and excel files used for analysis

What are the files?

  • Cell-Profiler output

  • Sarcomere App output

  • ImageJ measurements

  • Sota tool and other tools

  • Ilastik output

  • One step excel files output

  • RNA-seq output

Use cases

1. Stacking of content of files on top of each other

2. Summary statistics for a group defined in column

3. Get unique values "levels" from a specified column(s)

4. Split one file into multiple based on value in a column "level"

5. Load xlsx file, process, save as another list in the file

6. Generate a table in word sorted according to specified column

7. Generate a report of what was done

Why not to do it in R?

  • Need to open R studio, takes long to open, requires updates all the time

  • takes time to look at different csvs in RStudio

  • Do not know how to operate with excel files

  • it is difficult to specify the files I want to operate on (globbing in command line is much more easy)

  • I do not know how to use the Rscripts from the command line and supply arguments, i do know how to do it in bash

Why do it in R ?

  • I can manipulate the values (addition/substraction etc)

  • I can do statistics (t-tests etc.)

  • I can plot

  • R community is bigger, easier to share and get recognition

  • can make it into a web-app

How will I share it?

Last updated