2007-09-26

How to feed the program ?

The first challenge met is what kind of function I should use to feed the program.
Checked the data file, even with .xls; actually it's a tab separated text file, and thus I decided to use read.table() function.

Read.table has very flexible parameters allowed to set up for reading in titles/column names, change of separators(^t, ^p, "." and so on). However, R also provides several subtypes of read.table, which with mild difference of default.

One of them, read.delim(), fits to my needs. It reads in the first row as the column names (check it by using colnames()), recognizes "tab" as separator (type ?read.delim to see detail).

So far so good, right? Don't be happy too early..
Since our purpose is to visualize the data by heatmap, I'm so eager to see it first.. (what?! just put the fold change or something else on the back strove). I'm so naive that it's not a big deal to throw the data into the heatmap(). Here comes the problem... Orz

>t<-read.delim("sch.dat") >t
>colnames(t)
>dim(t)
>heatmap(t[2:7072,6:29],col=rev(heat.colors(100)))
Error in heatmap(t[2:7072, 6:29], col = rev(heat.colors(100))) :
'x' must be a numeric matrix
#since the column(1:5) are not numeric, they are probe/gene information
#It's so weird, the range I select t[2:7072, 6:29] should be ALL numeric.
#NOW learn a lesson what you saw is NOT what you think!!!
> t[1,7:8]
C2.A.. C3.A..
1 18291.58 19402.7
> is.numeric(t[1,7:8]) # check if it is numeric.. XD XD XD it's NOT.. Orz.. no wonder..
[1] FALSE
>NewSet <- array(rnorm(7071*25),dim=c(7071,25)) # put some number to array (just for sure it's numeric!!!)
>for(i in 1:7071){for(j in 1:24){NewSet[i,j]=t[i,5+j]}} # transfer data from original array to new one
>NewSet[1,2:3] # look almost identical to t[1,7:8]
[1] 18291.58 19402.70
>is.numeric(NewSet[1,2:3])
[1] TRUE
>rownames(NewSet)
NULL
>colnames(NewSet)
NULL
#now the data matrix has no row/col names, which is not good for us to know which is which!
#assign the names to them
>rownames(NewSet)=t[,1] # row=genes
>colnames(NewSet)<-c("c1","c2","c3","c4","c5","c6","c7","c8","c9","c10","c11","c12","p1","p2","p3","p4","p5","p6","p7","p8","p9","p10","p11","p12","") ## The reason I would like to assign the row/col names is the heatmap function can automatically pick up the names and show on the plot
## OK! now the data is ready for heatmap !!!!?

No comments: