HTSEQ Logo
Your session is inactive. Please login.
File Format Help
Search   

Contents


Experiment Results Files

The database generates .pcl files when data are retrieved. After clustering, a .cdt file is generated, and in addition, .gtr and .atr files may also be generated. The complete dataset without any processing can also be downloaded as an Excel file.

  • pcl file format

    The pcl file is a tab-delimited pre-clustering file. The first three columns are as follows:


    • Name
      This column will contain a name ascribed to the entity on that row, such as ORF name, or CLONEID. The column itself can be named anything, but by convention is named YORF when it contains yeast ORF names, CLID when it contains cloneids, and LUID when it contains LUIDs. These name are simply so that after clustering, the Treeview software can use the contents of the column in URLs without additional configuration. This column MUST contain some text on every row.
    • Description
      This column can contain descriptive information about the entity, eg process or function, or gene symbol. It too can be named anything. It can optionally be left blank, but the column itself must be present.
    • GWEIGHT
      This column allows you to weight genes with different weights, for instance if a gene appears on an array twice, you may want to give them a weight of 0.5 each. For the most part people leave this column with a value of 1 for every gene. This column must be present, and each row must have an entry.

    In addition the file must begin with the following two rows:


    • Row 1
      This contains the column headers as described above for columns 1, 2 and 3, then contains the experiment names for all the data columns that exist in the file. Each data column must have a text entry as a name for that column.
    • Row 2
      This is the EWEIGHT row. The entry in the first column for this row should say EWEIGHT, then for each experiment, there should be an EWEIGHT value. This will usually be 1, but if the same experiment is duplicated twice, you may want to give these repeats an EWEIGHT of 0.5.

      The remaining cells in the file contain the actual data, such that the row and column specifies to which gene and which experiment a particular piece of data corresponds. If you had modified, or created your pcl file in Excel, it would look something like this:

      You should then choose Save As... from the File menu, and elect to the the file as type Text (Tab delimited), as indicated below:





    • The Data

      In general the pcl file will contain log-transformed data, which is needed for clustering to work properly.

  • cdt file format

    When you cluster a .pcl file you will generate a .cdt (clustered data table) file, which will contain the original data, but reordered, to reflect the clustering. In addition, if you clustered by genes, you will get a .gtr file (gene tree), and if you clustered by experiments you will get a .atr file (array tree). These tree file reflect the history of how the cluster was built, and can be used to contruct how the tree(s) should look.

  • gtr file format

    The .gtr (gene tree) file records the order in which the genes (rows) were joined during clustering.

  • atr file format

    The .atr (array tree) file records the order in which the arrays (columns) were joined during clustering.