********************************************************************** Mouse OneArray(TM) Annotation Release 1.3, 2008-04-23 ----------------------------------------------------- PhalanxBio, Inc. 1400 Page Mill Road, Building B Palo Alto, CA 94304 U.S.A. Tel: 650-320-8669 Fax: 650-320-8488 Phalanx Biotech Group, Inc. 6F, No. 6, Technology Road 5, Hsinchu Science Park, Hsinchu 30078, Taiwan, R.O.C. Tel: 886-3-5781168 Fax: 886-3-5785099 ********************************************************************** DISTRIBUTION NOTICE All content copyright 2008 Phalanx Biotech Group, Inc., herein referred to as Phalanx. All rights reserved. This document describes the format and content of the Probe Annotation File for the Phalanx Mouse OneArray(TM) (also referred to as 'MOA'). Mouse OneArray(TM) and OneArray(TM) are trademarks of Phalanx Biotech Group, Inc. Additional information: http://www.phalanxbiotech.com/ If you have questions or comments please send them via email to: feedback@phalanxbiotech.com Mouse OneArray Annotation Release README ---------------------------------------- Summary: Annotation based on: Ensembl Mus_musculus 49_37b Group 1 - gene specific: exon 22341 Group 2 - intron hit 2197 Group 3 - intergenic 4053 Group 4 - multi-gene hits 1084 Group 5 - no hit to genome 221 Group 6 - >200 hits to genome 26 ----- Total Mouse OneArray probes: 29922 ====================================================================== TABLE OF CONTENTS ====================================================================== 1. INTRODUCTION 1.1 Release 1.3 2. CONTENT 2.1 Methodology 2.2 Probe Annotation Groups 2.3 Multi Gene Hits 3. ORGANIZATION OF DATA FILES 3.1 Files Included in the MOA Annotation Distribution 3.2 Probe Annotation File (PAF) 3.3 General Feature Format Version 2 (GFF2) 3.4 UCSC Browser Extensible Data format (BED) 4. CREATING CUSTOM TRACKS IN GENOME BROWSERS 4.1 Ensembl Genome Browser 4.2 UCSC Genome Browser 5. ANNOTATION ADMINISTRATION 5.1 Release Versioning 5.2 Release Schedule 5.3 Disclaimer ====================================================================== 1. INTRODUCTION ====================================================================== Phalanx microarray annotations link probe sequences to their target gene and transcripts in the Ensembl database. 1.1 Release 1.3 --------------- The 1.3 Mouse OneArray (MOA) annotation distribution contains up to date information regarding the targets of MOA probe sequences. The latest annotations will always be available for download at the following FTP site: ftp://ftp.phalanxbiotech.com/pub/annoatations/moa/current/ ====================================================================== 2. CONTENT ====================================================================== 2.1 Methodology --------------- The annotation pipeline has the following stages: * Align probe sequences to the Mouse genome The BLAT alignment utility was used to align the MOA probe sequences to the mouse genome, using a 90% minimum identity cutoff. The -oneOff parameter was set to 1, to allow for a one base pair mismatch in the BLAT tile. Probes passing the BLAT alignment filter were further filtered to remove probes that were not considered as potential cross-hybridization candidates. Probes were excluded if the coverage was less than 25bp at 100% identity. * Transcript assignment Each probe was then assigned to a transcript based on the chromosome positions annotated in the Ensembl database. Probes not assigned to transcripts are considered 'intergenic'. * Probe grouping The probe assignments were further grouped to break down matches based on their location wthin a gene, or proximity to a gene. These groups are described in section 2.2. 2.2 Probe Annotation Groups --------------------------- 1. Group 1 - gene specific: exon 2. Group 2 - intron hit 3. Group 3 - intergenic (>2kb from a gene) 4. Group 4 - multi-gene hits 5. Group 5 - no hit to genome 6. Group 6 - >200 hits to genome 2.3 Multi-gene Hits ------------------- Group 4 probes hit to multiple distinct genes. This group includes probes that may cross-hybridize to gene families or unrelated genes. All Ensembl gene identifiers are are included in the Ensembl_gene column of the probe annotation file. ====================================================================== 3. ORGANIZATION OF DATA FILES ====================================================================== 3.1 Files Included in the MOA Annotation Distribution ----------------------------------------------------- Files are compressed with the GNU zip (gzip) compression utility. README - Release information CHANGELOG - Track changes between releases CHECKSUMS - Checksum, file integrity check phalanx_moa_1.3.paf.gz - Probe Annotation File phalanx_moa_1.3.gff.gz - General Feature Format file phalanx_moa_1.3.bed.gz - Browser Extensible Data format 3.2 Probe Annotation File (PAF) ------------------------------- The Probe Annotation File format is a tab delimited text file. The following annotation fields are included: 1. probe_id - Phalanx Probe ID 2. group - See section 2.2 3. Ensembl_gene - Ensembl Gene ID 4. Ensembl_transcript - Ensembl Transcript ID(s) 5. chr_name - Chromosome name 6. gene_symbol - HGNC approved gene symbol(s) 7. gene_description - Description of gene product's function 8. go_term - Gene Ontology ID(s) 3.3 General Feature Format Version 2 (GFF2) ------------------------------------------- The General Feature Format Version 2 (GFF2 ) is used to store annotations based on their genomic coordinates. The GFF2 specification can be obtained at the following URL: http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml The GFF file in the Phalanx MOA annotation distribution contains the following fields: 1. seqname - chromosome name 2. source - set to "blat" 3. feature - set to "Phalanx_probe" 4. start - genomic start position of probe 5. end - genomic end position of probe 6. score - set to probe annotation group (1-4, section 2.2) 7. strand - set to either "+" or "-" 8. frame - set to "." 9. attribute - contains probe_id and Ensembl_gene identifier 3.4 UCSC Browser Extensible Data format (BED) --------------------------------------------- The Browser Extensible Data format (BED) is used by the UCSC Genome Browser to display custom annotation tracks. For instructions on using the BED file in the UCSC Genome Browser see section 4.2. The Phalanx annotation release provides a BED which includes the genomic locations of Phalanx MOA probes. The Phalanx BED file contains the following fields: 1. chrom - chromosome name 2. chromStart - genome start position of probe 3. chromEnd - genome end position of prob 4. name - Phalanx Probe ID 5. score - set to probe annotation group (1-4, section 2.2) 6. strand - set to either "+" or "-" 7. thickStart - same as chromStart 8. thickEnd - same as chromEnd 9. itemRgb - determined by annotation grouping 10. blockCount - set to 1 11. blockSizes - set to length of alignment 12. blockStarts - set to 1 More information on the BED file format can be found at the UCSC Genome Browser help pages: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BED ====================================================================== 4. CREATING CUSTOM TRACKS IN GENOME BROWSERS ====================================================================== 4.1 Ensembl Genome Browser -------------------------- The Ensembl Genome Browser uses the Distributed Annotation System (BioDAS) for displaying genomic annotations provided by third parties. Phalanx maintains a BioDAS server for our probe locations. 1. Open the Ensembl home page in your web browser http://www.ensembl.org/Homo_sapiens/index.html 2. In the search text box type a gene name, in this example we will use 'BRAC2', click "GO". 3. On the SearchView page click on the ContigView link next to the record labeled: "Ensembl protein_coding gene ENSG00000139618" 4. The ContigView page provides four levels of genome visualization: * Chromosome * Overview * Detailed view * Basepair view On the "Detailed view" pane, click on the "DAS Sources" drop down menu. When the menu expands, click on the link "Manage sources..." this will open a new window called "DasconfView". 5. On the left hand side of the "DasconfView" window, under Manage Sources, click on the "Add Data Source Link" 6. Step 1 of the DAS Wizard: click on the yellow button labeled "Enter server". 7. In the text box labeled "DAS Server URL" enter the following URL: http://das.phalanxbiotech.com:9000/das then click on the yellow button labeled "DSN list". 8. A list of Phalanx annotations will be displayed, select the check box next to 'phalanx_moa_1_3' and click "Next". 9. Step 2, accept the default choices by clicking "Next". 10. Step 3, at this point you may change the default color or name of the track, then click "Finished". 11. A page displaying all available DAS Sources is presented, scroll to the bottom, the DAS source 'phalanx_moa_1_3' should be visible. Click "Close Window" in the top right hand corner. 12. On the ContigView page Phalanx probes will now be displayed as a custom track. More information on BioDAS can be found at the following URL: http://www.biodas.org/ 4.2 UCSC Genome Browser ----------------------- The UCSC Genome Browser can display Phalanx probes as a custom track using the BED file described in section 3.4. 1. Open the "Add Custom Tracks" page in your web browser: http://genome.ucsc.edu/cgi-bin/hgCustom 2. Select Mouse genome, copy-and-paste the following URL: ftp://ftp.phalanxbiotech.com/pub/annotations/moa/current/phalanx_moa_1.3.bed.gz into the text box labeled "Paste URLs or data". 3. Click "Submit", and then on the "Manage Custom Tracks" page, click on the "go to genome browser" button. By default you will be returned to the Genome Browser home page, displaying Phalanx probe sequences on chromosome 1 with all other tracks hidden. ====================================================================== 5. ANNOTATION ADMINISTRATION ====================================================================== 5.1 Release versioning ---------------------- The following release versioning scheme will be used for this and future releases of probe set annotations. 0.1-0.8 - Alpha releases 0.9 - First beta release to the company and collaborators 0.9.1 0.9.2 0.9.x - Minor updates to the beta release based on user feedback 1.0 - Final release to customers 1.1 - Increments with bi-monthly Ensembl update 2.0 - Major version update e.g. addition of new annotation fields 5.2 Release schedule -------------------- The MOA probe annotations are based on the Ensembl core genomic database. Thus, the Phalanx bioinformatics team endeavors to maintain up-to-date versions of the annotations in sync with the release of the Ensembl database, which is currently every two months. 5.3 Disclaimer -------------- The data provided in the MOA annotation release is provided as-is by Phalanx Biotech Group, Inc. While Phalanx uses reasonable efforts to include accurate, complete and up-to-date data, Phalanx makes no warranties or representations as to the accuracy or completeness of the annotation data and assumes no liability or responsibility for any error or omission in the content. The recipient agrees to determine and assume responsibility for the applicability of the data provided.