wget– getting your data.  All sequence center are going to send your data in this firmat data will be in the cloud or remote server

gzip,mv,cp– Organizing, managing, verifying and backing up data

gzip,cat– Getting it ready for analysis

 

Super User (sudo) command

-The command is used when your user account needs elevated permission to perform a task, this is usually done when installing a new program or installing updates

-Downloading files from the internet

-The wget command will go to a website ad=nd download the content, used for downloading sequences from inline databases

-This can also be used with files stored on drop-box

-Right click automatically pastes it

Copy, Move and remove

  • The mv command will move a file from its source loation to a new location this command is also used to rename files …. mv combine.txt../combine.txt
  • The cp command will copy files from a source to a destination…. cp test.txt../test.txt
  • The rm command will delete a file…. rm test.txt
  • To remove a folder you qill use the commands rm with the options -rf test_folder

Redirecting Output

  • By default, the output of the commands you type into Bash are directed to the display
  • You can use the “>” sign to redirect the output to a file…. ls>list.txt
  • If you want to add output to the same file, use “>>” This will append data to the existing file ….cat>>test.txt

Viewing File Content using less

To view the contents of a file in the Terminal, use the less command…. less list_files.txt

  • The command less similar to cat, but allows you to control the screen flow more precisely using the arrow keys

The head and tail commands

  • The commands head and tail let you see the first 10 lines of a file. This is useful when looking at big sequencing data

Working with compressed files using gzip

  • Compression is when a file is reduced by removing redundant information, to compress a file Sequences are typically compressed when they are placed in databases, so may need to decompress the data to few it

Grep Command

  • The grep command searches a file for a word you specify, There are iotins you can place in the command to return information about what as found after the search work
  • Grep will be used extensively after Prokka as it will allow us to find sequences for specific genes
  • Use the -A option to print trailing lines after the search term

Intro to file formats

1.fasta (programs will process raw data and output in this format)

2. fastq.gz (raw sequences data will come in this format

3. fastq (Same as fastq.gz but not compressed)

FASTQ File Format Analysis

  • Looking at X and Y coordinate within the tile
  • The “+” indicates the break between sequence and quality
  • Y if the read is filtered, N otherwise
  • 0 when none of the control bits are on, otherwise it is an even number
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s