Bio Namespace

Bio Namespace

The base namespace for .NET Bio

Classes

	Class	Description
	AATreeT	Arne Andersson Self Balancing Binary Search Tree.
	AATreeTKey, TValue	Dictionary like implementation using AATree.
	Alphabets	The currently supported and built-in alphabets for sequence items.
	AmbiguousDnaAlphabet	Ambiguous symbol in the DNA.
	AmbiguousProteinAlphabet	Ambiguous characters in the Protein.
	AmbiguousRnaAlphabet	Ambiguous symbols in the RNA.
	BigArrayT
	BigListT	Represents a strongly typed list of objects. Uses BigArray to store objects.
	CloneLibrary	Class created for reading data from resource file having library information. Singleton design pattern is used to create only one instance of class.
	DerivedSequence	This is a temporary implementation of DerivedSequence to support reversing and complementing a sequence.
	DifferenceNode	Node that tracks difference between the two sequences.
	DnaAlphabet	The basic alphabet that describes symbols used in DNA sequences. This alphabet allows not only for the four base nucleotide symbols, but also for various ambiguities, termination, and gap symbols. The character representations come from the NCBI4na standard and are used in many sequence file formats. The NCBI4na standard is the same as the IUPACna standard with only the addition of the gap character. The entries in this dictionary are: Symbol - Name A - Adenine C - Cytosine M - A or C G - Guanine R - G or A S - G or C V - G or V or A T - Thymine W - A or T Y - T or C H - A or C or T K - G or T D - G or A or T B - G or T or C - - Gap N - A or G or T or C.
	IndexedItemT	IndexedItem holds an item and its index. Index is a zero based position of item. This class is used in Sparse Sequence to get the known sequence items with their positions. This class implements IComparable interface and all comparisons are based on index and not on item.
	MetadataListItemT	It is common for a biological sequence file to contain lists of certain types of metadata, such as features or references, which can be stored as MetadataListItems. A MetadataListItem contains a key (which might not be unique) a free-text field of top level information (such as a sequence location), and a list of sub-items, each consisting of a key and a data field of type T. If the sub-items have unique keys, a string type can be used for T. But if the sub-item keys are not unique, a list of strings should be used for T.
	PlatformManager	Platform manager - this holds all the platform specific services.
	ProteinAlphabet	The basic alphabet that describes symbols used in sequences of amino acids that come from codon encodings of RNA. This alphabet allows for the twenty amino acids as well as a termination and gap symbol. The character representations come from the NCBIstdaa standard and are used in many sequence file formats. The NCBIstdaa standard has all the same characters as NCBIeaa and IUPACaa, but adds Selenocysteine, termination, and gap symbols to the latter. The entries in this dictionary are: Symbol - Extended Symbol - Name A - Ala - Alanine C - Cys - Cysteine D - Asp - Aspartic Acid E - Glu - Glutamic Acid F - Phe - Phenylalanine G - Gly - Glycine H - His - Histidine I - Ile - Isoleucine K - Lys - Lysine L - Leu - Leucine M - Met - Methionine N - Asn - Asparagine O - Pyl - Pyrrolysine P - Pro - Proline Q - Gln - Glutamine R - Arg - Arginine S - Ser - Serine T - Thr - Threoine U - Sel - Selenocysteine V - Val - Valine W - Trp - Tryptophan Y - Tyr - Tyrosine * - Ter - Termination - - --- - Gap.
	QualitativeSequence	This class holds quality scores along with the sequence data.
	RnaAlphabet	The basic alphabet that describes symbols used in RNA sequences. This alphabet allows not only for the four base nucleotide symbols, but also for various ambiguities, termination, and gap symbols. The symbol representations come from the NCBI4na standard and are used in many sequence file formats. The NCBI4na standard is the same as the IUPACna standard with only the addition of the gap symbol. The entries in this dictionary are: Symbol - Name A - Adenine C - Cytosine M - A or C G - Guanine R - G or A S - G or C V - G or V or A U - Uracil W - A or U Y - U or C H - A or C or U K - G or U D - G or A or U B - G or U or C - - Gap N - A or G or U or C.
	Sequence	This is the standard implementation of the ISequence interface. It contains the raw data that defines the contents of a sequence. Since Sequence uses enumerable of bytes that can be accessed as follows: Sequence mySequence = new Sequence(Alphabets.DNA, "GATTC"); foreach (Nucleotide nucleotide in mySequence) { ... } The results will be based on the Alphabet associated with the sequence. Common alphabets include those for DNA, RNA, and Amino Acids. For users who wish to get at the underlying data directly, Sequence provides a means to do this as well. This may be useful for those writing algorithms against the sequence where performance is especially important. For these advanced users access is provided to the encoding classes associated with the sequence.
	SequenceEqualityComparer	This class gives the Sequence Equality Comparer.
	SequenceRange	A SequenceRange holds the data necessary to represent a region within a sequence defined by its start and end index without necessarily holding any of the sequence item data. At a minimum and ID, start index, and end index are required. Additional metadata can be stored as well using a generic key value pair.
	SequenceRangeGrouping	A grouping of SequenceRange objects sorted by their ID values. The purpose of these groups is to allow a set of SequenceRange objects to be associated together by bucketing them into groups where each bucket has a unique SequenceRange ID and all SequenceRange objects within the bucket has that same ID.
	SequenceStatistics	SequenceStatistics is used to keep track of the number of occurrences of each symbol within a sequence.
	SimpleConsensusResolver	Calculate the consensus for a list of symbols using simple frequency fraction method. Normal (non-gap) symbols are given a weight of 100. The confidence of a symbol is the sum of weights for that symbol, divided by the total number of symbols occurring at that position. If symbols have confidence >= threshold, symbol corresponding to set of these high confidence symbols is used. If no symbol meets the threshold, symbol corresponding to set of all the symbols at that position is used. For ambiguous symbols, the corresponding set of base symbols are retrieved. And for frequency calculation, each base symbol is given a weight of (100 / number of base symbols).
	SnpItem	Represents a single nucleotide polymporphism (Snp) at a particular position for a certain chromosome, with the two possible allele values for that position.
	SparseSequence	SparseSequence can hold discontinuous sequence. Use this class for storing the sequence items with their known position from a long continuous sequence. This class uses SortedDictionary to store the sequence items with their position. Position is zero based indexes at which a sequence items are present in the original continues sequence. For example: To store sequence items at position 10, 101, 200, 1501 this class can be used as shown in the below code. // Create a SparseSequence by specifying the Alphabet. SparseSequence mySparseSequence= new SparseSequence(Alphabets.DNA); // By default count will be set to zero. To insert a sequence item at a position greater than zero, // Count has to be set to a value greater than the maximum position value. // If try to insert a sequence item at a position greater than the count an exception will occur. // You can limit the SparseSequence length by setting the count to desired value. In this example it will be 1502 as the maximum index is 1501. mySparseSequence.Count = 1502; // To access the value in a SparseSequence use Indexer or an Enumerator like below. // Accessing SparsesSequence using Indexer. byte seqItem1 = mySparseSequence [10] ; // this will return sequence item A. byte seqItem2 = mySparseSequence [1501] ; // this will return sequence item G. byte seqItem3 = mySparseSequence [102] ; // this will return null as there is no sequence item at this position. // Accessing SparsesSequence using Enumerator. foreach(byte seqItem in mySparseSequence) {…}
	StringListValidator	A validator for string values that has a specific list of allowed values.
	WordMatch	WordMatch stores the region of similarity between two sequences.

Structures

	Structure	Description
	CloneLibraryInformation	Stores Information of Library.
	DifferenceNodeCompareFeature	Structure that maintains node structure for feature list.

Interfaces

	Interface	Description
	IAlphabet	An alphabet defines a set of symbols common to a particular representation of a biological sequence. The symbols in these alphabets are those you would find as the individual sequence items in an ISequence variable. The symbols in an alphabet may represent a particular biological structure or they may represent information helpful in understanding a sequence. For instance gap symbol, termination symbol, and symbols representing items whose definition remains ambiguous are all allowed.
	IConsensusResolver	Framework to compute the consensus for a list of symbols For example, one can construct consensus for a set of aligned sequences in the following way: Sequence 1: A G T C G A Sequence 2: A G G C - A Sequence 3: A G G T G - Consensus : A G G C G A In the example here, we might choose the character that occurs maximum number of times for consensus This means that consensus for characters at position 1: {A, A, A} is A, while consensus for characters at position 3: {T, G, G} is G, and so on. This interface provides the framework for consensus generation. Implement this interface to provide different implementations for building consensus.
	IParameterValidator	A simple interface to an object that can check a value for conformance to any required validation rules.
	IQualitativeSequence	Sequence with qualitative data
	ISequence	Implementations of ISequence make up the one of the core sets of data structures in Bio. It is these sequences that store data relevant to DNA, RNA, and Amino Acid structures. Several algorithms for alignment, assembly, and analysis take these items as their basic data inputs and outputs.
	ISequenceRange	A SequenceRange holds the data necessary to represent a region within a sequence defined by its start and end index without necessarily holding any of the sequence item data. At a minimum and ID, start index, and end index are required. Additional metadata can be stored as well using a generic key value pair.

Enumerations

	Enumeration	Description
	FastQFormatType	A FastQFormatType specifies the format of quality scores.
	IntersectOutputType	This enum indicates type of output an intersect operation should return.
	SubtractOutputType	This enum indicates type of output an subtract operation should return.