Pages

24 January 2010

History of Bioinformatics

The first comprehensive collection of amino acid sequences were assembled in "Atlas of protein sequence and structure National Biomedical Research Foundation. This collection was edited by MO Dayhoff 1965 to 1978.

Dayhoff and colleagues also made remarkable contributions to the comparison of the amino acid sequences by developing software to detect distantly related sequences, consequences, evolutionary relationships, etc.

European Molecular Biology Laboratory (EMBL) have generated their data library in 1980 to collect, organize, distribute dry nucleotide sequence data and information. This function is now performed by the European Bioinformatics Institute (EBI), Hinxton, UK

In the early 1980s, was the National Center for Bioinformatics Information (NCBI), established in the U.S.. NCBI acts as a primary database of information and vendor information.
Some time later established the DNA Data Bank of Japan. National Biomedical Research Foundation Protein Information Resource created (PIR) in 1984.
PIR helps researchers in the identification and interpretation of protein sequence data. All of these databases in a close cooperation and regular exchange of data. The data banks serve as an important resource for all researchers interested in biological phenomena, especially molecular aspects of the biological sciences.

Management and analysis of rapidly accumulating sequence data required new software and statistical methods. This attracted researchers from computer science and mathematics for the new field of bioinformatics.

As a result, a variety of methods and tools, and developed greatly facilitate the management, exploitation and dissemination of biological information, this is one of the great benefits of bioinformatics.

Context of bioinformatics / biological databases:

The first bioinformatics / biological databases were built a few years after the first protein sequences began to become available. The first protein sequence reported was that of bovine insulin in 1956, consisting of 51 residues. Nearly a decade later, the first nucleic acid sequence was reported that in yeast alanine RNA with 77 bases. Just one year later, Dayhoff gathered all the available sequence data to create the first bioinformatics database.

The Protein Data Bank followed in 1972 with a collection of ten X-ray crystallographic protein.

Structures and theSWISSPROT protein sequence database began in 1987. A large selection of different data resources of different type of sand sizes are now available either in the public space or higher from commercial third parties. All the original databases were organized in a very simple way of data elements that are stored in flat files, either a perentry, or as a single large text file. Transcend - Later in the lookup indices were added to allow convenient keyword search of header information.

Origin of tools:

After the formation of databases were accessible tools for searching sequence databases - first in a very simple way, looking for targeted keywords and short sequence words, and then more sophisticated pattern matching and alignment-based methods. The rapid but less rigorous BLAST algorithm has been the mainstay of sequence database search, since it was introduced a decade ago, supplemented by a more rigorous and slower FASTA and Smith Waterman algorithms. Suites analysis algorithms written by leading academic researchers at Stanford, CA, Cambridge, UK, Madison, WI for their in-house projects began to be more widely available to the basic sequence analysis. These algorithms were typically one function black boxes that took input and output is produced in the form of formatted files. UNIX-style commands were used to drive the algorithms, with some suites with hundreds of available commands, each taking different command options and input formats. Since these early efforts have made significant progress in automating the collection of sequence information. Rapid innovation in biochemistry and instrumentation have brought us to the point where the whole genome sequence of at least 20 organisms, primarily microbial pathogens, are known and projects to illuminate more than 100 prokaryotic and eukaryotic genomes is underway. Groups are now even competing to complete the sequence of the entire human genome. With new technologies we can directly examine changes in the expression levels of both mRNA and proteins in living cells, both in a disease or an external challenge. We can go on to identify patterns of response in cells that lead us to an understanding of the mechanism of action of an agent on a tissue. The amount of data associated with projects of this nature is unprecedented in the pharma industry, and will have a profound influence on the ways in which the data used and experiments performed in pharmaceutical research and development. This applies not least because a large proportion of the available data interesting in the hands of commercial genomics companies, pharmcos are unable to gain exclusive access to many gene sequences or their expression profiles. Competition between co-licensees a genome database is actually a race to establish a mechanistic role or a different tool for a gene for a condition to secure a patent position for this gene. Much of this work is done by IT tools. Despite the enormous progress in sequencing and expression analysis technologies, and the corresponding magnitude of more data, public, private and commercial databases, the tools used for storage, retrieval, analysis and dissemination of data in bioinformatics are still very similar to the original is assembled by researchers 15-20 years ago. Many are simple extensions of the original academic systems that have served the needs of both academic and commercial users for many years. These systems are now starting to fall behind as they struggle to keep pace with rapid developments in the pharma industry. Databases are still collected, organized, disseminated and sought help from flat files. Relational databases are still few and far between, and object-relational or fully object-oriented systems are still rare in mainstream applications. Interfaces still rely on command lines, fat client interfaces, which must be installed on all desktops, or HTML / CGI forms. While they were in the hands of bioinformatics specialists have pharmcos been relatively demanding their tools. Now the problems have extended to the general discovery process far more flexible and scalable solutions are needed to serve the Pharma R & D informatics requirements.

No comments: