Friday, November 11, 2011

GHC: Anita Borg Denice Denton Emerging Leader Award Winner

This year's ABI Denice Denton Emerging Leader award winner is Tiffani Williams from Texas A&M University.

Discovering Relationships in the Tree of Life

Dr Williams has been studying phylogentic trees to discover relationships. She opens with the example of the Dentist in Florida in 1990 that gave HIV to one of his patients. Even though HIV can mutate from person to person, phylogentic trees can show that the source of the virus and could prove that the dentist did indeed give the virus to his patient. It was also used in a court case to identify a man that intentionally gave HIV to 6 women - he is deservedly spending the next 70 years in prison.

There is some more work in this area is used for studying big cats - to see which cats are most related. For example, the lion, leopard, jaguar, tiger and snow leopard are part of the same group, but clouded leopard is not. By studying this, they can try to help save the species.

Dr. Williams did a great job showing that some of the most interesting is cross disciplinary - you need computer science, genetics and statistics to help save species!

But, these trees can be very large, expensive to store and impossible to easily transfer. Compressed files help, but you might lose useful data.

Storage is cheap, in theory, but upgrading and adding storage to your laptop is not easy and sometimes simply not possible.

Phylogentic trees are represented in Newick formatting, a notation based on balanced parentheses. something like this: (((A,B),D),C,(E,F))); It was actually pretty clear when Dr. Williams used the laser pointer :-)

The problem: one simple phylogentic tree can have 32 Newick patterns! This makes it hard to both compress and identify relationships. Dr. Williams came up with a way to store a unique tree as a unique binary code - then a simple hash algorithm can identify related trees.

The hash table can be further compressed with shorthand, like a special symbol that means "all trees have this relationship", and another for relationships when there are fewer items that share a relationship that do. And this can all be compressed using Tree Zip and stored in plain text!

As much fun as compression is, Dr. Williams advises against using it on humans - we don't like to be compressed into a group, especially when it comes to negative stereotypes.

I learned so much today - I'd love to take an entire class from her!
This post syndicated from Thoughts on security, beer, theater and biking!

1 comment: