What do you think about using hashes in DC network ?
- They are good
- I like using NMDC style more, I’m lazy to wait for the hash creation
- I don’t care
What is a hash? According to some definition : “A number generated by applying a mathematical formula to a document or sequence of text. The hash is significantly shorter that the original text and is unique to the original document.” One could say it’s a resume of some larger data.
Ok, so what has this to do with ADC?
ADC forces clients to use hashes. Why should we use hashes? There are several reasons.
First of all, any file in the world of DC can be identified by an unique hash. So, if you have a file named x, you can find alternatives for it using the hash, and that ensures that you really found the file you wanted… not some scam having the same name… NMDC searches alternatives based on name and size entirely. This also makes possible “magnet URI links” to anyfile around the world. They are similar to links to files, they include a known name and the hash of the file. This way people make sure you are looking for the right file and can point to them using a link. Bitzi.com provides file searches all over the net: http://bitzi.com/search/
The hash also helps when downloading. Let’s say you are downloading a 30 megabyte executable file. Somewhere around the middle some corrupt data arrives. Oups… what a pity… nmdc users have to download it all over again. Of course, after they realize that the downloading was the problem and they don’t throw away the file saying its not working…
When clients hash your files, they create a tree of leaves, each leaf being something small, like 1 kb. Then, a hash for each leaf is made, and a number of leaves hashes is concatenated and hashed again, resulting in a higher level hash. And so on, until you get to the top where you have a single hash. Thats the Tree Root hash. If you download a file and the hash downloaded matches the hash created on your downloaded file then it’s ok. If not, clients look down in the tree until the leaf that caused a different hash and download it again. So you got to download just 1 kb instead of the whole file.
How good is that? Well, i think you already guessed by now…
ADC can use multiple hashes, but the one used at the moment ( and the only with a specification ) is Tiger. You guessed right, TTH is Tiger Tree Hash.
So don’t be afraid of the hash because it’s good for you, it makes your downloads cleaner and safer, and gives you possibility for real positive alternates of any file, and a global file identification schema.
( ADC uses hashes in global client identification too, also in password hashing described in other posts)
[ Additional info http://en.wikipedia.org/wiki/Hash_tree ]