Back to ComputerTerms, InformationRetrieval

Signature files typically use Super Imposed Coding

  1. Each document is divided into logical blocks containing D distinct words (StopWords are usually removed before we make the block)

  2. Each word yields a binary "word signature" using some kind of hash code that is F bits in length with m bits set to 1.

  3. The word signature are OR'd together to form the block signature

  4. The block signatures are concatenated together to form the document signature.

Back to ComputerTerms, InformationRetrieval

SignatureFile (last edited 2006-02-19 20:50:24 by yakko)