Differences between revisions 1 and 2
Revision 1 as of 2004-04-08 15:15:03
Size: 585
Editor: yakko
Comment:
Revision 2 as of 2004-04-08 15:17:33
Size: 583
Editor: yakko
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
Signature files tipically use SuperImposedCoding Signature files tipically use Super Imposed Coding
Line 5: Line 5:
   1. Each document is divided into logical blocks containing D distinct words (StopList words are usually removed before we make the block)    1. Each document is divided into logical blocks containing D distinct words (Stop words are usually removed before we make the block)

Back to ComputerTerms, InformationRetrieval

Signature files tipically use Super Imposed Coding

  1. Each document is divided into logical blocks containing D distinct words (Stop words are usually removed before we make the block)
  2. Each word yields a binary "word signature" using some kind of hash code that is F bits in length with m bits set to 1.

  3. The word signature are OR'd together to form the block signature

  4. The block signatures are concatenated together to form the document signature.

Back to ComputerTerms, InformationRetrieval

SignatureFile (last edited 2006-02-19 20:50:24 by yakko)