A good reference is ftp://ftp.remotesensing.org/pub/libtiff/TIFF6.pdf head -c 526 /u/jasper/00000013.tif | od -x TIF File Header (8 bytes): 0000000 4949 2a00 0800 0000 |||| |||| \\\\ \\\\--> 8=Offset to First IFD (Image File Directory) |||| \\\\--> Constant Decimal 42 in (in this case) Little Indian Order \\\\--> x4949 = "II" = Data is in Little Indian Order x4d4d = "MM" = Data is in Big Indian Order First IFD: 0000008 1300 --> Number of Directory Entries (19) Each of the 19 Directory Entry is 12 bytes: 2-byte Tag (See ftp://ftp.remotesensing.org/pub/libtiff/TIFF6.pdf) 2-btye Type (1=Byte, 2=ASCII, 3=2-byte Integer, 4=4-byte Integer, 5=2 Longs) TIFF 6.0 Defines also types 6-12 4-byte Count 4-byte Value or Offset if type > 4 bytes long Tag Name 000000A fe00 0400 0100 0000 0200 0000 255=NewSubfileType 2=Single Page of a multi-page image? 0000016 0001 0400 0100 0000 1009 0000 256=ImageWidth 910=2320 pixels per scanline 0000022 0101 0400 0100 0000 500d 0000 257=ImageLength d50=3408 rows (aka scanlines) 000002E 0201 0300 0100 0000 0100 0000 258=BitsPerSample 1=Number of bits per component 000003A 0301 0300 0100 0000 0400 0000 259=Compression 4=Group 4 Fax: CCITT T.6 bi-level 0000046 0601 0300 0100 0000 0000 0000 262=PhotometricInterpretation 0=WhiteIsZero 0000052 0a01 0300 0100 0000 0100 0000 266=FillOrder 1= 000005E 1101 0400 0100 0000 5e01 0000 273=StripOffsets Strips start at offset 15e 000006A 1201 0300 0100 0000 0100 0000 274=Orientation 1=0th row=top and 0th column=left 0000076 1501 0300 0100 0000 0100 0000 277=SamplesPerPixel 1=Number of Components per pixel 0000082 1601 0300 0100 0000 500d 0000 278=RowsPerStrip d50=3408 rows 000008E 1701 0400 0100 0000 cef4 0000 279=StripByteCounts f4ce=62670 Bytes in that strip after compression 000009A 1a01 0500 0100 0000 f200 0000 282=XResolution Numerator/Denominator at offset f2 00000A6 1b01 0500 0100 0000 fa00 0000 283=YResolution Numerator/Denominator at offset fa 00000B2 2801 0300 0100 0000 0200 0000 296=ResolutionUnit 2=Inch 00000BE 2901 0300 0200 0000 0000 0100 297=PageNumber Page 1 of 1 00000CA 3101 0200 1d00 0000 0201 0000 305=Software(1) Length=29 at offset 102 (ASCII) 00000D6 3201 0200 1400 0000 1f01 0000 306=DateTime Length=20 at offset 11F (YYYY:MM:DD HH:MM:SS format) 00000E2 3b01 0200 2b00 0000 3301 0000 315=Artist Length=43 at offset 133 (ASCII) 00000EE 0000 0000 <== Offset of next IFD (0=None) The Real Data: 00000F2 2c01 0000 0100 0000 300 pixels per ResolutionUnit(inch) Horizontally 00000FA 2c01 0000 0100 0000 300 pixels per ResolutionUnit(inch) Vertically 0000102 [1]0;1;0;1;1;META-63021;.002 Supposedly "Software", but really page start info (see below) 000011f 2000: 1:12 09:47:24 Date & Time in YYYY:MM:DD HH:MM:SS format 0000133 1996-98 AccuSoft Inc., All rights reserved Artist 000015E ffff ffff ffff e5b4 b08b 2022 896e ... Strips start here If you read the metadata.doc file on the DVD, you'll see the explanation of that first "Software" field. The TIFF Specifications say TIF Directory Entry x'131'=decimal 305 is supposed to be the name of the software that wrote this image, but the US PTO has their own use for the first "Software" tag. Note that subsequent "Software" tags do not contain this data, it's only the first that does. For my 00000013.tif example shown above, [1]0;1;0;1;1;META-63021;.002 decodes like so, [1] = Total Number of Pages in Image 0 = Biblio Page Start 1 = Abstracts Page Start 0 = Drawings Page Start 1 = Descriptions Page Start 1 = Claims Page Start META-63021 = The metadata starts at the 63,021st byte (first byte is byte 1) .002 = I don't know. This isn't described in the metadata.doc document. The metadata for 00000013.tif doesn't follow this syntax described in the metadata.doc though, but it DOES have other decompression problems. The metadata for 00000012 & 00000014 are ok, though. The data for US05224775 from usp364 also makes sense and is very illustrative. That image contains the original 20-page image as well as a Certificate of Correction on page 21, and 2 Reexamination documents on pages 22-25 & 26-28. The metadata for US05224775 is [28] = 28 Total Pages 0; = Biblio Starts on page 0 (there is none?) 1; = Abstracts Start on page 1 2; = Abstracts Start on page 2 11; = Abstracts Start on page 11 1; = Claims start on page 1 META-1730906; The metadata starts at the 1,730,906th byte. That metadata is a bunch of comma-delimited fields 05224775 = Patent Number US = Country Code A1 = Kind Code 19930706 = Issue Date (YYYYMMDD) 28 = Total # Pages NULL = Missing Page Flag NULL = Withdrawn Flag 1 = Abstract Starting Page 1 = Abstract Ending Page 2 = Drawing Starting Page 10 = Drawing Ending Page 11 = Description Starting Page 18 = Description Ending Page 18 = Claims Starting Page 20 = Claims Ending Page 21 = Certificate of Corrections Starting Page 21 = Certificate of Corrections Ending Page 22 = Reexamination Starting Page 28 = Reexamination Ending Page CERT_OF_CORR=1 = Number of Certificate of Corrections in this Image Then, if the above is not zero, for each Certificate of Correction, 19941011 = Publication Date (YYYYMMDD) NULL = Missing Pages Flag 21 = Starting Page 21 = Ending Page RE_EXAM=2 = Number of Reexaminations in this Image 19940719 = Publication Date (YYYYMMDD) B1 = Sequence Number NULL = Missing Pages Flag 22 = Starting Page 25 = Ending Page 20020423 = Publication Date (YYYYMMDD) C2 = Sequence Number NULL = Missing Pages Flag 26 = Starting Page 28 = Ending Page