To run smartSwitch by hand, - cd /tmp (or some directory that you have write permissions to) - /dfs/prod/ips/converters/smartSwitch -p /cdrom/mepa2003037 \ -fc /dfs/prod/ips/converters/config/smartSwitch.cfg mepa2003037 - echo $? The return code (110 in this example) identifies the type of CD/DVD it is, as explained in the cfg file (Jouve EPA Text CD in American English). You can look in that /dfs/prod/ips/converters/config/smartSwitch.cfg file to see what smartSwitch used to identify the disk, in this example, it was patentFormat = Jouve_EPA returnCode = 110 searchType = File fileName = contents searchString = EUROPEAN # and searchString = A1 which says if the file "contents" exists, and contains both the strings "EUROPEAN" and "A1", then return with returnCode=110. ------------------------------------------------------------------------ To test the conversion for a week's worth of data in JAPIO, Logon to ips01i as ipsadmin or jasper if you want. Just not as root. cd /dfs/convdata mv usa01week12 usa01week12.save mkdir usa01week12 cp -p usa01week12.save/01week12rb.zip usa01week12 cd usa01week12 /ips/converters/Convert2DB2 usa01week12 ------------------------------------------------------------------------ The steps /ips/prod/converters/ConvUspto goes through to convert a usa*week* "CD", are 1) mkdir's & cd's to /ips/convdata/usa01week12 2) ftp from the USPTO site if need be, ftp ftp.uspto.gov login anonymous bin cd pub/patdata/01 (2-digit year) get 01week12rb.zip 3) unzip 04week14.zip to get the XML file, pgb040406.XML, which is XML describing patents. The patent number tag is D0487951 There are long lines in this XML file that grep can't handle, so to count the number of patents, use gnugrep -c '^' pgb040406.XML 4) Run /ips/prod/converters/XML2gb.pl *XML 20040406, which creates pgb040406.gbk (what we want) and a huge pgb010320.sta error message file. This takes about 23 minutes to run. This Greenbook file has fixed-length, 80-byte lines. Patent numbers are on WKU lines (eg, WKU D04879511), so to count, grep -c ^WKU pgb040406.gbk 5) "Fixes" the file, i.e. Converts it to Green Book Format, by calling /ips/prod/converters/txt2asc.pl <*.gbk > usa04week14.fix This pads it to fixed-length, 80-byte records WITHOUT carriage returns or new line characters (!!), so it's now difficult to read. You can set your window width to 80-characters, then "more" it. 6) Copies /ips/converters/config/country.ld and /ips/converters/config/ila.dat cp -p /ips/converters/config/country.ld . cp -p /ips/converters/config/ila.dat . 7) Converts the "fixed" Green Book file, usa01week12.fix, by calling /ips/converters/v4apsnpo usa01week12 usa01week12.fix v4rep -icnt=US -npo -ilang=EN_US >>scrout2 8) Erases usa01week12.fix, *.gbk, *.sgm, country.ld, and ila.dat files. 9) Erases all zero-length files. Prior to 10-23-200, the US PTO released SGML files that the ConvUspto converter basically processed it the same way. ... 3) 01week12rb.zip unzipped to pgb010320.sgm 4) A different Redbook-to-Greenbook converter, rb2gb.pl, was used /ips/converters/rb2gb.pl *sgm 20010320 which created pgb010320.gbk (what we want) and pgb010320.sta, which is an error message file. ------------------------------------------------------------------------ To see what's on one of the miwo20010xx CD's in Japan, 1) Logon as root on ipspat 2) dce_login jasper 3) cd /dfs/prod/converters 4) ./essai /wo/miwo2001023/mimosa.rom | more OR > miwo2001023.data /=================================\ | Here in San Jose, you could | | cd /dfs/ips/converters | | essai /cd/wld92035.rom | more | \=================================/ OR 1) Logon as root on ips01i 2) dce_login jasper 3) cd /dfs/prod/converters 4) ./essai /cdrom/miwo2000029/mimosa.rom | more OR > miwo2000029.data This generates a SpyFile in the current directory that doesn't have anthing interesting in it if it ran ok. It has debug information in it, though, when it doesn't run ok. Below is some sample data for patent WO00139725A2, which had 2 priorities, US199000164149 and the second with a missing priority number, which the IPS converter turned into 16 question marks ("????????????????"). The IPN converter turns missing APN numbers into a single dash ("-"). PN' 'WO 0139725 A2 20010607' AN' 'US 0041961 20001108' PR' 'US 60/164,149 19991108' PR' 'US 20001108' MC' 'A 61K ' DS' 'AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG' RO' 'AP EA EP OA' PA' 'L.A.M. PHARMACEUTICAL CORP.' IN' 'DRIZEN, Alan' IN' 'MICALIZZI, Michael' RP' 'NATH, Gary, M.' ET' 'DRUG PREPARATIONS' FT' 'PREPARATIONS DE MEDICAMENTS' ABE' 'Semisolid, sustained-release drug delivery compositions based on hyaluronic acid and its salts, and more particularly to the manufacture and use of such compositions.' ABF' 'L'invention concerne des compositions semi-solides de médicaments à libération lente basé es sur l'acide hyaluronique et es sels, et concerne plus particulièrement la fabrication et l'utilisation de telles compositions.' ------------------------------------------------------------------------ Here's another example from the miwo2002008 DVD. root@ipspat:/wo/miwo2002008>/dfs/prod/converters/essai mimosa.rom | head PN' 'WO 0213592 A2 20020221' AN' 'ES 0000313 20000807' DS' 'AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ L K LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW GH GM KE LS M Z SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN G L MR NE SN TD TG' RO' 'AP EA EP OA' PA' 'CREACIONES AROMATICAS INDUSTRIALES, SA' PA' 'MARTINEZ ORTEGA, Alberto' IN' 'MARTINEZ ORTEGA, Alberto' RP' 'MANRESA VAL, Manuel' ET' 'AROMATIZING PRE-MIXTURE TO BE INCLUDED AS ADDITIVE IN A COMPOUND FEED FOR SLAUGHTER ANIMALS AND METHOD FOR INCREASING THRODUCTIVITY OF SLAUGHTER ANIMALS' FT' 'PREMELANGE AROMATISANT DESTINE A ETRE UTILISE COMME ADDITIF DANS DES ALIMENTS COMPOSES POUR ANIMAUX D'ABATTAGE ET PEDE PERMETTANT D'AUGMENTER LA PRODUCTIVITE DES ANIMAUX D'ABATTAGE' ... Of interest, note how the EPO has chosen to list individual inventors twice, once as an assignee/applicant (the "PA" line), and again as an inventor (the "IN" line). This explained why the Japanese see the individual inventors in their bib-only "Applicant" section, but not in their "Assignee(s)" section of the IPN full-text view. We don't use this data source for EP data. We get our EP data directly from WIPO.