To run smartSwitch by hand,
- cd /tmp (or some directory that you have write permissions to)
- /dfs/prod/ips/converters/smartSwitch -p /cdrom/mepa2003037 \
-fc /dfs/prod/ips/converters/config/smartSwitch.cfg mepa2003037
- echo $?
The return code (110 in this example) identifies the type of CD/DVD it is,
as explained in the cfg file (Jouve EPA Text CD in American English).
You can look in that /dfs/prod/ips/converters/config/smartSwitch.cfg file
to see what smartSwitch used to identify the disk, in this example, it was
patentFormat = Jouve_EPA
returnCode = 110
searchType = File
fileName = contents
searchString = EUROPEAN
# and
searchString = A1
which says if the file "contents" exists, and contains both the strings
"EUROPEAN" and "A1", then return with returnCode=110.
------------------------------------------------------------------------
To test the conversion for a week's worth of data in JAPIO,
Logon to ips01i as ipsadmin or jasper if you want. Just not as root.
cd /dfs/convdata
mv usa01week12 usa01week12.save
mkdir usa01week12
cp -p usa01week12.save/01week12rb.zip usa01week12
cd usa01week12
/ips/converters/Convert2DB2 usa01week12
------------------------------------------------------------------------
The steps /ips/prod/converters/ConvUspto goes through to convert a
usa*week* "CD", are
1) mkdir's & cd's to /ips/convdata/usa01week12
2) ftp from the USPTO site if need be,
ftp ftp.uspto.gov
login anonymous
bin
cd pub/patdata/01 (2-digit year)
get 01week12rb.zip
3) unzip 04week14.zip to get the XML file, pgb040406.XML, which is XML describing
patents. The patent number tag is D0487951
There are long lines in this XML file that grep can't handle, so to count the
number of patents, use gnugrep -c '^' pgb040406.XML
4) Run /ips/prod/converters/XML2gb.pl *XML 20040406, which creates pgb040406.gbk
(what we want) and a huge pgb010320.sta error message file.
This takes about 23 minutes to run.
This Greenbook file has fixed-length, 80-byte lines. Patent numbers are
on WKU lines (eg, WKU D04879511), so to count, grep -c ^WKU pgb040406.gbk
5) "Fixes" the file, i.e. Converts it to Green Book Format, by calling
/ips/prod/converters/txt2asc.pl <*.gbk > usa04week14.fix
This pads it to fixed-length, 80-byte records WITHOUT carriage returns
or new line characters (!!), so it's now difficult to read. You can set your
window width to 80-characters, then "more" it.
6) Copies /ips/converters/config/country.ld and /ips/converters/config/ila.dat
cp -p /ips/converters/config/country.ld .
cp -p /ips/converters/config/ila.dat .
7) Converts the "fixed" Green Book file, usa01week12.fix, by calling
/ips/converters/v4apsnpo usa01week12 usa01week12.fix v4rep -icnt=US -npo -ilang=EN_US >>scrout2
8) Erases usa01week12.fix, *.gbk, *.sgm, country.ld, and ila.dat files.
9) Erases all zero-length files.
Prior to 10-23-200, the US PTO released SGML files that the ConvUspto
converter basically processed it the same way.
...
3) 01week12rb.zip unzipped to pgb010320.sgm
4) A different Redbook-to-Greenbook converter, rb2gb.pl, was used
/ips/converters/rb2gb.pl *sgm 20010320
which created pgb010320.gbk (what we want) and pgb010320.sta,
which is an error message file.
------------------------------------------------------------------------
To see what's on one of the miwo20010xx CD's in Japan,
1) Logon as root on ipspat
2) dce_login jasper
3) cd /dfs/prod/converters
4) ./essai /wo/miwo2001023/mimosa.rom | more OR > miwo2001023.data
/=================================\
| Here in San Jose, you could |
| cd /dfs/ips/converters |
| essai /cd/wld92035.rom | more |
\=================================/
OR
1) Logon as root on ips01i
2) dce_login jasper
3) cd /dfs/prod/converters
4) ./essai /cdrom/miwo2000029/mimosa.rom | more OR > miwo2000029.data
This generates a SpyFile in the current directory that doesn't have anthing
interesting in it if it ran ok. It has debug information in it, though,
when it doesn't run ok.
Below is some sample data for patent WO00139725A2, which had 2 priorities,
US199000164149 and the second with a missing priority number, which the
IPS converter turned into 16 question marks ("????????????????").
The IPN converter turns missing APN numbers into a single dash ("-").
PN' 'WO 0139725 A2 20010607'
AN' 'US 0041961 20001108'
PR' 'US 60/164,149 19991108'
PR' 'US 20001108'
MC' 'A 61K '
DS' 'AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG'
RO' 'AP EA EP OA'
PA' 'L.A.M. PHARMACEUTICAL CORP.'
IN' 'DRIZEN, Alan'
IN' 'MICALIZZI, Michael'
RP' 'NATH, Gary, M.'
ET' 'DRUG PREPARATIONS'
FT' 'PREPARATIONS DE MEDICAMENTS'
ABE' 'Semisolid, sustained-release drug delivery compositions based on hyaluronic acid and its salts, and more particularly to the manufacture and use of such compositions.'
ABF' 'L'invention concerne des compositions semi-solides de médicaments à libération lente basé es sur l'acide hyaluronique et es sels, et concerne plus particulièrement la fabrication et l'utilisation de telles compositions.'
------------------------------------------------------------------------
Here's another example from the miwo2002008 DVD.
root@ipspat:/wo/miwo2002008>/dfs/prod/converters/essai mimosa.rom | head
PN' 'WO 0213592 A2 20020221'
AN' 'ES 0000313 20000807'
DS' 'AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ L
K LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW GH GM KE LS M
Z SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN G
L MR NE SN TD TG'
RO' 'AP EA EP OA'
PA' 'CREACIONES AROMATICAS INDUSTRIALES, SA'
PA' 'MARTINEZ ORTEGA, Alberto'
IN' 'MARTINEZ ORTEGA, Alberto'
RP' 'MANRESA VAL, Manuel'
ET' 'AROMATIZING PRE-MIXTURE TO BE INCLUDED AS ADDITIVE IN A COMPOUND FEED FOR SLAUGHTER ANIMALS AND METHOD FOR INCREASING THRODUCTIVITY OF SLAUGHTER ANIMALS'
FT' 'PREMELANGE AROMATISANT DESTINE A ETRE UTILISE COMME ADDITIF DANS DES ALIMENTS COMPOSES POUR ANIMAUX D'ABATTAGE ET PEDE PERMETTANT D'AUGMENTER LA PRODUCTIVITE DES ANIMAUX D'ABATTAGE'
...
Of interest, note how the EPO has chosen to list individual inventors twice, once as an assignee/applicant
(the "PA" line), and again as an inventor (the "IN" line). This explained why the Japanese see the individual
inventors in their bib-only "Applicant" section, but not in their "Assignee(s)" section of the IPN full-text
view. We don't use this data source for EP data. We get our EP data directly from WIPO.