When having search troubles, there are 3 programs you can invoke from the command line to test, Verity, patquery, and patsearch. patsearch is what the search forms (qadvanced.html, qbasic.html, or qsimple.html) call on the main web server, to do a search. patsearch is a cgi-bin in NPO, but is now a fcgi-bin in Southbury. patsearch calls patquery on the Verity web server (pointed to via the $default_verity_server="ips05i"; in cgi-bin/local.pl. patquery runs on the Verity server as a fcgi-bin, does the search by calling the equivalent of rcvdk, stores the hitlist in a DB/2 hitlist (e.g. IPSRUN.HL_POOL0084), then returns. patsearch then calls hitlist_sql.d2w to read the hitlist from DB/2 and render it. When tracking down a problem, you probably want to start at the bottom with rcvdk, and work your way up to patquery, then patsearch. =========================================================================== To test the Verity collection, see how to use rcvdk in my Verity help file. =========================================================================== To run patquery from the command line, be logged onto the Verity server, cd /dfs/prod/ips/fcgi-bin or /dfs/test/ips/fcgi-bin export PQDATABASE=patent export PQUSERID=ipsrun export PQPW=inst1_password export COLLPATH=/ips/coll export VDKHOME=/dfs/prod/verity/verity Sample queries, ./patquery -g 5 -m 10 -c bibonly "thin film" ./patquery -g 7 -m 2000 -c coll_wo -i VdkVgwKey "pd=2000-11-09" or "VdkVgwKey=WO00126442A2" -m => Increases the maximum number of hits VdkVgwKey => The Patent Number Field. Other fields include CLAS = International Patent Class AD = Application Date PD = Publication Date DP = Priority Date ASSG = First Assignee Name - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - If you get the following messages, CONNECT failed. db=patent, user=ipsrun ... Cannot connect to PATENT. SQLCODE = -1390 Cannot disconnect from database. SQLCODE = -1390. you can get rid of them by . /home/inst1/db2profile The errors don't affect the search, but it does make the output cleaner. At least, I think this worked at one time, cause I wrote it here, that's why. When we tried this on 8-15-2001, it didn't work, though. Carol thought she knew why, but we didn't pursue it 'cause Sander wasn't around. =========================================================================== To run patsearch from the command line, be logged onto the main web server, (Note: Since Carol's turned patsearch into a fcgi program, you would think this no longer works. But ahhh, Carol is smart. She coded the fcgi patsearch to work in both modes and if you look, you'll see cgi-bin/patsearch -> ../fcgi-bin/patsearch.) cd /dfs/prod/ips/cgi-bin or /dfs/test/ips/cgi-bin export REQUEST_METHOD=GET # For Japan, you'll have to authenticate via a web browser, and use # http://ips06i.ips4db2.com/cgi-bin/echoit # to grab your session cookie, then use it below, e.g. export HTTP_COOKIE='USER_TYPE=F; SESSION_ID=44386,/gcLPZSPHltNHhYRmRI+xDj41WQyRD69/102gPLS2WL/GB9L4iNtqOoFcQsaMnVi' Then export your query string. All of these sample queries work. Actually, when patserch runs, this query string comes in URL-encoded, so you'll have a bunch of %28, %29, %2F, etc, but of course, it's easier to type them in normally. export QUERY_STRING="-c=/ips/coll/coll_us&-e=&language=en&-g=4&-m=100&GENERAL=((system10function)ab)" export QUERY_STRING="-c=/ips/coll/coll_us&-g=4&GENERAL=systemab" export QUERY_STRING="-c=/ips/coll/coll_us&-e=&language=en&-g=4&-m=100&GENERAL=((05551212)UP)" Then call patsearch, ./patsearch =========================================================================== One way to debug patsearch on penguin (our development environment), is to cd /dfs/devel/ipn/htdocs cp advquery.html.en advquery.raj.html.en modify line 24 from ... ACTION="/fcgi-bin/patsearch" to ... ACTION="/cgi-bin/patsearch.raj" Then cp /dfs/devel/ipn/fcgi-bin/patsearch /dfs/devel/ipn/cgi-bin/patsearch.raj and add lines like these to /dfs/devel/ipn/cgi-bin/patsearch.raj `echo "At 22, \\\$position_of_field_value=>$position_of_field_value< and \\\$length_of_field_value=$length_of_field_value" >> /afs/d/u/jasper/Publicly-Writable/patsearch.log`; `echo "At 22, \\\$next_unconsumed=>$next_unconsumed< \\\$position_in_value=$position_in_value" >> /afs/d/u/jasper/Publicly-Writable/patsearch.log`; To test, authenticate on penguin (https://penguin.delphion.com) and go to https://penguin.delphion.com/advquery.raj.html.en From another window, you can monitor your logging by tail -f /afs/d/u/jasper/Publicly-Writable/patsearch.log =========================================================================== Another debug technique is to use Carol's debug variable. On GENERAL screens, you can append &d to the end of your search screen. If it's not a general screen though, you've got to set debug=1 in patsearch. If set to 1, it won't erase the temporary files at /tmp/search.$PID.* For example, you can see what patsearch sends to patquery in the /tmp/search.44540.qs file (formatted for readibility), &-a= &-c=bibonly &-g=1 &-i1=VDKVGWKEY &-i2=PD &-i7=TITLE &-i8=SCORE &-l=advquery &-m=20 &-o=SCORE &Field1_Type=RAW &Field1_Text=any+field+field &Field21_Type=IN &Field21_Text=jasper &Field22_Type=PA &Field22_Text=IBM &Field23_Type=TI &Field23_Text=ballistic-resistant+helmet &Field40_OP=%29+AND \ &Field44_Type=RAW \ &Field44_OP=%29+AND \ &Field45_Type=RAW \ These lines &Field45_OP=%29+AND \ look like &Field46_Type=RAW \ they all &Field46_OP=%29+AND / should be &Field47_Type=RAW / deleted. &Field47_OP=%29+AND / &Field48_Type=RAW / &Field48_OP=%29+AND / &Field49_Type=RAW / &Field49_OP=%29+AND / &ft_type= This was done from the Advanced Query page (advquery.html.en) page, typing "any field field" in the "Any Field:" box, "jasper" in the Inventor box (IN), "IBM" in the Assignee box (PA), and "ballistic-resistant+helmet" in the Title box (TI). You can set debug=2 to get even more stuff if you want. =========================================================================== In patsearch, you'll find references to "GENERAL" versus "RAW" without either term defined. Here's what they are, what they mean, and the ramifications of each. First of all, despite this confusing patsearch comment, # Whole question in GENERAL or RAW field don't think that General is the opposite of Raw. They're not. They are actually more the same than they are different. There are three modes inside of patsearch, 1) GENERAL where your entire search string is in one variable called GENERAL, 2) Specific (for lack of a better term), where there are sets of FieldNN_ variable names, 3) and RAW, where we still have those same sets of FieldNN_ variable names, but the value of the FieldNN_type variable, is "RAW". General: Used as a HTTP form variable name for the search string the user specifies in some search boxes. For example, on the "Browse Codes" window, there are two boxes, the first for IPC codes and the second for U.S. codes, each looking like /------------------------------------------\ /--------\ | baseball | | Search | \------------------------------------------/ \--------/ This search string field is named GENERAL This is the and has the value of your search string. Action Button The only other static example is the search-prior_art screen, but there are many places where GENERAL is used in dynamic pages. For example, patsearch returns your search-results with a generic input window, which is already filled in with your filtered search string. Specific: This mode isn't called "specific" in patsearch, but for lack of a better term, I use it here. This is used extensively in the Advanced query form (advquery.html.en), where you allow the users to enter their search string in field-specific boxes, e.g. /------------------------------------------\ Inventor: | jasper | \------------------------------------------/ /------------------------------------------\ Assignee: | IBM | \------------------------------------------/ /------------------------------------------\ Title: | ballistic-resistant helmet | \------------------------------------------/ Each box is comprised of 2-5 form variables. The required variables are Field1_type = The field to search, in our 3 example boxes above, IN, PA, or TI. Field1_text = The search string to search. The following variables are optional and are not-often used. Field1_OP = The Verity operator to use before this term if needed, e.g. ") AND". Defaults to " AND " in patquery if it's not at the beginning of the search string. Field1_Joiner = Usually null, but for date fields (not zones), can be = > < >= or <=. Defaults to null in patquery. Field1_Prefix = Usually null, but might be "NOT", although I can't find anywhere that this is used. Defaults to null in patquery. The idea behind this, what I'm calling specific, FieldNN approach is that we take the user's input and build up the properly-formatted search string for them. This helps new users to build proper search strings, because the search result screen they'll see next, shows the search string we've built for them. The actual building of the search string from all these pieces is done in patquery and these comments from that code explains how the pieces are combined: If FieldNN_joiner is null, we join pieces like so to search zones, field.op field.prefix (field.text field.type) e.g. AND NOT (ibm ASSIGNEE ) else FieldNN.joiner is not null, and we join pieces like so to search fields, or field.op field.prefix (field.type field.joiner field.text) e.g. OR (PD < 04/22/1996) Note the text/type reversal, which is used to search fields, e.g. dates. Raw: For the input text windows that you want to allow more flexible searches from, i.e. the full Verity syntax, you specify raw mode. Examples include 1) the main, "Any Field:" window from the Advanced query form (advquery.html.en) 2) the boolean screen (boolquery.html.en) shown below, 3) A tricky example is the "Quick Text Search" window of the Quick/Number (simple.html.en) screen. If you don't check the "Limit to Inventors & Companies" checkbox, a piece of JavaScript sets the value of the Field1_type variable, to RAW. For example, the boolean screen has the following sets of lines /-----------------------------\ /-----------------\ | All Fields | | baseball | \-----------------------------/ \-----------------/ /-----\ /-----------------------------\ /-----------------\ | and | | All Fields | | | \-----/ \-----------------------------/ \-----------------/ /-----\ /-----------------------------\ /-----------------\ | and | | All Fields | | | \-----/ \-----------------------------/ \-----------------/ /-----\ /-----------------------------\ /-----------------\ | and | | All Fields | | | \-----/ \-----------------------------/ \-----------------/ Each row (except the first, which is missing the Field1_OP) is comprised of three form variables, whose names are (NN is a number, e.g 1 or 24) FieldNN_Type = The field to search or "RAW". This is a drop-down box and if you select "All Fields" (or oddly, even one of the section titles, e.g. "== US only fields follow ===") then the value of this variable will be "RAW". This is where the term "raw" comes from. FieldNN_Text = The search string to search. FieldNN_OP = The "and", "or", or "and not" operator. The Advanced & Boolean screens have other pull-down fields for date searching that get quite tricky, using Javascript to create sneaky -a variables, that bypass all of patsearch's filtering altogether. For example, specifying a year of 2002 in the "Filed Date:" "From:" year field of the advquery.html.en page, generates these html variable name/value pairs. Name Value =========== ================================= Field46_Type RAW with its cooresponding Field46_Text being undefined (Critical that this is not defined & empty!) Field46_OP ) AND and -a &Field46_Text=AD%3e=2002-01-01 (Yes, with 3 trailing spaces) This %3e is the > character, so this is AD>=2002-01-01 When patsearch comes across the Field46_Type variable, it fails the subsequent "Is FieldN_Text string defined and not empty?" test (Undefined variables=""). Patsearch never even sees that the value of the Field46_Type variable is RAW, and never gets to the filtering code. Later on, it's patquery that puts all these pieces together. This sneakiness to bypass patsearch's date filtering, is quite amazing. Amazing that it's built in, used to only bypass patsearch's filtering, and was NOT documented at all inside patsearch (it is now). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Note that "GENERAL" is a form variable's NAME, whereas "RAW" is a form variable's VALUE. For GENERAL, the search string is the value of the GENERAL variable. For RAW, the search string is the value of the cooresponding FieldNN_text variable. Either GENERAL or RAW (just not "specific") indicates that the search string can have the full Verity syntax and thus needs to be parsed more carefully by patsearch. You don't know prior to parsing ahead in the search string, which field or zone he's searching in, as you do for the "specific", fieldNN_type not equal to "RAW", mode. Until (or even if) you know which field he's searching in, patsearch cannot apply/call the proper filter_* subroutine. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Don't get confused with the line in patsearch where it's checking the variable's name for either GENERAL or RAW, viz if ( $input_name =~ / ^\s*GENERAL\s*$ | ^\s*RAW\s*$ /ix ) This is wrong. There is never a variable named RAW, so the only time this test could be true is when you're in "GENERAL" mode, never "RAW". (I'll fix this.) Also confusing, is the name of the filter_RAW subroutine. A better name would be Parse_Full_Verity_Syntax. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Q: Why does the first, big, main, "Any Field:" window from the Advanced query form (advquery.html.en) use the RAW approach and patquery use the GENERAL approach? A: Because the stuff you type in the Advanced query form window, can get all the extra stuff you might specify in those field-specific boxes, tacked on. Q: How about the "Browse Codes" and the search-prior_art windows? Why do they use GENERAL when it can use the FieldNN_ approach? You don't want to allow the full Verity syntax from these windows, do you? A: Good question. They probably SHOULD be using the "specific" FieldNN_ approach. My guess is that the original author didn't have the benefit from this wonderfully-lucid discussion and didn't understand all of this like you now do. And to be truthful, browse.html.en & search-prior_art.html.en don't actually use patsearch. They use their own patsearch bastards, e.g. cgi-bin/ipcsearch and cgi-bin/classsearch. =================================================================================== This was in my change 1909 on March 12, 2004. There are 5 search term filters in patsearch. All properly handle multiple terms. These test cases are intended to test the parsing & filtering. filter_KI for something KI: k*(ki) into ((k,k?) (ki)) (AA,BB,K*,CC)ki into (AA,BB,(K,K?),CC)ki filter_IC for something IC or MC or CLASS or MAINCLASS or INDEXCLASS: C09D 123/4IC into (C09D 12304 IC) (c09d 123/*,c09d,c09d *1)ic and rose into (((c09d 123?? c09d 123???),c09d,(c09d ??1?? c09d ??1???)) ic and rose) filter_NC for something NC or CNC: 084/423.Rnc into 084/423.Rnc junkclaims and (d5;ab*)(cnc) into (junk claims and ((0d5???*,d05???*),(ab????*,0ab???*)) (cnc)) filter_UP for something UP: ((uspp3987 d1234 itmi26 mi26 de00026 04/1234567) up) into ( (pp003987,d0001234,mi000026,mi000026,26,20041234567) up) filter_PN for somethingPN or USREFS: us2003/1234A1pn into (us20030001234A1pn) wo03/000204A2pn into (wo03000204A2pn) wo2004/1pn into (wo20040000001*wo00400001*wo04000001* pn) (usd123, uspp123_, wo123A)pn into (usd0000123*,uspp000123_*,wo00000123A*)pn ===================================================================================