« Goin' to Brazil | Main | Digitizing biological collections »

March 05, 2010

Create biblio to research site node references

1) Query for biblio NIDs and associated research sites.

SELECT nid, biblio_custom3
FROM `alpha_biblio`
WHERE biblio_custom3 IS NOT NULL


2) Query for research site NIDs and names:

SELECT nid, title
FROM `alpha_node`
WHERE `type` LIKE 'research_site'

Currently going back to the bib and fixing names. problem with (), don't know why.

3) I used a python script to match the biblio NIDs and the research site NIDs based upon the name. In the process, I found and corrected a number of errors. It would have been more efficient to build the sql into the pyton script but I didn't want to take the time to figure that out.


#! /usr/bin/python
# execfile("/blah/scripts/combineNIDS.py")
import sys, tempfile, string, re, csv, os
try:
fi1 = csv.reader( open('blah/biblioNID_withResearchSites.csv', 'r') , delimiter=',', quotechar='"') #biblio NIDs
fi2 = csv.reader( open('blah/researchSites_with_NID.csv', 'r'), delimiter=',', quotechar='"') # research site NIDs
fo = csv.writer( open('blah/io/matchedNIDs.csv', 'w')) # output Document
bibNID = list()
pairedNID = [["bibNID", "siteName", "siteNID"]]
siteList = list()
pairedNIDCounter= 0

# this loop pairs each research site with the NID of the respective bibliography entry
for row in fi1: #each row is a bib entry
a = row[1].split('/') #separate research sites for a given bib entry
for site in a:
bibNID.append([ row[0], site.strip() ]) # this list is each site in the bib with the biblio NID
for row in fi2: # create a reserach site list we can work with
siteList.append(row)

for bibSite in bibNID: # grab a site from a publication
pairedNIDCounterBeforeTest = pairedNIDCounter # see below
for aSite in siteList: # grab a research site
#print aSite
if bibSite[1].strip() == aSite[1].strip(): # do the two sites match
pairedNID.append([bibSite[0] , aSite[1] , aSite[0]])
pairedNIDCounter += 1
#print pairedNIDCounter
if pairedNIDCounter == pairedNIDCounterBeforeTest: # if this test is TRUE, then a site from the biblio did not have a matching research which means something is wrong
print bibSite #print the biblio site that doesn't match a research site
fo.writerows(pairedNID) # this is the matched NIDs

except IOError:
print 'Can\'t open file for reading.'
sys.exit(0)

4) I then used the following sql files to load the data into the database:
insert_into_bib_person_ref.sql
insert_into_search_node_lines.sql

Posted by kkwaiser at March 5, 2010 02:35 PM

Comments

Login to leave a comment. Create a new account.