« Automatically creating nodes | Main | Tackling the Gazetteer »

February 12, 2010

Create Node References via database insertions

The goal here is to create a lot of node references that point from publications held within biblio to a CCK content type, Person. Here are some steps:


1) Upload the bibliography to Drupal. Add a node reference field to the biblio content type. I called this field "bib_person_reference".

2) Go to the database (I'm using phpMyAdmin) and run this query to pull out the author names. These will be used (with Node Import) to create the Person instances:

SELECT biblio_contributor_data.cid, biblio_contributor_data.name, biblio_contributor_data.lastname, biblio_contributor_data.firstname, biblio_contributor_data.initials

FROM biblio_contributor_data

3) Export the previous query to a csv file. Here are the first rows:


Table A.
"cid","name","lastname","firstname","initials"
"1","Anonymous","Anonymous",,
"2","Abdel-Malek, Emile Tadros","Abdel-Malek","Emile Tadros",
"3","Ackert, James E.","Ackert","James","E"
"4","Wadley, F. M.","Wadley",,"F M"

Note: I saved this file as "biblio_author_dump.csv"

4) Prep the author information for Node Import. The Node Import module has a CSV formatting requirement for inputs. Here's what an example of my formatting looks like in a spreadsheet. I then save this with comma delimiters and quotes all around.

Table B.
uid type Full name Last Name First Name Middle Name Comment settings Published Promoted to front page Sticky at top of lists Authored on
1 Person Anonymous Anonymous 0 1 0 0 2010-02-25 14:00:00
1 Person Abdel-Malek, Emile Tadros Abdel-Malek Emile Tadros 0 1 0 0 2010-02-25 14:00:00
1 Person Ackert, James E. Ackert James E 0 1 0 0 2010-02-25 14:00:00
1 Person Wadley, F. M. Wadley F M 0 1 0 0 2010-02-25 14:00:00

Note: I called this file person_create_import.csv. Also note that I tangled with the names a *bit* because the biblio module uses a different name schema than I do.

5) Import and create the Person nodes. A trick here is that I kept each person's full name (aka Node Title) exactly the same as the biblio contributor (author) full name. See the next step for why this matters.

6) Now, match the Person NID's with the Author NID's. I am able to use LEFT JOIN because the Person Title (i.e., full name) is identical to the contributor name (i.e., author name). I was extremely happy when I figured this out.


SELECT biblio_contributor.nid, biblio_contributor.vid, biblio_contributor_data.cid, biblio_contributor_data.name, node.nid, node.title

FROM biblio_contributor_data

LEFT JOIN node ON node.title = biblio_contributor_data.name
LEFT JOIN biblio_contributor ON biblio_contributor.cid = biblio_contributor_data.cid

7) I put the results from the last query in OpenOffice calc. Notice that I sorted this data by the biblioNID (it was sorted by the author name before.)

Table C.
nid vid cid name nid title
2824 2824 1 Anonymous 5643 Anonymous
2825 2825 2 Abdel-Malek, Emile 5644 Abdel-Malek, Emile
2826 2826 3 Ackert, James E. 5645 Ackert, James E.
2826 2826 4 Wadley, F. M. 5646 Wadley, F. M.

8) Next, format the data in Table C so that they can go into an SQL statment. In OO Calc, I created two additional sheets that match the tables I will insert data into:

 Table D.
Name:  alpha_search_node_links
       (biblioNID)              (personNID)   (Full Name)
            sid        type         nid        caption
            2824       node	   5643	      Anonymous
            2825       node        5644	      Abdel-Malek, Emile Tadros
            2826       node	   5645	      Ackert, James E.
Name: alpha_content_field_bib_person_reference
            (biblioNID)          	(personNID)
	vid	nid	delta	field_bib_person_reference_nid
	2824	2824	  0		5643
	2825	2825	  0 		5644
	2826	2826	  0		5646
	2826	2826	  1		5645
The delta denotes the number of authors a publication has. Delta = 0 denotes the first author, Delta = 1 denotes the second author. In this example, publication 2826 has two authors whose node ID numbers are 5646 and 5645.


9) Format the contents of Table D to create the SQL query and then run it (I used phpMyAdmin for this.) Here are two snippets of what the queries looked like in the end:

Insert data into the table content_field_bib_person_reference. This table was created when I added a node reference field to the biblio content type:


INSERT INTO `database`.`content_field_bib_person_reference` (
`vid` ,
`nid` ,
`delta` ,
`field_bib_person_reference_nid`
)

VALUES (

'2824' , '2824' , '0' , '5643' ),
( '2825' , '2825' , '0' , '5644' ),
( '2826' , '2826' , '1' , '5645' ),
( '2826' , '2826' , '0' , '5646' )

;


Insert data into the table search_node_links:

INSERT INTO `database`.`search_node_links` (
`sid` ,
`type` ,
`nid` ,
`caption`
)
VALUES (

'2824' , 'node' , '5643' , 'Anonymous' ),
( '2825' , 'node' , '5644' , 'Abdel-Malek, Emile Tadros' ),
( '2826' , 'node' , '5645' , 'Ackert, James E.' ),
( '2826' , 'node' , '5646' , 'Wadley, F. M.' )

;


Note: Some of the names have apostrophes (e.g. 0'Neill). To keep this from interfering with the SQL syntax I added an escape character (e.g., O\'Neill)

10) Finally, clear the memory cache (admin/settings/performance > Clear Cached Data)

11) View a couple of bibliography entries to make sure everything went smoothly.


Note:  The text below is scrap.  It's being saved just in case it comes in handy later.




I ran this query to pull out the person's name and NID (referred to as personNID below):

SELECT node.nid, node.title
FROM node
WHERE node.type = "person"

7) Run the following query to get the author names, their id's (CID) and the id (NID or biblioNID, below) of the publications they are associated with.

This query extracts data from two biblio tables: biblio_contributor_data and biblio_contributor.

SELECT biblio_contributor.nid, biblio_contributor_data.cid, biblio_contributor_data.name, biblio_contributor_data.lastname, biblio_contributor_data.firstname, biblio_contributor_data.initials

FROM biblio_contributor, biblio_contributor_data

WHERE biblio_contributor.cid = biblio_contributor_data.cid

8) Export the previous query to a csv file. Here are the first rows:

 
Table A.
"nid","cid","name","lastname","firstname","initials"
"2824","1","Anonymous","Anonymous",,
"2825","2","Abdel-Malek, Emile Tadros","Abdel-Malek","Emile Tadros",
"2826","3","Ackert, James E.","Ackert","James","E"


7) I also ran this query to get the NID's of the biblio entries (biblioNID) and their associated contributor (author) ID's, or CID's.

SELECT nid, vid, cid
FROM biblio_contributor
ORDER BY nid

This produced the following:

Table B.

nid vid cid
25 25 1
26 26 1
27 27 1
28 28 1
29 29 1
30 30 3
30 30 2
31 31 1
32 32 1
33 33 4
34 34 5

Notice below that I have associated the CID's (i.e., the author identifier used by biblio, with the personNID (i.e., the identifier for each person I created.) I got the personNID when I queried the test_person content type. Because the list of persons and the list of contributors are identical, all I did to associate the two was an alphabetical sort and then a paste!

Table C.

cid name lastname firstname initials personNID
1 Gates, Frank C. Gates Frank C 2982
3 Nichols, G. E. Nichols G E 2984
4 Barnes, B. V. Barnes B. V. 2985

For a quick summary, a unique personNID is associated with each cid (personNID -> cid) and the cid is associated with every publication for a given author (cid -> biblioNID). This tutorial was helpful in associating each personNID with the appropriate biblioNID's.

Using OpenOffice Calc, I put Tables B and C on the same worksheet. I added a column to the right of Table A labeled personNID. I filled that column with this equation:

=INDEX($F$2:$K$1746;MATCH(C2;$F$2:$F$1746;0);MATCH($D$1;$F$1:$K$1;0))

$F$2:$K$1746 -> This references the dimensions of Table C.

MATCH(C2;$F$2:$F$1746;0) -> This matches the CID from Table B to the CID from Table C and returns the spreadsheet row number

MATCH($D$1;$F$1:$K$1;0) -> This returns the column number from Table C that holds the personNID.

When the row number and column number are put together, INDEX returns the value of the given cell, the personNID. This matches the biblioNID with the correct personNID(s).

2) Run the following query to get the author names, their id's (CID) and the id (NID or biblioNID, below) of the publications they are associated with. I did this through PHPMyAdmin.

This query extracts data from two biblio tables: biblio_contributor_data and biblio_contributor.

SELECT biblio_contributor.nid, biblio_contributor_data.cid, biblio_contributor_data.name, biblio_contributor_data.lastname, biblio_contributor_data.firstname, biblio_contributor_data.initials

FROM biblio_contributor, biblio_contributor_data

WHERE biblio_contributor.cid = biblio_contributor_data.cid

3) Export the previous query to a csv file. Here are the first rows:

 
Table A.
"nid","cid","name","lastname","firstname","initials"
"2824","1","Anonymous","Anonymous",,
"2825","2","Abdel-Malek, Emile Tadros","Abdel-Malek","Emile Tadros",
"2826","3","Ackert, James E.","Ackert","James","E"

Posted by kkwaiser at February 12, 2010 09:01 AM

Comments

Login to leave a comment. Create a new account.