December 14, 2009
Plan to create a UMBS controlled (keyword) vocabulary
Here is an outline of what will need to happen for the UMBS to have an established list of keywords.
1. Extract keywords from the UMBS bibliography, this will be the starting point.
- This list is not comma-delimited, meaning multi-term keywords will need to be manually identified and computer scripts will need to be used to reformat the terms and add commas
2. Parse the raw keyword list into 3 parts:
- a) keywords redundant with the LTER list (including synonyms and lexical variants)
- b) taxonomic descriptors (latin names and species-specific common names?)
- c) candidate-keywords for a UMBS keyword list.
3. Build UMBS keyword list using the candidate-keyword list:
- Identify how to treat hyphens, spaces and plurals
- Declare as equivalent lexical variants (e.g. analyze vs analyse)
- Identify synonyms
- Remove candidate-keywords that require context to make sense (e.g. "change", "description")
Savoy, J. (2005). Bibliographic database access using free-text and controlled vocabulary: An evaluation. Information Processing & Management, 41(4), 873-890.
Svenonius, E. (1986). Unanswered questions in the design of controlled vocabularies. Journal of the American Society for Information Science, 37(5), 331-430.
Svenonius, E. (2003). Design of controlled vocabularies Taylor & Francis.
Posted by kkwaiser at December 14, 2009 10:59 AM