« September 2011 | Main | November 2011 »

October 31, 2011

Image Transfer Fun

Time to move images from here to here. This here is a list (and a recursive link!):

Douglas Lake Region

- The data set Property Boundaries of UMBS - incorporates this image
- The data set Soils of UMBS - incorporates this image

No transferred (yet):
- the UMBS Campus or siteuse images yet.
- Gorge trails
- Mapped trees of UMBS
- NAIP orthophoto - must boost file upload size!!

Not going to transfer:

- the satellite images of UMBS because of copyrights. This topo map is also copyrighted but it must apply only to the digital version used here.
- DEM of all of N. Michigan - need to get one subset to UMBS property to accompany our DEM data set.

Sugar Island

- The Sugar Island Research Site received the following Osborn and Public Lands and Land Use

Not transferring

- Copyrighted topo map of SI

Ecosystem and Cover Type Maps

- The data set Major and Minor Landforms of UMBS - incorporates this
- The data set Vegetative Cover Types of UMBS - incorporates this UMBS image and a similar Colonial Point image.
-The data set Ecosystems of UMBS - incorporates this UMBS image and the Colonial Point image.

No transferred (yet):
- aerial photo of colonial point yet.

USGS and Historical Maps

- Not transferring these as of now.

Posted by kkwaiser at 01:32 PM | Comments (0) | TrackBack

October 28, 2011

Bibliography Criteria

I asked Bob to elaborate on how decisions to include a publication in our bibliography are made . Here is what he said:

The historical criteria for adding a publication is: 1) whether it resulted from research while that person was at the station and 2) whether it is from the Station area.

But as I was looking through all the publications there were quite a few that were about different areas of the world. So basically the criterion became whether that person worked on the publication here (wrote it, or did the statistics, etc.). So a publication wouldn't be part of our bibliography if it was just from our area (as Scott Herron's probably are). That is why I created the nmich bibliography (there were, for example, huge numbers of archaeological publications done by other people but in our area). A publication would be part of our bibl if someone was on sabbatical here (like Jordan Price) but wrote about research that they had done in other parts of the world.

Here is an additional criteria dealing with grey-literature:

Articles like the Gough and Nave 2011 (a FluxNet Newsletter article) should be included in the UMBS Research Bibliography if they meet the following criteria:

- They add unique value to the Bibliography. In this case, a full accounting of a signature UMBS research project qualifies.

- The bibliographic entry can be distinguished from peer-reviewed entries within the bibliography. Although not ideal, I've categorized this entry as a Magazine Article as the grey literature implication is clear.

Here is the publication:

Posted by kkwaiser at 09:02 AM | Comments (0)

October 27, 2011

Remove (slice) pages from a pdf in Ubuntu

Taken from a XMas post elsewhere on the internet:

$ pdftk [input.pdf] cat [pg-pg] [pg-pg] output [output.pdf]

$ pdftk FluxLetter_Vol4.pdf cat 4-7 output FluxLetter_Vol4_slice.pdf

Posted by kkwaiser at 08:51 AM | Comments (0) | TrackBack

October 26, 2011

Information Specialist Internship

Position overview: Work on projects central to University of Michigan Biological Station's data management goals. Specifically, this position will work closely with the UMBS Information Manager and affiliated researchers to identify and archive completed data sets.

Responsibilities: Communicate with and receive metadata and data from researchers, harvest metadata from peer-reviewed literature and dissertations, enter metadata into a Drupal-based information management system, run quality control on incoming data sets

Required skill sets: Knowledge of metadata standards, familiarity with data quality assurance and quality control practices, ability to communicate efficiently with scientists on a wide variety of research topics, exceptional organizing skills, ability to balance need for detail with overarching program goals.

Desired skill sets: Experience with the Drupal content management system, knowledge of a scripting language (e.g., PHP, python, R).

Posted by kkwaiser at 02:36 PM | Comments (0) | TrackBack

IGERT-BART Data Project

IGERT-BART Data Legacy Project - UMBS received two successive rounds of IGERT funding for graduate student researchers. That means a lot of data sets were created and, lo-and-behold, where are they? Let's map how we might find this out:

1. Collate list of BART fellows, dates at UMBS, topics researched, publications produced, contact information, likely data sets, etc.

2. Build "most wanted data" target by prioritizing list 1.

3. Contact BART fellows to request data submission.

4. Conduct data interviews with BART fellows to establish number and scope of potential data sets.

5. Metadata entry - harvested from publications/dissertations and correspondence with researcher

6. Data quality control - iterate with researcher to develop a final, archivable version of the dataset

Posted by kkwaiser at 02:00 PM | Comments (0) | TrackBack

Prioritizing Data Management Goals

Question: If the UMBS Information Manager could have an intern/assistant for at least a semester period, what would that person do? What are goal is the most pressing and would lead to the greatest payoff?

Potential Goals:

Data Forensics project - identify completed research projects with unarchived, high quality data sets and begin placing them into the Research Gateway

Target IGERT-BART Data - In someways a subset of the Data Forensics Project but with a greatly restricted target population and timespan.

Research Gateway Rampage - Crank away on improvements (i.e., New Table Wizard module, Drupal 7 migration)

Current Research Metadata Entry - Work to increase contributions from ongoing research projects

Housework - Monitor incoming housing applications, update database, search for and add new publications

Great exercise. Now let us order by priority:

1. Target IGERT-BART Data - In someways a subset of the Data Forensics Project but with a greatly restricted target population and timespan.

2. Current Research Metadata Entry - Work to increase contributions from ongoing research projects

3. Housework - Monitor incoming housing applications, update database, search for and add new publications

4. Research Gateway Rampage - Crank away on improvements (i.e., New Table Wizard module, Drupal 7 migration)

5. Data Forensics project - identify completed research projects with unarchived, high quality data sets and begin placing them into the Research Gateway

1-3 could conceivably be part of a single internship although 1 warrants more than a semester while 2 and 3 are ongoing.

Posted by kkwaiser at 01:12 PM | Comments (0) | TrackBack

Crop PDFs in Ubuntu

Another helpful snippet from the interwebs:


$ pdfcrop [input.pdf] [output.pdf]

I don't remember exactly how I installed pdfcrop but this command is in my terminal history:

$ sudo apt-get install texlive-extra-utils

Posted by kkwaiser at 11:20 AM | Comments (0) | TrackBack

October 11, 2011

Cool Things - Part I

Application of profiling buoys to determine whole lake metabolism in stratified lakes - P. Staehr

- Accounting for physical changes between depths
- Respiration and NPP higher in metalimnion
- High variability of biological processes driven mix layer depth, light attenuation

Deep chlorophyll maxima - Kevin Rose
- Implicatinos for ecosystem structure and metabolism estimates
- UV and PAR with manual profiler
- Chlorophyll peaks line up with 1.7% PAR and forms a 1:1 linear relationship (R^2 = 0.98)
Would help to know what deep chlorophyll maxima are

- Looking for PAR and UV profile data

Lake Lillinonah - New Site - Jen Klug
- New buoy on reservoir
- Lake association based (funded) operation
- Hurricane Irene and Tropical Storm Lee and additional storms have completely mixed the lake

Unnamed presentation shall remain unnamed
- I'm going to skip this one but will bet $5 that the speaker goes way over the time limit. Any takers?
- Rainfall increase increases UV absorbance due to accompanying increase in fresh DOC.
- Moderator is standing and the speaker is going strong. Double or nothing?!?!?
- No time for questions, I win the day.

Earth Microbiome Project - Ashley Shade
- Interesting side note: Ashley made the shortlist for an EEB faculty position a few years ago. She's now post-docing at Yale University.
- EMP Goal: characterizing microbial life on earth
- Four products - Gene Atlas, Assembled Genomes, Metabolic Reconstruction, Visualization Portal
- Opportunity to get microbes at UMBS sequenced

Novel approaches for assessing ecosystem metabolism and greenhouse gas fluxes at the Experimental Lakes Area - Scott Higgins
- Plug for Experimental Lakes Area
- NW Ontario research area with ~40 years of data
- Have sediment flux chambers deployed
- Supped up garden hose rack hooked to a data logger which controls raising and lowering of sensor package

Analysis of future scenarios of simulated water temperature data using lake analyzer - Nihar Samal

- Use climate model predictions to guestimate future water temps and feed into Lake Analyzer to model physical structure of the lake
- Interesting combination of climate change models and limnological models. However, models derived from models of models yields skepticism - error propagation, anyone?
- Schmidt Stability increases 15-20%

CALON project: Circumarctic Lakes Observation Network - John Lenters
- Note: Lenters also has projects on Lake Superior with collaborators from N. Michigan University
- This is a new site report, next GLEON should have data
- Arctic Observing Network - NSF/NOAA initiative, funding CALON
- One site in Barrow, Emaiksoun Lake and one other

Meta: We've two talks left and it is now lunchtime :|
I pledge to keep my talk very short - which probably won't matter because I am slated to go last tomorrow and we'll almost surely be 20 minutes behind by then.

Meta II: Nevermind, they're cancelling talks to get us to lunch.

Renconciling differences in the temperature-­dependence of ecosystem respiration across time scales and ecosystem types

Complete aside: internet speed appears to decrease when boring talks are given - Gabriel Yvon-­Durocher

- take home: this person is doing a nice job of linking terrestrial carbon to aquatic ecoystems

Posted by kkwaiser at 10:51 AM | Comments (0) | TrackBack

Working Group Reports

It's a live-blog people. Buckle up!

Lake Metabolism Working Group

- Comparing GPP, R and NEP among 25 lakes
- Mendota - night time metabolism
- Linking chlorophyll and metabolism
- Diel patterns in respiration (identifying different drivers)
- Physical affects on lake metabolism (Rose et al)
- Newest = high-frequency DO with Chl fluorescence data
- No UMBS data

Signal Processing Working Group
- Scales of variability in chl fluorescence
- inter-lake variability
- linking with biological variables
- Established QA/QC methods
- Day/night flourescence relationships
- Chl A variance at monthly/daily scales (Jim Rusak)
- Disturbance impacts on Chl A
- No UMBS data

Climate and Lake Physics Working Group
- Lake Analyzer consumes temp profiles, met variables and yields thermocline depth, smith stability, water density and other variables
- expansion planned: QA/QC
- Physical variability of lakes (K. Rose)
- Convective/wind mixing - causes of turbulence (J. Read)
- Extreme events synthesis
- Yes UMBS data

Lake Ecosystem Modelling Working Group
- Multi-model comparisons
what kind of model?
I don't know
- Other stuff goes here.
- PhD student opportunity in Denmark, contact Dennis

The Theory Group Working Group
- Award for non-name name
- Phytoplankton dynamics across different types of lakes
- Use lake physical parameters and link to phytoplankton assemblage dynamics
- Use low-res manual sampling as well
Sounds like community ecology, no?
- Seems course/REU data from UMBS could fit into this working group
- Need phytoplankton assemblage on monthly scale, currently with 3 lakes and not ready to add more :|
- Will be assessing data strengths

IT Working Group
- received supplemental funds to improve information management practices
- Looking to form task force to prioritize resource allocation (not much else here, more tomorrow?)
- Existing infrastructure:
- Data from 44 sites (Douglas Lake included) -> Plotting tools via Vega Database
- LakeBase (note to self: get DL metadata into this) - basic descriptors
- People Database - GLEON membership info

Unnamed Working Group Shall Remain Unnamed Working Group
- Microbial Working Group
- Microbes under ice!
- Autonomous detection/sampling devices
- Need DNA samples from Lakes (they will analyze for you)
- Currently done at high altitude, western lakes (K. Rose)
- Will be developing sampling protocols/policies
- First samples in 2012
- Earth Microbiome Project (Ashley)

Episodic Events Virtual Group Working Group
- Manuscript in press: Impacts of weather related episodic events in lakes: an analysis based on high frequency data

Posted by kkwaiser at 08:38 AM | Comments (0) | TrackBack

October 07, 2011

Data Security Resources

A list of what a proper resource solution should provide in terms of meeting data security recommendation practices. This is a proposed organization.

Data Classification

There are a host of documents available through IIA but I'm relatively familiar with them so will skip. The following are one-off documents that appear to be somewhat homeless but still may be useful.

A 2008 memo on sensitive data handling. To summarize, no sensitive data on desktop computers, removable storage or email.

Guidelines for the Contract for Obtaining Sensitive Data from the Toledo Adolescent Relationships Study - an example of a highly specific data handling agreement that covers collaborators, backups, replication, destruction, transmission and other facets of data storage.

Protecting Confidential Data on Personal Computers with Storage Capsules - A paper by UM researchers on a method for isolating sensitive data on a desktop computer from malware that may reside on the computer.

Criteria for sensitive data protection plans
- storage requirements for sensitive data, derived data is addressed, network solutions need not apply.

Research Data Strategy: Considerations of the Blue Ribbon Panel - Interesting snippet:

"Data is often not classified leading to data either being over protected because everything is treated like sensitive data or everything is under-protected by treating everything as public data"

Appropriate Data Storage Solution

Should cover:

Access Authentication
Access Authorization
Access/Activity Logging
Account Management
Password Management
Disaster Recovery/Business Continuity Plan

Available Resources:

East Hall's IT group has a pretty good list up. Sensitive data seem to be a deal breaker, however. Value Storage's FAQ states that "is not intended for data that is [sic] considered sensitive, private/confidential or critical to the operation of the university. Value Storage may be considered for such data when the customer environment is tightly managed according to the guidance provided below."

Mainstream Storage's Service Level Agreement recommends users "exercise caution when storing sensitive data in Mainstream Storage space."

Encryption Solution

Should cover:

Digital Media Protection

Available Resources:
SafeComputing on Mobile Device Security (MDS) appears to be the best, single, UM-derived resource. Includes webcasts walkthrough on protecting data in motion and at rest.

White Paper on MDS. See page 3 for practical recommendations.

A more exhaustive take on MDS targeting IT folks is also available from this site.

Backup Solution

Should Cover:
Backup Requirements
Disaster Recovery/Business Continuity Plan

Available Resources:
As I understand it, Tivoli Storage Manager (TSM) Backup Service, will be available for researchers within LSA soon.

This should assist in meeting Disaster Recovery needs because TSM "has full UPS redundancy, enhanced electrical systems, fire protection, security systems, and environmental alarms...[and] is replicated"

Physical Security Solution

Should cover:

Physical Security - Mandatory
Physical Security - Recommended

Not much right now. An likely outdated document with contact information identifying who to contact if, for example, you want to put in a key request at the LSA. I have a feeling this doesn't apply at the unit level in all instances.

Don't Require Solution

Should cover:
Third Party Data Handling
Audit/Review (of applicable procedures)

Training Opportunity Solution

Should Cover:
Training and Awareness of Data Handling and Applicable Regulations



Notes from LSA IT on secure server configuration available here.

Posted by kkwaiser at 10:06 AM | Comments (0) | TrackBack

October 06, 2011

Vague post is vague

Protection Category -> e.g. Backup requirements, access authorization
Sensitivity Level -> Vocabulary -> High, Medium, Low -> Indicates the sensitivity levels addressed prescribed action addressed
Protection Requirement -> Prescribed security actions given a Sensitivity Level and Protection Category

Protection Unit -> The University of Michigan unit responsible for administering the reference protection resource
Protection Resource -> The University of Michigan resource which can be used to meet protection requirements

Posted by kkwaiser at 04:42 PM | Comments (0) | TrackBack

EML and DEIMS - Code-Definition Parsing

Whew. I am the world's slowest debugger but I am also doggedly persistent.

From the DEIMS Google code repository:

Code-Definition variable information not parsed
What steps will reproduce the problem?
1. Create an EML document of a dataset containing variables that have code-definitions

What is the expected output? What do you see instead?
These code-definitions are NOT included in the EML document

What version of the product are you using? On what operating system?
Patch is applied against the latest (Oct. 6th) Google version

The conditional statement that leads to code-definitions being parsed does not include a check on whether the $code_definitions array contains values. Adding this check allows downstream parsing to occur.

Here is the patch:

--- /home/data/Desktop/views_bonus_eml/export/views-bonus-eml-export-eml.tpl.php 2010-12-16 14:28:50.000000000 -0500
+++ /home/data/Desktop/views-bonus-eml-export-eml.tpl.php.patched 2011-10-06 11:36:42.000000000 -0400
@@ -313,7 +313,8 @@
$attribute_maximum[0]['value'] ||
$attribute_minimum[0]['value'] ||
$attribute_precision[0]['value'] ||
- $attribute_unit[0]['value']) {
+ $attribute_unit[0]['value'] ||
+ $code_definitions[0]['value']) {
if ($attribute_formatstring[0]['value']) {

Posted by kkwaiser at 11:51 AM | Comments (0) | TrackBack

October 05, 2011

Devel snippets

Overview of the print statements is presented here.

# load a node given a specific NID

$mn = node_load(8503);

# pretty print the node array

# ugly print the node array

# load a CCK field array into a variable (a node reference in this case)
$dfNid = $mn->field_dataset_datafile_ref;

# load a portion of a CCK field array:
$dfNid = $mn->field_dataset_datafile_ref[0][nid];

# now load the reference node
$dfNode = node_load($dfNid);

Posted by kkwaiser at 02:40 PM | Comments (0) | TrackBack

EML and DEIMS - Mapping Attributes

How to treat variable information (called an Attribute in EML) is at the forefront of my mind for a few reasons:

1) Variables can vary widely (ha,ha) among and within data sets which makes the EML specification rather complex.
2) Portions of this complexity are ensconced within the DEIMS metadata structure but much of it is not. I tend to agree with this approach as encasing the entire specification would create a huge proliferation of fields that would rarely be used and would make direct metadata entry into the DEIMS system by non-expert researchers nearly impossible.

On with the show.

EML Specification*

<attributeName> is the official name of an attribute, typically the name of a field in a data table. This is often short and/or cryptic.

<attributeLabel> (optional): is used to provide a less ambiguous or cryptic alternative identification than what is provided in <attributeName>. This content may be used as a column or row header in an HTML display.

<attributeDefinition> gives a precise and complete definition of attribute being documented. It explains the contents of the attribute fully so that a data user can interpret the attribute accurately.

Corresponding DEIMS variable fields

Node Title -> attributeName
Variable Abbreviation (field_attribute_label) -> attributeLabel
Definition (field_var_definition) -> attributeDefinition

This is confusing. Variable Abbreviation maps to attributeLabel although the latter is designed to be the full variable name (i.e., Node Title.)

EML Specification* - yields 5 over-arching variable categories:

<measurementScale> indicates the type of scale from which values are drawn for the attribute. One of the 5 scale types must be used: nominal, ordinal, interval, ratio, or dateTime,

The <nominal> scale is used to represent named categories. Values are assigned to distinguish them from other observations. This would include a list of coded values (e.g. 1=male, 2=female), or plain text descriptions. Columns that contain strings or simple text are nominal. Example: plot1, plot2, plot3.

<ordinal> values are categories that have a logical or ordered relationship to one another, but the magnitude of the differences between the values is not defined or meaningful. Example: Low, Medium, High.

<interval> These measurements are ordinal, but in addition, use equal-sized units on a scale between values. The starting point is arbitrary, so a value of zero is not meaningful. Example: The Celsius temperature scale uses degrees which are equally spaced, but where zero does not represent “absolute zero” (i.e., the temperature at which molecular motion stops), and 20 Celsius is not “twice as hot” as 10 Celsius.

<ratio> measurements have a meaningful zero point, and ratio comparisons between values are legitimate. For example, the Kelvin scale reflects the amount of kinetic energy of a substance (i.e., zero is the point where a substance transmits no thermal energy), and so temperature measured in kelvin units is a ratio measurement. Concentration is also a ratio measurement because a solution at 10 micromolePerLiter has twice as much substance as one at 5 micromolePerLiter.

<dateTime>, is a date-time value from the Gregorian calendar and it is recommended that these be expressed in a format that conforms to the ISO 8601 standard. An example of an allowable ISO date-time is “YYYY-MM-DD”, as in 2004-06-25, or, more fully, as “YYYY-MM-DDThh:mm:ssTZD” (e.g., 1997-07-16T19:20:30.45Z).

Corresponding DEIMS variable fields

The DEIMS implementation is simplified into the following groups:

Quantitative Variable: Interval/Ratio are clumped under ratio

Date Time Variable: dateTime

Text Based Variable: Nominal/Ordinal are clumped under nominal.

- Note: pattern here is /attribute/measurementScale/nominal/nonNumericDomain/enumeratedDomain - are the last two contradictory? Partial answer: possibly not because there is a numericDoman field

EML Specification* - "The and scales require additional tags describing , the , and."

<unit> Units should be described in correct physical units. Terms which describe data but are not units should be used in <attributeDefinition>. For example, for data describing “milligrams of Carbon per square meter”, “Carbon” belongs in the <attributeDefinition>, while the <unit> is “milligramPerMeterSquared”.

Corresponding DEIMS variable fields

Unit (field_attribute_unit) -> Unit within a customUnit tag

Notes to follow-up on:

Code-definition doesn't show up in the EML output. views-bonus-eml-export-eml.tpl.php indicates it should appear as

Although it seems this would be correct


*EML Best Practices Working Group. EML Best Practices for LTER Sites V2.0. August 1st, 2011. http://im.lternet.edu/sites/im.lternet.edu/files/emlbestpractices-2.0-FINAL-20110801_0.pdf

Posted by kkwaiser at 11:19 AM | Comments (0) | TrackBack

EML and DEIMS - Mapping Mission

My primary take-away from the EIM 2011 conference was that, while the metadata we (UMBS) are taking in is of sufficient quality for re-use by researchers, it is not rigorous enough to be machine-ingested and we (UMBS) therefore are not well-prepared to move data to third-party databases. The first step needed to remedy this situation is for me to gain a better - more rigorous, if you will - understanding of the EML specification and how our DEIMS fields translate to it.

Of course, I'm not entirely certain how in depth this process will go but I do anticipate a series of posts looking at particular portions of the EML specification and analysing the DEIMS fields and the Drupal2EML module that accomplishes the mapping.

Important resources:
The LTER's EML Best Practices Guide

The DEIMS code repository which contains the content types and modules we use.

The EML specification, but I hope to rely mostly on the Best Practices Guide.

Posted by kkwaiser at 10:56 AM | Comments (0) | TrackBack

October 03, 2011

Zend tutorial notes

Part 1 -

When you requested the script above, Apache intercepted your request and handed it off to PHP. PHP then parsed the script, executing the code between the < ?php...? > marks and replacing it with the output of the code run. The result was then handed back to the server and transmitted to the client. Since the output contained valid HTML, the browser was able to render it for display to the user.

Posted by kkwaiser at 11:48 AM | Comments (0) | TrackBack

Online PHP Tutorials

W3 Schools

A wiki tutorial - a crowd sourced tutorial seems fitting, but does it work?

PHP Buddy - doesn't look as complete

Zend Developer Zone - looks well organized


From the actual PHP website

I'm going to try Zend first.

Posted by kkwaiser at 11:34 AM | Comments (0) | TrackBack