Data Management in the
Research Environment
Data Management and Research Design
Library Workshop Series
https://bit.ly/2N7SbVt
Dr. Timothy Norris - Research Data Scientist - tnorris@miami.edu
some things to talk about
Resources at UM
tnorris@miami.edu

Timothy Norris

Data Scientist

(305) 284-2826 tnorris@miami.edu

criopelle@miami.edu

Cameron Riopelle

Head of Data Services

(305) 284-3257 criopelle@miami.edu

??

Biomedical Data Librarian

newperson@miami.edu


aparrish@miami.edu

Abraham Parrish

GIS Services Librarian

(305) 284-9488 aparrish@miami.edu

aparrish@miami.edu

Jorge Quintella

GIS and Data Specialist

(305) 284-5729 jaq32@miami.edu

Working with Data Workshop Series
  • Data Analysis Software Instruction
    Dr. Cameron Riopelle, Head of Data Services, criopelle@miami.edu
    • Introduction to SPSS
      Designed for new SPSS users. It provides an introduction to the SPSS software program, including its software environment, importing data, descriptive -statistics, transforming variables, selecting and splitting data, and visualization.
      Richter Library – Digital Scholars' Lab - Wednesday September 18 - 3 - 4:30pm (17 people)
    • Intermediate SPSS
      Designed for intermediate SPSS users. It covers common statistical methods in SPSS such as means comparisons, ANOVA, linear regression, and logistic regression models.
      Richter Library – Digital Scholars' Lab - Wednesday September 25 - 3 - 4:30pm (17 people)
    • SAS for Data Analysis
      Designed for both new and intermediate SAS users. It provides an overview of the SAS software program, including its programming language, software environment, importing data, descriptive statistics, transforming variables, selecting and splitting data, exploratory tests, regression models, and visualization.
      Richter Library – Digital Scholars' Lab - Wednesday October 2 - 3 - 4:30pm (17 people)
    • Introduction to R/RStudio
      Designed for new R and R Studio users. It provides an introduction to the R software program, including its programming language, software -environment, importing data, descriptive statistics, transforming variables, selecting and splitting data, exploratory tests, and visualization.
      Richter Library – Digital Scholars' Lab - Wednesday October 23 - 3 - 4:30pm (17 people)
    • Intermediate R/RStudio
      Designed for intermediate R and R Studio users. It covers common statistical methods in R such as means comparisons, ANOVA, linear regression, and basic visualization.
      Richter Library – Digital Scholars' Lab - Wednesday October 30 - 3 - 4:30pm (17 people)
    • Data Visualization with Tableau
      Introduction to using the software program Tableau for data visualization. It covers making common graphs and tables, importing data, making sheets and dashboards, and exporting images.
      Richter Library – Digital Scholars' Lab - Wednesday November 6 - 3 - 4:30pm (17 people)
    • Data Visualization with R/RStudio
      Introduction to using the software program R for visualization. Prior experience with R required--this workshop assumes knowledge of the R language and environment.
      Richter Library – Digital Scholars' Lab - Wednesday November 20 - 3 - 4:30pm (17 people)
  • Research Data Management Series
    Dr. Timothy Norris, Data Scientist, tnorris@miami.edu
    • The Data Management Challenge: Wrangling Data in the Research Environment
      This is an introduction to topics in research data management designed to foster skills and encourage data management best practices for efficiency, compliance and security in the research environment. This is a discipline agnostic seminar. Specific learning goals include the identification of best practices for: file naming conventions, file system organization, data security, data privacy, backup strategies, data sharing, data documentation, and data publication. These topics introduce practical behaviors to ease the digital research process.
      Richter Library – Flex Space – Friday August 31, 10 – 11:30 am (30 people)
      RSMAS Library – Map Room – Friday September 6, 10 – 11:30 am (30 people)
      Calder Library (Miller) – First Floor Collaboratory – Friday September 13, 10 – 11:30 am (25 people)
    • Data Management and Research Design
      This is the first in a series of short seminars that explore topics in research data management (see intro above and other DM seminars in this list). This is a discipline agnostic seminar. Specific learning goals include to: Understand federal policy context for data management and sharing, explain the data lifecycle, critically evaluate existing Data Management Plans (DMPs), and be able to identify key elements in a DMP. As a product of the seminar/workshop attendees will outline a data management plan (DMP) for their research.
      Richter Library – Flex Space – Wednesday October 9, 10 – 11:30 am (30 people)
    • File Formats and System Organization for Research
      This is the second in a series of short seminars that explore topics in research data management (see intro above and other DM seminars in this list). This is a discipline agnostic seminar. Specific learning goals include to: Understand file format choices and their implications for data sharing, data publication, and data re-use, identify best practices for file system organization, and identify best practices for file naming conventions. As a product of the seminar/workshop attendees will select file formats, choose a file naming convention, and design a file system architecture for their research.
      Richter Library – Flex Space – Tuesday October 22, 10 – 11:30 am (30 people)
    • Research Data Description and Documentation
      This is the third in a series of short seminars that explore topics in research data management (see intro above and other DM seminars in this list). This is a discipline agnostic seminar. Specific learning goals include to: Identify disciplinary metadata standards (if they exist), describe discipline agnostic metadata standards, and understand how metadata facilitates discovery, sharing and access to data resources. As a product of the seminar/workshop attendees will create metadata for a selected subset of the data from their research.
      Richter Library – Flex Space – Wednesday November 6, 10 – 11:30 am (30 people)
    • Research Data Publication: Repositories and Sharing
      This is the fourth in a series of short seminars that explore topics in research data management (see intro above and other DM seminars in this list). This is a discipline agnostic seminar. Specific learning goals include to: Identify discipline specific repositories, understand data ownership in the context of research institutions, be able to decode repository requirements to publish data, and create correct citations for referencing data in publications. As a product of the seminar/workshop attendees will create a deposit package from a selected subset of the data from their research for publication in a data repository.
      Richter Library – Flex Space – Tuesday November 19, 10 – 11:30 am (30 people)
  • GIS Software Instruction
    Dr. Jorge Quintela, GIS & Data Specialist, jaq32@miami.edu
    • Introduction to ArcGIS Online
      This workshop will introduce you to ArcGIS Online, the ESRI’s cloud-based mapping and analysis platform. You will learn how to create interactive maps, how to add, manage and share content, and how to perform basic spatial analysis procedures with your data. Participants will also learn how to create and share basic web apps. It is strongly recommended that participants register in advance for them to receive their ArcGIS Online credentials before the session starts.
      Richter Library – Flex Space – Thursday October 10, 2 – 4 pm (17 people)
  • Software Carpentry - https://software-carpentry.org
    Dr. Timothy Norris, Data Scientist, tnorris@miami.edu
    Dr. Cameron Riopelle, Head of Data Services, criopelle@miami.edu
    • Python
      Scientific computing using the python programming language. This workshop includes an introduction to command line computing, version control and python.
      Calder Library - Downstairs Collaboratory - November 11-12, 9 am - 5 pm (35 people)
    • R/RStudio
      Scientific computing using the R programming language. This workshop includes an introduction to command line computing, version control and R.
      RSMAS Library - Map Room - December 2-3, 9 am - 4:30 pm (35 people)
The Data Deluge
Data Sharing Requirements
  • NIH: October 2003
Data Management Requirements
  • NSF: January 2011
  • NEH: June 2011
The 2013 OSTP Memo: Open Data
  • Federally funded research results should be made accessible to the public
  • Both peer-reviewed publications and data
The 2018 Federal Data Strategy
  • Governance, access, accountability, innovation
Federal Movement Towards Open Data

Adapted from: Whitmire, Amanda L. (2014). Research Data Management Curriculum, Lecture 2: Introduction to Research Data Management.
Oregon State University Libraries. http://figshare.com/articles/GRAD521_Research_Data_Management_Lectures/1003835
The 2013 OSTP Memo
  • Transparency and efficiency
  • Growth, security, value
  • Commercial re-use and innovation
Lots of sticks!!
https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf
Agency Response
Policy Coverage
Funder Link to Response Timeline to Implement Published Outputs Data
AHRQ http://www.ahrq.gov/funding/policies/publicaccess/index.html Feb 2015 (A), Oct 2015 (D) full full
ASPR http://www.phe.gov/Preparedness/planning/science/Pages/AccessPlan.aspx Oct 2015 (A, D) full full
CDC http://www.cdc.gov/od/science/docs/Final-CDC-Public-Access-Plan-Jan-2015_508-Compliant.pdf Jul 2013 (A), Oct 2015 (D) full full
DOD http://www.dtic.mil/dtic/pdf/DoD_PublicAccessPlan_Feb2015.pdf estimate fiscal year 2015 full full
DOE http://www.energy.gov/datamanagement/doe-policy-digital-research-data-management Oct 2014 (A) Oct 2015 (D) full full
DOT https://www.transportation.gov/open/official-dot-public-access-plan Jan 2016 full full
FDA http://www.fda.gov/downloads/ScienceResearch/AboutScienceResearchatFDA/UCM435418.pdf Oct 2015 (A, D) full full
IES https://ies.ed.gov/funding/researchaccess.asp
http://ies.ed.gov/funding/datasharing_implementation.asp
In effect (A, D), FY 2016 (D) full partial
NASA http://science.nasa.gov/media/medialibrary/2014/12/05 NASA_Plan_for_increasing_access_to_results_of_federally_funded_research.pdf Oct 2015 (A, D) full full
NIH http://grants.nih.gov/grants/NIH-Public-Access-Plan.pdf In effect (A, D), Dec 2015 (D) full full
NIST http://www.nist.gov/data/upload/NIST-Plan-for-Public-Access.pdf Oct. 2015 (A, D) partial full
NOAA http://docs.lib.noaa.gov/noaa_documents/NOAA_Research_Council/NOAA_PARR_Plan_v5.04.pdf FY2016, Q2 (A, D) (Jan 2016) full full
NSF http://www.nsf.gov/pubs/2015/nsf15052/nsf15052.pdf Jan 2016 (A, D) full full
SI http://public.media.smithsonianmag.com//file_upload_plugin/1f143b54-a9f9-4746-bef5-1c76151e3c7a.pdf Oct. 2015 (A, D) full full
USDA http://www.usda.gov/documents/USDA-Public-Access-Implementation-Plan.pdf Jan 2016 (A) full partial
USAID http://blog.usaid.gov/2014/10/announcing-usaids-open-data-policy/ October 1, 2014 (D) none full
USGS http://www.usgs.gov/usgs-manual/im/IM-OSQI-2015-01.html Oct 1, 2015 (A, D) full full
VA http://www.va.gov/ORO/Docs/Guidance/VA_RSCH_DATA_ACCESS_PLAN_07_23_2015.pdf October 1, 2015 (A, D) partial partial

adapted from http://bit.ly/FedOASummary
The 2013 OMB Memorandum
  • Value – “manage information as an asset throughout its lifecycle”
  • Privacy, security, ownership
  • Data:
    “refers to all structured information, unless otherwise noted.”
  • Information Life Cycle
    “means the stages through which information passes, typically characterized as creation, collection processing, dissemination, use, storage, and disposition.”
The 2013 OMB Memorandum
  • Open Data
    • Public
    • Accessible
    • Described
    • Reuseable
    • Complete
    • Timely
    • Managed Post-Release



https://project-open-data.github.io/

The 2013 OMB Memorandum
  • Policy Requirements
    • Collect information in a way that supports downstream use
    • Machine readable formats
    • Use data standards
    • Open licenses
    • Common core and extensible metadata



https://project-open-data.cio.gov/v1.1/schema/


The 2018 Federal Data Strategy
https://strategy.data.gov
  • Enterprise Data Governance
  • Access, Use, Augmentation
  • Decision Making & Accountability
  • Commercialization, Innovation, and Public Use
This is a collaborative effort that addresses the cross-agency priority (CAP) goal to
Leverage Data as a Strategic Asset.
What is Data?
Numbers
Words
Citations / references
Notebooks / marginalia
Specimens
Field Samples
Images
Videos / sound recording
Relationships
Models
Code


What is Data?

“Examples of Research Data and Materials include laboratory notebooks, notes of any type, photographs, films, digital images, original biological and environmental samples, protocols, numbers, graphs, charts, numerical raw experimental results, instrumental outputs from which Research Data can be derived and other deliverables under sponsored agreements.”


Data and the University of Miami
Definitions: The term “Research Data” in this document refers to information recorded and/or collected for research performed at or under the auspices of the University regardless of the form or the media upon which it is recorded. This term includes, but is not limited to, computer programs (code and documentation), computer databases, instrumental outputs, raw numerical results, original biological or environmental samples, photographs, digital images, films, protocols, graphs, and other deliverables produced under sponsored agreements. Research Data also includes any records related to the design, conduct or reporting of the research that would be necessary to reconstruct the reported research results. Research data can be intangible (statistics, findings, conclusions, etc.) and tangible (notebooks, printouts, etc.).

From the data curation initiative at UM (2016), a combination of definitions from several peer institutions
Data is an Innovation?
Innovations: patentable or un-patentable inventions, discoveries, processes, compositions, research tools, data, ideas, databases, know-how, copyrightable works that are not scholarly or artistic Creations and tangible property, including biological organisms, engineering prototypes, drawings, and software created, conceived or made by Applicable Personnel within their normal duties (including clinical duties), course of studies, field of research or scholarly expertise or making more than Incidental Use of University’s resources. (UM Faculty Manual p. 136)
3.3 Innovations are owned by the University; revenues derived from commercialization of Innovations will be shared with the Applicable Personnel as detailed in Section VI. (UM Faculty Manual p. 138)



The 2013 OSTP Memo

  • Data (from OMB Circular 110):
“Data is defined … as the digitally recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.”
https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf
What is Data?
Numbers
Words
Citations / references
Notebooks / marginalia
Specimens
Field Samples
Images
Videos / sound recording
Relationships
Models
Code


What is DATA Management?

Data?
Numbers
Words
Citations / references
Notebooks / marginalia
Specimens
Field Samples
Images
Videos / sound recording
Relationships
Models
Code


Data Management?
File System Organization
File Naming Conventions
Privacy/Security Considerations
File Format Choice
Documentation and metadata
Roles and responsibilities in research environment
Storage and backup strategies
Acquiring and cleaning data
Sharing and collaboration strategies
Ownership of data
Access strategies / Access restrictions
Data publication / Data citation
¿TOOLS?
¿VISUALIZATION?
Some Useful Abstractions


“Information is not knowledge.
Knowledge is not wisdom.
Wisdom is not truth.
Truth is not beauty.
Beauty is not love.
Love is not music.
Music is THE BEST.”

― Frank Zappa  
UC Santa Cruz
http://guides.library.ucsc.edu/datamanagement/
UC Davis
http://libguides.ucd.ie/data
The US Geological Survey

http://www.usgs.gov/datamanagement/images/figures/USGS-data-lifecycle-model.png
University of Virginia Library

https://data.library.virginia.edu/files/Research-Life-Cycle-LG.png
Data Curation Lifecycle
Data Curation Center (DCC)
http://www.dcc.ac.uk/resources/curation-lifecycle-model



University of Miami Libraries
http://library.miami.edu/datacuration/
Curation?
Curation?
Research Data Management
Before: Data Management Planning / Grant Process



During: Compliance and Productivity



After: Publication and/or Repository Deposit





Before: Data Management Planning / Grant Process



During: Compliance and Productivity



After: Publication and/or Repository Deposit



Privacy/Security Considerations
Storage and backup strategies
File System Organization
File Naming Conventions
File Format Choice
Documentation and metadata
Roles and responsibilities in research environment
Sharing and collaboration strategies
Ownership of data
Access strategies / Access restrictions
Before: Data Management Planning / Grant Process



During: Compliance and Productivity



After: Publication and/or Repository Deposit



Follow file naming, organization and format conventions
 Documentation and metadata
Acquiring and cleaning data
Regularly backup all data
Be mindful when sharing / version control
Access / privacy policy enforcement
Before: Data Management Planning / Grant Process



During: Compliance and Productivity



After: Publication and/or Repository Deposit



Publish
Deposit in a repository
.... break for Data Management Plan ...

What is a Data Management Plan

The Data Management Plan is a written document that describes the data you expect to acquire or collect throughout a research project, how you will collect, organize, document, and analyze the data, and finally how you will share, publish and preserve the data.




Data Management Plans

  1. Information about Data and Data Formats
  2. Metadata Content and Format
  3. Policies for access sharing and re-use
  4. Long-term storage and preservation
  5. Budget




1. Data and Data Formats

  • Data Types
  • How will they be collected
  • How will the data be processed
  • File Formats
  • Quality Control
  • Already Existing Data
  • Short-term Management




2. Metadata Content and Format

  • metadata is a description of your data for a future user (you perhaps?)
    • What does this person need to know to use the data properly?
    • Does this person need discipline specific knowledge? How much?
  • Two general kinds of metadata
    • Project level (contextual)?
    • Technical ((data level, units, headers, etc.)?
  • How will the metadata be captured
    • Notebooks (electronic?)
    • Device capture
  • What format (with justification)
    • Discipline specific standard? Other standard?
    • Machine or human readable (both?)



Metadata you already know

Human Readable
TY - BOOK
DB - /z-wcorg/
DP - http://worldcat.org
ID - 702896
LA - English
T1 - Traces on the Rhodian shore : nature and culture in Western thought from ancient times to the end of the eighteenth century
AU - Glacken, Clarence J.
PB - University of California Press
CY - Berkeley
Y1 - 1973///
SN - 0520023676 9780520023673 0520032160 9780520032163
ER -
Metadata you already know

Machine Readable

3. Policies for Access and Sharing

  • Data sharing details
    • How long, where, how accessed, rights of data collector
  • Ethical or Privacy Issues
    • Personally identifiable data, endangered species, fragile habitats
  • Ownership
    • Who owns the data, institutional or funder requirements, embargoes
  • Intended future audience
    • Who will want to use this data, why
  • Citation
    • Provide a citation, DOI, ARK




3. Policies for Re-use and Distribution

  • Permissions or restrictions
    • How can data be re-used
  • License type
    • Open Data?






4. Archival and Preservation Plans

  • What data will be preserved
    • Raw data? Data that was expensive to collect? Non-replaceable data?
  • Where will the data be preserved/archived
    • Identify the data repository that you will use
  • Are there data transformations necessary for archiving
    • What will the final format of the archived data be? Needed software for opening particular file formats?
  • Who will be responsible
    • Contact person(s)




5. Budget

  • Expected money needed for
    • Preparation of data and documentation
    • Hardware/software
    • Archival costs/rental of disk space
  • Who is going to pay these costs
    • Grant proportion
    • Other sources
    • In kind




NSF: the DMP may include
  • the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;
  • the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);
  • policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;
  • policies and provisions for re-use, re-distribution, and the production of derivatives; and
  • plans for archiving data, samples, and other research products, and for preservation of access to them.



http://www.nsf.gov/pubs/policydocs/pappguide/nsf15001/gpg_2.jsp#dmp
My Data Management Plan – a satire

Dear NSF,

I am happy to respond to your request for a 2-page Data Management Plan.

First of all, let me say how enthusiastic I am that you have embraced this new field of "large scale data analysis". Ever since I started working with large Avida data sets in 1993, then with large meteorological data sets in 1995, and then again with large sequence data sets in 1999, I have seen the need for a systematic plan to manage the data. It is nice to see NSF stepping up to the plate in such a timely manner, and I am happy to comply.

Now, as to my actual data management plan, here is how I plan to deal with research data in the future.




read more ... http://ivory.idyll.org/blog/data-management.html
Data Management Plans at UM
https://dmptool.org
https://dmponline.dcc.ac.uk/



https://portagenetwork.ca/
Evaluate a DMP
  • Describes what types of data will be captured, created or collected

  • Describes how data will be collected, captured, or created (observations, models, reuse, etc.)

  • Identifies how much data (volume) will be produced Tim's hidden text

  • Describes how the data will be made publicly available Tim's hidden text

  • Provides details on when the data will be made publicly available
Complete/
detailed
Addressed issue, but not completeDid not address
Other criteria??
adapted from: The DART Project - using data management plans as a research tool

Quick Review: Data Management Plans

  • Information about Data and Data Formats
  • Metadata Content and Format
  • Policies for access sharing and re-use
  • Long-term storage and preservation
  • Budget

BUT it always will depend on your goals and the funder’s goals



Further Reading:




DOIs and ORCIDs
  • Digital Object Identifiers (DOIs)

  • Permanent identifiers (links) to online resources
  • Provided by resolving service (https://doi.org/)
  • All repositories provide these for your data
  • UM is a member of DataCite who provides our DOIs
  • ORCID

  • https://orcid.org/
  • like a Digital Object Identifier (DOI) for people
  • the authoritative ID for researchers



Work together to connect research to researcher