The Data Management Challenge
wrangling data in the research environment

https://bit.ly/2IY8eUi
Timothy Norris, PhD
Librarian Associate Professor - Data Science
University of Miami Libraries
Institute for Data Science and Computing
tnorris@miami.edu
use your arrow keys to navigate
The Data Management Nightmare
Karen Hanson, Alisa Surkis, and Karen Yacobucci, NYU Health Sciences Libraries
https://www.youtube.com/watch?v=N2zK3sAtr-4
Resources at UM
tnorris@miami.edu

Timothy Norris

Data Scientist

(305) 284-2826 tnorris@miami.edu

criopelle@miami.edu

Cameron Riopelle

Head of Data Services

(305) 284-3257 criopelle@miami.edu

thilani.samarakoon@miami.edu

Thilani Samarakoon

Biomedical Data Librarian

(305) 243-6403 thilani.samarakoon@miami.edu

l.montas@umiami.edu

Larissa Montas

GIS Services Librarian

l.montas@umiami.edu

kbenknaan@miami.edu

Kineret Ben-Knaan

Research & Assessment Librarian

(305) 284-3077
kbenknaan@miami.edu

exn297@miami.edu

Erica Newcome

STEM Librarian

(305) 284-4509
exn297@miami.edu

<<  back to TOC

Software Carpentry at UM

Two day intensive Software Carpentry workshops that introduce basic lab skills for research computing. This includes an introduction to command line computing (bash), git and Github (version control), and approaces to programming in R or Python.

Note: There are also opportunities to train and be certified as a software carpentries instructor. If interested reach out to Tim Norris - tnorris@miami.edu.


<<  back to TOC
The Data Deluge
Data Sharing Requirements
  • NIH: October 2003
Data Management Requirements
  • NSF: January 2011
  • NEH: June 2011
The 2013 OSTP Memo: Increasing Access to Federally Funded Research
  • Federally funded research results should be made accessible to the public
  • Both peer-reviewed publications and data
The 2018 Federal Data Strategy
The 2022 OSTP Nelson Memo: Ensuring Free, Immediate, and Equitable access
  • Removes embargo periods on peer-reviewed publications
  • Both peer-reviewed publications and data
The 2023 NIH Data Management and Sharing Policy
  • Detailed policies for sharing, management, and privacy concerns
<<  back to TOC
Federal Movement Towards Open Data

Adapted from: Whitmire, Amanda L. (2014). Research Data Management Curriculum, Lecture 2: Introduction to Research Data Management.
Oregon State University Libraries. http://figshare.com/articles/GRAD521_Research_Data_Management_Lectures/1003835
<<  back to TOC
The 2013 OSTP Memo
All agencies with over $100 million in research and development expenditures.
All publications including data and academic peer reviewed articles.
  • Transparency and efficiency
  • Growth, security, value
  • Commercial re-use and innovation
Lots of sticks!!
<<  back to TOC
Agency Response (as of 2015)
Policy Coverage
Funder Link to Response Timeline to Implement Published Outputs Data
AHRQ http://www.ahrq.gov/funding/policies/publicaccess/index.html Feb 2015 (A), Oct 2015 (D) full full
ASPR http://www.phe.gov/Preparedness/planning/science/Pages/AccessPlan.aspx Oct 2015 (A, D) full full
CDC http://www.cdc.gov/od/science/docs/Final-CDC-Public-Access-Plan-Jan-2015_508-Compliant.pdf Jul 2013 (A), Oct 2015 (D) full full
DOD http://www.dtic.mil/dtic/pdf/DoD_PublicAccessPlan_Feb2015.pdf estimate fiscal year 2015 full full
DOE http://www.energy.gov/datamanagement/doe-policy-digital-research-data-management Oct 2014 (A) Oct 2015 (D) full full
DOT https://www.transportation.gov/open/official-dot-public-access-plan Jan 2016 full full
FDA http://www.fda.gov/downloads/ScienceResearch/AboutScienceResearchatFDA/UCM435418.pdf Oct 2015 (A, D) full full
IES https://ies.ed.gov/funding/researchaccess.asp
http://ies.ed.gov/funding/datasharing_implementation.asp
In effect (A, D), FY 2016 (D) full partial
NASA http://science.nasa.gov/media/medialibrary/2014/12/05 NASA_Plan_for_increasing_access_to_results_of_federally_funded_research.pdf Oct 2015 (A, D) full full
NIH http://grants.nih.gov/grants/NIH-Public-Access-Plan.pdf In effect (A, D), Dec 2015 (D) full full
NIST http://www.nist.gov/data/upload/NIST-Plan-for-Public-Access.pdf Oct. 2015 (A, D) partial full
NOAA http://docs.lib.noaa.gov/noaa_documents/NOAA_Research_Council/NOAA_PARR_Plan_v5.04.pdf FY2016, Q2 (A, D) (Jan 2016) full full
NSF http://www.nsf.gov/pubs/2015/nsf15052/nsf15052.pdf Jan 2016 (A, D) full full
SI http://public.media.smithsonianmag.com//file_upload_plugin/1f143b54-a9f9-4746-bef5-1c76151e3c7a.pdf Oct. 2015 (A, D) full full
USDA http://www.usda.gov/documents/USDA-Public-Access-Implementation-Plan.pdf Jan 2016 (A) full partial
USAID http://blog.usaid.gov/2014/10/announcing-usaids-open-data-policy/ October 1, 2014 (D) none full
USGS http://www.usgs.gov/usgs-manual/im/IM-OSQI-2015-01.html Oct 1, 2015 (A, D) full full
VA http://www.va.gov/ORO/Docs/Guidance/VA_RSCH_DATA_ACCESS_PLAN_07_23_2015.pdf October 1, 2015 (A, D) partial partial

adapted from http://bit.ly/FedOASummary <<  back to TOC
The 2018 Federal Data Strategy
  • Enterprise Data Governance
  • Access, Use, Augmentation
  • Decision Making & Accountability
  • Commercialization, Innovation, and Public Use
This is a collaborative effort that addresses the cross-agency priority (CAP) goal to Leverage Data as a Strategic Asset.
<<  back to TOC
The 2022 OSTP Nelson Memo
Both academic publications and supporting data
  • Adapting 2013 OSTP memo to FAIR principles (and COVID-19)
  • All agencies must update their public access policies
  • Removes embargo period
  • Guidelines for coordination among agencies
<<  back to TOC
What is Data?
Numbers
Words
Citations / references
Notebooks / marginalia
Specimens
Field Samples
Images
Videos / sound recording
Relationships
Models
Code

<<  back to TOC

What is Data?

“Examples of Research Data and Materials include laboratory notebooks, notes of any type, photographs, films, digital images, original biological and environmental samples, protocols, numbers, graphs, charts, numerical raw experimental results, instrumental outputs from which Research Data can be derived and other deliverables under sponsored agreements.”


<<  back to TOC
What is Data?
Definitions: The term “Research Data” in this document refers to information recorded and/or collected for research performed at or under the auspices of the University regardless of the form or the media upon which it is recorded. This term includes, but is not limited to, computer programs (code and documentation), computer databases, instrumental outputs, raw numerical results, original biological or environmental samples, photographs, digital images, films, protocols, graphs, and other deliverables produced under sponsored agreements. Research Data also includes any records related to the design, conduct or reporting of the research that would be necessary to reconstruct the reported research results. Research data can be intangible (statistics, findings, conclusions, etc.) and tangible (notebooks, printouts, etc.).
<<  back to TOC

The 2013 OSTP Memo

  • Data (from OMB Circular 110):
“Data is defined … as the digitally recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.”
<<  back to TOC
What is Data?
Numbers
Words
Citations / references
Notebooks / marginalia
Specimens
Field Samples
Images
Videos / sound recording
Relationships
Models
Code

<<  back to TOC

What is DATA Management?

Data?
Numbers
Words
Citations / references
Notebooks / marginalia
Specimens
Field Samples
Images
Videos / sound recording
Relationships
Models
Code


Data Management?
File System Organization
File Naming Conventions
Privacy/Security Considerations
File Format Choice
Documentation and metadata
Roles and responsibilities in research environment
Storage and backup strategies
Acquiring and cleaning data
Sharing and collaboration strategies
Ownership of data
Access strategies / Access restrictions
Data publication / Data citation
¿TOOLS?
¿VISUALIZATION?
<<  back to TOC
Some Useful Abstractions


“Information is not knowledge.
Knowledge is not wisdom.
Wisdom is not truth.
Truth is not beauty.
Beauty is not love.
Love is not music.
Music is THE BEST.”

― Frank Zappa  
<<  back to TOC
Data Curation Lifecycle
Data Curation Center (DCC)
<<  back to TOC
Research Data Management
Before: Data Management Planning / Grant Process



During: Compliance and Productivity



After: Publication and/or Repository Deposit





Before: Data Management Planning / Grant Process



During: Compliance and Productivity



After: Publication and/or Repository Deposit



Privacy/Security Considerations
Storage and backup strategies
File System Organization
File Naming Conventions
File Format Choice
Documentation and metadata
Roles and responsibilities in research environment
Sharing and collaboration strategies
Ownership of data
Access strategies / Access restrictions
Before: Data Management Planning / Grant Process



During: Compliance and Productivity



After: Publication and/or Repository Deposit



Follow file naming, organization and format conventions
 Documentation and metadata
Acquiring and cleaning data
Regularly backup all data
Be mindful when sharing / version control
Access / privacy policy enforcement
Before: Data Management Planning / Grant Process



During: Compliance and Productivity



After: Publication and/or Repository Deposit



Publish
Deposit in a repository
<<  back to TOC
Sensors and Data Levels

Active vs. Static:Data Storage:Example or Focus:Typical File Formats:
ACTIVERaw Data:Temperature readings over timePaper? Device-specific? .xlsx, …
Processed Data:“Cleaned,” normalized temperature data compiled in spreadsheet.xlsx, .sas, …
Analyzed Data:Temperature data with averages computed, graphs charted.xlsx, .sas, …
STATICFinalized, Published Data:Do the data support hypothesis?.csv


adapted from http://classguides.lib.uconn.edu/ <<  back to TOC

Why Manage Data

  • Productivity
    • Publishing
    • Knowledge creation
    • Career advancement



  • Compliance
    • Grant writing
    • University policy
    • Research ethics




¿SOMETHING ELSE?
<<  back to TOC
Why Manage Data
Researcher Perspective
  • Keep yourself organized
  • Track your science processes for reproducibility
  • Better control versions of data
  • Quality control your data more efficiently
  • Make backups to avoid data loss
  • Format your data for re-use (by yourself or others)
  • Be prepared: Document your data for your own recollection, accountability, and re-use (by yourself or others)
  • Prepare it to share it – gain credibility and recognition for your science efforts!
slide adapted from
<<  back to TOC
Well managed, publically accessible data is important: why?

Here are a few reasons (from the UK Data Archive):

  • Increases the impact and visibility of research
  • Promotes innovation and potential new data uses
  • Leads to new collaborations between data users and creators
  • Maximizes transparency and accountability
  • Enables scrutiny of research findings
  • Encourages improvement and validation of research methods
  • Reduces cost of duplicating data collection
  • Provides important resources for education and training



<<  back to TOC
File Formats
.?Q? – files that are compressed, often by the SQ program.
7z – 7-Zip compressed file
AAPKG – ArchestrA IDE
AAC – Advanced Audio Coding
ace – ACE compressed file
ALZ – ALZip compressed file
APK – Android package: Applications installable on Android; package format of the Alpine Linux distribution
APPX – Microsoft Application Package (.appx)
AT3 – Sony's UMD data compression
.bke – BackupEarth.com data compression
ARC – pre-Zip data compression
ARC - Nintendo U8 Archive (mostly Yaz0 compressed)
ARJ – ARJ compressed file
ASS (also SAS) – a subtitles file created by Aegisub, a video typesetting application (also a Halo game engine file)
B – (B file) Similar to .a, but less compressed.
BA – Scifer Archive (.ba), Scifer External Archive Type
big – Special file compression format used by Electronic Arts to compress the data for many of EA's games
BIN – compressed archive, can be read and used by CD-ROMs and Java, extractable by 7-zip and WINRAR
bjsn – Used to store The Escapists saves on Android.
BKF (.bkf) – Microsoft backup created by NTBackup.c
bzip2 (.bz2) –
bld – Skyscraper Simulator Building
cab – A cabinet (.cab) file is a library of compressed files stored as one file. Cabinet files are used to organize installation files that are copied to the user's system.[2]
c4 – JEDMICS image files, a DOD system
cals – JEDMICS image files, a DOD system
CLIPFLAIR (.clipflair, .clipflair.zip) – ClipFlair Studio ClipFlair component saved state file (contains component options in XML, extra/attached files and nested components' state in child .clipflair.zip files – activities are also components and can be nested at any depth)
CPT, SEA – Compact Pro (Macintosh)
DAA – Closed-format, Windows-only compressed disk image
deb – Debian install package
DMG – an Apple compressed/encrypted format
DDZ – a file which can only be used by the "daydreamer engine" created by "fever-dreamer", a program similar to RAGS, it's mainly used to make somewhat short games.
DN – Adobe Dimension CC file format
DPE – Package of AVE documents made with Aquafadas digital publishing tools.
.egg – Alzip Egg Edition compressed file
EGT (.egt) – EGT Universal Document also used to create compressed cabinet files replaces .ecab
ECAB (.ECAB, .ezip) – EGT Compressed Folder used in advanced systems to compress entire system folders, replaced by EGT Universal Document
ESD – Electronic Software Distribution, a compressed and encrypted WIM File
ESS (.ess) – EGT SmartSense File, detects files compressed using the EGT compression system.
Flipchart file (.flipchart) – Used in Promethean ActivInspire Flipchart Software.
GBP – GBP File Extension – What is a .gbp file and how do I open it? 2 types of files: 1. An archive index file that is created by Genie Timeline [2]. It contains references to the files that the user has chosen to backup; the references can be to an archive file or a batch of files. This files can be opened using Genie-Soft Genie Timeline on Windows. 2. A data output file created by CAD Printed Circuit Board (PCB). This type of file can be opened on Windows using Autodesk EAGLE EAGLE | PCB Design Software | Autodesk, Altium Designer [3], Viewplot Welcome to Viewplot.com ...For PCB Related Software;...Viewplot The Gerber Viewer & editor in one......PCB Elegance a professional layout package for a affordable price, Gerbv gerbv – A Free/Open Source Gerber Viewer on Mac using Autodesk EAGLE, Gerbv, gEDA gplEDA Homepage and on Linux using Autodesk EAGLE, gEDA, Gerbv
GHO (.gho, .ghs) – Norton Ghost
GIF (.gif) – Graphics Interchange Format
gzip (.gz) – Compressed file
HTML (.html) HTML code file
IPG (.ipg) – Format in which Apple Inc. packages their iPod games. can be extracted through Winrar
jar – ZIP file with manifest for use with Java applications.
LBR (.Lawrence) – Lawrence Compiler Type file
LBR – Library file
LQR – LBR Library file compressed by the SQ program.
LHA (.lzh) – Lempel, Ziv, Huffman
lzip (.lz) – Compressed file
lzo
lzma – Lempel–Ziv–Markov chain algorithm compressed file
LZX
MBW (.mbw) – MBRWizard archive
MHTML – Mine HTML (Hyper-Text Markup Language) code file
MPQ Archives (.mpq) – Used by Blizzard Entertainment
BIN (.bin) – MacBinary
NTH (.nth) – Nokia Theme Used by Nokia Series 40 Cellphones
OAR (.oar) – OAR archive
OSK - Compressed osu! skin archive
OSZ – Compressed osu! beatmap archive
PAK – Enhanced type of .ARC archive
PAR (.par, .par2) – Parchive
PAF (.paf) – Portable Application File
PEA (.pea) – PeaZip archive file
PHP (.php) – PHP code file
PYK (.pyk) – Compressed file
PK3 (.pk3) – Quake 3 archive (See note on Doom³)
PK4 (.pk4) – Doom³ archive (Opens similarly to a zip archive.)
py / pyw – Python code file
RAR (.rar) – Rar Archive, for multiple file archive (rar to .r01-.r99 to s01 and so on)
RAG, RAGS – Game file, a game playable in the RAGS game-engine, a free program which both allows people to create games, and play games, games created have the format "RAG game file"
RaX – Archive file created by RaX
RPM – Red Hat package/installer for Fedora, RHEL, and similar systems.
sb – Scratch file
sb2 – Scratch 2.0 file
sb3 - Scratch 3.0 file
SEN – Scifer Archive (.sen) – Scifer Internal Archive Type
SIT (.sitx) – StuffIt (Macintosh)
SIS/SISX – Symbian Application Package
SKB – Google SketchUp backup File
SQ (.sq) – Squish Compressed Archive
SWM – Splitted WIM File, usually found on OEM Recovery Partition to store preinstalled Windows image, and to make Recovery backup (to USB Drive) easier (due to FAT32 limitations)
SZS – Nintendo Yaz0 Compressed Archive
TAR – group of files, packaged as one file
TGZ (.tar.gz) – gzipped tar file
TB (.tb) – Tabbery Virtual Desktop Tab file
TIB (.tib) – Acronis True Image backup
UHA – Ultra High Archive Compression
UUE (.uue) – unified utility engine – the generic and default format for all things UUe-related.
VIV – Archive format used to compress data for several video games, including Need For Speed: High Stakes.
VOL – video game data package.
VSA – Altiris Virtual Software Archive
WAX – Wavexpress – A ZIP alternative optimized for packages containing video, allowing multiple packaged files to be all-or-none delivered with near-instantaneous unpacking via NTFS file system manipulation.
WIM – A compressed disk image for installing Windows Vista or higher, Windows Fundamentals for Legacy PC, or restoring a system image made from Backup and Restore (Windows Vista/7)
XAP – Windows Phone Application Package
xz – xz compressed files, based on LZMA/LZMA2 algorithm
Z – Unix compress file
zoo – based on LZW
zip – popular compression format
Physical recordable media archiving[edit]
ISO – The generic format for most optical media, including CD-ROM, DVD-ROM, Blu-ray Disc, HD DVD and UMD.
NRG – The proprietary optical media archive format used by Nero applications.
IMG – For archiving DOS formatted floppy disks, larger optical media, and hard disk drives.
ADF – Amiga Disk Format, for archiving Amiga floppy disks
ADZ – The GZip-compressed version of ADF.
DMS – Disk Masher System, a disk-archiving system native to the Amiga.
DSK – For archiving floppy disks from a number of other platforms, including the ZX Spectrum and Amstrad CPC.
D64 – An archive of a Commodore 64 floppy disk.
SDI – System Deployment Image, used for archiving and providing "virtual disk" functionality.
MDS – DAEMON tools native disc image format used for making images from optical CD-ROM, DVD-ROM, HD DVD or Blu-ray Disc. It comes together with MDF file and can be mounted with DAEMON Tools.
MDX – New DAEMON Tools format that allows getting one MDX disc image file instead of two (MDF and MDS).
DMG – Macintosh disk image files
(MPEG-1 is found in a .DAT file on a video CD.)

CDI – DiscJuggler image file
CUE – CDRWrite CUE image file
CIF – Easy CD Creator .cif format
C2D – Roxio-WinOnCD .c2d format
DAA – PowerISO .daa format
B6T – BlindWrite 5/6 image file
Ceramics glaze recipes[edit]
File formats for software, databases, and websites used by potters and ceramic artists to manage glaze recipes, glaze chemistry, etc.

GlazeChem text format INSIGHT Live, OnLine INSIGHT
GlazeMaster .tab xml (GlazeMaster software)GlazeMaster™ | Welcome to masteringglazes.com | John HesselberthCeramic Recipes FAQ | Ceramic RecipesINSIGHT Live, OnLine INSIGHT
HyperGlaze .hgz (HyperGlaze software) HyperGlaze – glaze software for artists[4]INSIGHT Live, OnLine INSIGHT
Insight .xml (DigitalFire Insight software)Wayback Machine[5]
Insight .rcp (deprecated, DigitalFire Insight software)Wayback Machine
Insight .rcx (deprecated, DigitalFire Insight software)Wayback Machine
Matrix Matrix Glaze SoftwareINSIGHT Live, OnLine INSIGHT
Computer-aided design[edit]
Computer-aided is a prefix for several categories of tools (e.g., design, manufacture, engineering) which assist professionals in their respective fields (e.g., machining, architecture, schematics).

Computer-aided design (CAD)[edit]
Computer-aided design (CAD) software assists engineers, architects and other design professionals in project design.

3DXML – Dassault Systemes graphic representation
3MF – Microsoft 3D Manufacturing Format[3]
ACP – VA Software VA – Virtual Architecture CAD file
AMF – Additive Manufacturing File Format
AEC – DataCAD drawing format[4]
AR – Ashlar-Vellum Argon – 3D Modeling
ART – ArtCAM model
ASC – BRL-CAD Geometry File (old ASCII format)
ASM – Solidedge Assembly, Pro/ENGINEER Assembly
BIN, BIM – Data Design System DDS-CAD
BREP – Open CASCADE 3D model (shape)
C3D – C3D Toolkit File Format
CCC – CopyCAD Curves
CCM – CopyCAD Model
CCS – CopyCAD Session
CAD – CadStd
CATDrawing – CATIA V5 Drawing document
CATPart – CATIA V5 Part document
CATProduct – CATIA V5 Assembly document
CATProcess – CATIA V5 Manufacturing document
cgr – CATIA V5 graphic representation file
ckd – KeyCreator CAD Modeling
ckt – KeyCreator CAD Modeling
CO – Ashlar-Vellum Cobalt – parametric drafting and 3D modeling
DRW – Caddie Early version of Caddie drawing – Prior to Caddie changing to DWG
DFT – Solidedge Draft
DGN – MicroStation design file
DGK – Delcam Geometry
DMT – Delcam Machining Triangles
DXF – ASCII Drawing Interchange file format, AutoCAD
DWB – VariCAD drawing file
DWF – Autodesk's Web Design Format; AutoCAD & Revit can publish to this format; similar in concept to PDF files; Autodesk Design Review is the reader
DWG – Popular file format for Computer Aided Drafting applications, notably AutoCAD, Open Design Alliance applications, and Autodesk Inventor Drawing files
EASM – SolidWorks eDrawings assembly file
EDRW – eDrawings drawing file
EMB – Wilcom ES Designer Embroidery CAD file
EPRT – eDrawings part file
EscPcb – "esCAD pcb" data file by Electro-System (Japan)
EscSch – "esCAD sch" data file by Electro-System (Japan)
ESW – AGTEK format
EXCELLON – Excellon file
EXP – Drawing Express format
F3D – Autodesk Fusion 360 archive file[5]
FCStd – Native file format of FreeCAD CAD/CAM package
FM – FeatureCAM Part File
FMZ – FormZ Project file
G – BRL-CAD Geometry File
GBR – Gerber file
GLM – KernelCAD model
GRB – T-FLEX CAD File
GTC – GRAITEC Advance format
IAM – Autodesk Inventor Assembly file
ICD – IronCAD 2D CAD file
IDW – Autodesk Inventor Drawing file
IFC – buildingSMART for sharing AEC and FM data
IGES – Initial Graphics Exchange Specification
Intergraph Standard File Formats – Intergraph
IPN – Autodesk Inventor Presentation file
IPT – Autodesk Inventor Part file
JT – Jupiter Tesselation
MCD – Monu-CAD (Monument/Headstone Drawing file)
MDG – Model of Digital Geometric Kernel
model – CATIA V4 part document
OCD – Orienteering Computer Aided Design (OCAD) file
PAR – Solidedge Part
PIPE – PIPE-FLO Professional Piping system design file
PLN – ArchiCad project
PRT – NX (recently known as Unigraphics), Pro/ENGINEER Part, CADKEY Part
PSM – Solidedge Sheet
PSMODEL – PowerSHAPE Model
PWI – PowerINSPECT File
PYT – Pythagoras File
SKP – SketchUp Model
RLF – ArtCAM Relief
RVM – AVEVA PDMS 3D Review model
RVT – Autodesk Revit project files
RFA – Autodesk Revit family files
S12 – Spirit file, by Softtech
SCAD – OpenSCAD 3D part model
SCDOC – SpaceClaim 3D Part/Assembly
SLDASM – SolidWorks Assembly drawing
SLDDRW – SolidWorks 2D drawing
SLDPRT – SolidWorks 3D part model
dotXSI – For Softimage
STEP – Standard for the Exchange of Product model data
STL – Stereo Lithographic data format used by various CAD systems and stereo lithographic printing machines.
STD – Power Vision Plus – Electricity Meter Data (Circutor)
TCT – TurboCAD drawing template
TCW – TurboCAD for Windows 2D and 3D drawing
UNV – I-DEAS I-DEAS (Integrated Design and Engineering Analysis Software)
VC6 – Ashlar-Vellum Graphite – 2D and 3D drafting
VLM – Ashlar-Vellum Vellum, Vellum 2D, Vellum Draft, Vellum 3D, DrawingBoard
VS – Ashlar-Vellum Vellum Solids
WRL – Similar to STL, but includes color. Used by various CAD systems and 3D printing rapid prototyping machines. Also used for VRML models on the web.
X_B – Parasolids binary format
X_T – Parasolids
XE – Ashlar-Vellum Xenon – for associative 3D modeling
ZOFZPROJ – ZofzPCB 3D PCB model, containing mesh, netlist and BOM
Electronic design automation (EDA)[edit]
Electronic design automation (EDA), or electronic computer-aided design (ECAD), is specific to the field of electrical engineering.

BRD – Board file for EAGLE Layout Editor, a commercial PCB design tool
BSDL – Description language for testing through JTAG
CDL – Transistor-level netlist format for IC design
CPF – Power-domain specification in system-on-a-chip (SoC) implementation (see also UPF)
DEF – Gate-level layout
DSPF – Detailed Standard Parasitic Format, Analog-level parasitics of interconnections in IC design
EDIF – Vendor neutral gate-level netlist format
FSDB – Analog waveform format (see also Waveform viewer)
GDSII – Format for PCB and layout of integrated circuits
HEX – ASCII-coded binary format for memory dumps
LEF – Library Exchange Format, physical abstract of cells for IC design
LIB – Library modeling (function, timing) format
MS12 – NI Multisim file
OASIS – Open Artwork System Interchange Standard
OpenAccess – Design database format with APIs
PSF – Cadence proprietary format to store simulation results/waveforms (2GB limit)
PSFXL – Cadence proprietary format to store simulation results/waveforms
SDC – Synopsys Design Constraints, format for synthesis constraints
SDF – Standard for gate-level timings
SPEF – Standard format for parasitics of interconnections in IC design
SPI, CIR – SPICE Netlist, device-level netlist and commands for simulation
SREC, S19 – S-record, ASCII-coded format for memory dumps
SST2 – Cadence proprietary format to store mixed-signal simulation results/waveforms
STIL – Standard Test Interface Language, IEEE1450-1999 standard for Test Patterns for IC
SV – SystemVerilog source file
S*P – Touchstone/EEsof Scattering parameter data file – multi-port blackbox performance, measurement or simulated
TLF – Contains timing and logical information about a collection of cells (circuit elements)
UPF – Standard for Power-domain specification in SoC implementation
V – Verilog source file
VCD – Standard format for digital simulation waveform
VHD, VHDL – VHDL source file
WGL – Waveform Generation Language, format for Test Patterns for IC
Test technology[edit]
Files output from Automatic Test Equipment or post-processed from such.

Standard Test Data Format
Database[edit]
4DB – 4D database Structure file
4DD – 4D database Data file
4DIndy – 4D database Structure Index file
4DIndx – 4D database Data Index file
4DR – 4D database Data resource file (in old 4D versions)
ACCDB – Microsoft Database (Microsoft Office Access 2007 and later)
ACCDE – Compiled Microsoft Database (Microsoft Office Access 2007 and later)
ADT – Sybase Advantage Database Server (ADS)
APR – Lotus Approach data entry & reports
BOX – Lotus Notes Post Office mail routing database
CHML – Krasbit Technologies Encrypted database file for 1 click integration between contact management software and the chameleon(tm) line of imaging workflow solutions
DAF – Digital Anchor data file
DAT – DOS Basic
DAT – Intersystems Caché database file
DB – Paradox
DB – SQLite
DBF – db/dbase II,III,IV and V, Clipper, Harbour/xHarbour, Fox/FoxPro, Oracle
DTA – Sage Sterling database file
EGT – EGT Universal Document, used to compress sql databases to smaller files, may contain original EGT database style.
ESS – EGT SmartSense is a database of files and its compression style. Specific to EGT SmartSense
EAP – Enterprise Architect Project
FDB – Firebird Databases
FDB – Navision database file
FP, FP3, FP5, and FP7 – FileMaker Pro
FRM – MySQL table definition
GDB – Borland InterBase Databases
GTABLE – Google Drive Fusion Table
KEXI – Kexi database file (SQLite-based)
KEXIC – shortcut to a database connection for a Kexi databases on a server
KEXIS – shortcut to a Kexi database
LDB – Temporary database file, only existing when database is open
LIRS - Layered Intager Storage. Stores intageres with characters such as semicolons to create lists of data.
MDA – Add-in file for Microsoft Access
MDB – Microsoft Access database
ADP – Microsoft Access project (used for accessing databases on a server)
MDE – Compiled Microsoft Database (Access)
MDF – Microsoft SQL Server Database
MYD – MySQL MyISAM table data
MYI – MySQL MyISAM table index
NCF – Lotus Notes configuration file
NSF – Lotus Notes database
NTF – Lotus Notes database design template
NV2 – QW Page NewViews object oriented accounting database
ODB – LibreOffice Base or OpenOffice Base database
ORA – Oracle tablespace files sometimes get this extension (also used for configuration files)
PCONTACT – WinIM Contact file
PDB – Palm OS Database
PDI – Portable Database Image
PDX – Corel Paradox database management
PRC – Palm OS resource database
SQL – bundled SQL queries
REC – GNU recutils database
REL – Sage Retrieve 4GL data file
RIN – Sage Retrieve 4GL index file
SDB – StarOffice's StarBase
SDF – SQL Compact Database file
sqlite – SQLite
UDL – Universal Data Link
waData – Wakanda (software) database Data file
waIndx – Wakanda (software) database Index file
waModel – Wakanda (software) database Model file
waJournal – Wakanda (software) database Journal file
WDB – Microsoft Works Database
WMDB – Windows Media Database file – The CurrentDatabase_360.wmdb file can contain file name, file properties, music, video, photo and playlist information.
Desktop publishing[edit]
AI – Adobe Illustrator
AVE / ZAVE – Aquafadas
CDR – CorelDRAW
CHP / pub / STY / CAP / CIF / VGR / FRM – Ventura Publisher – Xerox (DOS / GEM)
CPT – Corel Photo-Paint
DTP – Greenstreet Publisher, GST PressWorks
FM – Adobe FrameMaker
GDRAW – Google Drive Drawing
ILDOC – Broadvision Quicksilver document
INDD – Adobe InDesign
MCF – FotoInsight Designer
PDF – Adobe Acrobat or Adobe Reader
PMD – Adobe PageMaker
PPP – Serif PagePlus
PSD – Adobe Photoshop
PUB – Microsoft Publisher
QXD – QuarkXPress
SLA / SCD – Scribus
XCF – File format used by the GIMP, as well as other programs
Document[edit]
These files store formatted text and plain text.

0 – Plain Text Document, normally used for licensing
1ST – Plain Text Document, normally preceded by the words "README" (README.1ST)
600 – Plain Text Document, used in UNZIP history log
602 – Text602 document
ABW – AbiWord document
ACL – MS Word AutoCorrect List
AFP – Advanced Function Presentation – IBc
AMI – Lotus Ami Pro
Amigaguide
ANS – American National Standards Institute (ANSI) text
ASC – ASCII text
AWW – Ability Write
CCF – Color Chat 1.0
CSV – ASCII text as comma-separated values, used in spreadsheets and database management systems
CWK – ClarisWorks-AppleWorks document
DBK – DocBook XML sub-format
DITA – Darwin Information Typing Architecture document
DOC – Microsoft Word document
DOCM – Microsoft Word macro-enabled document
DOCX – Office Open XML document
DOT – Microsoft Word document template
DOTX – Office Open XML text document template
DWD – DavkaWriter Heb/Eng word processor file
EGT – EGT Universal Document
EPUB – EPUB open standard for e-books
EZW – Reagency Systems easyOFFER document[6]
FDX – Final Draft
FTM – Fielded Text Meta
FTX – Fielded Text (Declared)
GDOC – Google Drive Document
HTML – HyperText Markup Language (.html, .htm)
HWP – Haansoft (Hancom) Hangul Word Processor document
HWPML – Haansoft (Hancom) Hangul Word Processor Markup Language document
LOG – Text log file
LWP – Lotus Word Pro
MBP – metadata for Mobipocket documents
MD – Markdown text document
ME – Plain text document normally preceded by the word "READ" (READ.ME)
MCW – Microsoft Word for Macintosh (versions 4.0–5.1)
Mobi – Mobipocket documents
NB – Mathematica Notebook
nb – Nota Bene Document (Academic Writing Software)
NBP – Mathematica Player Notebook
NEIS – 학교생활기록부 작성 프로그램 (Student Record Writing Program) Document
ODM – OpenDocument master document
ODOC – Synology Drive Office Document
ODT – OpenDocument text document
OSHEET – Synology Drive Office Spreadsheet
OTT – OpenDocument text document template
OMM – OmmWriter text document
PAGES – Apple Pages document
PAP – Papyrus word processor document
PDAX – Portable Document Archive (PDA) document index file
PDF – Portable Document Format
QUOX – Question Object File Format for Quobject Designer or Quobject Explorer
Radix-64
RTF – Rich Text document
RPT – Crystal Reports
SDW – StarWriter text document, used in earlier versions of StarOffice
SE – Shuttle Document
STW – OpenOffice.org XML (obsolete) text document template
Sxw – OpenOffice.org XML (obsolete) text document
TeX – TeX
INFO – Texinfo
Troff
TXT – ASCII or Unicode plain text file
UOF – Uniform Office Format
UOML – Unique Object Markup Language
VIA – Revoware VIA Document Project File
WPD – WordPerfect document
WPS – Microsoft Works document
WPT – Microsoft Works document template
WRD – WordIt! document
WRF – ThinkFree Write
WRI – Microsoft Write document
XHTML (xhtml, xht) – eXtensible HyperText Markup Language
XML – eXtensible Markup Language
XPS – Open XML Paper Specification
Financial records[edit]
MYO – MYOB Limited (Windows) File
MYOB – MYOB Limited (Mac) File
TAX – TurboTax File
YNAB – You Need a Budget (YNAB) File
Financial data transfer formats[edit]
Interactive Financial Exchange (IFX) – XML-based specification for various forms of financial transactions
Open Financial Exchange (.ofx) – open standard supported by CheckFree and Microsoft and partly by Intuit; SGML and later XML based
QFX – proprietary pay-only format used only by Intuit
Quicken Interchange Format (.qif) – open standard formerly supported by Intuit
Font file[edit]
ABF – Adobe Binary Screen Font
AFM – Adobe Font Metrics
BDF – Bitmap Distribution Format
BMF – ByteMap Font Format
BRFNT - Binary Revolution Font Format
FNT – Bitmapped Font – Graphics Environment Manager (GEM)
FON – Bitmapped Font – Microsoft Windows
MGF – MicroGrafx Font
OTF – OpenType Font
PCF – Portable Compiled Format
PostScript Font – Type 1, Type 2
PFA – Printer Font ASCII
PFB – Printer Font Binary – Adobe
PFM – Printer Font Metrics – Adobe
AFM – Adobe Font Metrics
FOND – Font Description resource – Mac OS
SFD – FontForge spline font database Font
SNF – Server Normal Format
TDF – TheDraw Font
TFM – TeX font metric
TTF (.ttf, .ttc) – TrueType Font
UFO – Unified Font Object is a cross-platform, cross-application, human readable, future proof format for storing font data.
WOFF – Web Open Font Format
Geographic information system[edit]
ASC – ASCII point of interest (POI) text file
APR – ESRI ArcView 3.3 and earlier project file
DEM – USGS DEM file format
E00 – ARC/INFO interchange file format
GeoJSON –Geographically located data in object notation
GeoTIFF – Geographically located raster data
GML – Geography Markup Language file[7]
GPX – XML-based interchange format
ITN – TomTom Itinerary format
MXD – ESRI ArcGIS project file, 8.0 and higher
NTF – National Transfer Format file
OV2 – TomTom POI overlay file
SHP – ESRI shapefile
TAB – MapInfo Table file format
World TIFF – Geographically located raster data: text file giving corner coordinate, raster cells per unit, and rotation
DTED – Digital Terrain Elevation Data
KML – Keyhole Markup Language, XML-based
Graphical information organizers[edit]
3DT – 3D Topicscape, the database in which the meta-data of a 3D Topicscape is held, it is a form of 3D concept map (like a 3D mind-map) used to organize ideas, information, and computer files
ATY – 3D Topicscape file, produced when an association type is exported; used to permit round-trip (export Topicscape, change files and folders as desired, re-import to 3D Topicscape)
CAG – Linear Reference System
FES – 3D Topicscape file, produced when a fileless occurrence in 3D Topicscape is exported to Windows. Used to permit round-trip (export Topicscape, change files and folders as desired, re-import them to 3D Topicscape)
MGMF – MindGenius Mind Mapping Software file format
MM – FreeMind mind map file (XML)
MMP – Mind Manager mind map file
TPC – 3D Topicscape file, produced when an inter-Topicscape topic link file is exported to Windows; used to permit round-trip (export Topicscape, change files and folders as desired, re-import to 3D Topicscape)
Graphics[edit]
Main articles: Image file formats and Comparison of graphics file formats
Color palettes[edit]
ACT – Adobe Color Table. Contains a raw color palette and consists of 256 24-bit RGB colour values.
ASE – Adobe Swatch Exchange. Used by Adobe Photoshop, Illustrator, and InDesign.[8]
GPL – GIMP palette file. Uses a text representation of color names and RGB values. Various open source graphical editors can read this format,[9] including GIMP, Inkscape, Krita,[10] KolourPaint, Scribus, CinePaint, and MyPaint.[11]
PAL – Microsoft RIFF palette file
Color management[edit]
ICC/ICM – Color profile conforming the specification of the ICC.
Raster graphics[edit]
Raster or bitmap files store images as a group of pixels.

ART – America Online proprietary format
BLP – Blizzard Entertainment proprietary texture format
BMP – Microsoft Windows Bitmap formatted image
BTI – Nintendo proprietary texture format
CD5 – Chasys Draw IES image
CIT – Intergraph is a monochrome bitmap format
CPT – Corel PHOTO-PAINT image
CR2 – Canon camera raw format; photos have this on some Canon cameras if the quality RAW is selected in camera settings
CLIP – CLIP STUDIO PAINT format
CPL – Windows control panel file
DDS – DirectX texture file
DIB – Device-Independent Bitmap graphic
DjVu – DjVu for scanned documents
EGT – EGT Universal Document, used in EGT SmartSense to compress PNG files to yet a smaller file
Exif – Exchangeable image file format (Exif) is a specification for the image format used by digital cameras
GIF – CompuServe's Graphics Interchange Format
GRF – Zebra Technologies proprietary format
ICNS – format for icons in macOS. Contains bitmap images at multiple resolutions and bitdepths with alpha channel.
ICO – a format used for icons in Microsoft Windows. Contains small bitmap images at multiple resolutions and bitdepths with 1-bit transparency or alpha channel.
IFF (.iff, .ilbm, .lbm) – ILBM
JNG – a single-frame MNG using JPEG compression and possibly an alpha channel
JPEG, JFIF (.jpg or .jpeg) – Joint Photographic Experts Group; a lossy image format widely used to display photographic images
JP2 – JPEG2000
JPS – JPEG Stereo
LBM – Deluxe Paint image file
MAX – ScanSoft PaperPort document
MIFF – ImageMagick's native file format
MNG – Multiple-image Network Graphics, the animated version of PNG
MSP – a format used by old versions of Microsoft Paint; replaced by BMP in Microsoft Windows 3.0
NITF – A U.S. Government standard commonly used in Intelligence systems
OTB – Over The Air bitmap, a specification designed by Nokia for black and white images for mobile phones
PBM – Portable bitmap
PC1 – Low resolution, compressed Degas picture file
PC2 – Medium resolution, compressed Degas picture file
PC3 – High resolution, compressed Degas picture file
PCF – Pixel Coordination Format
PCX – a lossless format used by ZSoft's PC Paint, popular for a time on DOS systems.
PDN – Paint.NET image file
PGM – Portable graymap
PI1 – Low resolution, uncompressed Degas picture file
PI2 – Medium resolution, uncompressed Degas picture file; also Portrait Innovations encrypted image format
PI3 – High resolution, uncompressed Degas picture file
PICT, PCT – Apple Macintosh PICT image
PNG – Portable Network Graphic (lossless, recommended for display and edition of graphic images)
PNM – Portable anymap graphic bitmap image
PNS – PNG Stereo
PPM – Portable Pixmap (Pixel Map) image
PSB – Adobe Photoshop Big image file (for large files)
PSD, PDD – Adobe Photoshop Drawing
PSP – Paint Shop Pro image
PX – Pixel image editor image file
PXM – Pixelmator image file
PXR – Pixar Image Computer image file
QFX – QuickLink Fax image
RAW – General term for minimally processed image data (acquired by a digital camera)
RLE – a run-length encoding image
SCT – Scitex Continuous Tone image file
SGI, RGB, INT, BW – Silicon Graphics Image
TGA (.tga, .targa, .icb, .vda, .vst, .pix) – Truevision TGA (Targa) image
TIFF (.tif or .tiff) – Tagged Image File Format (usually lossless, but many variants exist, including lossy ones)
TIFF/EP (.tif or .tiff) – Tag Image File Format / Electronic Photography, ISO 12234-2; tends to be used as a basis for other formats rather than in its own right.
VTF – Valve Texture Format
XBM – X Window System Bitmap
XCF – GIMP image (from Gimp's origin at the eXperimental Computing Facility of the University of California)
XPM – X Window System Pixmap
ZIF – Zoomable/Zoomify Image Format (a web-friendly, TIFF-based, zoomable image format)
Vector graphics[edit]
Vector graphics use geometric primitives such as points, lines, curves, and polygons to represent images.

3DV – 3-D wireframe graphics by Oscar Garcia
AMF – Additive Manufacturing File Format
AWG – Ability Draw
AI – Adobe Illustrator Document
CGM – Computer Graphics Metafile, an ISO Standard
CDR – CorelDRAW Document
CMX – CorelDRAW vector image
DP – Drawing Program file for PERQ [12]
DXF – ASCII Drawing Interchange file Format, used in AutoCAD and other CAD-programs
E2D – 2-dimensional vector graphics used by the editor which is included in JFire
EGT – EGT Universal Document, EGT Vector Draw images are used to draw vector to a website
EPS – Encapsulated Postscript
FS – FlexiPro file
GBR – Gerber file
ODG – OpenDocument Drawing
MOVIE.BYU
RenderMan
SVG – Scalable Vector Graphics, employs XML
Scene description languages (3D vector image formats)
STL – Stereo Lithographic data format (see STL (file format)) used by various CAD systems and stereo lithographic printing machines. See above.
VRML Uses .wrl extension – Virtual Reality Modeling Language, for the creation of 3D viewable web images.
X3D
SXD – OpenOffice.org XML (obsolete) Drawing
TGAX - Texture format used by Zwift
V2D – voucher design used by the voucher management included in JFire
VDOC – Vector format used in AnyCut, CutStorm, DrawCut, DragonCut, FutureDRAW, MasterCut, SignMaster, VinylMaster software by Future Corporation
VSD – Vector format used by Microsoft Visio
VSDX – Vector format used by MS Visio and opened by VSDX Annotator
VND – Vision numeric Drawing file used in TypeEdit, Gravostyle.
WMF – Windows Meta File
EMF – Enhanced (Windows) MetaFile, an extension to WMF
ART – Xara – Drawing (superseded by XAR)
XAR – Xara – Drawing
3D graphics[edit]
See also: 3D file format at EduTech Wiki
3D graphics are 3D models that allow building models in real-time or non-real-time 3D rendering.

3DMF – QuickDraw 3D Metafile (.3dmf)
3DM – OpenNURBS Initiative 3D Model (used by Rhinoceros 3D) (.3dm)
3MF – Microsoft 3D Manufacturing Format (.3mf)[3]
3DS – legacy 3D Studio Model (.3ds)
ABC – Alembic (computer graphics)
AC – AC3D Model (.ac)
AMF – Additive Manufacturing File Format
AN8 – Anim8or Model (.an8)
AOI – Art of Illusion Model (.aoi)
ASM – PTC Creo assembly (.asm)
B3D – Blitz3D Model (.b3d)
BLEND – Blender (.blend)
BLOCK – Blender encrypted blend files (.block)
BMD3 – Nintendo GameCube first-party J3D proprietary model format (.bmd)
BDL4 – Nintendo GameCube and Wii first-party J3D proprietary model format (2002, 2006–2010) (.bdl)
BRRES – Nintendo Wii first-party proprietary model format 2010+ (.brres)
BFRES – Nintendo Wii U and later Switch first-party proprietary model format
C4D – Cinema 4D (.c4d)
Cal3D – Cal3D (.cal3d)
CCP4 – X-ray crystallography voxels (electron density)
CFL – Compressed File Library (.cfl)
COB – Caligari Object (.cob)
CORE3D – Coreona 3D Coreona 3D Virtual File(.core3d)
CTM – OpenCTM (.ctm)
DAE – COLLADA (.dae)
DFF – RenderWare binary stream, commonly used by Grand Theft Auto III-era games as well as other RenderWare titles
DPM – deepMesh (.dpm)
DTS – Torque Game Engine (.dts)
EGG – Panda3D Engine
FACT – Electric Image (.fac)
FBX – Autodesk FBX (.fbx)
G – BRL-CAD geometry (.g)
GLB – a binary form of glTF required to be loaded in Facebook 3D Posts. (.glb)
GLM – Ghoul Mesh (.glm)
glTF – the JSON standard developed by Khronos Group (.gltf)
IOB – Imagine (3D modeling software) (.iob)
JAS – Cheetah 3D file (.jas)
LWO – Lightwave Object (.lwo)
LWS – Lightwave Scene (.lws)
LXF – LEGO Digital Designer Model file (.lxf)
LXO – Luxology Modo (software) file (.lxo)
MA – Autodesk Maya ASCII File (.ma)
MAX – Autodesk 3D Studio Max file (.max)
MB – Autodesk Maya Binary File (.mb)
MD2 – Quake 2 model format (.md2)
MD3 – Quake 3 model format (.md3)
MD5 – Doom 3 model format (.md5)
MDX – Blizzard Entertainment's own model format (.mdx)
MESH – New York University(.m)
MESH – Meshwork Model (.mesh)
MM3D – Misfit Model 3d (.mm3d)
MPO – Multi-Picture Object – This JPEG standard is used for 3d images, as with the Nintendo 3DS
MRC – voxels in cryo-electron microscopy
NIF – Gamebryo NetImmerse File (.nif)
OBJ – Wavefront .obj file (.obj)
OFF – OFF Object file format (.off)
OGEX – Open Game Engine Exchange (OpenGEX) format (.ogex)
PLY – Polygon File Format / Stanford Triangle Format (.ply)
PRC – Adobe PRC (embedded in PDF files)
PRT – PTC Creo part (.prt)
POV – POV-Ray document (.pov)
R3D – Realsoft 3D (Real-3D) (.r3d)
RWX – RenderWare Object (.rwx)
SIA – Nevercenter Silo Object (.sia)
SIB – Nevercenter Silo Object (.sib)
SKP – Google Sketchup file (.skp)
SLDASM – SolidWorks Assembly Document (.sldasm)
SLDPRT – SolidWorks Part Document (.sldprt)
SMD – Valve Studiomdl Data format (.smd)
U3D – Universal 3D format (.u3d)
USD – Universal Scene Description (.usd)
USDA – Universal Scene Description , Human-readable text format (.usda)
USDC – Universal Scene Description , Binary format (.usdc)
USDZ – Universal Scene Description Zip (.usdz)
VIM – Revizto visual information model format (.vimproj)
VRML97 – VRML Virtual reality modeling language (.wrl)
VUE – Vue scene file (.vue)
VWX – Vectorworks (.vwx)
WINGS – Wings3D (.wings)
W3D – Westwood 3D Model (.w3d)
X – DirectX 3D Model (.x)
X3D – Extensible 3D (.x3d)
Z3D – Zmodeler (.z3d)
Links and shortcuts[edit]
Alias (Mac OS)
JNLP – Java Network Launching Protocol, an XML file used by Java Web Start for starting Java applets over the Internet
LNK – binary-format file shortcut in Microsoft Windows 95 and later
APPREF-MS – File shortcut format used by ClickOnce
URL – INI file pointing to a URL bookmarks/Internet shortcut in Microsoft Windows
WEBLOC – Property list file pointing to a URL bookmarks/Internet shortcut in macOS
SYM – Symbolic link
.desktop – Desktop entry on Linux Desktop environments
Mathematical[edit]
Harwell-Boeing file format – a format designed to store sparse matrices
MML – MathML – Mathematical Markup Language
ODF – OpenDocument Math Formula
SXM – OpenOffice.org XML (obsolete) Math Formula
Object code, executable files, shared and dynamically linked libraries[edit]
.8BF files – plugins for some photo editing programs including Adobe Photoshop, Paint Shop Pro, GIMP and Helicon Filter.
.a – Objective C native static library
a.out – (no suffix for executable image, .o for object files, .so for shared object files) classic UNIX object format, now often superseded by ELF
APK – Android Application Package
APP – A folder found on macOS systems containing program code and resources, appearing as one file.
BAC – an executable image for the RSTS/E system, created using the BASIC-PLUS COMPILE command[13]
BPL – a Win32 PE file created with Borland Delphi or C++Builder containing a package.
Bundle – a Macintosh plugin created with Xcode or make which holds executable code, data files, and folders for that code.
.Class – used in Java
COFF (no suffix for executable image, .o for object files) – UNIX Common Object File Format, now often superseded by ELF
COM files – commands used in DOS
DCU – Delphi compiled unit
DLL – library used in Windows and OS/2 to store data, resources and code.
DOL – the format used by the GameCube and Wii, short for Dolphin, which was the codename of the GameCube.
.EAR – archives of Java enterprise applications
ELF – (no suffix for executable image, .o for object files, .so for shared object files) used in many modern Unix and Unix-like systems, including Solaris, other System V Release 4 derivatives, Linux, and BSD)
expander (see bundle)
DOS executable (.exe – used in DOS)
.IPA – apple IOS application executable file. Another form of zip file.
JEFF – a file format allowing execution directly from static memory[14]
.JAR – archives of Java class files
.XPI – PKZIP archive that can be run by Mozilla web browsers to install software.
Mach-O – (no suffix for executable image, .o for object files, .dylib and .bundle for shared object files) Mach-based systems, notably native format of macOS, iOS, watchOS, and tvOS
NetWare Loadable Module (.NLM) – the native 32-bit binaries compiled for Novell's NetWare Operating System (versions 3 and newer)
New Executable (.EXE – used in multitasking ("European") MS-DOS 4.0, 16-bit Microsoft Windows, and OS/2)
.o – un-linked object files directly from the compiler
Obb – a file that developers create along with some APK packages to support the application.
Portable Executable (.EXE, – used in Microsoft Windows and some other systems)
Preferred Executable Format – (classic Mac OS for PowerPC applications; compatible with macOS via a classic (Mac OS X) emulator)
RLL – used in Microsoft operating systems together with a DLL file to store program resources
.s1es – Executable used for S1ES learning system.
.so – shared library, typically ELF
Value Added Process (.VAP) – the native 16-bit binaries compiled for Novell's NetWare Operating System (version 2, NetWare 286, Advanced NetWare, etc.)
.WAR – archives of Java Web applications
XBE – Xbox executable
.XAP – Windows Phone package
XCOFF – (no suffix for executable image, .o for object files, .a for shared object files) extended COFF, used in AIX
XEX – Xbox 360 executable
Object extensions
.VBX – Visual Basic extensions
.OCX – Object Control extensions
.TLB – Windows Type Library
Page description language[edit]
DVI – Device independent format
EGT – Universal Document can be used to store CSS type styles (*.egt)
PLD
PCL
PDF – Portable Document Format
PostScript (.ps, .ps.gz)
SNP – Microsoft Access Report Snapshot
XPS
XSL-FO (Formatting Objects)
Configurations, Metadata
CSS – Cascading Style Sheets
XSLT, XSL – XML Style Sheet (.xslt, .xsl)
TPL – Web template (.tpl)
Personal information manager[edit]
Main article: Personal information manager
MSG – Microsoft Outlook task manager
ORG – Lotus Organizer PIM package
PST, OST – Microsoft Outlook email communication
SC2 – Microsoft Schedule+ calendar
Presentation[edit]
GSLIDES – Google Drive Presentation
KEY, KEYNOTE – Apple Keynote Presentation
NB – Mathematica Slideshow
NBP – Mathematica Player slideshow
ODP – OpenDocument Presentation
OTP – OpenDocument Presentation template
PEZ – Prezi Desktop Presentation
POT – Microsoft PowerPoint template
PPS – Microsoft PowerPoint Show
PPT – Microsoft PowerPoint Presentation
PPTX – Office Open XML Presentation
PRZ – Lotus Freelance Graphics
SDD – StarOffice's StarImpress
SHF – ThinkFree Show
SHOW – Haansoft(Hancom) Presentation software document
SHW – Corel Presentations slide show creation
SLP – Logix-4D Manager Show Control Project
SSPSS – SongShow Plus Slide Show
STI – OpenOffice.org XML (obsolete) Presentation template
SXI – OpenOffice.org XML (obsolete) Presentation
THMX – Microsoft PowerPoint theme template
WATCH – Dataton Watchout Presentation
Project management software[edit]
Main article: Project management software
MPP – Microsoft Project
Reference management software[edit]
Main article: Reference management software
Formats of files used for bibliographic information (citation) management.

bib – BibTeX
enl – EndNote
ris – Research Information Systems RIS (file format)
Scientific data (data exchange)[edit]
FITS (Flexible Image Transport System) – standard data format for astronomy (.fits)
Silo – a storage format for visualization developed at Lawrence Livermore National Laboratory
SPC – spectroscopic data
EAS3 – binary format for structured data
EOSSA – Electro-Optic Space Situational Awareness format
OST (Open Spatio-Temporal) – extensible, mainly images with related data, or just pure data; meant as an open alternative for microscope images
CCP4 – X-ray crystallography voxels (electron density)
MRC – voxels in cryo-electron microscopy
HITRAN – spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit)
.root – hierarchical platform-independent compressed binary format used by ROOT
Simple Data Format (SDF) – a platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays.
MYD – Everfine LEDSpec software file for LED measurements
Multi-domain[edit]
NetCDF – Network common data format
HDR, [HDF], h4 or h5 – Hierarchical Data Format
SDXF – (Structured Data Exchange Format)
CDF – Common Data Format
CGNS – CFD General Notation System
FMF – Full-Metadata Format
Meteorology[edit]
GRIB – Grid in Binary, WMO format for weather model data
BUFR – WMO format for weather observation data
PP – UK Met Office format for weather model data
NASA-Ames – Simple text format for observation data. First used in aircraft studies of the atmosphere.
Chemistry[edit]
Main article: chemical file format
CML – Chemical Markup Language (CML) (.cml)
Chemical table file (CTab) (.mol, .sd, .sdf)
Joint Committee on Atomic and Molecular Physical Data (JCAMP) (.dx, .jdx)
Simplified molecular input line entry specification (SMILES) (.smi)
Mathematics[edit]
graph6, sparse6 – ASCII encoding of Adjacency matrices (.g6, .s6)
Biology[edit]
Molecular biology and bioinformatics:
AB1 – In DNA sequencing, chromatogram files used by instruments from Applied Biosystems
ACE – A sequence assembly format
ASN.1– Abstract Syntax Notation One, is an International Standards Organization (ISO) data representation format used to achieve interoperability between platforms. NCBI uses ASN.1 for the storage and retrieval of data such as nucleotide and protein sequences, structures, genomes, and PubMed records.
BAM – Binary Alignment/Map format (compressed SAM format)
BCF – Binary compressed VCF format
BED – The browser extensible display format is used for describing genes and other features of DNA sequences
CAF – Common Assembly Format for sequence assembly
CRAM – compressed file format for storing biological sequences aligned to a reference sequence
DDBJ – The flatfile format used by the DDBJ to represent database records for nucleotide and peptide sequences from DDBJ databases.
EMBL – The flatfile format used by the EMBL to represent database records for nucleotide and peptide sequences from EMBL databases.
FASTA – The FASTA format, for sequence data. Sometimes also given as FNA or FAA (Fasta Nucleic Acid or Fasta Amino Acid).
FASTQ – The FASTQ format, for sequence data with quality. Sometimes also given as QUAL.
GCPROJ – The Genome Compiler project. Advanced format for genetic data to be designed, shared and visualized.
GenBank – The flatfile format used by the NCBI to represent database records for nucleotide and peptide sequences from the GenBank and RefSeq databases
GFF – The General feature format is used to describe genes and other features of DNA, RNA, and protein sequences
GTF – The Gene transfer format is used to hold information about gene structure
MAF – The Multiple Alignment Format stores multiple alignments for whole-genome to whole-genome comparisons [6]
NCBI ASN.1 – Structured ASN.1 format used at National Center for Biotechnology Information for DNA and protein data
NEXUS – The Nexus file encodes mixed information about genetic sequence data in a block structured format
NeXML–XML format for phylogenetic trees
NWK – The Newick tree format is a way of representing graph-theoretical trees with edge lengths using parentheses and commas and useful to hold phylogenetic trees.
PDB – structures of biomolecules deposited in Protein Data Bank, also used to exchange protein and nucleic acid structures
PHD – Phred output, from the basecalling software Phred
PLN – Protein Line Notation used in proteax software specification
SAM – Sequence Alignment Map format, in which the results of the 1000 Genomes Project will be released
SBML – The Systems Biology Markup Language is used to store biochemical network computational models
SCF – Staden chromatogram files used to store data from DNA sequencing
SFF – Standard Flowgram Format
SRA – format used by the National Center for Biotechnology Information Short Read Archive to store high-throughput DNA sequence data
Stockholm – The Stockholm format for representing multiple sequence alignments
Swiss-Prot – The flatfile format used to represent database records for protein sequences from the Swiss-Prot database
VCF – Variant Call Format, a standard created by the 1000 Genomes Project that lists and annotates the entire collection of human variants (with the exception of approximately 1.6 million variants).
Biomedical imaging[edit]
Digital Imaging and Communications in Medicine (DICOM) (.dcm)
Neuroimaging Informatics Technology Initiative (NIfTI)
.nii – single-file (combined data and meta-data) style
.nii.gz – gzip-compressed, used transparently by some software, notably the FMRIB Software Library (FSL)
.gii – single-file (combined data and meta-data) style; NIfTI offspring for brain surface data
.img,.hdr – dual-file (separate data and meta-data, respectively) style
AFNI data, meta-data (.BRIK,.HEAD)
Massachusetts General Hospital imaging format, used by the FreeSurfer brain analysis package
.MGH – uncompressed
.MGZ – zip-compressed
Analyze data, meta-data (.img,.hdr)
Medical Imaging NetCDF (MINC) format, previously based on NetCDF; since version 2.0, based on HDF5 (.mnc)
Biomedical signals (time series)[edit]
ACQ – AcqKnowledge format for Windows/PC from Biopac Systems Inc., Goleta, CA, USA
ADICHT – LabChart format from ADInstruments Pty Ltd, Bella Vista NSW, Australia
BCI2000 – The BCI2000 project, Albany, NY, USA
BDF – BioSemi data format from BioSemi B.V. Amsterdam, Netherlands
BKR – The EEG data format developed at the University of Technology Graz, Austria
CFWB – Chart Data Format from ADInstruments Pty Ltd, Bella Vista NSW, Australia
DICOM – Waveform An extension of Dicom for storing waveform data
ecgML – A markup language for electrocardiogram data acquisition and analysis
EDF/EDF+ – European Data Format
FEF – File Exchange Format for Vital signs, CEN TS 14271
GDF v1.x – The General Data Format for biomedical signals, version 1.x
GDF v2.x – The General Data Format for biomedical signals, version 2.x
HL7aECG – Health Level 7 v3 annotated ECG
MFER – Medical waveform Format Encoding Rules
OpenXDF – Open Exchange Data Format from Neurotronics, Inc., Gainesville, FL, USA
SCP-ECG – Standard Communication Protocol for Computer assisted electrocardiography EN1064:2007
SIGIF – A digital SIGnal Interchange Format with application in neurophysiology
WFDB – Format of Physiobank
XDF – eXtensible Data Format
Other biomedical formats[edit]
Health Level 7 (HL7) – a framework for exchange, integration, sharing, and retrieval of health information electronically
xDT – a family of data exchange formats for medical records
Biometric formats[edit]
CBF – Common Biometric Format, based on CBEFF 2.0 (Common Biometric ExFramework).
EBF – Extended Biometric Format, based on CBF but with S/MIME encryption support and semantic extensions
CBFX – XML Common Biometric Format, based upon XCBF 1.1 (OASIS XML Common Biometric Format)
EBFX – XML Extended Biometric Format, based on CBFX but with W3C XML Encryption support and semantic extensions
Programming languages and scripts[edit]
ADB – Ada body
ADS – Ada specification
AHK – AutoHotkey script file
APPLESCRIPT- applescript – see SCPT
AS – Adobe Flash ActionScript File
AU3 – AutoIt version 3
BAT – Batch file
BAS – QBasic & QuickBASIC
CLJS – ClojureScript
CMD – Batch file
Coffee – CoffeeScript
C – C
CPP – C++
INO – Arduino sketch (program)
EGG – Chicken
EGT – EGT Asterisk Application Source File, EGT Universal Document
ERB – Embedded Ruby, Ruby on Rails Script File
HTA – HTML Application
IBI – Icarus script
ICI – ICI
IJS – J script
.ipynb – IPython Notebook
ITCL – Itcl
JS – JavaScript and JScript
JSFL – Adobe JavaScript language
.kt - Kotlin
LUA – Lua
M – Mathematica package file
MRC – mIRC Script
NCF – NetWare Command File (scripting for Novell's NetWare OS)
NUC – compiled script
NUD – C++ External module written in C++
NUT – Squirrel
pde – Processing (programming language), Processing script
PHP – PHP
PHP? – PHP (? = version number)
PL – Perl
PM – Perl module
PS1 – Windows PowerShell shell script
PS1XML – Windows PowerShell format and type definitions
PSC1 – Windows PowerShell console file
PSD1 – Windows PowerShell data file
PSM1 – Windows PowerShell module file
PY – Python
PYC – Python byte code files
PYO – Python
R – R scripts
r – REBOL scripts
RB – Ruby
RDP – RDP connection
red – Red scripts
RS – Rust (programming language)
SB2 – Scratch
SCPT – Applescript
SCPTD – See SCPT.
SDL – State Description Language
SH – Shell script
SYJS – SyMAT JavaScript
SYPY – SyMAT Python
TCL – Tcl
TNS – Ti-Nspire Code/File
VBS – Visual Basic Script
XPL – XProc script/pipeline
ebuild – Gentoo linux's portage package.
Security[edit]
Authentication and general encryption formats are listed here.

OpenPGP Message Format – used by Pretty Good Privacy, GNU Privacy Guard, and other OpenPGP software; can contain keys, signed data, or encrypted data; can be binary or text ("ASCII armored")
Certificates and keys[edit]
GXK – Galaxkey, an encryption platform for authorized, private and confidential email communication[citation needed]
OpenSSH private key (.ssh) – Secure Shell private key; format generated by ssh-keygen or converted from PPK with PuTTYgen[15][16][17]
OpenSSH public key (.pub) – Secure Shell public key; format generated by ssh-keygen or PuTTYgen[15][16][17]
PuTTY private key (.ppk) – Secure Shell private key, in the format generated by PuTTYgen instead of the format used by OpenSSH[15][16][17]
X.509[edit]
Distinguished Encoding Rules (.cer, .crt, .der) – stores certificates
PKCS#7 SignedData (.p7b, .p7c) – commonly appears without main data, just certificates or certificate revocation lists (CRLs)
PKCS#12 (.p12, .pfx) – can store public certificates and private keys
PEM – Privacy-enhanced Electronic Mail: full format not widely used, but often used to store Distinguished Encoding Rules in Base64 format
PFX – Microsoft predecessor of PKCS#12
Encrypted files[edit]
This section shows file formats for encrypted general data, rather than a specific program's data.

AXX – Encrypted file, created with AxCrypt
EEA – An encrypted CAB, ostensibly for protecting email attachments
TC – Virtual encrypted disk container, created by TrueCrypt
KODE – Encrypted file, created with KodeFile
Password files[edit]
Password files (sometimes called keychain files) contain lists of other passwords, usually encrypted.

BPW – Encrypted password file created by Bitser password manager
KDB – KeePass 1 database
KDBX – KeePass 2 database
Signal data (non-audio)[edit]
ACQ – AcqKnowledge format for Windows/PC from Biopac
ADICHT – LabChart format from ADInstruments
BKR – The EEG data format developed at the University of Technology Graz
BDF, CFG – Configuration file for Comtrade data
CFWB – Chart Data format from ADInstruments
DAT – Raw data file for Comtrade data
EDF – European data format
FEF – File Exchange Format for Vital signs
GDF – General data formats for biomedical signals
GMS – Gesture And Motion Signal format
IROCK – intelliRock Sensor Data File Format
MFER – Medical waveform Format Encoding Rules
SAC – Seismic Analysis Code, earthquake seismology data format[18]
SCP-ECG – Standard Communication Protocol for Computer assisted electrocardiography
SEED, MSEED – Standard for the Exchange of Earthquake Data, seismological data and sensor metadata[19]
SEGY – Reflection seismology data format
SIGIF – SIGnal Interchange Format
WIN, WIN32 – NIED/ERI seismic data format (.cnt)[20]
Sound and music[edit]
Lossless audio[edit]
Uncompressed[edit]
8SVX – Commodore-Amiga 8-bit sound (usually in an IFF container)
16SVX – Commodore-Amiga 16-bit sound (usually in an IFF container)
AIFF, AIF, AIFC – Audio Interchange File Format
AU – Simple audio file format introduced by Sun Microsystems
BWF – Broadcast Wave Format, an extension of WAVE
CDDA – Compact Disc Digital Audio
RAW – Raw samples without any header or sync
WAV – Microsoft Wave
Compressed[edit]
RA, RM – RealAudio format
FLAC – Free lossless codec of the Ogg project
LA – Lossless Audio
PAC – LPAC
APE – Monkey's Audio
OFR, OFS, OFF – OptimFROG
RKA – RKAU
SHN – Shorten
TAK – Tom's Lossless Audio Kompressor[21]
THD – Dolby TrueHD
TTA – Free lossless audio codec (True Audio)
WV – WavPack
WMA – Windows Media Audio 9 Lossless
BRSTM – Binary Revolution Stream[22]
DTS, DTSHD, DTSMA – DTS (sound system)
AST – Nintendo Audio Stream
AW – Nintendo Audio Sample used in first-party games
PSF – Portable Sound Format, PlayStation variant (originally PlayStation Sound Format)
Lossy audio[edit]
AC3 – Usually used for Dolby Digital tracks
AMR – For GSM and UMTS based mobile phones
MP1 – MPEG Layer 1
MP2 – MPEG Layer 2
MP3
MPEG Layer 3
SPX – Speex (Ogg project, specialized for voice, low bitrates)
GSM – GSM Full Rate, originally developed for use in mobile phones
WMA – Windows Media Audio
AAC – Advanced Audio Coding (usually in an MPEG-4 container)
MPC – Musepack
VQF – Yamaha TwinVQ
OTS – Audio File (similar to MP3, with more data stored in the file and slightly better compression; designed for use with OtsLabs' OtsAV)
SWA – Macromedia Shockwave Audio (Same compression as MP3 with additional header information specific to Macromedia Director
VOX – Dialogic ADPCM Low Sample Rate Digitized Voice
VOC – Creative Labs Soundblaster Creative Voice 8-bit & 16-bit Also output format of RCA Audio Recorders
DWD – DiamondWare Digitized
SMP – Turtlebeach SampleVision
OGG – Ogg Vorbis
Tracker modules and related[edit]
MOD – Soundtracker and Protracker sample and melody modules
MT2 – MadTracker 2 module
S3M – Scream Tracker 3 module
XM – Fast Tracker module
IT – Impulse Tracker module
NSF – NES Sound Format
MID, MIDI – Standard MIDI file; most often just notes and controls but occasionally also sample dumps (.mid, .rmi)
FTM – FamiTracker Project file
Sheet music files[edit]
ABC – ABC Notation sheet music file
DARMS – DARMS File Format also known as the Ford-Columbia Format
ETF – Enigma Transportation Format abandoned sheet music exchange format
GP* – Guitar Pro sheet music and tablature file
KERN – Kern File Format sheet music file
LY – LilyPond sheet music file
MEI – Music Encoding Initiative file format that attempts to encode all musical notations
MUS, MUSX – Finale sheet music file
MXL, XML – MusicXML standard sheet music exchange format
MSCX, MSCZ – MuseScore sheet music file
SMDL – Standard Music Description Language sheet music file
SIB – Sibelius sheet music file
Other file formats pertaining to audio[edit]
NIFF – Notation Interchange File Format
PTB – Power Tab Editor tab
ASF – Advanced Systems Format
CUST – DeliPlayer custom sound format
GYM – Genesis YM2612 log
JAM – Jam music format
MNG – Background music for the Creatures game series, starting from Creatures 2
RMJ – RealJukebox Media used for RealPlayer
SID – Sound Interface Device – Commodore 64 instructions to play SID music and sound effects
SPC – Super NES sound format
TXM – Track ax media
VGM – Stands for "Video Game Music", log for several different chips
YM – Atari ST/Amstrad CPC YM2149 sound chip format
PVD – Portable Voice Document used for Oaisys & Mitel call recordings
Playlist formats[edit]
AIMPPL – AIMP Playlist format
ASX – Advanced Stream Redirector
RAM – Real Audio Metafile For RealAudio files only.
XPL – HDi playlist
XSPF – XML Shareable Playlist Format
ZPL – Xbox Music (Formerly Zune) Playlist format from Microsoft
M3U – Multimedia playlist file
PLS – Multimedia playlist, originally developed for use with the museArc
Audio editing and music production[edit]
ALS – Ableton Live set
ALC – Ableton Live clip
ALP – Ableton Live pack
AUP – Audacity project file
BAND – GarageBand project file
CEL – Adobe Audition loop file (Cool Edit Loop)
CPR – Steinberg Cubase project file
CWP – Cakewalk Sonar project file
DRM – Steinberg Cubase drum file
DMKIT – Image-Line's Drumaxx drum kit file
ENS – Native Instruments Reaktor Ensemble
FLP – Image Line FL Studio project file
GRIR – Native Instruments Komplete Guitar Rig Impulse Response
LOGIC – Logic Pro X project file
MMP – LMMS project file (alternatively MMPZ for compressed formats)
MMR – MAGIX Music Maker project file
MX6HS – Mixcraft 6 Home Studio project file
NPR – Steinberg Nuendo project file
OMF, OMFI – Open Media Framework Interchange OMFI succeeds OMF (Open Media Framework)
RIN – Soundways RIN-M file containing sound recording participant credits and song information
SES – Adobe Audition multitrack session file
SFL – Sound Forge sound file
SNG – MIDI sequence file (MidiSoft, Korg, etc.) or n-Track Studio project file
STF – StudioFactory project file. It contains all necessary patches, samples, tracks and settings to play the file
SND – Akai MPC sound file
SYN – SynFactory project file. It contains all necessary patches, samples, tracks and settings to play the file
VCLS – VocaListener project file
VSQ – Vocaloid 2 Editor sequence excluding wave-file
VSQX – Vocaloid 3 Editor sequence excluding wave-file
Recorded television formats[edit]
DVR-MS – Windows XP Media Center Edition's Windows Media Center recorded television format
WTV – Windows Vista's and up Windows Media Center recorded television format
Source code for computer programs[edit]
ADA, ADB, 2.ADA – Ada (body) source
ADS, 1.ADA – Ada (specification) source
ASM, S – Assembly language source
BAS – BASIC, FreeBASIC, Visual Basic, BASIC-PLUS source,[13] PICAXE basic
BB – Blitz Basic Blitz3D
BMX – Blitz Basic BlitzMax
C – C source
CLJ – Clojure source code
CLS – Visual Basic class
COB, CBL – COBOL source
CPP, CC, CXX, C, CBP – C++ source
CS – C# source
CSPROJ – C# project (Visual Studio .NET)
D – D source
DBA – DarkBASIC source
DBPro123 – DarkBASIC Professional project
E – Eiffel source
EFS – EGT Forever Source File
EGT – EGT Asterisk Source File, could be J, C#, VB.net, EF 2.0 (EGT Forever)
EL – Emacs Lisp source
FOR, FTN, F, F77, F90 – Fortran source
FRM – Visual Basic form
FRX – Visual Basic form stash file (binary form file)
FTH – Forth source
GED – Game Maker Extension Editable file as of version 7.0
GM6 – Game Maker Editable file as of version 6.x
GMD – Game Maker Editable file up to version 5.x
GMK – Game Maker Editable file as of version 7.0
GML – Game Maker Language script file
GO – Go source
H – C/C++ header file
HPP, HXX – C++ header file
HS – Haskell source
I – SWIG interface file
INC – Turbo Pascal included source
JAVA – Java source
L – lex source
LGT – Logtalk source
LISP – Common Lisp source
M – Objective-C source
M – MATLAB
M – Mathematica
M4 – m4 source
ML – Standard ML and OCaml source
MSQR – M² source file, created by Mattia Marziali
N – Nemerle source
NB – Nuclear Basic source
P – Parser source
PAS, PP, P – Pascal source (DPR for projects)
PHP, PHP3, PHP4, PHP5, PHPS, Phtml – PHP source
PIV – Pivot stickfigure animator
PL, PM – Perl
PLI, PL1 – PL/I
PRG – Ashton-Tate; dbII, dbIII and dbIV, db, db7, clipper, Microsoft Fox and FoxPro, harbour, xharbour, and Xbase
PRO – IDL
POL – Apcera Policy Language doclet
PY – Python source
R – R source
RED – Red source
REDS – Red/System source
RB – Ruby source
RESX – Resource file for .NET applications
RC, RC2 – Resource script files to generate resources for .NET applications
RKT, RKTL – Racket source
SCALA – Scala source
SCI, SCE – Scilab
SCM – Scheme source
SD7 – Seed7 source
SKB, SKC – Sage Retrieve 4GL Common Area (Main and Amended backup)
SKD – Sage Retrieve 4GL Database
SKF, SKG – Sage Retrieve 4GL File Layouts (Main and Amended backup)
SKI – Sage Retrieve 4GL Instructions
SKK – Sage Retrieve 4GL Report Generator
SKM – Sage Retrieve 4GL Menu
SKO – Sage Retrieve 4GL Program
SKP, SKQ – Sage Retrieve 4GL Print Layouts (Main and Amended backup)
SKS, SKT – Sage Retrieve 4GL Screen Layouts (Main and Amended backup)
SKZ – Sage Retrieve 4GL Security File
SLN – Visual Studio solution
SPIN – Spin source (for Parallax Propeller microcontrollers)
STK – Stickfigure file for Pivot stickfigure animator
SWG – SWIG source code
TCL – TCL source code
VAP – Visual Studio Analyzer project
VB – Visual Basic.NET source
VBG – Visual Studio compatible project group
VBP, VIP – Visual Basic project
VBPROJ – Visual Basic .NET project
VCPROJ – Visual C++ project
VDPROJ – Visual Studio deployment project
XPL – XProc script/pipeline
XQ – XQuery file
XSL – XSLT stylesheet
Y – yacc source
Spreadsheet[edit]
123 – Lotus 1-2-3
AB2 – Abykus worksheet
AB3 – Abykus workbook
AWS – Ability Spreadsheet
BCSV – Nintendo proprietary table format
CLF – ThinkFree Calc
CELL – Haansoft(Hancom) SpreadSheet software document
CSV – Comma-Separated Values
GSHEET – Google Drive Spreadsheet
numbers – An Apple Numbers Spreadsheet file
gnumeric – Gnumeric spreadsheet, a gziped XML file
LCW – Lucid 3-D
ODS – OpenDocument spreadsheet
OTS – OpenDocument spreadsheet template
QPW – Quattro Pro spreadsheet
SDC – StarOffice StarCalc Spreadsheet
SLK – SYLK (SYmbolic LinK)
STC – OpenOffice.org XML (obsolete) Spreadsheet template
SXC – OpenOffice.org XML (obsolete) Spreadsheet
TAB – tab delimited columns; also TSV (Tab-Separated Values)
TXT – text file
VC – Visicalc
WK1 – Lotus 1-2-3 up to version 2.01
WK3 – Lotus 1-2-3 version 3.0
WK4 – Lotus 1-2-3 version 4.0
WKS – Lotus 1-2-3
WKS – Microsoft Works
WQ1 – Quattro Pro DOS version
XLK – Microsoft Excel worksheet backup
XLS – Microsoft Excel worksheet sheet (97–2003)
XLSB – Microsoft Excel binary workbook
XLSM – Microsoft Excel Macro-enabled workbook
XLSX – Office Open XML worksheet sheet
XLR – Microsoft Works version 6.0
XLT – Microsoft Excel worksheet template
XLTM – Microsoft Excel Macro-enabled worksheet template
XLW – Microsoft Excel worksheet workspace (version 4.0)
Tabulated data[edit]
TSV – Tab-separated values
CSV – Comma-separated values
db – databank format; accessible by many econometric applications
dif – accessible by many spreadsheet applications
Video[edit]
Main article: video file format
AAF – mostly intended to hold edit decisions and rendering information, but can also contain compressed media essence
3GP – the most common video format for cell phones
GIF – Animated GIF (simple animation; until recently often avoided because of patent problems)
ASF – container (enables any form of compression to be used; MPEG-4 is common; video in ASF-containers is also called Windows Media Video (WMV))
AVCHD – Advanced Video Codec High Definition
AVI – container (a shell, which enables any form of compression to be used)
BIK (.bik) – Bink Video file. A video compression system developed by RAD Game Tools
CAM – aMSN webcam log file
COLLAB – Blackboard Collaborate session recording
DAT – video standard data file (automatically created when we attempted to burn as video file on the CD)
DSH
DVR-MS – Windows XP Media Center Edition's Windows Media Center recorded television format
FLV – Flash video (encoded to run in a flash animation)
M1V MPEG-1 – Video
M2V MPEG-2 – Video
FLA – Macromedia Flash (for producing)
FLR – (text file which contains scripts extracted from SWF by a free ActionScript decompiler named FLARE)
SOL – Adobe Flash shared object ("Flash cookie")
M4V – video container file format developed by Apple
Matroska (*.mkv) – Matroska is a container format, which enables any video format such as MPEG-4 ASP or AVC to be used along with other content such as subtitles and detailed meta information
WRAP – MediaForge (*.wrap)
MNG – mainly simple animation containing PNG and JPEG objects, often somewhat more complex than animated GIF
QuickTime (.mov) – container which enables any form of compression to be used; Sorenson codec is the most common; QTCH is the filetype for cached video and audio streams
MPEG (.mpeg, .mpg, .mpe)
THP – Nintendo proprietary movie/video format
MPEG-4 Part 14, shortened "MP4" – multimedia container (most often used for Sony's PlayStation Portable and Apple's iPod)
MXF – Material Exchange Format (standardized wrapper format for audio/visual material developed by SMPTE)
ROQ – used by Quake 3
NSV – Nullsoft Streaming Video (media container designed for streaming video content over the Internet)
Ogg – container, multimedia
RM – RealMedia
SVI – Samsung video format for portable players
SMI – SAMI Caption file (HTML like subtitle for movie files)
SMK (.smk) – Smacker video file. A video compression system developed by RAD Game Tools
SWF – Macromedia Flash (for viewing)
WMV – Windows Media Video (See ASF)
WTV – Windows Vista's and up Windows Media Center recorded television format
YUV – raw video format; resolution (horizontal x vertical) and sample structure 4:2:2 or 4:2:0 must be known explicitly
WebM – video file format for web video using HTML5
Video editing, production[edit]
BRAW – Blackmagic Design RAW video file name
FCP – Final Cut Pro project file
MSWMM – Windows Movie Maker project file
PPJ & PRPROJ– Adobe Premiere Pro video editing file
IMOVIEPROJ – iMovie project file
VEG & VEG-BAK – Sony Vegas project file
SUF – Sony camera configuration file (setup.suf) produced by XDCAM-EX camcorders
WLMP – Windows Live Movie Maker project file
KDENLIVE – Kdenlive project file
VPJ – VideoPad project file
MOTN – Apple Motion project file
IMOVIEMOBILE – iMovie project file for iOS users
WFP / WVE — Wondershare Filmora Project
WLMP – Windows Live Movie Maker project
Video game data[edit]
List of common file formats of data for video games on systems that support filesystems, most commonly PC games.

TrackMania United/Nations Forever Engine – Formats used by games based on the TrackMania engine.
XeX
CHALLENGE.GBX – (Edited) Challenge files.
CONSTRUCTIONCAMPAIGN.GBX – Construction campaigns files.
CONTROLEFFECTMASTER.GBX/CONTROLSTYLE.GBX – Menu parts.
FIDCACHE.GBX – Saved game.
GBX – Other TrackMania items.
REPLAY.GBX – Replays of races.
Doom engine – Formats used by games based on the Doom engine.
DEH – DeHackEd files to mutate the game executable (not officially part of the DOOM engine)
DSG – Saved game
LMP – A lump is an entry in a DOOM wad.
LMP – Saved demo recording
MUS – Music file (usually contained within a WAD file)
WAD – Data storage (contains music, maps, and textures)
Quake engine – Formats used by games based on the Quake engine.
BSP – (For Binary space partitioning) compiled map format
MAP – Raw map format used by editors like GtkRadiant or QuArK
MDL/MD2/MD3/MD5 – Model for an item used in the game
PAK/PK2 – Data storage
PK3/PK4 – used by the Quake II, Quake III Arena and Quake 4 game engines, respectively, to store game data, textures etc. They are actually .zip files.
.dat – not specific file type, often generic extension for "data" files for a variety of applications
sometimes used for general data contained within the .PK3/PK4 files
.fontdat – a .dat file used for formatting game fonts
.roq – Video format
.sav – Savegame format
Unreal Engine – Formats used by games based on the Unreal engine.
U – Unreal script format
UAX – Animations format for Unreal Engine 2
UMX – Map format for Unreal Tournament
UMX – Music format for Unreal Engine 1
UNR – Map format for Unreal
UPK – Package format for cooked content in Unreal Engine 3
USX – Sound format for Unreal Engine 1 and Unreal Engine 2
UT2 – Map format for Unreal Tournament 2003 and Unreal Tournament 2004
UT3 – Map format for Unreal Tournament 3
UTX – Texture format for Unreal Engine 1 and Unreal Engine 2
UXX – Cache format; these are files a client downloaded from server (which can be converted to regular formats)
Duke Nukem 3D Engine – Formats used by games based on this engine
DMO – Save game
GRP – Data storage
MAP – Map (usually constructed with BUILD.EXE)
Diablo Engine – Formats used by Diablo by Blizzard Entertainment.
SV – Save Game
ITM – Item File
Real Virtuality Engine – Formats used by Bohemia Interactive. Operation:Flashpoint, ARMA 2, VBS2
SQF – Format used for general editing
SQM – Format used for mission files
PBO – Binarized file used for compiled models
LIP – Format that is created from WAV files to create in-game accurate lip-synch for character animations.
Source Engine – Formats used by Valve. Half-Life 2, Counter-Strike: Source, Day of Defeat: Source, Half-Life 2: Episode One, Team Fortress 2, Half-Life 2: Episode Two, Portal, Left 4 Dead, Left 4 Dead 2, Alien Swarm, Portal 2, Counter-Strike: Global Offensive, Titanfall, Insurgency, Titanfall 2, Day of Infamy
VMF – Valve Hammer Map editor raw map file
BSP – Source Engine compiled map file
MDL – Source Engine model format
SMD – Source Engine uncompiled model format
PCF – Source Engine particle effect file
HL2 – Half-Life 2 save format
DEM – Source Engine demo format
VPK – Source Engine pack format
VTF – Source Engine texture format
VMT – Source Engine material format.
Other Formats
B – used for Grand Theft Auto saved game files
BOL – used for levels on Poing!PC
DBPF – The Sims 2, DBPF, Package
DIVA – Project DIVA timings, element coördinates, MP3 references, notes, animation poses and scores.
ESM, ESP – Master and Plugin data archives for the Creation Engine
HAMBU - format used by the Aidan's Funhouse game RGTW for storing map data [23]
HE0, HE2, HE4 HE games File
GCF – format used by the Steam content management system for file archives
IMG – format used by Renderware-based Grand Theft Auto games for data storage
LOVE – format used by the LOVE2D Engine[24]
MAP – format used by Halo: Combat Evolved for archive compression, Doom³, and various other games
MCA – format used by Minecraft for storing data for in-game worlds[25]
MCADDON – format used by the Bedrock Edition of Minecraft for add-ons
MCFUNCTION – format used by Minecraft for storing functions
MCMETA – format used by Minecraft for storing data for customizable texture packs for the game
MCPACK – format used by the Bedrock Edition of Minecraft for in-game texture packs
MCR – format used by Minecraft for storing data for in-game worlds before version 1.2
MCTEMPLATE – format used by the Bedrock Edition of Minecraft for world templates
MCWORLD – format used by the Bedrock Edition of Minecraft for in-game worlds
NBT – format used by Minecraft for storing program variables along with their (Java) type identifiers
OEC – format used by OE-Cake for scene data storage
OSB - osu! storyboard data
OSC - osu!stream combined stream data
OSF2 - free osu!stream song file
OSR – osu! replay data
OSU – osu! beatmap data
OSZ2 - paid osu!stream song file
P3D – format for panda3d by Disney
PLAGUEINC - format used by Plague_Inc. for storing custom scenario information [26]
POD – format used by Terminal Reality
RCT – Used for templates and save files in RollerCoaster Tycoon games
REP – used by Blizzard Entertainment for scenario replays in StarCraft.
Simcity 4, DBPF (.dat, .SC4Lot, .SC4Model) – All game plugins use this format, commonly with different file extensions
SMZIP – ZIP-based package for Stepmania songs, themes and announcer packs.
USLD – format used by Unison Shift to store level layouts.
VVVVVV – format used by VVVVVV
CPS – format used by The Powder Toy, Powder Toy save
STM – format used by The Powder Toy, Powder Toy stamp
PKG – format used by Bungie for the PC Beta of Destiny 2, for nearly all the game's assets.
CHR – format used by Team Salvato, for the character files of Doki Doki Literature Club!
Z5 – format used by Z-machine for story files in interactive fiction.
scworld – format used by Survivalcraft to store sandbox worlds.
scskin – format used by Survivalcraft to store player skins.
scbtex – format used by Survivalcraft to store block textures.
prison – format used by Prison Architect to save prisons
escape – format used by Prison Architect to save escape attempts
Video game storage media[edit]
List of the most common filename extensions used when a game's ROM image or storage medium is copied from an original read-only memory (ROM) device to an external memory such as hard disk for back up purposes or for making the game playable with an emulator. In the case of cartridge-based software, if the platform specific extension is not used then filename extensions ".rom" or ".bin" are usually used to clarify that the file contains a copy of a content of a ROM. ROM, disk or tape images usually do not consist of one file or ROM, rather an entire file or ROM structure contained within one file on the backup medium.[27]

A26 – Atari 2600 (.a26)
A52 – Atari 5200 (.a52)
A78 – Atari 7800 (.a78)
LNX – Atari Lynx (.lnx)
JAG,J64 – Atari Jaguar (.jag, .j64)
ISO, WBFS, WAD, WDF – Wii and WiiU (.iso, .wbfs, .wad, .wdf)
GCM, ISO – GameCube (.gcm, .iso)
min - Pokemon mini (.min)
NDS – Nintendo DS (.nds)
3DS – Nintendo 3DS (.3ds)
CIA – Installation File (.cia)
GB – Game Boy (.gb) (this applies to the original Game Boy and the Game Boy Color)
GBC – Game Boy Color (.gbc)
GBA – Game Boy Advance (.gba)
GBA – Game Boy Advance (.gba)
SAV – Game Boy Advance Saved Data Files (.sav)
SGM – Visual Boy Advance Save States (.sgm)
N64, V64, Z64, U64, USA, JAP, PAL, EUR, BIN – Nintendo 64 (.n64, .v64, .z64, .u64, .usa, .jap, .pal, .eur, .bin)
PJ – Project 64 Save States (.pj)
NES – Nintendo Entertainment System (.nes)
FDS – Famicom Disk System (.fds)
JST – Jnes Save States (.jst)
FC? – FCEUX Save States (.fc#, where # is any character, usually a number)
GG – Game Gear (.gg)
SMS – Master System (.sms)
SG – SG-1000 (.sg)
SMD,BIN – Mega Drive/Genesis (.smd or .bin)
32X – Sega 32X (.32x)
SMC,078,SFC – Super NES (.smc, .078, or .sfc) (.078 is for split ROMs, which are rare)
FIG – Super Famicom (Japanese releases are rarely .fig, above extensions are more common)
SRM – Super NES Saved Data Files (.srm)
ZST – ZSNES Save States (.zst, .zs1-.zs9, .z10-.z99)
FRZ – Snes9X Save States (.frz, .000-.008)
PCE – TurboGrafx-16/PC Engine (.pce)
NPC, NGP – Neo Geo Pocket (.npc, .ngp)
NGC – Neo Geo Pocket Color (.ngc)
VB – Virtual Boy (.vb)
INT – Intellivision (.int)
MIN – Pokémon Mini (.min)
VEC – Vectrex (.vec)
BIN – Odyssey² (.bin)
WS – WonderSwan (.ws)
WSC – WonderSwan Color (.wsc)
TZX – ZX Spectrum (.tzx) (for exact copies of ZX Spectrum games)
TAP – for tape images without copy protection
Z80,SNA – (for snapshots of the emulator RAM)
DSK – (for disk images)
TAP – Commodore 64 (.tap) (for tape images including copy protection)
T64 – (for tape images without copy protection, considerably smaller than .tap files)
D64 – (for disk images)
CRT – (for cartridge images)
ADF – Amiga (.adf) (for 880K diskette images)
ADZ – GZip-compressed version of the above.
DMS – Disk Masher System, previously used as a disk-archiving system native to the Amiga, also supported by emulators.
Virtual machines[edit]
Microsoft Virtual PC, Virtual Server[edit]
VFD – Virtual Floppy Disk (.vfd)
VHD – Virtual Hard Disk (.vhd)
VUD – Virtual Undo Disk (.vud)
VMC – Virtual Machine Configuration (.vmc)
VSV – Virtual Machine Saved State (.vsv)
EMC VMware ESX, GSX, Workstation, Player[edit]
LOG – Virtual Machine Logfile (.log)
VMDK, DSK – Virtual Machine Disk (.vmdk, .dsk)
NVRAM – Virtual Machine BIOS (.nvram)
VMEM – Virtual Machine paging file (.vmem)
VMSD – Virtual Machine snapshot metadata (.vmsd)
VMSN – Virtual Machine snapshot (.vmsn)
VMSS,STD – Virtual Machine suspended state (.vmss, .std)
VMTM – Virtual Machine team data (.vmtm)
VMX,CFG – Virtual Machine configuration (.vmx, .cfg)
VMXF – Virtual Machine team configuration (.vmxf)
VirtualBox[edit]
VDI – VirtualBox Virtual Disk Image (.vdi)
Vbox-extpack – VitualBox extension pack. (.vbox-extpack)
Parallels Workstation[edit]
Main article: Parallels Workstation
HDD – Virtual Machine hard disk (.hdd)
PVS – Virtual Machine preferences/configuration (.pvs)
SAV – Virtual Machine saved state (.sav)
QEMU[edit]
COW – Copy-on-write
QCOW – QEMU copy-on-write Qcow
QCOW2 – QEMU copy-on-write – version 2 Qcow
QED – QEMU enhanced disk format
Web page[edit]
Static
DTD – Document Type Definition (standard), MUST be public and free
HTML (.html, .htm) – HyperText Markup Language
XHTML (.xhtml, .xht) – eXtensible HyperText Markup Language
MHTML (.mht, .mhtml) – Archived HTML, store all data on one web page (text, images, etc.) in one big file
MAF (.maff) – web archive based on ZIP
Dynamically generated
ASP (.asp) – Microsoft Active Server Page
ASPX – (.aspx) – Microsoft Active Server Page. NET
ADP – AOLserver Dynamic Page
BML – (.bml) – Better Markup Language (templating)
CFM – (.cfm) – ColdFusion
CGI – (.cgi)
iHTML – (.ihtml) – Inline HTML
JSP – (.jsp) JavaServer Pages
Lasso – (.las, .lasso, .lassoapp) – A file created or served with the Lasso Programming Language
PL – Perl (.pl)
PHP – (.php, .php?, .phtml) – ? is version number (previously abbreviated Personal Home Page, later changed to PHP: Hypertext Preprocessor)
SSI – (.shtml) – HTML with Server Side Includes (Apache)
SSI – (.stm) – HTML with Server Side Includes (Apache)
Markup languages and other web standards-based formats[edit]
Atom – (.atom, .xml) – Another syndication format.
EML – (.eml) – Format used by several desktop email clients.
JSON-LD – (.jsonld) – A JSON-based serialization for linked data.
Metalink – (.metalink, .met) – A format to list metadata about downloads, such as mirrors, checksums, and other information.
RSS – (.rss, .xml) – Syndication format.
Markdown – (.markdown, .md) – Plain text formatting syntax, which is popularly used to format "readme" files.
Shuttle – (.se) – Another lightweight markup language.
Other[edit]
AXD – cookie extensions found in temporary internet folder
BDF – Binary Data Format – raw data from recovered blocks of unallocated space on a hard drive
CBP – CD Box Labeler Pro, CentraBuilder, Code::Blocks Project File, Conlab Project
CEX – SolidWorks Enterprise PDM Vault File
COL – Nintendo GameCube proprietary collision file (.col)
CREDX – CredX Dat File
DDB – Generating code for Vocaloid singers voice (see .DDI)
DDI – Vocaloid phoneme library (Japanese, English, Korean, Spanish, Chinese, Catalan)
DUPX – DuupeCheck database management tool project file
FTM – Family Tree Maker data file
FTMB – Family Tree Maker backup file
GA3 – Graphical Analysis 3
GEDCOM (.ged) – (GEnealogical Data COMmunication) format to exchange genealogy data between different genealogy software
HLP – Windows help file
IGC – flight tracks downloaded from GPS devices in the FAI's prescribed format
INF – similar format to INI file; used to install device drivers under Windows, inter alia.
JAM – JAM Message Base Format for BBSes
KMC – tests made with KatzReview's MegaCrammer
KCL – Nintendo GameCube/Wii proprietary collision file (.kcl)
LNK – Microsoft Windows format for Hyperlinks to Executables
LSM – LSMaker script file (program using layered .jpg to create special effects; specifically designed to render lightsabers from the Star Wars universe) (.lsm)
NARC – Archive format used in Nintendo DS games.
OER – AU OER Tool, Open Educational Resource editor
PA – Used to assign sound effects to materials in KCL files (.pa)
PIF – Used to run MS-DOS programs under Windows
POR – So called "portable" SPSS files, readable by PSPP
PXZ – Compressed file to exchange media elements with PSALMO
RISE – File containing RISE generated information model evolution
TOPC – TopicCrunch SEO Project file holding keywords, domain and search engine settings (ASCII);
XLF – Utah State University Extensible LADAR Format
XMC – Assisted contact lists format, based on XML and used in kindergartens and schools
ZED – My Heritage Family Tree
Zone file – a text file containing a DNS zone
Cursors[edit]
ANI – Animated cursor
CUR – Cursor file
Smes – Hawk's Dock configuration file
Generalized files[edit]
General data formats[edit]
These file formats are fairly well defined by long-term use or a general standard, but the content of each file is often highly specific to particular software or has been extended by further standards for specific uses.

Text-based[edit]
CSV – comma-separated values
HTML – hyper text markup language
CSS – cascading style sheets
INI – a configuration text file whose format is substantially similar between applications
JSON – JavaScript Object Notation is an openly used data format now used by many languages, not just JavaScript
TSV – tab-separated values
XML – an open data format
YAML – an open data format
ReStructuredText – an open text format for technical documents used mainly in the Python programming language
Markdown (.md) – an open lightweight markup language to create simple but rich text, often used to format README files
AsciiDoc – an open human-readable markup document format semantically equivalent to DocBook
Generic file extensions[edit]
These are filename extensions and broad types reused frequently with differing formats or no specific format by different programs.

Binary files[edit]
Bak file (.bak, .bk) – various backup formats: some just copies of data files, some in application-specific data backup formats, some formats for general file backup programs
BIN – binary data, often memory dumps of executable code or data to be re-used by the same software that originated it
DAT – data file, usually binary data proprietary to the program that created it, or an MPEG-1 stream of Video CD
DSK – file representations of various disk storage images
RAW – raw (unprocessed) data
Text files[edit]
configuration file (.cnf, .conf, .cfg) – substantially software-specific
logfiles (.log) – usually text, but sometimes binary
plain text (.asc or .txt) – human-readable plain text, usually no more specific
Partial files[edit]
Differences and patches[edit]
diff – text file differences created by the program diff and applied as updates by patch
Incomplete transfers[edit]
!UT (.!ut) – partly complete uTorrent download
CRDOWNLOAD (.crdownload) – partly complete Google Chrome download
OPDOWNLOAD (.opdownload) – partly complete Opera download
PART (.part) – partly complete Mozilla Firefox or Transmission download
PARTIAL (.partial) – partly complete Internet Explorer or Microsoft Edge download
Temporary files[edit]
Temporary file (.temp, .tmp, various others) – sometimes in a specific format, but often just raw data in the middle of processing
Pseudo-pipeline file – used to simulate a software pipe
            
retrieved from: http://en.wikipedia.org/wiki/List_of_file_formats - accessed Aug 12 2020
<<  back to TOC
File Formats

Recommended Formats for Long-term Access and Sharing

Non-proprietaryno software purchase to open the file
Losslessuncompressed with all of the original data
Indexableif possible a plain text format that is both human and machine readable

Best file format?????

PAPER!
for example see: http://ollydbg.de/Paperbak/
<<  back to TOC
File Formats

  • Text:
  • Tabular:
  • Stat:
  • Images:
  • Geographic
  • Video
  • Music
  • Plain text
doc, docx, rtf, odt, pages
xls, xlsx, numbers, dbf
spss, sas, jmp, rdata
jpg, tiff, svg, png, gif, bmp
shp, geotiff, kml, kmz, gdb
mp4, mov, avi, ogg
mp3, wav, m4a, aiff
txt, csv, json, html, xml
<<  back to TOC
File Formats

  • Text:
  • Tabular:
  • Stat:
  • Images:
  • Geographic
  • Video
  • Music
  • Plain text:
doc, docx, rtf, odt, pages
xls, xlsx, numbers, dbf
spss, sas, jmp, rdata
jpg, tiff, svg, png, gif, bmp
shp, geotiff, kml, kmz, gdb
mp4, mov, avi, ogg
mp3, wav, m4a, aiff
txt, csv, json, html, xml

General Formats
  • proprietary
  • mixed
  • open
<<  back to TOC
File Formats

  • Text:
  • Tabular:
  • Stat:
  • Images:
  • Geographic
  • Video
  • Music
  • Plain text:
doc, docx, rtf, odt, pages
xls, xlsx, numbers, dbf
spss, sas, jmp, rdata
jpg, tiff, svg, png, gif, bmp
shp, geotiff, kml, kmz, gdb
mp4, mov, avi, ogg
mp3, wav, m4a, aiff
txt, csv, json, html, xml

General Formats
  • proprietary
  • mixed
  • open
Compression
  • lossy
  • depends
  • lossless
<<  back to TOC
OK - so what?

First, make sure your operating system lets you see the file extensions!!!!

  • Mac file extensions
    • Finder: Finder -> Preferences ... :
    • "Advanced" tab, check box next to "Show all filename extensions"
  • PC file extensions
    • Win 7 and below
      • File Explorer: Organize -> Folder and search options:
      • "View" tab, uncheck the box next to "Hide extensions for known file types"
    • Win 8 and above
      • File Explorer: "view" tab, check the box next to: "File name extensions"


<<  back to TOC
Quick note on statistics files and conversions

  • Often contain much metadata embedded in the file
    • For example SPSS and SAS include data types (nominal, ordinal, interval, ratio) and data dictionaries (code keys for nominal data, units for interval and ration data, etc.)
  • How to best share???
    • Option 1: keep in the proprietary format
    • Option 2: convert to text based format (csv, xml) and have either
      • A data dictionary in a text based format so that a user can reconstruct the data-metadata association
      • Some sort of ‘installer’ that contains the metadata and automatically reconstructs the data-metadata association

This also applies to relational databases, images, and some geographical data


<<  back to TOC
Some things to remember


  • Text and numbers
    • Plain text - BUT STRUCTURED
    • Character enconding??? UTF-8
    • PDF - preferably not!! (hard to index/search UNLESS created with specific care)
  • Images (bitmap)
    • TIFF, JPEG2000 (??), PNG, JPEG


hyper text markup language
.html
comma seperated values
.csv
.txt
extensible markup language
.xml
javascript object notation
.json
portable document format
joint photographic experts group
.jpg [ .jpeg, .jp2, j2k ]
tagged image file format
.tif [ .tiff ]
portable network graphic
.png
<<  back to TOC
File Formats

character encoding??? UTF-8

ASCII – American Standard Code for Information Interchange
[ old school, 128 characters in 7 bits ]
lowercase “j” would become binary 01101010 and decimal 106

UTF-8 – Universal Coded Character Set + Transformation Format – 8-bit
[ now the new standard, only since about 2007, first 128 characters are ASCII ]
[ encodes 1,112,064 “code points” or characters ]


<<  back to TOC
Bitmap and vector images

Raster – a “grid” of numeric color values, also known as a bitmap
[ .tiff, .jpg, .png ]

Vector – a collection of points that can be connected to make lines, polygons, and volumes
[ no standards yet, but common in Adobe Illustrator, AutoCAD, and many GIS applications ]
WATCH for .svg – scaleable vector graphic
<<  back to TOC
Some things to remember


  • Cartographic (maps)
    • Raster: GeoTIFF
    • Vector: shapefile, AutoCAD, GeoJSON
    • Note: shapefile has .shp, .shx, .dbf optional (?!) .prj, .sbx, .sbn
  • Audio
    • AIFF, WAVE 44.1 kHz / 16 bit or higher
    • BUT MP3 with FLAC encoding OK (Free Lossless Audio Codec)


shapefile
.shp
data base format
.dbf
projection (for maps)
.prj
javascript object notation
.json
.dxf
drawing exchange format
.shx
shapefile index
audio interchange file format
.aiff
moving pictures expert group
.mp3
wave
.wav
<<  back to TOC
Some things to remember

  • Video
    • MPEG-4
  • Documentation
    • Rich Text Format
    • Open Document text
    • html
    • Plain text


motion picture expert group
.mp4
.m4a
rich text format
.rtf
.odt
open document text
<<  back to TOC
Data Organization
Think about time and space
  • Directory Structure
  • File Naming Convention
Save time and space
<<  back to TOC
this will get personal





<<  back to TOC
So ....
  • Take a moment, think about your file naming for your articles that you save
    • Are you consistent?
    • Is it easy to find what you want several months later?
  • Now think about your file structure for your downloaded articles
    • Where are the actual files on the computer?
    • How many folders are you using?
    • Are they logically organized?



<<  back to TOC
The Bottom Line: File Naming Conventions
[ best practices ]
DO
  • useCamelCasing.docx
  • use_underscores.txt
  • 2015_put_The_Date_First.csv
  • 20150214_useTwoDidgitDateNumbers.xls
  • startASeriesWithLeadingZeros_001.doc
  • 20150214_UM_date-place.shp
  • useFileExtensions.jpg




Mac: Finder: Finder -> Preferences ... : "Advanced" tab, check box next to "Show all filename extensions"
Win 8 and above: File Explorer: "view" tab, check the box next to: "File name extensions"
DON'T
  • Leave spaces in the file name.xls
  • Use the default save name from MS word that is simply the long first sentence in your file.doc
  • January 5 2015 Samples with the month first.xls
  • Label as final version.doc
  • "special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > - + /"
  • No more than about 25 characters




<<  back to TOC
The Bottom Line: File Naming Conventions
[ best practices ]
DO
  • useCamelCasing.docx
  • use_underscores.txt
  • 2015_put_The_Date_First.csv
  • 20150214_useTwoDidgitDateNumbers.xls
  • startASeriesWithLeadingZeros_001.doc
  • 20150214_UM_date-place.shp
  • useFileExtensions.jpg




DON'T
  • Leave spaces in the file name.xls
  • Use the default save name from MS word that is simply the long first sentence in your file.doc
  • January 5 2015 Samples with the month first.xls
  • Label as final version.doc
  • special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > - + /
  • No more than about 25 characters




Mac: Finder: Finder -> Preferences ... : "Advanced" tab, check box next to "Show all filename extensions"
Win 8 and above: File Explorer: "view" tab, check the box next to: "File name extensions"
Be Consistent
<<  back to TOC
File Tagging: a *new* approach?
Think about your music library
Think about article keywords
Part of the semantic web conversation


Mac
  • Native on Mac (colors)
  • Not really searchable

PC
  • Native in Office (file->info)
copyright © Gary Larson - used under "fair use"
<<  back to TOC
Lose a file?
Use file tagging systems to better keep track
  • for tagging files as they are created and then the tags are indexed
  • good for not losing things in the first place.


Mac:

PC:



<<  back to TOC
File Versioning
  • Turn on versioning or tracking in collaborative documents
    • Word documents, excel, etc
    • Learn by doing!
  • Turn on versioning for storage utilities
    • Wikis
    • Google Docs, BOX
  • Consider using version control software
    • Subversion (apache foundation), TortoiseSVN (commerical subversion), git (Linus Torvalds), bitbukkit and/or github (both commercial versions of git)
    • Mostly designed for collaborative coding, but . . .
    • Check with your lab/colleagues as to their preference



<<  back to TOC
Online Versioning and Sharing Services
<<  back to TOC
Batch Naming
  • Photos
  • Instrument data
  • Moving files across languages



<<  back to TOC
Tools for File Management: bulk rename
Windows

Mac

Linux

Unix: The use of grep command to search for regular expressions



<<  back to TOC
Information in the Filename
  • Version number
  • Date of creation
  • Name of creator
  • Description of content
  • Name of research team/department associated with the data
  • Publication date
  • Project number
metadata
<<  back to TOC
Retraction Watch
“A problem with a malfunctioning computer and image storage and mislabeling led to the assembling by one of the co-authors of images that were previously published by our research group. I didn’t detect the problem when the manuscript was sent for publication. Although the conclusions were not compromised in any of the two papers, we retract the papers precisely because some images were wrongly used.”

Principal investigator Jorge Leitão
<<  back to TOC
Retraction Watch
“In the 2011 paper (http://jb.asm.org/content/196/22/3980), it was first submitted to other 2 journal (JBC and RNA Biology), whom requested a lot of modifications, and therefore, we accumulated a lot of processed data files. In between the process, the hard-drive of the computer that was used to store the data files (which is shared by 5 research groups) stopped working due data overloading. Nonetheless, we were able to retrieve the original data, or so we thought. At the time, I was responsible for composing the final figures of each paper that we produced, and asked the team members to give me the files. In Figure 8 of this paper, it seemed that there has been a labeling error in the source files, and I did not realize that some images where duplicated in the experiment that was being represented, neither that parts of the image had already been published. I should stress that that the images were produced in our lab and represent our data.”

First author Christian Ramos
<<  back to TOC
Quick Review
  • Organization
  • Context
  • Consistency
YYYYMMDD_projectID_place_001.ext
<<  back to TOC
The benefits of consistent data file labeling are:
  • Data files are distinguishable from each other within their containing folders
  • Data file naming prevents confusion when multiple people are working on shared files
  • Data files are easier to locate and browse
  • Data files can be retrieved not only by the creator but by other users
  • Data files can be sorted in logical sequence
  • Data files are not accidentally overwritten or deleted
  • Different versions of data files can be identified
  • If data files are moved to other storage platform their names will retain useful context
<<  back to TOC
Security and Privacy
University of Miami Human Subjects Research Office
https://hsro.uresearch.miami.edu

Data derived from human subjects research

Personally Identifiable Information (PII)


<<  back to TOC
UM Secure Storage Options
BOX is HIPAA compliant
OneDrive Enterprise is HIPAA compliant (likely your first choice)
IDSC storage services are HIPAA compliant
Qualitrics survey tool to build and manage secure online surveys
https://www.it.miami.edu/a-z-listing/survey-tools/index.html




<<  back to TOC
Tools from UM Information Technology
Velos: https://cas.it.miami.edu/research-it/velos-eresearch/
Velos eResearch Clinical Trials Management Software is a tool for managing clinical trials data.


REDcap: https://cas.it.miami.edu/research-it/redcap/
Research Electronic Data Capture is an application that allows users to build and manage online surveys and databases quickly and securely.




<<  back to TOC
Not all data has security and privacy "needs", BUT ...
  • Use automatic updates (mostly for virus issues)
  • Use anti-virus software (also anti-intrusion)
  • Use a firewall (also password protect file-sharing)
  • Never connect to untrusted wireless connections
  • Computer disposal?
  • Know HTTPS, SSH/SCP, sFTP
  • Understand certificate errors
  • Do not send confidential email
  • Public computers
    • Always log out
    • Never leave data/files


<<  back to TOC
Common sense (?)
Password lock all devices (in case of theft or loss)
Encryption

Empty trash securely
  • Mac: Finder->Preferences: Advanced Tab, “Empty Trash Securely”
  • PC: Eraser (open source GPL): http://eraser.heidi.ie/


<<  back to TOC
Password Management
Mistakes leading to weak passwords (Do not make these mistakes when choosing a password):
  • your username as a password (even backwards or mixed up).
  • using any name, or any word in any language.
  • obvious personal information (your year of birth, phone number, national insurance number, address, etc.).
  • all digits, or just one letter.
  • real words with only one or two obvious digit substitutions, like 'p4ssword' or '5ecret'.
  • fewer than eight characters ("brute force" attack cracks 7 letters in a few minutes).
  • characters from books, films, etc. (Gandalf, Sherlock), band names, song titles etc. (no matter how obscure).
  • passwords that are too easy or too difficult to type



<<  back to TOC
Encryption

  $ openssl

  $ openssl des3 -in test.txt -out encrypted.txt

  $ openssl des3 -d -in encrypted.txt -out testout.txt
            


Singh, S. 1999. The Code Book: the science of secrecy from ancient Egypt to quantum cryptography. Fourth Estate, London.
NOTE: encryption mistakes are irreversible
<<  back to TOC
How important is your data?

take a moment of
silence to imagine
what would happen
if your computer
failed today



<<  back to TOC
The World of Data Around Us: Data Loss

  • Natural disaster
  • Facilities infrastructure failure
  • Storage failure
  • Server hardware/software failure
  • Application software failure
  • External dependencies (e.g. PKI failure)
  • Format obsolescence
  • Legal encumbrance
  • Human error
  • Malicious attack by human or automated agents
  • Loss of staffing competencies
  • Loss of institutional commitment
  • Loss of financial stability
  • Changes in user expectations and requirements
  • Upset boyfriend or girlfriend
slide from
<<  back to TOC
common data loss scenarios
"Researchers don't delete data, they lose it"
John Bixby - Vice Provost for Research at University of Miami





<<  back to TOC
Backup and Storage

Major Considerations
  • Who is responsible for backup ?
  • How often do you backup ?
  • Partial vs. full backups ?
  • Non-digital backups ?
  • Where (literally) will the backups be located ?
  • Do the backups need a description (metadata) ?
  • Manual vs automatic ?
  • Recovery procedures ?
  • Verification – how do you know the backup was successful ?
  • How long do you keep your backups ?
  • What happens when the project ends ?



<<  back to TOC
The Bottom Line: Storage and Backup
[ best practices ]
DO
  • RAID storage
  • External hard drives (XFAT)
  • Cloud storage and file-syncing
  • Duplicate computers or hard drives
  • Write down roles and responsibilities
  • Organize, file naming conventions, versioning
  • Have automatic backups
  • Verify backups
  • Open formats




The XFAT format is essential if you ever want to share between a mac and a pc
Mac: Applications:Utilities:Disk Utility
PC: right click in explorer -> Format
DON'T
  • USB thumb drives
  • Email files to yourself
  • Save files without knowing their location in the computer’s file structure
  • Backup when you remember




<<  back to TOC
The Bottom Line: Storage and Backup
[ best practices ]
Have all your work in at least three places at all times: working version + two backups

Drives fail, computers break, viruses happen, computers get stolen, usb thumb drives ALWAYS fail, you will make a mistake and delete your work on accident, ex-partners seek revenge, and the list goes on . . .




<<  back to TOC
Short-term solutions at UMa
Size Limits HIPAA Compliant Collaboration and Sharing Relational Databases Self Guided No Costs
Box, OneDrive, Googleb 5TB c
GPFS Storage (IDSC) > 10 TB d
File Server (UMIT) > 1 TB d e f

If one of these solutions does not meet your needs, you can consider self-managed solutions or please feel free to contact the UM Information Technology (UMIT) Service Desk, research data services at the Libraries, or the advanced computing services at IDSC for further assistance.


  1. None of these options are for long-term storage or preservation, please see our institutional repository or identify another disciplinary repository to meet this need.
  2. Google Drive has a 500MB storage limit.
  3. Box single file upload limit 15GB. Also note that network speed and congestion affects performance.
  4. Please see the advanced computing resources at IDSC or contact the Advanced Computing group directly for more information.
  5. To begin the request process, please contact the UMIT Service Desk: email itsupportcenter@miami.edu or call (305) 284-6565.
  6. Every request is evaluated on a case-by-case basis. Evaluations are based on the requested resource needs and the current resource allocations across campus. If the request is exceptionally large there may be cost sharing requirements.
<<  back to TOC
Short-term solutions at UM
For general sharing and collaboration needs please see the cloud storage solutions that Information Technology provides for students, staff and faculty:

box
Box
google
Google Drive
onedrive
OneDrive





<<  back to TOC
Short-term solutions at UM
    If you need more space at the University of Miami go to the Institute for Data Science and Computing (IDSC)






<<  back to TOC
The Cloud
The term “cloud computing” (or just “cloud”, in the context of computing) is a marketing buzzword with no coherent meaning. It is used for a range of different activities whose only common characteristic is that they use the Internet for something beyond transmitting files. Thus, the term spreads confusion. If you base your thinking on it, your thinking will be confused.

Richard Stallman - https://www.gnu.org/philosophy/words-to-avoid.html





<<  back to TOC
7 common cloud missteps
  1. You lost control of your data because of the fine print in a user agreement.
  2. Do a google search “*theNewCloudService* shady user agreement”
  3. You sent out a public link to a Google Doc so others could view and edit.
  4. Invite people through emails
  5. Your cloud account gets hacked (bad password).
  6. Better password management
  7. You use the same password for every app on your phone.
  8. See above
  9. Web trackers are storing information on the sites you visit online.
  10. Private browsing? Don’t stay logged in to google all the time?
  11. You granted an application (smartphone) every permission under the sun.
  12. Be thoughtful when you install and run new apps
  13. A small mobile app startup you know nothing about has access to your banking data.
  14. Let you bank manage your banking data


<<  back to TOC
Spinning (metal) disk or SSDs: Laptop or Desktop
  • Pros
    • High level of control over file system, naming, and physical location of disk
    • Easy to backup
    • Convenient
  • Cons
    • Risk of malware (virus)
    • Risk of theft, damage, loss, etc
    • System can eventually corrupt the disk (especially pcs)
    • Finite lifespan



SECURITY CONSIDERATIONS:
  • Virus Control
  • Theft or damage
  • Disk failure
<<  back to TOC
Spinning (metal) disk or SSDs: (Network) Server
  • Pros
    • High level of control over file system, naming, and physical location of disk
    • Likely has backup and maintenance schedule
    • Possible duplicate (mirror) images – RAID systems
    • Safe physical location
    Redundant Array of Independent Disks
  • Cons
    • Expensive to maintain
    • Migration can be difficult
    • Susceptible to catastrophic events





SECURITY CONCERNS:
  • Use SSL connections
  • Good control (physical location)
  • Encryption?
<<  back to TOC
External storage: SSDs and disk drives
  • Pros
    • Drives are cheap (sort of) and portable
    • Convenient
    • Memory is cheap and portable
  • Cons
    • Connection technologies change (USB, Firewire, SATA, and so on)
    • Drive failure (both spinning drives and SSD devices)
    • Easily damaged, stolen or lost
    • Finite space for large projects multiple drives may be necessary
    • Malware can be propagated (think unsafe sex)



SECURITY CONCERNS:
  • Theft and damage
  • Good control (physical location), but you are responsible
  • Encryption?
  • Virus protection
Does anyone use CDs anymore??? ZIP disks??? NOT recommended!!
<<  back to TOC
Networked drives: personal cloud
  • Pros
    • Drives are cheap (sort of) and portable
    • Convenient access from anywhere
    • Easy to install and sync
    • Private: password protected
  • Cons
    • Upload/download bottlenecks
    • Susceptible to acts catastrophic events
    • Needs permanent power
    • Needs IP address



SECURITY CONCERNS:
  • Use SSL connections
  • Theft or damage
  • Encryption?
Western Digital, Seagate . . .
Perhaps buy online, out of state?
<<  back to TOC
Networked drives: the "cloud"
Includes: Repositories, Online Versioning and Sharing
  • Pros
    • No failure or backup worries (they do it)
    • Can be secure (depends)
    • Convenient
    • Good for catastrophic events
  • Cons
    • Upload/download bottlenecks
    • Fees?
    • Long-term? No standards?
    • How to get copies of all your data (try this for google drive)
    • No control




SECURITY CONCERNS:
  • Check the service agreement for encryption algorithm
  • Use SSL connections
  • No control
<<  back to TOC
Data Ownership
?
<<  back to TOC
2020-2021 UM Faculty Manual
Innovations: patentable or un-patentable inventions, discoveries, processes, compositions, research tools, data, ideas, databases, know-how, copyrightable works that are not scholarly or artistic Creations and tangible property, including biological organisms, engineering prototypes, drawings, and software created, conceived or made by Applicable Personnel within their normal duties (including clinical duties), course of studies, field of research or scholarly expertise or making more than Incidental Use of University’s resources. (p. 141)
3.3 Innovations are owned by the University; revenues derived from commercialization of Innovations will be shared with the Applicable Personnel as detailed in Section VI. (p. 142)


<<  back to TOC
Really??
yes (and no):
  • Data cannot be copyrighted, but code can (USA)
  • University may "own" data, but stewardship lies with researcher
  • funders can request to own data
  • research preformed with federal dollars goes to the public domain
  • bottom line
    • I am not a lawyer
    • laws are for interpreting
    • currently being interpreted


<<  back to TOC
Licensing Things
<<  back to TOC
Some Open Source Licenses
  • Apache License, 2.0 (Apache-2.0)
  • BSD 3-Clause "New" or "Revised" license (BSD-3-Clause)
  • BSD 2-Clause "Simplified" or "FreeBSD" license (BSD-2-Clause)
  • GNU General Public License (GPL)
  • GNU Library or "Lesser" General Public License (LGPL)
  • MIT license (MIT)
  • Mozilla Public License 2.0 (MPL-2.0)
<<  back to TOC
Metadata
?
Data 'reporting'
  1. WHO created the data?
  2. WHAT is the content of the data?
  3. WHEN were the data created?
  4. WHERE is the data geographically?
  5. HOW were the data developed?
  6. WHY were the data developed?
<<  back to TOC
Metadata you already know

Human Readable
<<  back to TOC
TY - BOOK
DB - /z-wcorg/
DP - http://worldcat.org
ID - 702896
LA - English
T1 - Traces on the Rhodian shore : nature and culture in Western thought from ancient times to the end of the eighteenth century
AU - Glacken, Clarence J.
PB - University of California Press
CY - Berkeley
Y1 - 1973///
SN - 0520023676 9780520023673 0520032160 9780520032163
ER -
Metadata you already know

Machine Readable
<<  back to TOC
A love letter to the future
"Scientific metadata provide the information necessary for investigators separated by time, space, institution or disciplinary norm to establish common ground."  -  Christine Borgman


Edwards, Mayernik, Betcheller, Bowker, and Borgman (2011). Science friction: Data, metadata, and collaboration. Social Studies of Science 41(5): 667-690. http://dx.doi.org/10.1177/0306312711413314
<<  back to TOC
Working with data
  • When you provide data to someone else, what types of information would you want to include with the data?




  • When you receive a dataset from an external source, what types of details do you want to know about the data?
DataONE Education Module: Lesson 07: Metadata. DataONE.
https://www.dataone.org/education-modules
<<  back to TOC
Working with data

Providing data

  • Why were the data created?
  • What limitations, if any, do the data have?
  • What does the data mean?
  • How should the data be cited?
Receiving data

  • What are the data gaps?
  • What processes were used for creating the data
  • Are there any fees associated with the data?
  • In what scale were the data created (units)?
  • What do the values in the tables mean (data dictionary)?
  • What software do I need to use the data?
  • What projection are the data in (geospatial data)?
  • Can I give these data to someone else?
DataONE Education Module: Lesson 07: Metadata. DataONE.
https://www.dataone.org/education-modules
<<  back to TOC
Describing Metadata

Descriptive
  • Project: Describe the overall project (author, date, place, etc.)
  • Technical: Describe individual project elements (tables, column headers, data dictionary etc.)

Structural
  • Describe how different elements of the data(set) fit together

Administrative
  • Rights Management
  • Preservation
National Information Standards Organization (NISO) (2004). Understanding Metadata.
http://www.niso.org/publications/press/UnderstandingMetadata.pdf
<<  back to TOC
Standards and Schemas

Idea of standardized set of elements
  • Minimal to maximal, depends on purpose, audience, domain, and structure

Dublin Core
  • One of the most common (XML) – not Dublin Ireland
  • Used as a starting point for many other schema

The Dublin Core Metadata Element Set is a vocabulary of fifteen properties for use in resource description. The name "Dublin" is due to its origin at a 1995 invitational workshop in Dublin, Ohio; "core" because its elements are broad and generic, usable for describing a wide range of resources.
<<  back to TOC
DC

Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights
DC Example

Title=
Creator=
Creator=
Creator=
Subject=
Description=
Publisher=
Publisher=
Date=
Type=
Format=
Identifier=
Language=
 

”Metadata Demystified”
”Brand, Amy”
”Daly, Frank”
"Meyers, Barbara”
”metadata”
”Presents an overview of metadata conventions in publishing.”
”NISO Press”
”The Sheridan Press”
”2003-07"
”Text”
”application/pdf”
”http://www.niso.org/standards/resources/Metadata_Demystified.pdf”
”en”
<<  back
  to TOC
Standards and Schemas

Project Open Data
  • US Government standard (json-ld)
  • 2013 data sharing requirements and open-data policy briefs

The White House developed Project Open Data – this collection of code, tools, and case studies – to help agencies adopt the Open Data Policy and unlock the potential of government data. Project Open Data will evolve over time as a community resource to facilitate broader adoption of open data practices in government. Anyone – government employees, contractors, developers, the general public – can view and contribute.
<<  back to TOC
Project Open Data Required Fields
Field Label Definition Required
title Title Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery. Always
description Description Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest. Always
keyword Tags Tags (or keywords) help users discover your dataset; please include terms that would be used by technical and non-technical users. Always
modified Last Update Most recent date on which the dataset was changed, updated or modified. Always
publisher Publisher The publishing entity and optionally their parent organization(s). Always
contactPoint Contact Name and Email Contact person’s name and email for the asset. Always
identifier Unique Identifier A unique identifier for the dataset or API as maintained within an Agency catalog or database. Always
accessLevel Public Access Level The degree to which this dataset could be made publicly-available, regardless of whether it has been made available. Choices: public (Data asset is or could be made publicly available to all without restrictions), restricted public (Data asset is available under certain use restrictions), or non-public (Data asset is not available to members of the public). Always
bureauCodeUSG Bureau Code Federal agencies, combined agency and bureau code from OMB Circular A-11, Appendix C (PDF, CSV) in the format of 015:11. Always
programCodeUSG Program Code Federal agencies, list the primary program related to this data asset, from the Federal Program Inventory. Use the format of 015:001. Always
<<  back to TOC
Metadata Review
  • metadata is a description of your data for a future user (you perhaps?)
    • What does this person need to know to use the data properly?
    • Does this person need discipline specific knowledge? How much?
  • Two general kinds of descriptive metadata
    • Project level (contextual)
    • Technical (data level, units, headers, etc.)
  • How will the metadata be captured
    • Notebooks (electronic??)
    • Device Capture
  • What format (with justification)
    • Discipline specific standard? Other standard?
    • Machine or human readable (both?)
<<  back to TOC
DOIs and ORCIDs
  • Digital Object Identifiers (DOIs)

  • Permanent identifiers (links) to online resources
  • Resolved through a service (https://doi.org/)
  • All repositories provide these for your data
  • UM is a member of DataCite and CrossRefwho provide our DOIs
  • ORCID




Work together to connect research to researcher
<<  back to TOC
Journal requests DOI for article review
-or-
you just want to share.
  • Find the right place
  • Create the correct package
“Sharing data from one laboratory to another—or even within a laboratory—takes time and effort, but there are also psychological, cultural and technological barriers to doing so.”
Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, Cain KE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. https://doi.org/10.1371/journal.pbio.1001779
<<  back to TOC
Find the Right Place
Kinds of Repositories
<<  back to TOC
Create the correct deposit package
(a zip file or something similar)
  • best practices for naming, formats, organization
  • project metadata (best practices or standard)
  • technical metadata: README.txt
  • licensing information
  • checksum


<<  back to TOC

Select archive location

Considerations

  • Costs
  • Size of dataset
  • Public vs. private access
  • Length of preservation
  • Hands-on vs. hands-off
  • Security of platform
  • Disciplinary standards
Locations

  • Individual
  • Department/college
  • University-wide
  • Discipline-specific
  • 3rd party
archive vs. sharing mechanism
adapted from Whitmire, Amanda L. (2014). Research Data Management Curriculum, Lecture 15: Data Preservation. Oregon State University Libraries. Retrieved 11/04/2015 from: http://figshare.com/articles/GRAD521_Research_Data_Management_Lectures/1003835
<<  back to TOC
Archive in discipline-specific repository
  • Replicated, archive-quality storage
  • Data curation throughout ingest & archive period
  • Data in context with other datasets


Costs (100 GB dataset)
$Depends
<<  back to TOC
Discipline Specific Repository Directories




<<  back to TOC
Archive @ University of Miami Institutional Repository
https://scholarship.miami.edu
  • Storage in Esploro (commercial service run by XLibris)
  • Remotely accessible (restrictions/embargoes possible)
  • OK for unrestricted data. Not compliant for sensitive (FERPA) and protected data


Costs (5 GB dataset)
$($0/year * 5 GB) = $0
NOTE: UM has a 5GB limit for cost-free archiving. For datasets larger than 5GB please contact researchdata@miami.edu.
<<  back to TOC
3rd party repository platforms
  • easy and hassle free
  • highly visible and publically available
  • longevity is unknown


Costs (100 GB dataset)
$mostly free
<<  back to TOC
Archive w/department IT
  • Depends on department IT


Costs (100 GB dataset)
$Under development???
(check your dept.)
<<  back to TOC
Archive on your own
  • You buy & manage hardware, replication, backups and networking (if applicable, for offsite access)
  • OK for unrestricted, sensitive (FERPA), and protected data


Costs (100 GB dataset)
$Ranges
(but generally cheap)
<<  back to TOC
Researcher perspectives on sharing
Sharing analysis scripts and data sets Frequency percent (valid)
1. Willing to share publicly 120 25.9%
2. Willing to share under access control 98 21.1%
3. Willing to share only on request 163 35.1%
4. Not willing to share 83 17.9%
Sum 464 100%
Source: SOEP User Survey 2013, own calculations

adapted from doi:10.1371/journal.pone.0118053.t005
Age
Control
Resource
Returns
Discipline
<<  back to TOC

Data Sharing

Since the initial publication of the 2003 NIH data sharing requirements, two sets of principles have emerged that guide how we share data:
  • the FAIR Principles
  • the CARE Principles


<<  back to TOC
FAIR Data Sharing
The principles provide guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets
  • The FAIR principles emphasize machine actionability (computers can find, access, interoperate, and reuse data)
  • The four main principles all have sub-sections that outline best practices
In 2016, the ‘FAIR Guiding Principles for scientific data management and stewardship’ were published in Scientific Data.
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
<<  back to TOC


Carroll, S., Barba, I., Figueroa-Rodríguez, O., Holbrook, J., Lovett, R., Materechera, S., Parsons, M.< Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J., Anderson, J., and Hudson, M.(2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19: XX, pp. 1–12. DOI: https://doi.org/10.5334/dsj-2020-042
The CARE Principles for Indigenous Data Governance were drafted at the International Data Week and Research Data Alliance Plenary co-hosted event “Indigenous Data Sovereignty Principles for the Governance of Indigenous Data Workshop,” 8 November 2018, Gaborone, Botswana.
"Existing principles within the open data movement (e.g. FAIR: findable, accessible, interoperable, reusable) primarily focus on characteristics of data that will facilitate increased data sharing among entities while ignoring power differentials and historical contexts."
<<  back to TOC

What is a Data Management Plan

The Data Management Plan is a written document that describes the data you expect to acquire or collect throughout a research project, how you will collect, organize, document, and analyze the data, and finally how you will share, publish and preserve the data.


<<  back to TOC

Data Management Plans

  • Information about Data and Data Formats
  • Metadata Content and Format
  • Policies for access sharing and re-use
  • Long-term storage and preservation
  • Budget


<<  back to TOC
Data Management Plans
  • Compliance vs Practice
  • Check all funder requirements (funder website or DMPTool)
  • Consider your data
    • How will the value of your data change with time?
    • Who will want to use your data? Why?
    • Who owns the data for your research?
    • Are there privacy concerns with your data?
    • Are there existing repositories and standards for your data?


<<  back to TOC
Evaluate a DMP
  • Describes what types of data will be captured, created or collected

  • Describes how data will be collected, captured, or created (observations, models, reuse, etc.)

  • Identifies how much data (volume) will be produced

  • Describes how the data will be made publicly available

  • Provides details on when the data will be made publicly available
Complete/
detailed
Addressed issue, but not completeDid not address
Other criteria??
<<  back to TOC

Quick Review: Data Management Plans

  • Information about Data and Data Formats
  • Metadata Content and Format
  • Policies for access sharing and re-use
  • Long-term storage and preservation
  • Budget

BUT it always will depend on your goals and the funder’s goals


<<  back to TOC

Electronic Lab Notebooks

Harvard's Notes on ELNs

  • Open Source or Proprietary?
  • Security?
  • Data Sharing Facilitation?
  • Backup?
  • Budget?




<<  back to TOC

Data Ethics

Data does not tell stories, humans tell stories with data.



Extreme care should be taken with curation of data. It is the stories we tell as scientists, reporters, and other interested parties that shape societal reaction.


<<  back to TOC

Data Ethics

  • Coding ethics
  • Privacy issues
  • Data governance and data sovereignty
    • Data ownership
    • Data access
    • Data control
    • Data possession
  • Telling stories with data




<<  back to TOC
Thanks
Timothy Norris, PhD
Librarian Associate Professor Data Science
Fall 2022 UM Data Services Courses and Workshops
University of Miami Libraries
Institute for Data Science and Computing
https://bit.ly/2IY8eUi
tnorris@miami.edu

Get a good text editor



<<  back to TOC