Data Management in the
Research Environment
File Formats and File System Organization
Library Workshop Series
https://bit.ly/2QGtdPr
Dr. Timothy Norris - Research Data Scientist - tnorris@miami.edu
some things to talk about
Resources at UM
tnorris@miami.edu

Timothy Norris

Data Scientist

(305) 284-2826 tnorris@miami.edu

criopelle@miami.edu

Cameron Riopelle

Head of Data Services

(305) 284-3257 criopelle@miami.edu

??

Biomedical Data Librarian

newperson@miami.edu


aparrish@miami.edu

Abraham Parrish

GIS Services Librarian

(305) 284-9488 aparrish@miami.edu

aparrish@miami.edu

Jorge Quintella

GIS and Data Specialist

(305) 284-5729 jaq32@miami.edu

Working with Data Workshop Series
  • Data Analysis Software Instruction
    Dr. Cameron Riopelle, Head of Data Services, criopelle@miami.edu
    • Introduction to SPSS
      Designed for new SPSS users. It provides an introduction to the SPSS software program, including its software environment, importing data, descriptive -statistics, transforming variables, selecting and splitting data, and visualization.
      Richter Library – Digital Scholars' Lab - Wednesday September 18 - 3 - 4:30pm (17 people)
    • Intermediate SPSS
      Designed for intermediate SPSS users. It covers common statistical methods in SPSS such as means comparisons, ANOVA, linear regression, and logistic regression models.
      Richter Library – Digital Scholars' Lab - Wednesday September 25 - 3 - 4:30pm (17 people)
    • SAS for Data Analysis
      Designed for both new and intermediate SAS users. It provides an overview of the SAS software program, including its programming language, software environment, importing data, descriptive statistics, transforming variables, selecting and splitting data, exploratory tests, regression models, and visualization.
      Richter Library – Digital Scholars' Lab - Wednesday October 2 - 3 - 4:30pm (17 people)
    • Introduction to R/RStudio
      Designed for new R and R Studio users. It provides an introduction to the R software program, including its programming language, software -environment, importing data, descriptive statistics, transforming variables, selecting and splitting data, exploratory tests, and visualization.
      Richter Library – Digital Scholars' Lab - Wednesday October 23 - 3 - 4:30pm (17 people)
    • Intermediate R/RStudio
      Designed for intermediate R and R Studio users. It covers common statistical methods in R such as means comparisons, ANOVA, linear regression, and basic visualization.
      Richter Library – Digital Scholars' Lab - Wednesday October 30 - 3 - 4:30pm (17 people)
    • Data Visualization with Tableau
      Introduction to using the software program Tableau for data visualization. It covers making common graphs and tables, importing data, making sheets and dashboards, and exporting images.
      Richter Library – Digital Scholars' Lab - Wednesday November 6 - 3 - 4:30pm (17 people)
    • Data Visualization with R/RStudio
      Introduction to using the software program R for visualization. Prior experience with R required--this workshop assumes knowledge of the R language and environment.
      Richter Library – Digital Scholars' Lab - Wednesday November 20 - 3 - 4:30pm (17 people)
  • Research Data Management Series
    Dr. Timothy Norris, Data Scientist, tnorris@miami.edu
    • The Data Management Challenge: Wrangling Data in the Research Environment
      This is an introduction to topics in research data management designed to foster skills and encourage data management best practices for efficiency, compliance and security in the research environment. This is a discipline agnostic seminar. Specific learning goals include the identification of best practices for: file naming conventions, file system organization, data security, data privacy, backup strategies, data sharing, data documentation, and data publication. These topics introduce practical behaviors to ease the digital research process.
      Richter Library – Flex Space – Friday August 31, 10 – 11:30 am (30 people)
      RSMAS Library – Map Room – Friday September 6, 10 – 11:30 am (30 people)
      Calder Library (Miller) – First Floor Collaboratory – Friday September 13, 10 – 11:30 am (25 people)
    • Data Management and Research Design
      This is the first in a series of short seminars that explore topics in research data management (see intro above and other DM seminars in this list). This is a discipline agnostic seminar. Specific learning goals include to: Understand federal policy context for data management and sharing, explain the data lifecycle, critically evaluate existing Data Management Plans (DMPs), and be able to identify key elements in a DMP. As a product of the seminar/workshop attendees will outline a data management plan (DMP) for their research.
      Richter Library – Flex Space – Wednesday October 9, 10 – 11:30 am (30 people)
    • File Formats and System Organization for Research
      This is the second in a series of short seminars that explore topics in research data management (see intro above and other DM seminars in this list). This is a discipline agnostic seminar. Specific learning goals include to: Understand file format choices and their implications for data sharing, data publication, and data re-use, identify best practices for file system organization, and identify best practices for file naming conventions. As a product of the seminar/workshop attendees will select file formats, choose a file naming convention, and design a file system architecture for their research.
      Richter Library – Flex Space – Tuesday October 22, 10 – 11:30 am (30 people)
    • Research Data Description and Documentation
      This is the third in a series of short seminars that explore topics in research data management (see intro above and other DM seminars in this list). This is a discipline agnostic seminar. Specific learning goals include to: Identify disciplinary metadata standards (if they exist), describe discipline agnostic metadata standards, and understand how metadata facilitates discovery, sharing and access to data resources. As a product of the seminar/workshop attendees will create metadata for a selected subset of the data from their research.
      Richter Library – Flex Space – Wednesday November 6, 10 – 11:30 am (30 people)
    • Research Data Publication: Repositories and Sharing
      This is the fourth in a series of short seminars that explore topics in research data management (see intro above and other DM seminars in this list). This is a discipline agnostic seminar. Specific learning goals include to: Identify discipline specific repositories, understand data ownership in the context of research institutions, be able to decode repository requirements to publish data, and create correct citations for referencing data in publications. As a product of the seminar/workshop attendees will create a deposit package from a selected subset of the data from their research for publication in a data repository.
      Richter Library – Flex Space – Tuesday November 19, 10 – 11:30 am (30 people)
  • GIS Software Instruction
    Dr. Jorge Quintela, GIS & Data Specialist, jaq32@miami.edu
    • Introduction to ArcGIS Online
      This workshop will introduce you to ArcGIS Online, the ESRI’s cloud-based mapping and analysis platform. You will learn how to create interactive maps, how to add, manage and share content, and how to perform basic spatial analysis procedures with your data. Participants will also learn how to create and share basic web apps. It is strongly recommended that participants register in advance for them to receive their ArcGIS Online credentials before the session starts.
      Richter Library – Flex Space – Thursday October 10, 2 – 4 pm (17 people)
  • Software Carpentry - https://software-carpentry.org
    Dr. Timothy Norris, Data Scientist, tnorris@miami.edu
    Dr. Cameron Riopelle, Head of Data Services, criopelle@miami.edu
    • Python
      Scientific computing using the python programming language. This workshop includes an introduction to command line computing, version control and python.
      Calder Library - Downstairs Collaboratory - November 11-12, 9 am - 5 pm (35 people)
    • R/RStudio
      Scientific computing using the R programming language. This workshop includes an introduction to command line computing, version control and R.
      RSMAS Library - Map Room - December 2-3, 9 am - 4:30 pm (35 people)
On Data
Qualitative - Quantitative
Non-numeric
Text, Image, Sound
Nominal, Ordinal, Interval, Ratio
What is Data?

Measurement LevelDefinitionExample
NominalCategorical in nature, with observations recorded into discrete units.Unmarried, married, divorced, widowed
OrdinalObservations that are placed in a rank order, where certain observations are greater than othersLow, medium, high
IntervalMeasurements along a scale which possesses a fixed but arbitrary interval and an arbitrary origin. Addition or multiplication by a constant will not alter the interval nature of the observations. Data can either be continuous or discrete in nature.Temperature along the Celsius scale
RatioSimilar to interval data except the scale possesses a true zero origin, and multiplication by a constant will not alter the ratio nature of the observations.Exam marks on a scale of 0–10


On Data
Qualitative - Quantitative
Non-numeric
Text, Image, Sound
Nominal, Ordinal, Interval, Ratio
Captured, Exhaust, Transient, Derived
Observed
Experimental,
Modeled
Modeled
Technical Metadata
Non-Observed
Not "Raw"
Levels (more in a moment)
Data Levels (as described by NASA)

Data LevelDescription
Level 0Reconstructed, unprocessed instrument and payload data at full resolution, with any and all communications artefacts (e.g., synchronisation frames, communications headers, duplicate data) Removed.
Level 1aReconstructed, unprocessed instrument data at full resolution, time-referenced, and annotated with ancillary information, including radiometric and geometric calibration coefficients and georeferencing parameters computed and appended but not applied to Level 0 data.
Level 1bLevel 1A data that have been processed to sensor units
Level 2Derived geophysical variables at the same resolution and location as Level 1 source data.
Level 3Variables mapped on uniform space-time grid scales, usually with some completeness and consistency
Level 4Model output or results from analyses of lower-level data (e.g., variables derived from multiple measurements).

https://earthdata.nasa.gov/user-resources/standards-and-references
http://science.nasa.gov/earth-science/earth-science-data/data-processing-levels-for-eosdis-data-products/
Sensors and Data Levels

Active vs. Static:Data Storage:Example or Focus:Typical File Formats:
ACTIVERaw Data:Temperature readings over timePaper? Device-specific? .xlsx, …
Processed Data:“Cleaned,” normalized temperature data compiled in spreadsheet.xlsx, .sas, …
Analyzed Data:Temperature data with averages computed, graphs charted.xlsx, .sas, …
STATICFinalized, Published Data:Do the data support hypothesis?.csv


adapted from http://classguides.lib.uconn.edu/
On Data
Qualitative - Quantitative
Non-numeric
Text, Image, Sound
Nominal, Ordinal, Interval, Ratio
Captured, Exhaust, Transient, Derived
Observed
Experimental,
Modeled
Modeled
Technical Metadata
Non-Observed
Not "Raw"
Levels (more in a moment)
Structured, Semi-structured, Unstructured
Irregular, Flexible
Nested, Trees, Tagged
Data model, Schema,
Relational Database
Primary, Secondary, Tertiary
Incorporated
Re-used
Created, Collected
Released
Truncated
Data: Primary, Secondary, and Tertiary
  • Primary: research generated (from instruments or observations)
  • Secondary: acquired for research project from another source
  • Tertiary: derivative of primary or secondary data (anonymized, annotated, bundled, and so on)



On Data
Qualitative - Quantitative
Non-numeric
Text, Image, Sound
Nominal, Ordinal, Interval, Ratio
Kitchin, R (2014). “Conceptualizing Data” in Kitchin, R The Data Revolution. Washington DC: Sage.
http://uk.sagepub.com/sites/default/files/upm-binaries/63923_Kitchin_CH1.pdf
Captured, Exhaust, Transient, Derived
Observed
Experimental,
Modeled
Modeled
Technical Metadata
Non-Observed
Not "Raw"
Levels (more in a moment)
Structured, Semi-structured, Unstructured
Irregular, Flexible
Nested, Trees, Tagged
Data model, Schema,
Relational Database
Primary, Secondary, Tertiary
Incorporated
Re-used
Created, Collected
Released
Truncated
Indexical, Attribute, Metadata
Identifiers
Descriptions
Characteristics
Some Useful Abstractions


“Information is not knowledge.
Knowledge is not wisdom.
Wisdom is not truth.
Truth is not beauty.
Beauty is not love.
Love is not music.
Music is THE BEST.”

― Frank Zappa  

What data will you collect / create / wrangle ?

National GeograpahicInstitute
Ministry of theEnvironment
Ministry of Energyand Mines
Previous Area Studies
Systemic Theories
Methods
USGS / NASAsatellite imagery
SRTM togographic data
Six countygovernments
Ten communities
Three stategovernments
SURVEYINTERVIEWS
Pasture Transects
Water QualityAnalysis
GPS Data Collected
Productivity Model
Disturbance Model
Topographic Model
Conservation Zoning Maps
Land Use Maps
Raw
Analyzed
Processed
Finalized / Published

What data will you collect / create / wrangle ?

  • Will you use sensors? - OBSERVATIONAL

    • Captured in situ?
    • Can’t be recreated, recaptured or replaced - VALUE
    • Includes survey instruments and hired research assistants
    • But, will you collect data, buy data from a provider, or receive data as a contracted service?




What data will you collect / create / wrangle ?

  • Will you conduct an experiment? - EXPERIMENTAL

    • In situ or laboratory based (also considered are natural experiments)?
    • Should be reproducuble, but can be expensive
    • May include sensors and observations




What data will you collect / create / wrangle ?

  • Will you build models? - SIMULATED

    • Will you write code?
    • How will you parametrize the model?
    • Inputs may be more valuable than outputs
    • What software (or other tools) will you use?




What data will you collect / create / wrangle ?

  • Will you combine and analyze previously shared data to create new data? – DERIVED or COMPILED

    • Integration from several sources
    • Recreation can be very expensive
    • Again, software and tools?
    • Are there copyright concerns?




What data will you collect / create / wrangle ?

  • Will you draw from previously published materials? – REFERENCE or CANONICAL

    • Peer reviewed
    • Can be data or textual



"This is the most creative, important and valuable aspect of research data."

  • Do you agree?
  • Write a paragraph on why or why not you agree with this statement




Your Turn
  • Think about your research project – if you don’t have one, partner with someone who does OR imagine your future internship

    • DRAW your RESEARCH LIFE CYCLE (example)
      • Remember: before, during, after (reminder)

    • CONSIDER DATA as
      • Qualitative and Quantitative
      • observational, experimental, derived, simulated, refere
      • raw, processed, analyzed, published
      • primary, secondary, tertiary

    • MATCH the STAGES of the research lifecycle with DATA TYPES
      • Think about management/wrangling at each research stage with each data type



File Formats
"The file formats you use have a direct impact on your ability to open those files at a later date and on the ability of other people to access those data."
Stanford University - Best practices for file formats
File Formats
“A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium.” - Wikipedia


Data type
Qualitative, tabular
experimental data
{
Possible formats
Excel spreadsheet (.xlsx)
Comma-delimited text (.csv)
Access database (.mdb/,accdb)
Google Spreadsheet
SPSS portable file (.por)
XML file
Whitmire, Amanda L. (2014). Research Data Management Curriculum, Lecture 3: Introduction to Research Data Management. Oregon State University Libraries.
Retrieved 11/04/2015 from: http://figshare.com/articles/GRAD521_Research_Data_Management_Lectures/1003835
File Formats
Main article: List of archive formats
.cab – A cabinet (.cab) file is a library of compressed files stored as one file. Cabinet files are used to organize installation files that are copied to the user's system.[1]
.?Q? – files compressed by the SQ program
7z – 7-Zip compressed file
AAC – Advanced Audio Coding
ace – ACE compressed file
ALZ – ALZip compressed file
APK – Applications installable on Android
APPX – Microsoft Application Package (.appx)
AT3 – Sony's UMD data compression
.bke – BackupEarth.com data compression
ARC – pre-Zip data compression
ARJ – ARJ compressed file
ASS (also SAS) – a subtitles file created by Aegisub, a video typesetting application (also a Halo game engine file)
B – (B file) Similar to .a, but a little less compressed...
BA – Scifer Archive (.ba), Scifer External Archive Type
big – Special file compression format used by Electronic Arts to compress the data for many of EA's games
BIN – compressed archive, can be read and used by CD-ROMs and Java, extractable by 7-zip and WINRAR
bjsn – Used to store The Escapists saves on Android.
BKF (.bkf) – Microsoft backup created by NTBackup.c
bzip2 (.bz2) –
bld – Skyscraper Simulator Building
c4 – JEDMICS image files, a DOD system
cab – Microsoft Cabinet
cals – JEDMICS image files, a DOD system
CLIPFLAIR (.clipflair, .clipflair.zip) – ClipFlair Studio [1] component saved state file (contains component options in XML, extra/attached files and nested components' state in child .clipflair.zip files – activities are also components and can be nested at any depth)
CPT, SEA – Compact Pro (Macintosh)
DAA – Closed-format, Windows-only compressed disk image
deb – Debian install package
DMG – an Apple compressed/encrypted format
DDZ – a file which can only be used by the "daydreamer engine" created by "fever-dreamer", a program similar to RAGS, it's mainly used to make somewhat short games.
DPE – Package of AVE documents made with Aquafadas digital publishing tools.
.egg – Alzip Egg Edition compressed file
EGT (.egt) – EGT Universal Document also used to create compressed cabinet files replaces .ecab
ECAB (.ECAB, .ezip) – EGT Compressed Folder used in advanced systems to compress entire system folders, replaced by EGT Universal Document
ESS (.ess) – EGT SmartSense File, detects files compressed using the EGT compression system.
Flipchart file (.flipchart) – Used in Promethean ActivInspire Flipchart Software.
GFA – Graphical Fragment Assembly
GHO (.gho, .ghs) – Norton Ghost
GIF (.gif) – Graphics Interchange Format
gzip (.gz) – Compressed file
IPG (.ipg) – Format in which Apple Inc. packages their iPod games. can be extracted through Winrar
jar – ZIP file with manifest for use with Java applications.
LBR (.Lawrence) – Lawrence Compiler Type file
LBR – Library file
LQR – LBR Library file compressed by the SQ program.
LHA (.lzh) – Lempel, Ziv, Huffman
lzip (.lz) – Compressed file
lzo
lzma – Lempel–Ziv–Markov chain algorithm compressed file
LZX (algorithm)
MBW (.mbw) – MBRWizard archive
MPQ Archives (.mpq) – Used by Blizzard games
BIN (.bin) – MacBinary
NTH (.nth) – Nokia Theme Used by Nokia Series 40 Cellphones
OSZ – osu! compressed beatmap archive
PAK – Enhanced type of .ARC archive
PAR (.par, .par2) – Parchive
PAF (.paf) – Portable Application File
PYK (.pyk) – Compressed file
PK3 (.pk3) – Quake 3 archive (See note on Doom³)
PK4 (.pk4) – Doom³ archive (Opens similarly to a zip archive.)
RAR (.rar) – Rar Archive, for multiple file archive (rar to .r01-.r99 to s01 and so on)
RAG, RAGS – Game file, a game playable in the RAGS game-engine, a free program which both allows people to create games, and play games, games created have the format "RAG game file"
RPM – Red Hat package/installer for Fedora, RHEL, and similar systems.
SEN – Scifer Archive (.sen) – Scifer Internal Archive Type
SIT (.sitx) – StuffIt (Macintosh)
SKB – Google SketchUp backup File
SZS – Nintendo U8 archive
TAR – group of files, packaged as one file
TGZ (.tar.gz) – gzipped tar file
TB (.tb) – Tabbery Virtual Desktop Tab file
TIB (.tib) – Acronis True Image backup
UHA – Ultra High Archive Compression
UUE (.uue) – unified utility engine – the generic and default format for all things UUe-related.
VIV – Archive format used to compress data for several video games, including Need For Speed: High Stakes.
VOL – video game data package.
VSA – Altiris Virtual Software Archive
WAX – Wavexpress – A ZIP alternative optimized for packages containing video, allowing multiple packaged files to be all-or-none delivered with near-instantaneous unpacking via NTFS file system manipulation.
xz - xz compressed files, based on LZMA/LZMA2 algorithm
Z – Unix compress file
zoo – based on LZW
zip – popular compression format
Physical recordable media archiving[edit]
ISO – The generic format for most optical media, including CD-ROM, DVD-ROM, Blu-ray Disc, HD DVD and UMD.
NRG – The proprietary optical media archive format used by Nero applications.
IMG – For archiving DOS formatted floppy disks, larger optical media, and hard disk drives.
ADF – Amiga Disk Format, for archiving Amiga floppy disks
ADZ – The GZip-compressed version of ADF.
DMS – Disk Masher System, a disk-archiving system native to the Amiga.
DSK – For archiving floppy disks from a number of other platforms, including the ZX Spectrum and Amstrad CPC.
D64 – An archive of a Commodore 64 floppy disk.
SDI – System Deployment Image, used for archiving and providing "virtual disk" functionality.
MDS – DAEMON tools native disc image format used for making images from optical CD-ROM, DVD-ROM, HD DVD or Blu-ray Disc. It comes together with MDF file and can be mounted with DAEMON Tools.
MDX – New DAEMON Tools format that allows getting one MDX disc image file instead of two (MDF and MDS).
DMG – Macintosh disk image files
(MPEG-1 is found in a .DAT file on a video CD.)

CDI – DiscJuggler image file
CUE – CDRWrite CUE image file
CIF – Easy CD Creator .cif format
C2D – Roxio-WinOnCD .c2d format
DAA – PowerISO .daa format
B6T – BlindWrite 5/6 image file
Ceramics glaze recipes[edit]
File formats for software, databases, and websites used by potters and ceramic artists to manage glaze recipes, glaze chemistry, etc.

GlazeChem text format [2]
GlazeMaster .tab xml (GlazeMaster software)[3][4][5]
HyperGlaze .hgz (HyperGlaze software) [6][7][8]
Insight .xml (DigitalFire Insight software)[9][10]
Insight .rcp (deprecated, DigitalFire Insight software)[11]
Insight .rcx (deprecated, DigitalFire Insight software)[12]
Matrix [13][14]
Computer-aided Design[edit]
Computer-aided is a prefix for several categories of tools (e.g., design, manufacture, engineering) which assist professionals in their respective fields (e.g., machining, architecture, schematics).

Computer-aided design (CAD)[edit]
Computer-aided design (CAD) software assists engineers, architects and other design professionals in project design.

3DXML – Dassault Systemes graphic representation
3MF – Microsoft 3D Manufacturing Format[2]
ACP – VA Software VA – Virtual Architecture CAD file
AMF – Additive Manufacturing File Format
AEC – DataCAD drawing format[3]
AR – Ashlar-Vellum Argon – 3D Modeling
ART – ArtCAM model
ASC – BRL-CAD Geometry File (old ASCII format)
ASM – Solidedge Assembly, Pro/ENGINEER Assembly
BIN, BIM – Data Design System DDS-CAD
BREP – Open CASCADE 3D model (shape)
C3D – C3D Toolkit File Format
CCC – CopyCAD Curves
CCM – CopyCAD Model
CCS – CopyCAD Session
CAD – CadStd
CATDrawing – CATIA V5 Drawing document
CATPart – CATIA V5 Part document
CATProduct – CATIA V5 Assembly document
CATProcess – CATIA V5 Manufacturing document
cgr – CATIA V5 graphic representation file
ckd – KeyCreator CAD Modeling
ckt – KeyCreator CAD Modeling
CO – Ashlar-Vellum Cobalt – parametric drafting and 3D modeling
DRW – Caddie Early version of Caddie drawing – Prior to Caddie changing to DWG
DFT – Solidedge Draft
DGN – MicroStation design file
DGK – Delcam Geometry
DMT – Delcam Machining Triangles
DXF – ASCII Drawing Interchange file format, AutoCAD
DWB – VariCAD drawing file
DWF – Autodesk's Web Design Format; AutoCAD & Revit can publish to this format; similar in concept to PDF files; Autodesk Design Review is the reader
DWG – Popular file format for Computer Aided Drafting applications, notably AutoCAD, Open Design Alliance applications, and Autodesk Inventor Drawing files
EASM – SolidWorks eDrawings assembly file
EDRW – eDrawings drawing file
EMB – Wilcom ES Designer Embroidery CAD file
EPRT – eDrawings part file
EscPcb – "esCAD pcb" EsCAD.pngEsCAD pcb.png data file by Electro-System Japan.pngElectro-System (Japan)
EscSch – "esCAD sch" EsCAD.pngEsCAD sch.png data file by Electro-System Japan.pngElectro-System (Japan)
ESW – AGTEK format
EXCELLON – Excellon file
EXP – Drawing Express format
F3D – Autodesk Fusion 360 project file
FCStd – Native file format of FreeCAD CAD/CAM package
FM – FeatureCAM Part File
FMZ – FormZ Project file
G – BRL-CAD Geometry File
GBR – Gerber file
GLM – KernelCAD model
GRB – T-FLEX CAD File
GTC – GRAITEC Advance format
IAM – Autodesk Inventor Assembly file
ICD – IronCAD 2D CAD file
IDW – Autodesk Inventor Drawing file
IFC – buildingSMART for sharing AEC and FM data
IGES – Initial Graphics Exchange Specification
Intergraph Standard File Formats – Intergraph
IPN – Autodesk Inventor Presentation file
IPT – Autodesk Inventor Part file
JT – Jupiter Tesselation
MCD – Monu-CAD (Monument/Headstone Drawing file)
model – CATIA V4 part document
OCD – Orienteering Computer Aided Design (OCAD) file
PAR – Solidedge Part
PIPE – PIPE-FLO Professional Piping system design file
PLN – ArchiCad project
PRT – NX (recently known as Unigraphics), Pro/ENGINEER Part, CADKEY Part
PSM – Solidedge Sheet
PSMODEL – PowerSHAPE Model
PWI – PowerINSPECT File
PYT – Pythagoras File
SKP – SketchUp Model
RLF – ArtCAM Relief
RVM – AVEVA PDMS 3D Review model
RVT – Autodesk Revit project files
RFA – Autodesk Revit family files
S12 – Spirit file, by Softtech
SCAD – OpenSCAD 3D part model
SCDOC – SpaceClaim 3D Part/Assembly
SLDASM – SolidWorks Assembly drawing
SLDDRW – SolidWorks 2D drawing
SLDPRT – SolidWorks 3D part model
dotXSI – For Softimage
STEP – Standard for the Exchange of Product model data
STL – Stereo Lithographic data format used by various CAD systems and stereo lithographic printing machines.
TCT – TurboCAD drawing template
TCW – TurboCAD for Windows 2D and 3D drawing
UNV – I-DEAS I-DEAS (Integrated Design and Engineering Analysis Software)
VC6 – Ashlar-Vellum Graphite – 2D and 3D drafting
VLM – Ashlar-Vellum Vellum, Vellum 2D, Vellum Draft, Vellum 3D, DrawingBoard
VS – Ashlar-Vellum Vellum Solids
WRL – Similar to STL, but includes color. Used by various CAD systems and 3D printing rapid prototyping machines. Also used for VRML models on the web.
X_B – Parasolids binary format
X_T – Parasolids
XE – Ashlar-Vellum Xenon – for associative 3D modeling
Electronic design automation (EDA)[edit]
Electronic design automation (EDA), or electronic computer-aided design (ECAD), is specific to the field of electrical engineering.

BRD – Board file for EAGLE Layout Editor, a commercial PCB design tool
BSDL – Description language for testing through JTAG
CDL – Transistor-level netlist format for IC design
CPF – Power-domain specification in system-on-a-chip (SoC) implementation (see also UPF)
DEF – Gate-level layout
DSPF – Detailed Standard Parasitic Format, Analog-level parasitics of interconnections in IC design
EDIF – Vendor neutral gate-level netlist format
FSDB – Analog waveform format (see also Waveform viewer)
GDSII – Format for PCB and layout of integrated circuits
HEX – ASCII-coded binary format for memory dumps
LEF – Library Exchange Format, physical abstract of cells for IC design
LIB – Library modeling (function, timing) format
MS12 – NI Multisim file
OASIS – Open Artwork System Interchange Standard
OpenAccess – Design database format with APIs
SDC – Synopsys Design Constraints, format for synthesis constraints
SDF – Standard for gate-level timings
SPEF – Standard format for parasitics of interconnections in IC design
SPI, CIR – SPICE Netlist, device-level netlist and commands for simulation
SREC, S19 – S-record, ASCII-coded format for memory dumps
STIL – Standard Test Interface Language, IEEE1450-1999 standard for Test Patterns for IC
SV – SystemVerilog source file
S*P – Touchstone/EEsof Scattering parameter data file – multi-port blackbox performance, measurement or simulated
UPF – Standard for Power-domain specification in SoC implementation
V – Verilog source file
VCD – Standard format for digital simulation waveform
VHD, VHDL – VHDL source file
WGL – Waveform Generation Language, format for Test Patterns for IC
Test technology[edit]
Files output from Automatic Test Equipment or post-processed from such.

Standard Test Data Format
Database[edit]
4DB – 4D database Structure file
4DD – 4D database Data file
4DIndy – 4D database Structure Index file
4DIndx – 4D database Data Index file
4DR – 4D database Data resource file (in old 4D versions)
ACCDB – Microsoft Database (Microsoft Office Access 2007 and later)
ACCDE – Compiled Microsoft Database (Microsoft Office Access 2007 and later)
ADT – Sybase Advantage Database Server (ADS)
APR – Lotus Approach data entry & reports
BOX – Lotus Notes Post Office mail routing database
CHML – Krasbit Technologies Encrypted database file for 1 click integration between contact management software and the chameleon(tm) line of imaging workflow solutions
DAF – Digital Anchor data file
DAT – DOS Basic
DAT – Intersystems Caché database file
DB – Paradox
DB – SQLite
DBF – db/dbase II,III,IV and V, Clipper, Harbour/xHarbour, Fox/FoxPro, Oracle
EGT – EGT Universal Document, used to compress sql databases to smaller files, may contain original EGT database style.
ESS – EGT SmartSense is a database of files and its compression style. Specific to EGT SmartSense
EAP – Enterprise Architect Project
FDB – Firebird Databases
FDB – Navision database file
FP, FP3, FP5, and FP7 – FileMaker Pro
FRM – MySQL table definition
GDB – Borland InterBase Databases
GTABLE – Google Drive Fusion Table
KEXI – Kexi database file (SQLite-based)
KEXIC – shortcut to a database connection for a Kexi databases on a server
KEXIS – shortcut to a Kexi database
LDB – Temporary database file, only existing when database is open
MDA – Add-in file for Microsoft Access
MDB – Microsoft Access database
ADP – Microsoft Access project (used for accessing databases on a server)
MDE – Compiled Microsoft Database (Access)
MDF – Microsoft SQL Server Database
MYD – MySQL MyISAM table data
MYI – MySQL MyISAM table index
NCF – Lotus Notes configuration file
NSF – Lotus Notes database
NTF – Lotus Notes database design template
NV2 – QW Page NewViews object oriented accounting database
ODB – LibreOffice Base or OpenOffice Base database
ORA – Oracle tablespace files sometimes get this extension (also used for configuration files)
PCONTACT – WinIM Contact file
PDB – Palm OS Database
PDI – Portable Database Image
PDX – Corel Paradox database management
PRC – Palm OS resource database
SQL – bundled SQL queries
REC – GNU recutils database
REL – Sage Retrieve 4GL data file
RIN – Sage Retrieve 4GL index file
SDB – StarOffice's StarBase
SDF – SQL Compact Database file
sqlite – SQLite
UDL – Universal Data Link
waData – Wakanda (software) database Data file
waIndx – Wakanda (software) database Index file
waModel – Wakanda (software) database Model file
waJournal – Wakanda (software) database Journal file
WDB – Microsoft Works Database
WMDB – Windows Media Database file – The CurrentDatabase_360.wmdb file can contain file name, file properties, music, video, photo and playlist information.

Desktop publishing[edit]
AI – Adobe Illustrator
AVE / ZAVE – Aquafadas
CDR – CorelDRAW
CHP / pub / STY / CAP / CIF / VGR / FRM – Ventura Publisher – Xerox (DOS / GEM)
CPT – Corel Photo-Paint
DTP – Greenstreet Publisher, GST PressWorks
GDRAW – Google Drive Drawing
ILDOC – Broadvision Quicksilver document
INDD – Adobe InDesign
PSD – Adobe Photoshop
MCF – FotoInsight Designer
PDF – Adobe Acrobat or Adobe Reader
PMD – Adobe PageMaker
PPP – Serif PagePlus
PUB – Microsoft Publisher
QXD – QuarkXPress
FM – Adobe FrameMaker
SLA / SCD – Scribus
WLMP – Windows Live Movie Maker project file

Document[edit]
These files store formatted text and plain text.
0 – Plain Text Document, normally used for licensing
1ST – Plain Text Document, normally preceded by the words "README" (README.1ST)
600 – Plain Text Document, used in UNZIP history log
602 – Text602 document
ABW – AbiWord document
ACL – MS Word AutoCorrect List
AFP – Advanced Function Presentation – IBc
AMI – Lostus Ami Pro
Amigaguide
ANS – American National Standards Institute (ANSI) text
ASC – ASCII text
AWW – Ability Write
CCF – Color Chat 1.0
CSV – ASCII text as comma-separated values, used in spreadsheets and database management systems
CWK – ClarisWorks-AppleWorks document
DBK – DocBook XML sub-format
DITA – Darwin Information Typing Architecture document
DOC – Microsoft Word document
DOCM – Microsoft Word macro-enabled document
DOCX – Office Open XML document
DOT – Microsoft Word document template
DOTX – Office Open XML text document template
EGT – EGT Universal Document
EPUB – EPUB open standard for e-books
EZW – Reagency Systems easyOFFER document[4]
FDX – Final Draft
FTM – Fielded Text Meta
FTX – Fielded Text (Declared)
GDOC – Google Drive Document
HTML – HyperText Markup Language (.html, .htm)
HWP – Haansoft (Hancom) Hangul Word Processor document
HWPML – Haansoft (Hancom) Hangul Word Processor Markup Language document
LOG – Text log file
LWP – Lotus Word Pro
MBP – metadata for Mobipocket documents
MD – Markdown text document
ME – Plain text document normally preceded by the word "READ" (READ.ME)
MCW – Microsoft Word for Macintosh (versions 4.0–5.1)
Mobi – Mobipocket documents
NB – Mathematica Notebook
NBP – Mathematica Player Notebook
NEIS – 학교생활기록부 작성 프로그램 (Student Record Writing Program) Document
ODM – OpenDocument master document
ODOC - Synology Drive Office Document
ODT – OpenDocument text document
OSHEET - Synology Drive Office Spreadsheet
OTT – OpenDocument text document template
OMM – OmmWriter text document
PAGES – Apple Pages document
PAP – Papyrus word processor document
PDAX – Portable Document Archive (PDA) document index file
PDF – Portable Document Format
QUOX – Question Object File Format for Quobject Designer or Quobject Explorer
Radix-64
RTF – Rich Text document
RPT – Crystal Reports
SDW – StarWriter text document, used in earlier versions of StarOffice
SE – Shuttle Document
STW – OpenOffice.org XML (obsolete) text document template
Sxw – OpenOffice.org XML (obsolete) text document
TeX – TeX
INFO – Texinfo
Troff
TXT – ASCII or Unicode plain text file
UOF – Uniform Office Format
UOML – Unique Object Markup Language
VIA – Revoware VIA Document Project File
WPD – WordPerfect document
WPS – Microsoft Works document
WPT – Microsoft Works document template
WRD – WordIt! document
WRF – ThinkFree Write
WRI – Microsoft Write document
XHTML (xhtml, xht) – eXtensible HyperText Markup Language
XML – eXtensible Markup Language
XPS – Open XML Paper Specification
Financial records[edit]
MYO – MYOB Limited (Windows) File
MYOB – MYOB Limited (Mac) File
TAX – TurboTax File
YNAB – You Need a Budget (YNAB) File

Financial data transfer formats[edit]
Interactive Financial Exchange (IFX) – XML-based specification for various forms of financial transactions
Open Financial Exchange (.ofx) – open standard supported by CheckFree and Microsoft and partly by Intuit; SGML and later XML based
QFX – proprietary pay-only format used only by Intuit
Quicken Interchange Format (.qif) – open standard formerly supported by Intuit
Font file[edit]
ABF – Adobe Binary Screen Font
AFM – Adobe Font Metrics
BDF – Bitmap Distribution Format
BMF – ByteMap Font Format
FNT – Bitmapped Font – Graphics Environment Manager (GEM)
FON – Bitmapped Font – Microsoft Windows
MGF – MicroGrafx Font
OTF – OpenType Font
PCF – Portable Compiled Format
PostScript Font – Type 1, Type 2
PFA – Printer Font ASCII
PFB – Printer Font Binary – Adobe
PFM – Printer Font Metrics – Adobe
AFM – Adobe Font Metrics
FOND – Font Description resource – Mac OS
SFD – FontForge spline font database Font
SNF – Server Normal Format
TDF – TheDraw Font
TFM – TeX font metric
TTF (.ttf, .ttc) – TrueType Font
UFO – Unified Font Object is a cross-platform, cross-application, human readable, future proof format for storing font data.
WOFF – Web Open Font Format
Geographic information system[edit]
ASC – ASCII point of interest (POI) text file
APR – ESRI ArcView 3.3 and earlier project file
DEM – USGS DEM file format
E00 – ARC/INFO interchange file format
GeoJSON –Geographically located data in object notation
GeoTIFF – Geographically located raster data
GML – Geography Markup Language file[5]
GPX – XML-based interchange format
ITN – TomTom Itinerary format
MXD – ESRI ArcGIS project file, 8.0 and higher
NTF – National Transfer Format file
OV2 – TomTom POI overlay file
SHP – ESRI shapefile
TAB – MapInfo Table file format
World TIFF – Geographically located raster data: text file giving corner coordinate, raster cells per unit, and rotation
DTED – Digital Terrain Elevation Data
KML – Keyhole Markup Language, XML-based
Graphical information organizers[edit]
3DT – 3D Topicscape, the database in which the meta-data of a 3D Topicscape is held, it is a form of 3D concept map (like a 3D mind-map) used to organize ideas, information, and computer files
ATY – 3D Topicscape file, produced when an association type is exported; used to permit round-trip (export Topicscape, change files and folders as desired, re-import to 3D Topicscape)
CAG – Linear Reference System
FES – 3D Topicscape file, produced when a fileless occurrence in 3D Topicscape is exported to Windows. Used to permit round-trip (export Topicscape, change files and folders as desired, re-import them to 3D Topicscape)
MGMF – MindGenius Mind Mapping Software file format
MM – FreeMind mind map file (XML)
MMP – Mind Manager mind map file
TPC – 3D Topicscape file, produced when an inter-Topicscape topic link file is exported to Windows; used to permit round-trip (export Topicscape, change files and folders as desired, re-import to 3D Topicscape)

Graphics[edit]
Main article: image file formats
Color palettes[edit]
ACT – Adobe Color Table. Contains a raw color palette and consists of 256 24-bit RGB colour values.
ASE – Adobe Swatch Exchange. Used by Adobe Photoshop, Illustrator, and InDesign.[6]
GPL – GIMP palette file. Uses a text representation of color names and RGB values. Various open source graphical editors can read this format,[7] including GIMP, Inkscape, Krita[8], KolourPaint, Scribus, CinePaint, and MyPaint.[9]
PAL – Microsoft RIFF palette file
Color management[edit]
ICC/ICM – Color profile conforming the specification of the ICC.
Raster graphics[edit]
Raster or bitmap files store images as a group of pixels.

ART – America Online proprietary format
BLP – Blizzard Entertainment proprietary texture format
BMP – Microsoft Windows Bitmap formatted image
BTI – Nintendo proprietary texture format
CD5 – Chasys Draw IES image
CIT – Intergraph is a monochrome bitmap format
CPT – Corel PHOTO-PAINT image
CR2 – Canon camera raw format; photos have this on some Canon cameras if the quality RAW is selected in camera settings
CSP – CLIP STUDIO PAINT format
CUT – Dr. Halo image file
DDS – DirectX texture file
DIB – Device-Independent Bitmap graphic
DjVu – DjVu for scanned documents
EGT – EGT Universal Document, used in EGT SmartSense to compress PNG files to yet a smaller file
Exif – Exchangeable image file format (Exif) is a specification for the image format used by digital cameras
GIF – CompuServe's Graphics Interchange Format
GRF – Zebra Technologies proprietary format
ICNS – format for icons in macOS. Contains bitmap images at multiple resolutions and bitdepths with alpha channel.
ICO – a format used for icons in Microsoft Windows. Contains small bitmap images at multiple resolutions and sizes
IFF (.iff, .ilbm, .lbm) – ILBM
JNG – a single-frame MNG using JPEG compression and possibly an alpha channel
JPEG, JFIF (.jpg or .jpeg) – Joint Photographic Experts Group; a lossy image format widely used to display photographic images
JP2 – JPEG2000
JPS – JPEG Stereo
LBM – Deluxe Paint image file
MAX – ScanSoft PaperPort document
MIFF – ImageMagick's native file format
MNG – Multiple-image Network Graphics, the animated version of PNG
MSP – a format used by old versions of Microsoft Paint; replaced by BMP in Microsoft Windows 3.0
NITF – A U.S. Government standard commonly used in Intelligence systems
OTB – Over The Air bitmap, a specification designed by Nokia for black and white images for mobile phones
PBM – Portable bitmap
PC1 – Low resolution, compressed Degas picture file
PC2 – Medium resolution, compressed Degas picture file
PC3 – High resolution, compressed Degas picture file
PCF – Pixel Coordination Format
PCX – a lossless format used by ZSoft's PC Paint, popular for a time on DOS systems.
PDN – Paint.NET image file
PGM – Portable graymap
PI1 – Low resolution, uncompressed Degas picture file
PI2 – Medium resolution, uncompressed Degas picture file; also Portrait Innovations encrypted image format
PI3 – High resolution, uncompressed Degas picture file
PICT, PCT – Apple Macintosh PICT image
PNG – Portable Network Graphic (lossless, recommended for display and edition of graphic images)
PNM – Portable anymap graphic bitmap image
PNS – PNG Stereo
PPM – Portable Pixmap (Pixel Map) image
PSB – Adobe Photoshop Big image file (for large files)
PSD, PDD – Adobe Photoshop Drawing
PSP – Paint Shop Pro image
PX – Pixel image editor image file
PXM – Pixelmator image file
PXR – Pixar Image Computer image file
QFX – QuickLink Fax image
RAW – General term for minimally processed image data (acquired by a digital camera)
RLE – a run-length encoding image
SCT – Scitex Continuous Tone image file
SGI, RGB, INT, BW – Silicon Graphics Image
TGA (.tga, .targa, .icb, .vda, .vst, .pix) – Truevision TGA (Targa) image
TIFF (.tif or .tiff) – Tagged Image File Format (usually lossless, but many variants exist, including lossy ones)
TIFF/EP (.tif or .tiff) – Tag Image File Format / Electronic Photography, ISO 12234-2; tends to be used as a basis for other formats rather than in its own right.
VTF – Valve Texture Format
XBM – X Window System Bitmap
XCF – GIMP image (from Gimp's origin at the eXperimental Computing Facility of the University of California)
XPM – X Window System Pixmap
ZIF – Zoomable/Zoomify Image Format (a web-friendly, TIFF-based, zoomable image format)
Vector graphics[edit]
Vector graphics use geometric primitives such as points, lines, curves, and polygons to represent images.

3DV – 3-D wireframe graphics by Oscar Garcia
AMF – Additive Manufacturing File Format
AWG – Ability Draw
AI – Adobe Illustrator Document
CGM – Computer Graphics Metafile, an ISO Standard
CDR – CorelDRAW Document
CMX – CorelDRAW vector image
DXF – ASCII Drawing Interchange file Format, used in AutoCAD and other CAD-programs
E2D – 2-dimensional vector graphics used by the editor which is included in JFire
EGT – EGT Universal Document, EGT Vector Draw images are used to draw vector to a website
EPS – Encapsulated Postscript
FS – FlexiPro file
GBR – Gerber file
ODG – OpenDocument Drawing
MOVIE.BYU
RenderMan
SVG – Scalable Vector Graphics, employs XML
Scene description languages (3D vector image formats)
STL – Stereo Lithographic data format (see STL (file format)) used by various CAD systems and stereo lithographic printing machines. See above.
VRML Uses .wrl extension – Virtual Reality Modeling Language, for the creation of 3D viewable web images.
X3D
SXD – OpenOffice.org XML (obsolete) Drawing
V2D – voucher design used by the voucher management included in JFire
VDOC – Vector format used in AnyCut, CutStorm, DrawCut, DragonCut, FutureDRAW, MasterCut, SignMaster, VinylMaster software by Future Corporation
VSD – Vector format used by Microsoft Visio
VSDX – Vector format used by MS Visio and opened by VSDX Annotator
VND – Vision numeric Drawing file used in TypeEdit, Gravostyle.
WMF – Windows Meta File
EMF – Enhanced (Windows) MetaFile, an extension to WMF
ART – Xara – Drawing (superseded by XAR)
XAR – Xara – Drawing
3D graphics[edit]
See also: 3D file format at EduTech Wiki
3D graphics are 3D models that allow building models in real-time or non real-time 3D rendering.

3DMF – QuickDraw 3D Metafile (.3dmf)
3DM – OpenNURBS Initiative 3D Model (used by Rhinoceros 3D) (.3dm)
3MF – Microsoft 3D Manufacturing Format (.3mf)[2]
3DS – Legacy 3D Studio Model (.3ds)
ABC – Alembic (computer graphics)
AC – AC3D Model (.ac)
AMF – Additive Manufacturing File Format
AN8 – Anim8or Model (.an8)
AOI – Art of Illusion Model (.aoi)
ASM – PTC Creo assembly (.asm)
B3D – Blitz3D Model (.b3d)
BLEND – Blender (.blend)
BLOCK – Blender encrypted blend files (.block)
BMD3 – Nintendo GameCube first-party proprietary model format (.bmd)
BDL (BMD4) – Nintendo Wii first-party proprietary model format 2006–2010 (.bdl)
BRRES – Nintendo Wii first-party proprietary model format 2010+ (.brres)
C4D – Cinema 4D (.c4d)
Cal3D – Cal3D (.cal3d)
CCP4 – X-ray crystallography voxels (electron density)
CFL – Compressed File Library (.cfl)
COB – Caligari Object (.cob)
CORE3D – Coreona 3D Coreona 3D Virtual File(.core3d)
CTM – OpenCTM (.ctm)
DAE – COLLADA (.dae)
DFF – RenderWare binary stream, commonly used by Grand Theft Auto III-era games as well as other RenderWare titles
DPM – deepMesh (.dpm)
DTS – Torque Game Engine (.dts)
EGG – Panda3D Engine
FACT – Electric Image (.fac)
FBX – Autodesk FBX (.fbx)
G – BRL-CAD geometry (.g)
GLM – Ghoul Mesh (.glm)
IOB – Imagine (3D modeling software) (.iob)
JAS – Cheetah 3D file (.jas)
LWO – Lightwave Object (.lwo)
LWS – Lightwave Scene (.lws)
LXF – LEGO Digital Designer Model file (.lxf)
LXO – Luxology Modo (software) file (.lxo)
MA – Autodesk Maya ASCII File (.ma)
MAX – Autodesk 3D Studio Max file (.max)
MB – Autodesk Maya Binary File (.mb)
MD2 – Quake 2 model format (.md2)
MD3 – Quake 3 model format (.md3)
MD5 – Doom 3 model format (.md5)
MDX – Blizzard Entertainment's own model format (.mdx)
MESH – New York University(.m)
MESH – Meshwork Model (.mesh)
MM3D – Misfit Model 3d (.mm3d)
MPO – Multi-Picture Object – This JPEG standard is used for 3d images, as with the Nintendo 3DS
MRC – voxels in cryo-electron microscopy
NIF – Gamebryo NetImmerse File (.nif)
OBJ – Wavefront .obj file (.obj)
OFF – OFF Object file format (.off)
OGEX – Open Game Engine Exchange (OpenGEX) format (.ogex)
PLY – Polygon File Format / Stanford Triangle Format (.ply)
PRC – Adobe PRC (embedded in PDF files)
PRT – PTC Creo part (.prt)
POV – POV-Ray document (.pov)
R3D – Realsoft 3D (Real-3D) (.r3d)
RWX – RenderWare Object (.rwx)
SIA – Nevercenter Silo Object (.sia)
SIB – Nevercenter Silo Object (.sib)
SKP – Google Sketchup file (.skp)
SLDASM – SolidWorks Assembly Document (.sldasm)
SLDPRT – SolidWorks Part Document (.sldprt)
SMD – Valve Studiomdl Data format (.smd)
U3D – Universal 3D format (.u3d)
USD - Universal Scene Description (.usd)
USDA - Universal Scene Description , Human-readable text format (.usda)
USDC - Universal Scene Description , Binary format (.usdc)
USDZ - Universal Scene Description Zip (.usdz)
VIM – Revizto visual information model format (.vimproj)
VRML97 – VRML Virtual reality modeling language (.wrl)
VUE – Vue scene file (.vue)
VWX – Vectorworks (.vwx)
WINGS – Wings3D (.wings)
W3D – Westwood 3D Model (.w3d)
X – DirectX 3D Model (.x)
X3D – Extensible 3D (.x3d)
Z3D – Zmodeler (.z3d)

Links and shortcuts[edit]
Alias (Mac OS)
JNLP – Java Network Launching Protocol, an XML file used by Java Web Start for starting Java applets over the Internet
LNK – binary-format file shortcut in Microsoft Windows 95 and later
APPREF-MS – File shortcut format used by ClickOnce
URL – INI file pointing to a URL bookmarks/Internet shortcut in Microsoft Windows
SYM – Symbolic link
.desktop – Desktop entry on Linux Desktop environments

Mathematical[edit]
Harwell-Boeing file format – a format designed to store sparse matrices
MML – MathML – Mathematical Markup Language
ODF – OpenDocument Math Formula
SXM – OpenOffice.org XML (obsolete) Math Formula
Object code, executable files, shared and dynamically linked libraries[edit]
.8BF files – plugins for some photo editing programs including Adobe Photoshop, Paint Shop Pro, GIMP and Helicon Filter.
.a – Objective C native static library
a.out – (no suffix for executable image, .o for object files, .so for shared object files) classic UNIX object format, now often superseded by ELF
APK – Android Application Package
APP – A folder found on macOS systems containing program code and resources, appearing as one file.
BAC – an executable image for the RSTS/E system, created using the BASIC-PLUS COMPILE command[10]
BPL – a Win32 PE file created with Borland Delphi or C++Builder containing a package.
Bundle – a Macintosh plugin created with Xcode or make which holds executable code, data files, and folders for that code.
.Class – used in Java
COFF (no suffix for executable image, .o for object files) – UNIX Common Object File Format, now often superseded by ELF
COM files – commands used in DOS
DCU – Delphi compiled unit
DLL - library used in Windows and OS/2 to store data, resources and code.
DOL – the format used by the GameCube and Wii, short for Dolphin, which was the codename of the GameCube.
.EAR – archives of Java enterprise applications
ELF – (no suffix for executable image, .o for object files, .so for shared object files) used in many modern Unix and Unix-like systems, including Solaris, other System V Release 4 derivatives, Linux, and BSD)
expander (see bundle)
DOS executable (.exe – used in DOS)
.IPA – apple IOS application executable file. Another form of zip file.
JEFF – a file format allowing execution directly from static memory[11]
.JAR – archives of Java class files
.XPI – PKZIP archive that can be run by Mozilla web browsers to install software.
Mach-O – (no suffix for executable image, .o for object files, .dylib and .bundle for shared object files) Mach-based systems, notably native format of macOS, iOS, watchOS, and tvOS
NetWare Loadable Module (.NLM) – the native 32-bit binaries compiled for Novell's NetWare Operating System (versions 3 and newer)
New Executable (.EXE – used in multitasking ("European") MS-DOS 4.0, 16-bit Microsoft Windows, and OS/2)
.o – un-linked object files directly from the compiler
Portable Executable (.EXE, – used in Microsoft Windows and some other systems)
Preferred Executable Format – (classic Mac OS for PowerPC applications; compatible with macOS via a classic (Mac OS X) emulator)
.s1es – Executable used for S1ES learning system.
.so – shared library, typically ELF
Value Added Process (.VAP) – the native 16-bit binaries compiled for Novell's NetWare Operating System (version 2, NetWare 286, Advanced NetWare, etc.)
.WAR – archives of Java Web applications
XBE – Xbox executable
.XAP – Windows Phone package
XCOFF – (no suffix for executable image, .o for object files, .a for shared object files) extended COFF, used in AIX
XEX – Xbox 360 executable
Object extensions
.VBX – Visual Basic extensions
.OCX – Object Control extensions
.TLB – Windows Type Library
Page description language[edit]
DVI – Device independent format
EGT – Universal Document can be used to store CSS type styles (*.egt)
PLD
PCL
PDF – Portable Document Format
PostScript (.ps, .ps.gz)
SNP – Microsoft Access Report Snapshot
XPS
XSL-FO (Formatting Objects)
Configurations, Metadata
CSS – Cascading Style Sheets
XSLT, XSL – XML Style Sheet (.xslt, .xsl)
TPL – Web template (.tpl)
Personal information manager[edit]
Main article: Personal information manager
MSG – Microsoft Outlook task manager
ORG – Lotus Organizer PIM package
PST, OST – Microsoft Outlook email communication
SC2 – Microsoft Schedule+ calendar
Presentation[edit]
GSLIDES – Google Drive Presentation
KEY, KEYNOTE – Apple Keynote Presentation
NB – Mathematica Slideshow
NBP – Mathematica Player slideshow
ODP – OpenDocument Presentation
OTP – OpenDocument Presentation template
PEZ – Prezi Desktop Presentation
POT – Microsoft PowerPoint template
PPS – Microsoft PowerPoint Show
PPT – Microsoft PowerPoint Presentation
PPTX – Office Open XML Presentation
PRZ – Lotus Freelance Graphics
SDD – StarOffice's StarImpress
SHF – ThinkFree Show
SHOW – Haansoft(Hancom) Presentation software document
SHW – Corel Presentations slide show creation
SLP – Logix-4D Manager Show Control Project
SSPSS – SongShow Plus Slide Show
STI – OpenOffice.org XML (obsolete) Presentation template
SXI – OpenOffice.org XML (obsolete) Presentation
THMX – Microsoft PowerPoint theme template
WATCH – Dataton Watchout Presentation
Project management software[edit]
Main article: Project management software
MPP – Microsoft Project
Reference management software[edit]
Main article: Reference management software
Formats of files used for bibliographic information (citation) management.

bib – BibTeX
enl – EndNote
ris – Research Information Systems RIS (file format)
Scientific data (data exchange)[edit]
FITS (Flexible Image Transport System) – standard data format for astronomy (.fits)
Silo – a storage format for visualization developed at Lawrence Livermore National Laboratory
SPC – spectroscopic data
EAS3 – binary format for structured data
EOSSA – Electro-Optic Space Situational Awareness format
OST (Open Spatio-Temporal) – extensible, mainly images with related data, or just pure data; meant as an open alternative for microscope images
CCP4 – X-ray crystallography voxels (electron density)
MRC – voxels in cryo-electron microscopy
HITRAN – spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit)
.root – hierarchical platform-independent compressed binary format used by ROOT
Simple Data Format (SDF) – a platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays.
MYD - Everfine LEDSpec software file for LED measurements
Multi-domain[edit]
NetCDF – Network common data format
HDR, [HDF], h4 or h5 – Hierarchical Data Format
SDXF – (Structured Data Exchange Format)
CDF – Common Data Format
CGNS – CFD General Notation System
FMF – Full-Metadata Format
Meteorology[edit]
GRIB – Grid In Binary, WMO format for weather model data
BUFR – WMO format for weather observation data
PP – UK Met Office format for weather model data
NASA-Ames – Simple text format for observation data. First used in aircraft studies of the atmosphere.

Chemistry[edit]
Main article: chemical file format
CML – Chemical Markup Language (CML) (.cml)
Chemical table file (CTab) (.mol, .sd, .sdf)
Joint Committee on Atomic and Molecular Physical Data (JCAMP) (.dx, .jdx)
Simplified molecular input line entry specification (SMILES) (.smi)

Mathematics[edit]
graph6, sparse6 – ASCII encoding of Adjacency matrices (.g6, .s6)

Biology[edit]
Molecular biology and bioinformatics:
AB1 – In DNA sequencing, chromatogram files used by instruments from Applied Biosystems
ACE – A sequence assembly format
ASN.1– Abstract Syntax Notation One, is an International Standards Organization (ISO) data representation format used to achieve interoperability between platforms. NCBI uses ASN.1 for the storage and retrieval of data such as nucleotide and protein sequences, structures, genomes, and PubMed records.
BAM – Binary Alignment/Map format (compressed SAM format)
BCF – Binary compressed VCF format
BED – The browser extensible display format is used for describing genes and other features of DNA sequences
CAF – Common Assembly Format for sequence assembly
EMBL – The flatfile format used by the EMBL to represent database records for nucleotide and peptide sequences from EMBL databases
FASTA – The FASTA format, for sequence data. Sometimes also given as FNA or FAA (Fasta Nucleic Acid or Fasta Amino Acid).
FASTQ – The FASTQ format, for sequence data with quality. Sometimes also given as QUAL.
GCPROJ – The Genome Compiler project. Advanced format for genetic data to be designed, shared and visualized.
GenBank – The flatfile format used by the NCBI to represent database records for nucleotide and peptide sequences from the GenBank and RefSeq databases
GFF – The General feature format is used to describe genes and other features of DNA, RNA, and protein sequences
GTF – The Gene transfer format is used to hold information about gene structure
NCBI ASN.1 – Structured ASN.1 format used at National Center for Biotechnology Information for DNA and protein data
NEXUS – The Nexus file encodes mixed information about genetic sequence data in a block structured format
NeXML–XML format for phylogenetic trees
NWK – The Newick tree format is a way of representing graph-theoretical trees with edge lengths using parentheses and commas and useful to hold phylogenetic trees.
PDB – structures of biomolecules deposited in Protein Data Bank, also used to exchange protein and nucleic acid structures
PHD – Phred output, from the basecalling software Phred
PLN – Protein Line Notation used in proteax software specification
SAM – Sequence Alignment/Map format, in which the results of the 1000 Genomes Project will be released
SBML – The Systems Biology Markup Language is used to store biochemical network computational models
SCF – Staden chromatogram files used to store data from DNA sequencing
SFF – Standard Flowgram Format
SRA – format used by the National Center for Biotechnology Information Short Read Archive to store high-throughput DNA sequence data
Stockholm – The Stockholm format for representing multiple sequence alignments
Swiss-Prot – The flatfile format used to represent database records for protein sequences from the Swiss-Prot database
VCF – Variant Call Format, a standard created by the 1000 Genomes Project that lists and annotates the entire collection of human variants (with the exception of approximately 1.6 million variants).

Biomedical imaging[edit]
Digital Imaging and Communications in Medicine (DICOM) (.dcm)
Neuroimaging Informatics Technology Initiative (NIfTI)
.nii – single-file (combined data and meta-data) style
.nii.gz – gzip-compressed, used transparently by some software, notably the FMRIB Software Library (FSL)
.gii – single-file (combined data and meta-data) style; NIfTI offspring for brain surface data
.img,.hdr – dual-file (separate data and meta-data, respectively) style
AFNI data, meta-data (.BRIK,.HEAD)
Massachusetts General Hospital imaging format, used by the FreeSurfer brain analysis package
.MGH – uncompressed
.MGZ – zip-compressed
Analyze data, meta-data (.img,.hdr)
Medical Imaging NetCDF (MINC) format, previously based on NetCDF; since version 2.0, based on HDF5 (.mnc)

Biomedical signals (time series)[edit]
ACQ – AcqKnowledge format for Windows/PC from Biopac Systems Inc., Goleta, CA, USA
ADICHT – LabChart format from ADInstruments Pty Ltd, Bella Vista NSW, Australia
BCI2000 – The BCI2000 project, Albany, NY, USA
BDF – BioSemi data format from BioSemi B.V. Amsterdam, Netherlands
BKR – The EEG data format developed at the University of Technology Graz, Austria
CFWB – Chart Data Format from ADInstruments Pty Ltd, Bella Vista NSW, Australia
DICOM – Waveform An extension of Dicom for storing waveform data
ecgML – A markup language for electrocardiogram data acquisition and analysis
EDF/EDF+ – European Data Format
FEF – File Exchange Format for Vital signs, CEN TS 14271
GDF v1.x – The General Data Format for biomedical signals, version 1.x
GDF v2.x – The General Data Format for biomedical signals, version 2.x
HL7aECG – Health Level 7 v3 annotated ECG
MFER – Medical waveform Format Encoding Rules
OpenXDF – Open Exchange Data Format from Neurotronics, Inc., Gainesville, FL, USA
SCP-ECG – Standard Communication Protocol for Computer assisted electrocardiography EN1064:2007
SIGIF – A digital SIGnal Interchange Format with application in neurophysiology
WFDB – Format of Physiobank
XDF – eXtensible Data Format

Other Biomedical Formats[edit]
Health Level 7 (HL7) – a framework for exchange, integration, sharing, and retrieval of health information electronically
xDT – a family of data exchange formats for medical records

Biometric Formats[edit]
CBF – Common Biometric Format, based on CBEFF 2.0 (Common Biometric ExFramework).
EBF – Extended Biometric Format, based on CBF but with S/MIME encryption support and semantic extensions
CBFX – XML Common Biometric Format, based upon XCBF 1.1 (OASIS XML Common Biometric Format)
EBFX – XML Extended Biometric Format, based on CBFX but with W3C XML Encryption support and semantic extensions

Programming languages and scripts[edit]
AHK – AutoHotkey script file
APPLESCRIPT- applescript – see SCPT
AS – Adobe Flash ActionScript File
AU3 – AutoIt version 3
BAT – Batch file
BAS – QBasic & QuickBASIC
CLJS – ClojureScript
CMD – Batch file
Coffee – CoffeeScript
duino – Arduino IDE sketch (program)
EGG – Chicken
EGT – EGT Asterisk Application Source File, EGT Universal Document
ERB – Embedded Ruby, Ruby on Rails Script File
HTA – HTML Application
IBI – Icarus script
ICI – ICI
IJS – J script
.ipynb – IPython Notebook
ITCL – Itcl
JS – JavaScript and JScript
JSFL – Adobe JavaScript language
LUA – Lua
M – Mathematica package file
MRC – mIRC Script
NCF – NetWare Command File (scripting for Novell's NetWare OS)
NUC – compiled script
NUD – C++ External module written in C++
NUT – Squirrel
PHP – PHP
PHP? – PHP (? = version number)
PL – Perl
PM – Perl module
PS1 – Windows PowerShell shell script
PS1XML – Windows PowerShell format and type definitions
PSC1 – Windows PowerShell console file
PSD1 – Windows PowerShell data file
PSM1 – Windows PowerShell module file
PY – Python
PYC – Python byte code files
PYO – Python
R – R scripts
RB – Ruby
RDP – RDP connection
SB2 – Scratch
SCPT – Applescript
SCPTD – See SCPT.
SDL – State Description Language
SH – Shell script
SYJS – SyMAT JavaScript
SYPY – SyMAT Python
TCL – Tcl
VBS – Visual Basic Script
XPL – XProc script/pipeline
ebuild – Gentoo linux's portage package.

Security[edit]
Authentication and general encryption formats are listed here.

OpenPGP Message Format – used by Pretty Good Privacy, GNU Privacy Guard, and other OpenPGP software; can contain keys, signed data, or encrypted data; can be binary or text ("ASCII armored")

Certificates and keys[edit]
GXK – Galaxkey, an encryption platform for authorized, private and confidential email communication[citation needed]
OpenSSH private key (.ssh) – Secure Shell private key; format generated by ssh-keygen or converted from PPK with PuTTYgen[12][13][14]
OpenSSH public key (.pub) – Secure Shell public key; format generated by ssh-keygen or PuTTYgen[12][13][14]
PuTTY private key (.ppk) – Secure Shell private key, in the format generated by PuTTYgen instead of the format used by OpenSSH[12][13][14]
X.509[edit]
Distinguished Encoding Rules (.cer, .crt, .der) – stores certificates
PKCS#7 SignedData (.p7b, .p7c) – commonly appears without main data, just certificates or certificate revokation lists (CRLs)
PKCS#12 (.p12, .pfx) – can store public certificates and private keys
PEM – Privacy-enhanced Electronic Mail: full format not widely used, but often used to store Distinguished Encoding Rules in Base64 format
PFX – Microsoft predecessor of PKCS#12
Encrypted files[edit]
This section shows file formats for encrypted general data, rather than a specific program's data.
Name (Extension)	Description
AXX	Encrypted file, created with Axcrypt
EEA	An encrypted CAB, ostensibly for protecting email attachments
TC	Virtual encrypted disk container, created by TrueCrypt
Password files[edit]
Password files (sometimes called keychain files) contain lists of other passwords, usually encrypted.
Name (Extension)	Description
BPW	Encrypted password file created by Bitser password manager
KDB	KeePass 1 database
KDBX	KeePass 2 database

Signal data (non-audio)[edit]
Name (Extension)	Description
ACQ	AcqKnowledge format for Windows/PC from Biopac
ADICHT	LabChart format from ADInstruments
BKR	The EEG data format developed at the University of Technology Graz
BDF – CFG	Configuration file for Comtrade data
CFWB	Chart Data format from ADInstruments
DAT	Raw data file for Comtrade data
EDF	European data format
FEF	File Exchange Format for Vital signs
GDF	General data formats for biomedical signals
GMS	Gesture And Motion Signal format
IROCK	intelliRock Sensor Data File Format
MFER	Medical waveform Format Encoding Rules
SAC	Seismic Analysis Code, earthquake seismology data format[15]
SCP-ECG	Standard Communication Protocol for Computer assisted electrocardiography
SEED	Standard for the Exchange of Earthquake Data, seismological data and sensor metadata[16] (.seed, .mseed)
SEG Y	Reflection seismology data format (.segy)
SIGIF	SIGnal Interchange Format
WIN, WIN32	NIED/ERI seismic data format (.cnt)[17]

Sound and music[edit]
Lossless audio—Uncompressed
Name (Extension)	Description
8SVX	Commodore-Amiga 8-bit sound (usually in an IFF container)
16SVX	Commodore-Amiga 16-bit sound (usually in an IFF container)
AIFF, AIF, AIFC	Audio Interchange File Format
AU	Simple audio file format introduced by Sun Microsystems
BWF	Broadcast Wave Format, an extension of WAVE
CDDA	Compact Disc Digital Audio
RAW	Raw samples without any header or sync
WAV	Microsoft Wave
Lossless audio—Compressed
Name (Extension)	Description
RA, RM	RealAudio format
FLAC	Free lossless codec of the Ogg project
LA	Lossless Audio
PAC	LPAC
APE	Monkey's Audio
OFR, OFS, OFF	OptimFROG
RKA	RKAU
SHN	Shorten
TAK	Tom's Lossless Audio Kompressor[18]
TTA	Free lossless audio codec (True Audio)
WV	WavPack
WMA	Windows Media Audio 9 Lossless
BRSTM	Binary Revolution Stream[19]
DTS, DTSHD, DTSMA	DTS (sound system)
AST	Nintendo Audio Stream[20]
AW	Nintendo Audio Sample used in first-party games
PSF	Portable Sound Format, PlayStation variant (originally PlayStation Sound Format)
Lossy audio
Name(Extension)	Description
AMR	For GSM and UMTS based mobile phones
MP1	MPEG Layer 1
MP2	MPEG Layer 2
MP3	MPEG Layer 3
SPX	Speex (Ogg project, specialized for voice, low bitrates)
GSM	GSM Full Rate, originally developed for use in mobile phones
WMA	Windows Media Audio
AAC	Advanced Audio Coding (usually in an MPEG-4 container)
MPC	Musepack
VQF	Yamaha TwinVQ
OTS	Audio File (similar to MP3, with more data stored in the file and slightly better compression; designed for use with OtsLabs' OtsAV)
SWA	Macromedia Shockwave Audio (Same compression as MP3 with additional header information specific to Macromedia Director
VOX	Dialogic ADPCM Low Sample Rate Digitized Voice
VOC	Creative Labs Soundblaster Creative Voice 8-bit & 16-bit Also output format of RCA Audio Recorders
DWD	DiamondWare Digitized
SMP	Turtlebeach SampleVision
OGG	Ogg Vorbis
Tracker modules & Related
Name (Extension)	Description
MOD	Soundtracker and Protracker sample and melody modules
MT2	MadTracker 2 module
S3M	Scream Tracker 3 module
XM	Fast Tracker module
IT	Impulse Tracker module
NSF	NES Sound Format
MID, MIDI	Standard MIDI file; most often just notes and controls but occasionally also sample dumps
FTM	FamiTracker Project file
Sheet Music Files
Name (Extension)	Description
LY	LilyPond sheet music file
MUS, MUSX	Finale sheet music file
MXL, XML	MusicXML standard sheet music exchange format
MSCX, MSCZ	MuseScore sheet music file
SIB	Sibelius sheet music file
Other File Formats Pertaining to Audio
Name (Extension)	Description
NIFF	Notation Interchange File Format
PTB	Power Tab Editor tab
ASF	Advanced Systems Format
CUST	DeliPlayer custom sound format
GYM	Genesis YM2612 log
JAM	Jam music format
MNG	BGM for the Creatures game series, starting from Creatures 2
RMJ	RealJukebox Media used for RealPlayer
SID	Sound Interface Device – Commodore 64 instructions to play SID music and sound effects
SPC	Super NES sound format
TXM	Track ax media
VGM	Stands for "Video Game Music", log for several different chips
YM	Atari ST/Amstrad CPC YM2149 sound chip format
Playlist Formats
Name (Extension)	Description
AIMPPL	AIMP Playlist format
ASX	Advanced Stream Redirector
RAM	Real Audio Metafile For RealAudio files only.
XPL	HDi playlist
XSPF	XML Shareable Playlist Format
ZPL	Xbox Music (Formerly Zune) Playlist format from Microsoft
M3U	Multimedia playlist file
PLS	Multimedia playlist, originally developed for use with the museArc
Audio Editing, Music Production
Name (Extension)	Description
ALS	Ableton Live set
ALC	Ableton Live clip
ALP	Ableton Live pack
AUP	Audacity project file
BAND	GarageBand project file
CEL	Adobe Audition loop file (Cool Edit Loop)
CPR	Steinberg Cubase project file
CWP	Cakewalk Sonar project file
DRM	Steinberg Cubase drum file
DMKIT	Image-Line's Drumaxx drum kit file
ENS	Native Instruments Reaktor Ensemble
GRIR	Native Instruments Komplete Guitar Rig Impulse Response
LOGIC	Logic Pro X project file
MMR	MAGIX Music Maker project file
MX6HS	Mixcraft 6 Home Studio project file
NPR	Steinberg Nuendo project file
OMF, OMFI	Open Media Framework Interchange OMFI succeeds OMF (Open Media Framework)
RIN	Soundways RIN-M file containing sound recording participant credits and song information
SES	Adobe Audition multitrack session file
SFL	Sound Forge sound file
SNG	MIDI sequence file (MidiSoft, Korg, etc.) or n-Track Studio project file
STF	StudioFactory project file. It contains all necessary patches, samples, tracks and settings to play the file
SND	Akai MPC sound file
SYN	SynFactory project file. It contains all necessary patches, samples, tracks and settings to play the file
FLP	Image Line FL Studio Project file
VCLS	VocaListener project file
VSQ	Vocaloid 2 Editor sequence excluding wave-file
VSQX	Vocaloid 3 Editor sequence excluding wave-file
Recorded Television Formats
Name (Extension)	Description
DVR-MS	Windows XP Media Center Edition's Windows Media Center recorded television format
WTV	Windows Vista's and up Windows Media Center recorded television format
____________________________________________________________________________________

Source code for computer programs

(see also: Script)

ADA, ADB, 2.ADA – Ada (body) source
ADS, 1.ADA – Ada (specification) source
ASM, S – Assembly language source
BAS – BASIC, FreeBASIC, Visual Basic, BASIC-PLUS source,[10] PICAXE basic
BB – Blitz Basic Blitz3D
BMX – Blitz Basic BlitzMax
C – C source
CLJ – Clojure source code
CLS – Visual Basic class
COB, CBL – COBOL source
CPP, CC, CXX, C, CBP – C++ source
CS – C# source
CSPROJ – C# project (Visual Studio .NET)
D – D source
DBA – DarkBASIC source
DBPro123 – DarkBASIC Professional project
E – Eiffel source
EFS – EGT Forever Source File
EGT – EGT Asterisk Source File, could be J, C#, VB.net, EF 2.0 (EGT Forever)
EL – Emacs Lisp source
FOR, FTN, F, F77, F90 – Fortran source
FRM – Visual Basic form
FRX – Visual Basic form stash file (binary form file)
FTH – Forth source
GED – Game Maker Extension Editable file as of version 7.0
GM6 – Game Maker Editable file as of version 6.x
GMD – Game Maker Editable file up to version 5.x
GMK – Game Maker Editable file as of version 7.0
GML – Game Maker Language script file
GO – Go source
H – C/C++ header file
HPP, HXX – C++ header file
HS – Haskell source
I – SWIG interface file
INC – Turbo Pascal included source
JAVA – Java source
L – lex source
LGT – Logtalk source
LISP – Common Lisp source
M – Objective-C source
M – MATLAB
M – Mathematica
M4 – m4 source
ML – Standard ML and OCaml source
MSQR – M² source file, created by Mattia Marziali
N – Nemerle source
NB – Nuclear Basic source
P – Parser source
PAS, PP, P – Pascal source (DPR for projects)
PHP, PHP3, PHP4, PHP5, PHPS, Phtml – PHP source
pisrc − PiNET source code mains. Used with Python 3.0, Snap!, and UnrealEngine 4 files source- used by PiIT, Dangerous_Pi, and Silicon Alchemy
PIV – Pivot stickfigure animator
PL, PM – Perl
PLI, PL1 – PL/I
PRG – Ashton-Tate; dbII, dbIII and dbIV, db, db7, clipper, Microsoft Fox and FoxPro, harbour, xharbour, and Xbase
PRO – IDL
POL – Apcera Policy Language doclet
PY – Python source
R – R source
RED – Red source
REDS – Red/System source
RB – Ruby source
RESX – Resource file for .NET applications
RC, RC2 – Resource script files to generate resources for .NET applications
RKT, RKTL – Racket source
SCALA – Scala source
SCI, SCE – Scilab
SCM – Scheme source
SD7 – Seed7 source
SKB, SKC – Sage Retrieve 4GL Common Area (Main and Amended backup)
SKD – Sage Retrieve 4GL Database
SKF, SKG – Sage Retrieve 4GL File Layouts (Main and Amended backup)
SKI – Sage Retrieve 4GL Instructions
SKK – Sage Retrieve 4GL Report Generator
SKM – Sage Retrieve 4GL Menu
SKO – Sage Retrieve 4GL Program
SKP, SKQ – Sage Retrieve 4GL Print Layouts (Main and Amended backup)
SKS, SKT – Sage Retrieve 4GL Screen Layouts (Main and Amended backup)
SKZ – Sage Retrieve 4GL Security File
SLN – Visual Studio solution
SPIN – Spin source (for Parallax Propeller microcontrollers)
STK – Stickfigure file for Pivot stickfigure animator
SWG – SWIG source code
TCL – TCL source code
VAP – Visual Studio Analyzer project
VB – Visual Basic.NET source
VBG – Visual Studio compatible project group
VBP, VIP – Visual Basic project
VBPROJ – Visual Basic .NET project
VCPROJ – Visual C++ project
VDPROJ – Visual Studio deployment project
XPL – XProc script/pipeline
XQ – XQuery file
XSL – XSLT stylesheet
Y – yacc source

Spreadsheet[edit]
123 – Lotus 1-2-3
AB2 – Abykus worksheet
AB3 – Abykus workbook
AWS – Ability Spreadsheet
BCSV – Nintendo proprietary table format
CLF – ThinkFree Calc
CELL – Haansoft(Hancom) SpreadSheet software document
CSV – Comma-Separated Values
GSHEET – Google Drive Spreadsheet
numbers – An Apple Numbers Spreadsheet file
gnumeric – Gnumeric spreadsheet, a gziped XML file
LCW - Lucid 3-D
ODS – OpenDocument spreadsheet
OTS – OpenDocument spreadsheet template
QPW – Quattro Pro spreadsheet
SDC – StarOffice StarCalc Spreadsheet
SLK – SYLK (SYmbolic LinK)
STC – OpenOffice.org XML (obsolete) Spreadsheet template
SXC – OpenOffice.org XML (obsolete) Spreadsheet
TAB – tab delimited columns; also TSV (Tab-Separated Values)
TXT – text file
VC – Visicalc
WK1 – Lotus 1-2-3 up to version 2.01
WK3 – Lotus 1-2-3 version 3.0
WK4 – Lotus 1-2-3 version 4.0
WKS – Lotus 1-2-3
WKS – Microsoft Works
WQ1 – Quattro Pro DOS version
XLK – Microsoft Excel worksheet backup
XLS – Microsoft Excel worksheet sheet (97–2003)
XLSB – Microsoft Excel binary workbook
XLSM – Microsoft Excel Macro-enabled workbook
XLSX – Office Open XML worksheet sheet
XLR – Microsoft Works version 6.0
XLT – Microsoft Excel worksheet template
XLTM – Microsoft Excel Macro-enabled worksheet template
XLW – Microsoft Excel worksheet workspace (version 4.0)

Tabulated data[edit]
TSV – Tab-separated values
CSV – Comma-separated values
db – databank format; accessible by many econometric applications
dif – accessible by many spreadsheet applications

Video[edit]
Main article: video file format
AAF – mostly intended to hold edit decisions and rendering information, but can also contain compressed media essence
3GP – the most common video format for cell phones
GIF – Animated GIF (simple animation; until recently often avoided because of patent problems)
ASF – container (enables any form of compression to be used; MPEG-4 is common; video in ASF-containers is also called Windows Media Video (WMV))
AVCHD – Advanced Video Codec High Definition
AVI – container (a shell, which enables any form of compression to be used)
BIK (.bik) – Bink Video file. A video compression system developed by RAD Game Tools
CAM – aMSN webcam log file
COLLAB – Blackboard Collaborate session recording
DAT – video standard data file (automatically created when we attempted to burn as video file on the CD)
DSH
DVR-MS – Windows XP Media Center Edition's Windows Media Center recorded television format
FLV – Flash video (encoded to run in a flash animation)
M1V MPEG-1 – Video
M2V MPEG-2 – Video
FLA – Macromedia Flash (for producing)
FLR – (text file which contains scripts extracted from SWF by a free ActionScript decompiler named FLARE)
SOL – Adobe Flash shared object ("Flash cookie")
M4V – video container file format developed by Apple
Matroska (*.mkv) – Matroska is a container format, which enables any video format such as MPEG-4 ASP or AVC to be used along with other content such as subtitles and detailed meta information
WRAP – MediaForge (*.wrap)
MNG – mainly simple animation containing PNG and JPEG objects, often somewhat more complex than animated GIF
QuickTime (.mov) – container which enables any form of compression to be used; Sorenson codec is the most common; QTCH is the filetype for cached video and audio streams
MPEG (.mpeg, .mpg, .mpe)
THP – Nintendo proprietary movie/video format
MPEG-4 Part 14, shortened "MP4" – multimedia container (most often used for Sony's PlayStation Portable and Apple's iPod)
MXF – Material Exchange Format (standardized wrapper format for audio/visual material developed by SMPTE)
ROQ – used by Quake 3
NSV – Nullsoft Streaming Video (media container designed for streaming video content over the Internet)
Ogg – container, multimedia
RM – RealMedia
SVI – Samsung video format for portable players
SMI – SAMI Caption file (HTML like subtitle for movie files)
SMK (.smk) – Smacker video file. A video compression system developed by RAD Game Tools
SWF – Macromedia Flash (for viewing)
WMV – Windows Media Video (See ASF)
WTV – Windows Vista's and up Windows Media Center recorded television format
YUV – raw video format; resolution (horizontal x vertical) and sample structure 4:2:2 or 4:2:0 must be known explicitly
WebM – video file format for web video using HTML5
Video editing, production[edit]
FCP – Final Cut Pro project file
MSWMM – Windows Movie Maker project file
PPJ & PRPROJ– Adobe Premiere Pro video editing file
IMOVIEPROJ – iMovie project file
VEG & VEG-BAK – Sony Vegas project file
SUF – Sony camera configuration file (setup.suf) produced by XDCAM-EX camcorders
WLMP – Windows Live Movie Maker project file
KDENLIVE – Kdenlive project file
VPJ – VideoPad project file
MOTN – Apple Motion project file
IMOVIEMOBILE – Apple IMovie for IOS users
Video game data[edit]
List of common file formats of data for video games on systems that support filesystems, most commonly PC games.

TrackMania United/Nations Forever Engine – Formats used by games based on the TrackMania engine.
XeX
CHALLENGE.GBX – (Edited) Challenge files.
CONSTRUCTIONCAMPAIGN.GBX – Construction campaigns files.
CONTROLEFFECTMASTER.GBX/CONTROLSTYLE.GBX – Menu parts.
FIDCACHE.GBX – Saved game.
GBX – Other TrackMania items.
REPLAY.GBX – Replays of races.
Doom engine – Formats used by games based on the Doom engine.
DEH – DeHackEd files to mutate the game executable (not officially part of the DOOM engine)
DSG – Saved game
LMP – A lump is an entry in a DOOM wad.
LMP – Saved demo recording
MUS – Music file (usually contained within a WAD file)
WAD – Data storage (contains music, maps, and textures)
Quake engine – Formats used by games based on the Quake engine.
BSP – (For Binary space partitioning) compiled map format
MAP – Raw map format used by editors like GtkRadiant or QuArK
MDL/MD2/MD3/MD5 – Model for an item used in the game
PAK/PK2 – Data storage
PK3/PK4 – used by the Quake II, Quake III Arena and Quake 4 game engines, respectively, to store game data, textures etc. They are actually .zip files.
.dat – not specific file type, often generic extension for "data" files for a variety of applications
sometimes used for general data contained within the .PK3/PK4 files
.fontdat – a .dat file used for formatting game fonts
.roq – Video format
.sav – Savegame format
Unreal Engine – Formats used by games based on the Unreal engine.
U – Unreal script format
UAX – Animations format for Unreal Engine 2
UMX – Map format for Unreal Tournament
UMX – Music format for Unreal Engine 1
UNR – Map format for Unreal
UPK – Package format for cooked content in Unreal Engine 3
USX – Sound format for Unreal Engine 1 and Unreal Engine 2
UT2 – Map format for Unreal Tournament 2003 and Unreal Tournament 2004
UT3 – Map format for Unreal Tournament 3
UTX – Texture format for Unreal Engine 1 and Unreal Engine 2
UXX – Cache format; these are files a client downloaded from server (which can be converted to regular formats)
Duke Nukem 3D Engine – Formats used by games based on this engine
DMO – Save game
GRP – Data storage
MAP – Map (usually constructed with BUILD.EXE)
Diablo Engine – Formats used by Diablo by Blizzard Entertainment.
SV – Save Game
ITM – Item File
Real Virtuality Engine – Formats used by Bohemia Interactive. Operation:Flashpoint, ARMA 2, VBS2
SQF – Format used for general editing
SQM – Format used for mission files
PBO – Binarized file used for compiled models
LIP – Format that is created from WAV files to create in-game accurate lip-synch for character animations.
Source Engine - Formats used by Valve Software. Half-Life 2, Counter-Strike: Source, Day of Defeat: Source, Half-Life 2: Episode One, Team Fortress 2, Half-Life 2: Episode Two, Portal, Left 4 Dead, Left 4 Dead 2, Alien Swarm, Portal 2, Counter-Strike: Global Offensive, Titanfall, Insurgency, Titanfall 2, Day of Infamy
VMF - Valve Hammer Map editor raw map file
BSP - Source Engine compiled map file
MDL - Source Engine model format
SMD - Source Engine uncompiled model format
PCF - Source Engine particle effect file
HL2 - Half-Life 2 save format
DEM - Source Engine demo format
VPK - Source Engine pack format
VTF - Source Engine texture format
VMT - Source Engine material format.
Other Formats
B – used for Grand Theft Auto saved game files
BOL – used for levels on Poing!PC
DBPF – The Sims 2, DBPF, Package
DIVA – Project DIVA timings, element coördinates, MP3 references, notes, animation poses and scores.
ESM, ESP - Master and Plugin data archives for the Creation Engine
HE0, HE2, HE4 HE games File
GCF – format used by the Steam content management system for file archives
IMG – format used by Renderware-based Grand Theft Auto games for data storage
LOVE – format used by the LOVE2D Engine[21]
MAP – format used by Halo: Combat Evolved for archive compression, Doom³, and various other games
MCA – format used by Minecraft for storing data for in-game worlds
MCADDON – format used by the Windows 10 Edition of Minecraft for add-ons
MCFUNCTION – format used by Minecraft for storing functions
MCMETA – format used by Minecraft for storing data for customizable texture packs for the game
MCPACK – format used by the Windows 10 Edition of Minecraft for in-game texture packs
MCR - format used by Minecraft for storing data for in-game worlds before version 1.2
MCTEMPLATE – format used by the Windows 10 Edition of Minecraft for world templates
MCWORLD – format used by the Windows 10 Edition of Minecraft for in-game worlds
NBT – format used by Minecraft for storing program variables along with their (Java) type identifiers
OEC – format used by OE-Cake for scene data storage
OSK – osu! compressed skin data
OSR – osu! replay data
OSU – osu! beatmap data
P3D – format for panda3d by Disney
POD – format used by Terminal Reality
REP – used by Blizzard Entertainment for scenario replays in StarCraft.
Simcity 4, DBPF (.dat, .SC4Lot, .SC4Model) – All game plugins use this format, commonly with different file extensions
SMZIP – ZIP-based package for Stepmania songs, themes and announcer packs.
VVVVVV – format used by VVVVVV[22]
CPS – format used by The Powder Toy, Powder Toy save
STM – format used by The Powder Toy, Powder Toy stamp
PKG – format used by Bungie for the PC Beta of Destiny 2, for nearly all the game's assets.
Video game storage media[edit]
List of the most common filename extensions used when a game's ROM image or storage medium is copied from an original read-only memory (ROM) device to an external memory such as hard disk for back up purposes or for making the game playable with an emulator. In the case of cartridge-based software, if the platform specific extension is not used then filename extensions ".rom" or ".bin" are usually used to clarify that the file contains a copy of a content of a ROM. ROM, disk or tape images usually do not consist of one file or ROM, rather an entire file or ROM structure contained within one file on the backup medium.[23]

A26 – Atari 2600 (.a26)
A52 – Atari 5200 (.a52)
A78 – Atari 7800 (.a78)
LNX – Atari Lynx (.lnx)
JAG,J64 – Atari Jaguar (.jag, .j64)
BIN – Wii (.bin, .iso)
GCM – GameCube (.gcm, .iso)
NDS – Nintendo DS (.nds)
3DS – Nintendo 3DS (.3ds)
CIA – Installation File (.cia)
GB – Game Boy (.gb) (this applies to the original Game Boy and the Game Boy Color)
GBC – Game Boy Color (.gbc)
GBA – Game Boy Advance (.gba)
GBA – Game Boy Advance (.gba)
SAV – Game Boy Advance Saved Data Files (.sav)
SGM – Visual Boy Advance Save States (.sgm)
N64, V64, Z64, U64, USA, JAP, PAL, EUR, BIN – Nintendo 64 (.n64, .v64, .z64, .u64, .usa, .jap, .pal, .eur, .bin)
PJ – Project 64 Save States (.pj)
NES – Nintendo Entertainment System (.nes)
FDS – Famicom Disk System (.fds)
JST – Jnes Save States (.jst)
FC? – FCEUX Save States (.fc#, where # is any character, usually a number)
GG – Game Gear (.gg)
SMS – Master System (.sms)
SG – SG-1000 (.sg)
SMD,BIN – Mega Drive/Genesis (.smd or .bin)
32X – Sega 32X (.32x)
SMC,078,SFC – Super NES (.smc, .078, or .sfc) (.078 is for split ROMs, which are rare)
FIG – Super Famicom (Japanese releases are rarely .fig, above extensions are more common)
SRM – Super NES Saved Data Files (.srm)
ZST – ZSNES Save States (.zst, .zs1-.zs9, .z10-.z99)
FRZ – Snes9X Save States (.frz, .000-.008)
PCE – TurboGrafx-16/PC Engine (.pce)
NPC, NGP – Neo Geo Pocket (.npc, .ngp)
NGC – Neo Geo Pocket Color (.ngc)
VB – Virtual Boy (.vb)
INT – Intellivision (.int)
MIN – Pokémon Mini (.min)
VEC – Vectrex (.vec)
BIN – Odyssey² (.bin)
WS – WonderSwan (.ws)
WSC – WonderSwan Color (.wsc)
TZX – ZX Spectrum (.tzx) (for exact copies of ZX Spectrum games)
TAP – for tape images without copy protection
Z80,SNA – (for snapshots of the emulator RAM)
DSK – (for disk images)
TAP – Commodore 64 (.tap) (for tape images including copy protection)
T64 – (for tape images without copy protection, considerably smaller than .tap files)
D64 – (for disk images)
CRT – (for cartridge images)
ADF – Amiga (.adf) (for 880K diskette images)
ADZ – GZip-compressed version of the above.
DMS – Disk Masher System, previously used as a disk-archiving system native to the Amiga, also supported by emulators.

Virtual machines[edit]
Microsoft Virtual PC, Virtual Server[edit]
VFD – Virtual Floppy Disk (.vfd)
VHD – Virtual Hard Disk (.vhd)
VUD – Virtual Undo Disk (.vud)
VMC – Virtual Machine Configuration (.vmc)
VSV – Virtual Machine Saved State (.vsv)
EMC VMware ESX, GSX, Workstation, Player[edit]
LOG – Virtual Machine Logfile (.log)
VMDK, DSK – Virtual Machine Disk (.vmdk, .dsk)
NVRAM – Virtual Machine BIOS (.nvram)
VMEM – Virtual Machine paging file (.vmem)
VMSD – Virtual Machine snapshot metadata (.vmsd)
VMSN – Virtual Machine snapshot (.vmsn)
VMSS,STD – Virtual Machine suspended state (.vmss, .std)
VMTM – Virtual Machine team data (.vmtm)
VMX,CFG – Virtual Machine configuration (.vmx, .cfg)
VMXF – Virtual Machine team configuration (.vmxf)
Virtualbox[edit]
VDI – VirtualBox Virtual Disk Image (.vdi)
Parallels Workstation[edit]
Main article: Parallels Workstation
HDD – Virtual Machine hard disk (.hdd)
PVS – Virtual Machine preferences/configuration (.pvs)
SAV – Virtual Machine saved state (.sav)
QEMU[edit]
COW – Copy-on-write
QCOW – QEMU copy-on-write Qcow
QCOW2 – QEMU copy-on-write – version 2 Qcow
QED – QEMU enhanced disk format

Webpage[edit]
Static
DTD – Document Type Definition (standard), MUST be public and free
HTML (.html, .htm) – HyperText Markup Language
XHTML (.xhtml, .xht) – eXtensible HyperText Markup Language
MHTML (.mht, .mhtml) – Archived HTML, store all data on one web page (text, images, etc.) in one big file
MAF (.maff) – web archive based on ZIP
Dynamically generated
ASP (.asp) – Microsoft Active Server Page
ASPX – (.aspx) – Microsoft Active Server Page. NET
ADP – AOLserver Dynamic Page
BML – (.bml) – Better Markup Language (templating)
CFM – (.cfm) – ColdFusion
CGI – (.cgi)
iHTML – (.ihtml) – Inline HTML
JSP – (.jsp) JavaServer Pages
Lasso – (.las, .lasso, .lassoapp) – A file created or served with the Lasso Programming Language
PL – Perl (.pl)
PHP – (.php, .php?, .phtml) – ? is version number (previously abbreviated Personal Home Page, later changed to PHP: Hypertext Preprocessor)
RNA[permanent dead link] – (.rna) – Real Native Application File
R[permanent dead link] – (.r) – Real Native Application File (short alternative)
RNX[permanent dead link] – (.rnx) – Real Native Application File (using experimental version 6 of RNA/Karbon Language)
SSI – (.shtml) – HTML with Server Side Includes (Apache)
SSI – (.stm) – HTML with Server Side Includes (Apache)

Markup languages and other web standards-based formats[edit]
Atom – (.atom, .xml) – Another syndication format
EML – (.eml) – Format used by several desktop email clients
JSON-LD – (.jsonld) – A JSON-based Serialization for Linked Data
Metalink – (.metalink, .met) – A format to list metadata about downloads, such as mirrors, checksums, and other information.
RSS – (.rss, .xml) – Syndication format
Markdown – (.markdown, .md) – A light-weight, plain-text, easy to read and write markup language.
Shuttle – (.se) – lightweight markup language

Other[edit]
AXD – cookie extensions found in temporary internet folder
BDF – Binary Data Format – raw data from recovered blocks of unallocated space on a hard drive
CBP – CD Box Labeler Pro, CentraBuilder, Code::Blocks Project File, Conlab Project[24]
CEX – SolidWorks Enterprise PDM Vault File
COL – Nintendo GameCube proprietary collision file (.col)
CREDX – CredX Dat File
DDB – Generating code for Vocaloid singers voice (see .DDI)
DDI – Vocaloid phoneme library (Japanese, English, Korean, Spanish, Chinese, Catalan)
DUPX – DuupeCheck database management tool project file
FTM - Family Tree Maker data file
FTMB - Family Tree Maker backup file
GA3 – Graphical Analysis 3
GEDCOM (.ged) – (GEnealogical Data COMmunication) format to exchange genealogy data between different genealogy software
HLP – Windows help file
IGC – flight tracks downloaded from GPS devices in the FAI's prescribed format
INF – similar format to INI file; used to install device drivers under Windows, inter alia.
JAM – JAM Message Base Format for BBSes
KMC – tests made with KatzReview's MegaCrammer
KCL – Nintendo GameCube/Wii proprietary collision file (.kcl)
LNK - Microsoft Windows format for Hyperlinks to Executables
LSM – LSMaker script file (program using layered .jpg to create special effects; specifically designed to render lightsabers from the Star Wars universe) (.lsm)
NARC - Archive format used in Nintendo DS games.
OER – AU OER Tool, Open Educational Resource editor
PA – Used to assign sound effects to materials in KCL files (.pa)
PIF – Used to run MS-DOS programs under Windows
POR – So called "portable" SPSS files, readable by PSPP
PXZ – Compressed file to exchange media elements with PSALMO
RISE – File containing RISE generated information model evolution
TOPC – TopicCrunch SEO Project file holding keywords, domain and search engine settings (ASCII);
TOS – Character file from The Only Sheet
XLF – Utah State University Extensible LADAR Format
XMC – Assisted contact lists format, based on XML and used in kindergartens and schools
ZED – My Heritage Family Tree
Zone file – a text file containing a DNS zone

Cursors[edit]
ANI – Animated cursor
CUR – Cursor file
Smes – Hawk's Dock configuration file
Generalized files[edit]
General data formats[edit]
These file formats are fairly well defined by long-term use or a general standard, but the content of each file is often highly specific to particular software or has been extended by further standards for specific uses.

Text-based[edit]
CSV – comma-separated values
HTML – hyper text markup language
CSS – cascading style sheets
INI – a configuration text file whose format is substantially similar between applications
JSON – JavaScript Object Notation is an openly used data format now used by many languages, not just JavaScript
TSV – tab-separated values
XML – an open data format
YAML – an open data format
ReStructuredText – an open text format for technical documents used mainly in the Python programming language
Markdown (.md) – an open lightweight markup language to create simple but rich text, often used to format README files
AsciiDoc – an open human-readable markup document format semantically equivalent to DocBook
Generic file extensions[edit]
These are filename extensions and broad types reused frequently with differing formats or no specific format by different programs.

Binary files[edit]
Bak file (.bak, .bk) – various backup formats: some just copies of data files, some in application-specific data backup formats, some formats for general file backup programs
BIN – binary data, often memory dumps of executable code or data to be re-used by the same software that originated it
DAT – data file, usually binary data proprietary to the program that created it
DSK – file representations of various disk storage images
RAW – raw (unprocessed) data
Text files[edit]
configuration file (.cnf, .conf, .cfg) – substantially software-specific
logfiles (.log) – usually text, but sometimes binary
plain text (.asc or .txt) – human-readable plain text, usually no more specific
Partial files[edit]
Differences and patches[edit]
diff – text file differences created by the program diff and applied as updates by patch
Incomplete transfers[edit]
!UT (.!ut) – partly complete uTorrent download
CRDOWNLOAD (.crdownload) – partly complete Google Chrome download
OPDOWNLOAD (.opdownload) – partly complete Opera download
PART (.part) – partly complete Mozilla Firefox or Transmission download
PARTIAL (.partial) – partly complete Internet Explorer or Microsoft Edge download
Temporary files[edit]
Temporary file (.temp, .tmp, various others) – sometimes in a specific format, but often just raw data in the middle of processing
Pseudo-pipeline file – used to simulate a software pipe
                      
retrieved from: http://en.wikipedia.org/wiki/List_of_file_formats - accessed October 11 2018
File Formats

Recomended Formats for Long-term Access and Sharing

Non-proprietary – no software purchase to open the file
Lossless – uncompressed with all of the original data
Indexable – if possible a plain text format that is both human and machine readable

Best file format?????

PAPER!
File Formats

  • Text:
  • Tabular:
  • Stat:
  • Images:
  • Geographic
  • Video
  • Music
  • Plain text:
doc, docx, rtf, odt, pages
xls, xlsx, numbers, dbf
spss, sas, jmp, rdata
jpg, tiff, svg, png, gif, bmp
shp, geotiff, kml, kmz, gdb
mp4, mov, avi, ogg
mp3, wav, m4a, aiff
txt, csv, json, html, xml



File Formats

  • Text:
  • Tabular:
  • Stat:
  • Images:
  • Geographic
  • Video
  • Music
  • Plain text:
doc, docx, rtf, odt, pages
xls, xlsx, numbers, dbf
spss, sas, jmp, rdata
jpg, tiff, svg, png, gif, bmp
shp, geotiff, kml, kmz, gdb
mp4, mov, avi, ogg
mp3, wav, m4a, aiff
txt, csv, json, html, xml

General Formats
  • proprietary
  • mixed
  • open



File Formats

  • Text:
  • Tabular:
  • Stat:
  • Images:
  • Geographic
  • Video
  • Music
  • Plain text:
doc, docx, rtf, odt, pages
xls, xlsx, numbers, dbf
spss, sas, jmp, rdata
jpg, tiff, svg, png, gif, bmp
shp, geotiff, kml, kmz, gdb
mp4, mov, avi, ogg
mp3, wav, m4a, aiff
txt, csv, json, html, xml

General Formats
  • proprietary
  • mixed
  • open
Compression
  • lossy
  • depends
  • lossless



Quick note on statistics files and conversions

  • Often contain much metadata embedded in the file
    • For example SPSS and SAS include data types (nominal, ordinal, interval, ration) and data dictionaries (code keys for nominal data, units for interval and ration data, etc.)
  • How to best share???
    • Option 1: keep in the proprietary format
    • Option 2: convert to text based format (csv, xml) and have either
      • A data dictionary in a text based format so that a user can reconstruct the data-metadata association
      • Some sort of ‘installer’ that contains the metadata and automatically reconstructs the data-metadata association

This also applies to relational databases, images, and some geographical data


OK - so what?

First, make sure your operating system lets you see the file formats!!!!

  • Mac file extensions
    • Finder: Finder -> Preferences ... :
      "Advanced" tab, check box next to "Show all filename extensions"
  • PC file extensions
    • Win 7 and below
      • File Explorer: Organize -> Folder and search options:
        "View" tab, uncheck the box next to "Hide extensions for known file types"
    • Win 8 and above
      • File Explorer: "view" tab, check the box next to: "File name extensions"


Some things to remember


  • Text and numbers
    • Plain text - BUT STRUCTURED
    • Character enconding??? UTF-8
    • PDF - preferably not!! (hard to index/search UNLESS created with specific care)
  • Images (bitmap)
    • TIFF, JPEG2000 (??), PNG, JPEG






hyper text markup language
.html
comma seperated values
.csv
.txt
extensible markup language
.xml
javascript object notation
.json
portable document format
joint photographic experts group
.jpg [ .jpeg, .jp2, j2k ]
tagged image file format
.tif [ .tiff ]
portable network graphic
.png
File Formats

character encoding??? UTF-8

ASCII – American Standard Code for Information Interchange
[ old school, 128 characters in 7 bits ]
lowercase “j” would become binary 01101010 and decimal 106

UTF-8 – Universal Coded Character Set + Transformation Format – 8-bit
[ now the new standard, only since about 2007, first 128 characters are ASCII ]
[ encodes 1,112,064 “code points” or characters ]


Bitmap and vector images

Raster – a “grid” of numeric color values, also known as a bitmap
[ .tif, .jpg, .png ]

Vector – a collection of points that can be connected to make lines, polygons, and volumes
[ no standards yet, but common in Adobe Illustrator, AutoCAD, and many GIS applications ]
WATCH for .svg – scaleable vector graphic
Some things to remember


  • Cartographic (maps)
    • Raster: GeoTIFF
    • Vector: shapefile, AutoCAD, GeoJSON
    • Note: shapefile has .shp, .shx, .dbf optional (?!) .prj, .sbx, .sbn
  • Audio
    • AIFF, WAVE 44.1 kHz / 16 bit or higher
    • BUT MP3 with FLAC encoding OK (Free Lossless Audio Codec)






shapefile
.shp
data base format
.dbf
projection (for maps)
.prj
.dxf
drawing exchange format
.shx
shapefile index
audio interchange file format
.aiff
moving pictures expert group
.mp3
wave
.wav
Some things to remember

  • Video
    • MPEG-4
  • Documentation
    • Rich Text Format
    • Open Document text
    • html
    • Plain text






motion picture expert group
.mp4
.m4a
rich text format
.rtf
.odt
open document text
Others?

What do you use in your work?

NetCDF

Actually a collection of tools. NetCDF “is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data” – wikipedia
more?




Images: Color and Resolution

Color is entirely a creation of the mind

Red Green Blue - RGB

  • Additive color model based on ‘primary’ colors
    • Used on all electronic display devices
    • Primary colors are closely matched to three receptors in eye
  • The modern computer uses numbers from 0-255 to represent each primary color
    • 8 bits for each color, three colors, 24-bit true color
    • Approximately 16,777,216 colors – more than we can see






Cyan Magenta Yellow Black - CMYK

  • Subtractive color model based on printer colors
    • Also known as "process" or "four-color" system
    • All printing devices use this model
    • More ink 'subtracts' lightness from the white paper
    • 'K' is for 'key' as the black plate in an offset press is the 'key' plate
  • 8-bit or 16-bit information in each color 'channel'
    • CMYK image files are notiveably larger than RGB image files
    • way more colors than we can see ...



Hexadecimal for the Internet

  • The hex color system is based on the RGB model
    • Used in HTML for the internet
    • Is the preferred representation for color by programmers
  • Hexadecimal is the name for counting in base 16
    • Good for computers: 24 = 16 and 16 x 16 = 256
    • Counted from 0-15 like this:

      0 1 2 3 4 5 6 7 8 9 A B C D E F


  • A hex color might look like this: #FF00FF



rgb(255,0,255)

Red: FF = 15*1 + 15*16 = 255



Green: 00 = 0*1 + 0*16 = 0

Blue: FF = 15*1 + 15*16 = 255
Photoshop's color picker




Approximate screen color
HEX
RGB
CMYK
A note on resolution

“Highest resolution available, not rescaled or interpolated” – LOC recommendation

Resolution is directly related to pixel dimensions
  • usually expressed as dots per inch (DPI): 300 dpi
  • can be megapixels (the product of height x width): 2.07 megapixels (1920 x 1080 = 2073600)
  • can be simple image dimensions in pixels ('height' X 'width'): 1920 x 1080


SOME NUMBERS TO REMEMBER
1000 DPI – standard number for greyscale reproduction on a printing press
300 DPI – standard minimum for color reproduction on a printing press
72 DPI – standard screen resolution


So what?

  • How are file size and image quality related to:
    • Resolution (image dimensions)?
    • Color model?
    • Compression?
    • Verctor or Raster?
  • What Implications are there for
    • Short-term workflows?
    • Long-term preservation?



Data Organization
Think about time and space
  • Directory Structure
  • File Naming Convention
Save time and space
this will get personal





So ....
  • Take a moment, think about your file naming for your articles that you save
    • Are you consistent?
    • Is it easy to find what you want several months later?
  • Now think about your file structure for your downloaded articles
    • Where are the actual files on the computer?
    • How many folders are you using?
    • Are they logically organized?





http://phdcomics.com/comics.php?f=1531
http://phdcomics.com/comics.php?f=1531
The Bottom Line: File Naming Conventions
[ best practices ]
DO
  • useCamelCasing.docx
  • use_underscores.txt
  • 2015_put_The_Date_First.csv
  • 20150214_useTwoDidgitDateNumbers.xls
  • startASeriesWithLeadingZeros_001.doc
  • 20150214_UM_date-place.shp
  • useFileExtensions.jpg




Mac: Finder: Finder -> Preferences ... : "Advanced" tab, check box next to "Show all filename extensions"
Win 8 and above: File Explorer: "view" tab, check the box next to: "File name extensions"
DON'T
  • Leave spaces in the file name.xls
  • Use the default save name from MS word that is simply the long first sentence in your file.doc
  • January 5 2015 Samples with the month first.xls
  • Label as final version.doc
  • "special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > - + /"
  • No more than about 25 characters





http://assets.amuniversal.com/42ec27b03718012ea5cb00163e41dd5b
The Bottom Line: File Naming Conventions
[ best practices ]
DO
  • useCamelCasing.docx
  • use_underscores.txt
  • 2015_put_The_Date_First.csv
  • 20150214_useTwoDidgitDateNumbers.xls
  • startASeriesWithLeadingZeros_001.doc
  • 20150214_UM_date-place.shp
  • useFileExtensions.jpg




DON'T
  • Leave spaces in the file name.xls
  • Use the default save name from MS word that is simply the long first sentence in your file.doc
  • January 5 2015 Samples with the month first.xls
  • Label as final version.doc
  • special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > - + /
  • No more than about 25 characters





http://assets.amuniversal.com/42ec27b03718012ea5cb00163e41dd5b
Mac: Finder: Finder -> Preferences ... : "Advanced" tab, check box next to "Show all filename extensions"
Win 8 and above: File Explorer: "view" tab, check the box next to: "File name extensions"
Be Consistent
File Tagging: a *new* approach?
Think about your music library
Think about article keywords
Part of the semantic web conversation


Mac
  • Native on Mac (colors)
  • Not really searchable

PC
  • Native in Office (file->info)
copyright © Gary Larson - used under "fair use"
Lose a file?
Use file tagging systems to better keep track
  • for tagging files as they are created and then the tags are indexed
  • good for not losing things in the first place.


Mac:

PC:



File Versioning
  • Turn on versioning or tracking in collaborative documents
    • Word documents, excel, etc
    • Learn by doing!
  • Turn on versioning for storage utilities
    • Wikis
    • Google Docs, BOX
  • Consider using version control software
    • Subversion (apache foundation), TortoiseSVN (commerical subversion), git (Linus Torvalds), bitbukkit and/or github (both commercial versions of git)
    • Mostly designed for collaborative coding, but . . .
    • Check with your lab/colleagues as to their preference



Online Versioning and Sharing Services
Batch Naming
  • Photos
  • Instrument data
  • Moving files across languages



Tools for File Management: bulk rename
Windows

Mac

Linux

Unix: The use of grep command to search for regular expressions



http://datalib.edina.ac.uk/mantra/organisingdata/
Information in the Filename
  • Version number
  • Date of creation
  • Name of creator
  • Description of content
  • Name of research team/department associated with the data
  • Publication date
  • Project number
http://datalib.edina.ac.uk/mantra/organisingdata/
metadata
Retraction Watch
“A problem with a malfunctioning computer and image storage and mislabeling led to the assembling by one of the co-authors of images that were previously published by our research group. I didn’t detect the problem when the manuscript was sent for publication. Although the conclusions were not compromised in any of the two papers, we retract the papers precisely because some images were wrongly used.”

Principal investigator Jorge Leitão
http://retractionwatch.com/2014/10/17/this-situation-left-me-ashamed-and-infuriated-with-myself-scientist-retracts-two-papers/
Retraction Watch
“In the 2011 paper (http://jb.asm.org/content/196/22/3980), it was first submitted to other 2 journal (JBC and RNA Biology), whom requested a lot of modifications, and therefore, we accumulated a lot of processed data files. In between the process, the hard-drive of the computer that was used to store the data files (which is shared by 5 research groups) stopped working due data overloading. Nonetheless, we were able to retrieve the original data, or so we thought. At the time, I was responsible for composing the final figures of each paper that we produced, and asked the team members to give me the files. In Figure 8 of this paper, it seemed that there has been a labeling error in the source files, and I did not realize that some images where duplicated in the experiment that was being represented, neither that parts of the image had already been published. I should stress that that the images were produced in our lab and represent our data.”

First author Christian Ramos
http://retractionwatch.com/2014/10/17/this-situation-left-me-ashamed-and-infuriated-with-myself-scientist-retracts-two-papers/
Quick Review
  • Organization
  • Context
  • Consistency
YYYYMMDD_projectID_place_001.ext
The benefits of consistent data file labeling are:
  • Data files are distinguishable from each other within their containing folders
  • Data file naming prevents confusion when multiple people are working on shared files
  • Data files are easier to locate and browse
  • Data files can be retrieved not only by the creator but by other users
  • Data files can be sorted in logical sequence
  • Data files are not accidentally overwritten or deleted
  • Different versions of data files can be identified
  • If data files are moved to other storage platform their names will retain useful context
http://datalib.edina.ac.uk/mantra/organisingdata/
How important is your data?

take a moment of
silence to imagine
what would happen
if your computer
failed today



The World of Data Around Us: Data Loss

  • Natural disaster
  • Facilities infrastructure failure
  • Storage failure
  • Server hardware/software failure
  • Application software failure
  • External dependencies (e.g. PKI failure)
  • Format obsolescence
  • Legal encumbrance
  • Human error
  • Malicious attack by human or automated agents
  • Loss of staffing competencies
  • Loss of institutional commitment
  • Loss of financial stability
  • Changes in user expectations and requirements
  • Upset boyfriend or girlfriend
slide from
common data loss scenarios
  • Hardware failure
    • Disk drive in the computer
    • Solid state memory devices (internal drives and external thumb drives)
  • Unexpected power problem
    • Power surge/drop
    • Something touches your computer too much (child/pet)
  • Catastrophic event
    • Hurricane
    • other
  • Device accidents
    • Stolen
    • Dropped/spilled/etc
    • Pressed format accidentally


common data loss scenarios
"Researchers don't delete data, they lose it"
John Bixby - Vice Provost for Research at University of Miami





a warning ...
this will get personal
it CAN happen to you
it is LIKELY that it will happen to you










in the Lifecycle?
http://library.miami.edu/datacuration/
Data in real life
  • A design firm was handling their own backups. The system was working fine and the backup software was reporting that the data was successfully backed up.




slide from
Data in real life
  • The administrator checked the backups immediately after they were done and confirmed they were good.

slide from
Data in real life
After a computer virus erased most of their files, they went back to their backups. Unfortunately they found that the backups were all blank and all of the data was gone. Only after some investigation did they discover that the computer tapes (which contained the backups) were placed against a wall that had an elevator on the other side of it. When the elevator went past, the magnets inside erased all of the tapes.

Had they checked their backups properly, they probably would have noticed this before there was an emergency
slide from
Data Protection, Backups, Archiving, Preservation
Are They the Same Thing? Not Quite…
  • Data Protection
    • Includes topics such as backups, archives, and preservation
    • also includes physical security, encryption, and others not addressed here (for later)...
  • Terms “backups” and “archives” are often used interchangeably, but do have different meanings
    • Backups: a copy (or copies) of the original file is made before the original is overwritten
    • Archives: preservation of the file
  • Data Preservation
    • Includes archiving in addition to processes such as data rescue, data reformatting, data conversion, metadata
slide from
Backups vs. Archiving
  • Backups
    • Used to take periodic snapshots of data in case the current version is destroyed or lost
    • Backups are copies of files stored for short or near-long-term
    • Often performed on a somewhat frequent schedule
  • Archives
    • Used to preserve data for historical reference or potentially during disasters
    • Archives are usually the final version, stored for long-term, and generally not copied over
    • Often performed at the end of a project or during major milestones

    It is a good idea to have multiple copies of your backups and archives, in case one copy fails.
slide from
Backup and Storage

Major Considerations
  • Who is responsible for backup ?
  • How often do you backup ?
  • Partial vs. full backups ?
  • Non-digital backups ?
  • Where (literally) will the backups be located ?
  • Do the backups need a description (metadata) ?
  • Manual vs automatic ?
  • Recovery procedures ?
  • Verification – how do you know the backup was successful ?
  • How long do you keep your backups ?
  • What happens when the project ends ?



Don't forget
  • Data conversions and formats
  • Versioning
  • File Naming





Validation
Do you trust your computer?
  • Always check file sizes after backing up
    • Or at least check periodically
  • The MD5 checksum
    • With this you can monitor the integrity of your data over time
    • Other checksums are CSC and SHA
    • Like a fingerprint for a file
    • Command line: md5sum



Syncronization
Do you trust your computer?
  • Cloud based services (box, dropbox, etc.) are based on Folder Synchronization
    • Only copies newer files (that have changed or been created)
    • Thus contains a "mirror image"
  • There are commercial and free folder synchronization tools
    • All have privacy and user data issues
    • All based on “command line” tools that already exist on your computer
  • Command line interface (CLI) synching
    • Mac or Linux: rsync
    • Windows: xcopy or robocopy



Storage and Backup
File syncing tools that exist on your machine already:
  • Mac and linux: rsync
  • PC: xcopy or robocopy


You will have to understand the command line first:
  • Mac: Applications/Utilities/Terminal
  • PC: Start Button->search “cmd”

Storage and Backup
  • PC: xcopy

C:\> xcopy <source> <destination> [<options>]

C:\> xcopy c:\Users\tnorris\Documents\MapData\*.* G:\MapData /D /S /Y

This copies all NEWER files from the MapData directory on the local machine
to the backup MapData folder on an external hard drive.

C:\> xcopy /?

This will show all of the options for the xcopy command. You can see that:
/D tells xcopy to only copy newer files
/S goes through all sub-directories
/Y tells xcopy to proceed without asking the user to confirm (be careful!!)
                    



Storage and Backup
  • PC:robocopy

C:\> robocopy <source folder> <destination folder>  <file list> [<options>]

C:\> robocopy c:\Users\tnorris\Documents\MapData\ G:\MapData *.* /XO /FFT /E /XD c:\Users\tnorris\Documents\MapData\BigData /XF .* .*.* Thumbs.db

This copies all NEWER files from the MapData directory on the local machine
to the backup MapData folder on an external hard drive. It will not copy the
foler "BigData", nor will it copy files called "Thumbs.db"

C:\> robocopy /?

This will show all of the options for the robocopy command. You can see that:
/XO excludes older files
/FFT use FAT file times (good for copies between mac and pc)
/E goes through all sub-directories
/XD tells robocopy to exclude a list of directories
/XF tells robocopy to exclude a list of files
                    


Storage and Backup
  • Mac or linux: rsync

% rsync [<options>] <source> <destination>

% rsync -arv /Users/tnorris/Documents/MapData/* /Volumes/MyDrive/MapDataFB
 
This copies all NEWER files from the MapData directory on the local machine to 
the backup MapDataFB folder on an external hard drive named “MyDrive”.

% man rsync

This will show all of the options for the rsync command. You can see that 
 -a tells rsync preserve archival information (date stamps, owners, permissions)
 -r tells rsync to go through all sub-directories
 -v tells rsync to tell you what it is doing (which files it copied)
                    



Storage and Backup
System vs Disk synching software
  • Mac
    • time machine for external disks - recommended for most users
  • Windows
    • no native system software for external disks
    • often external drives have their own software
    • you must choose how to perform the sync/backup/restore
  • Linux
    • you must choose how to perform the sync/backup/restore (rsync)
Backup Media Options
  • Local Machines
    • Hard disk in computer
    • External hard drives
  • Online Solutions
    • Networked drives (personal cloud)
    • Repositories
    • Versioning Tolls
    • The "cloud"



2.5” 500 GB Western Digital SATA
Evan-Amos - CC BY-SA 3.0
Spinning (metal) disk: Laptop or Desktop
  • Pros
    • High level of control over file system, naming, and physical location of disk
    • Easy to backup
    • Convenient
  • Cons
    • Risk of malware (virus)
    • Risk of theft, damage, loss, etc
    • System can eventually corrupt the disk (especially pcs)
    • Finite lifespan



RECOMMENDATIONS:
  • Never have master copies on your computer
  • Not for long term storage
  • Have a backup plan for this storage option
Spinning (metal) disk: (Network) Server
  • Pros
    • High level of control over file system, naming, and physical location of disk
    • Likely has backup and maintenance schedule
    • Possible duplicate (mirror) images – RAID systems
    • Safe physical location
    Redundant Array of Independent Disks
  • Cons
    • Expensive to maintain
    • Migration can be difficult
    • Susceptible to catastrophic events





RECOMMENDATIONS:
  • Good for master copies
  • Good for up to 5 year storage
External storage: memory and drives
  • Pros
    • Drives are cheap (sort of) and portable
    • Convenient
    • Memory is cheap and portable
  • Cons
    • Connection technologies change (USB, Firewire, SATA, and so on)
    • Drive failure (both spinning drives and memory devices)
    • Easily damaged, stolen or lost
    • Finite space for large projects multiple drives may be necessary
    • Malware can be propagated (think unsafe sex)



RECOMMENDATIONS:
  • Not for master copies
  • Not for long term storage
  • Have a backup plan for this storage option
Does anyone use CDs anymore??? ZIP disks??? NOT recommended!!
External storage: magnetic tapes
  • Pros
    • Massive amunts of data, cheap
    • Fast backup
    • Reusable
  • Cons
    • Slow retrevial
    • Degradation over time
    • Installation and maintenance is expensive



RECOMMENDATIONS:
  • Excellent for rolling backups
Networked drives: personal cloud
  • Pros
    • Drives are cheap (sort of) and portable
    • Convenient access from anywhere
    • Easy to install and sync
    • Private: password protected
  • Cons
    • Upload/download bottlenecks
    • Susceptible to acts catastrophic events
    • Needs permanent power
    • Needs IP address



RECOMMENDATIONS:
  • Good "third" option
Western Digital, Seagate . . .
Perhaps buy online, out of state?
Data Repositories
  • Pros
    • Maintained by others
    • Your data is accessible, visible
    • Mostly cost free
  • Cons
    • Like journals, can be well recognized, but not necessarily
    • Takes time to format data and metadata correctly
    • Your data is accessible, visible



RECOMMENDATIONS:
  • Good preservation option
  • Not good for working data
Online Versioning and Sharing
  • Pros
    • Maintained by others
    • Often open text-based formats
    • Excellent version control and backup
  • Cons
    • Steep learning curve
    • Doesn’t handle proprietary data well
    • Your data is accessible, visible (sometimes)




RECOMMENDATIONS:
  • Depending on data type, can be good for working data
  • Privacy concerns vary
Online Versioning and Sharing Services
Networked drives: the "cloud"
  • Pros
    • No failure or backup worries (they do it)
    • Can be secure (depends)
    • Convenient
    • Good for catastrophic events
  • Cons
    • Upload/download bottlenecks
    • Fees?
    • Long-term? No standards?
    • How to get copies of all your data (try this for google drive)
    • No control, AND you are responsible if something is hacked (according to US government export laws)




RECOMMENDATIONS:
  • Good for quick and dirty collaborations
  • a good “third” option
  • Not good for large data
  • Privacy concerns vary
The Cloud
The term “cloud computing” (or just “cloud”, in the context of computing) is a marketing buzzword with no coherent meaning. It is used for a range of different activities whose only common characteristic is that they use the Internet for something beyond transmitting files. Thus, the term spreads confusion. If you base your thinking on it, your thinking will be confused.

Richard Stallman - https://www.gnu.org/philosophy/words-to-avoid.html





Storage and Backup
file-syncing software in the cloud
  • Google Drive? Drop Box? Sky Drive? iCloud?
    • Drop Box is good for temporary sharing
    • Google Drive is good for collaborative work: synchronous file editing with multiple users
    • What about security?? privacy??
  • BOX - https://www.box.com/
    • University of Miami has an affiliation
    • All content is encrypted
    • All platforms are supported including smart phones
    • Recently became “unlimited”

    http://www.miami.edu/it/index.php/about_it/aas/ps/documentation/box/

The Bottom Line: Storage and Backup
[ best practices ]
DO
  • RAID storage
  • External hard drives (XFAT)
  • Cloud storage and file-syncing
  • Duplicate computers or hard drives
  • Write down roles and responsibilities
  • Organize, file naming conventions, versioning
  • Have automatic backups
  • Verify backups
  • Open formats




DON'T
  • USB thumb drives
  • Email files to yourself
  • Save files without knowing their location in the computer’s file structure
  • Backup when you remember




The XFAT format is essential if you ever want to share between a mac and a pc
Mac: Applications:Utilities:Disk Utility
PC: right click in explorer -> Format
The Bottom Line: Storage and Backup
[ best practices ]
Have all your work in at least three places at all times: working version + two backups

Drives fail, computers break, viruses happen, computers get stolen, usb thumb drives ALWAYS fail, you will make a mistake and delete your work on accident, ex-partners seek revenge, and the list goes on . . .




Short-term solutions at UMa
Size Limits HIPPA Compliant Collaboration and Sharing Relational Databases Self Guided No Costs
Box Cloud-Based Storage unlimited b
Cloud Storage (CCS) > 10 TB c
File Server (UMIT) > 1 TB d d e

If one of these solutions does not meet your needs, you can consider self-managed solutions or please feel free to contact the UM Information Technology (UMIT) Service Desk, research data services at the Libraries, or the advanced computing services at CCS for further assistance.


  1. None of these options are for long-term storage, please see our institutional repository or identify another disciplinary repository to meet this need.
  2. Box’s single file upload limit 15GB. Also note that network speed and congestion affects performance.
  3. Please see the advanced computing resources at CCS or contact the Advanced Computing group directly for more information.
  4. To begin the request process, please contact the UMIT Service Desk: email itsupportcenter@miami.edu or call (305) 284-6565.
  5. Every request is evaluated on a case-by-case basis. Evaluations are based on the requested resource needs and the current resource allocations across campus. If the request is exceptionally large there may be cost sharing requirements.
Short-term solutions at UM
For general sharing and collaboration needs please see the cloud storage solutions that Information Technology provides for students, staff and faculty:

box
Box
google
Google Drive
onedrive
OneDrive





Short-term solutions at UM
    If you need more space at the University of Miami go to the Center for Computational Science (CCS)







Further Reading:




DOIs and ORCIDs
  • Digital Object Identifiers (DOIs)

  • Permanent identifiers (links) to online resources
  • Provided by resolving service (https://doi.org/)
  • All repositories provide these for your data
  • UM is a member of DataCite who provides our DOIs
  • ORCID

  • https://orcid.org/
  • like a Digital Object Identifier (DOI) for people
  • the authoritative ID for researchers



Work together to connect research to researcher