About

MetaMed is a meta data extraction tool for heterogeneous data in a different file formats. It is a non-interactive command line open source application.

MetaMed can be also useful as a simple SPARQL query tool and even as a RDF file serialization format converter.

Supported file formats: DASTA, DICOM, SITS (CSV, XML); image/multimedia files and file system directory structure can be indexed to RDF, too.

Features

Support for operations:
- empty graph,
- import RDF files to the graph,
- extract meta data into the graph,
- export RDF data from the graph,
- and query RDF data using SPARQL.
Sequential operations processing. You can activate one or more operations by selecting command line options.
Extract meta data from selected files or the whole input data directories or files recursively.
Use a file with the list of input data files which enable you to select files for meta data extraction. Each file's (absolute or relative) path on a one row.
Graph can be stored:
- in-memory only,
- in-memory, but always automatically backed from/to specified file system root directory (Note: If graph in file system exists it is loaded into the memory automatically and before application exit it is stored back to the file system. The store/load delay is significant with a large graph.),
- in OpenLink Virtuoso OpenSource database.
Export an application built-in prefix map into a text file.
Set the graph name.
Empty the the graph.
Can print the graph size automatically before and after operations.
Import RDF files(s) to the graph.
Import GZipped RDF files is possible with all data stores. MetaMed will gunzip file when reading.
Set the directory with SPARQL files. One query per file with prefixes or without them when known to MetaMed. All queries will be automatically extended with build-in prefixes and processed.
Select output format for all queries.
Export an application built-in prefix map into a text file.
Export the whole RDF graph to a file with specified RDF serialization.
Merge RDF files – using import and export options (e.g. with in-memory graph).
Use this tool for RDF serialization conversion (e.g. with in-memory graph, you can use export RDF graph property).
Plugable file format support for extractors. You can implement own plugins with support for other file formats.
Parallel data processing - you can set number of threads for:
- meta data extraction
- querying data by SPARQL
- graph export (when extracted by more threads)
Configuration file order:
- user specific ~/.mre/connection.properties.
- or system configuration in the /etc/mre/connection.properties files.
- specify your own file by -virt argument.
The file connection.properties content looks like:
```
        url=jdbc:virtuoso://localhost:1111
        user=username
        password=password
        graph=https://mre.zcu.cz/dataset/example
```
Property graph is the default graph name. It is used when connecting/opening graph. You can change used graph by --graph argument.

Input and output format support

Implemented support of file formats:

DASTA – DAta STAndard (the Czech language only) is national standard for medical data by Ministry of Health of the Czech Republic,
HL7 – Health Level Seven (HL7) – partial/development support only, because (still) it is not a national standard in the Czech Republic.
DICOM – Digital Imaging and Communications in Medicine,
Stroke data – our internal XML format for diagnostic data and treatment results during three months after the stroke incident,
PNG – Portable Network Graphics
JFIF/JPEG – JPEG File Interchange Format/Joint Photographic Experts Group
TIFF – Tag Image File Format

RDF serialization formats support:

RDF/XML,
RDF/XML-ABBREV,
TURTLE,
N3,
N-TRIPLE.

Supported (query) output formats:

text,
csv, tsv,
rdf,
sse,
json,
xml and xmlstring.

SPARQL is supported for RDF data queries.