About

MetaMed is a meta data extraction tool for heterogeneous data in a different file formats. It is a non-interactive command line open source application.

MetaMed can be also useful as a simple SPARQL query tool and even as a RDF file serialization format converter.

Supported file formats: DASTA, DICOM, SITS (CSV, XML); image/multimedia files and file system directory structure can be indexed to RDF, too.

Features

  • Support for operations:
    • empty graph,
    • import RDF files to the graph,
    • extract meta data into the graph,
    • export RDF data from the graph,
    • and query RDF data using SPARQL.
  • Sequential operations processing. You can activate one or more operations by selecting command line options.
  • Extract meta data from selected files or the whole input data directories or files recursively.
  • Use a file with the list of input data files which enable you to select files for meta data extraction. Each file's (absolute or relative) path on a one row.
  • Graph can be stored:
    • in-memory only,
    • in-memory, but always automatically backed from/to specified file system root directory (Note: If graph in file system exists it is loaded into the memory automatically and before application exit it is stored back to the file system. The store/load delay is significant with a large graph.),
    • in OpenLink Virtuoso OpenSource database.
  • Export an application built-in prefix map into a text file.
  • Set the graph name.
  • Empty the the graph.
  • Can print the graph size automatically before and after operations.
  • Import RDF files(s) to the graph.
  • Import GZipped RDF files is possible with all data stores. MetaMed will gunzip file when reading.
  • Set the directory with SPARQL files. One query per file with prefixes or without them when known to MetaMed. All queries will be automatically extended with build-in prefixes and processed.
  • Select output format for all queries.
  • Export an application built-in prefix map into a text file.
  • Export the whole RDF graph to a file with specified RDF serialization.
  • Merge RDF files – using import and export options (e.g. with in-memory graph).
  • Use this tool for RDF serialization conversion (e.g. with in-memory graph, you can use export RDF graph property).
  • Plugable file format support for extractors. You can implement own plugins with support for other file formats.
  • Parallel data processing - you can set number of threads for:
    • meta data extraction
    • querying data by SPARQL
    • graph export (when extracted by more threads)
  • Configuration file order:
    • user specific ~/.mre/connection.properties.
    • or system configuration in the /etc/mre/connection.properties files.
    • specify your own file by -virt argument.
  • The file connection.properties content looks like:
            url=jdbc:virtuoso://localhost:1111
            user=username
            password=password
            graph=https://mre.zcu.cz/dataset/example

    Property graph is the default graph name. It is used when connecting/opening graph. You can change used graph by --graph argument.

Input and output format support

Implemented support of file formats:

  • DASTA – DAta STAndard (the Czech language only) is national standard for medical data by Ministry of Health of the Czech Republic,
  • HL7 – Health Level Seven (HL7) – partial/development support only, because (still) it is not a national standard in the Czech Republic.
  • DICOM – Digital Imaging and Communications in Medicine,
  • Stroke data – our internal XML format for diagnostic data and treatment results during three months after the stroke incident,
  • PNG – Portable Network Graphics
  • JFIF/JPEG – JPEG File Interchange Format/Joint Photographic Experts Group
  • TIFF – Tag Image File Format

RDF serialization formats support:

  • RDF/XML,
  • RDF/XML-ABBREV,
  • TURTLE,
  • N3,
  • N-TRIPLE.

Supported (query) output formats:

  • text,
  • csv, tsv,
  • rdf,
  • sse,
  • json,
  • xml and xmlstring.

SPARQL is supported for RDF data queries.