About
MetaMed is a meta data extraction tool for heterogeneous data in a different file formats. It is a non-interactive command line open source application.
MetaMed can be also useful as a simple SPARQL query tool and even as a RDF file serialization format converter.
Supported file formats: DASTA, DICOM, SITS (CSV, XML); image/multimedia files and file system directory structure can be indexed to RDF, too.
Features
- Support for operations:
- empty graph,
- import RDF files to the graph,
- extract meta data into the graph,
- export RDF data from the graph,
- and query RDF data using SPARQL.
- Sequential operations processing. You can activate one or more operations by selecting command line options.
- Extract meta data from selected files or the whole input data directories or files recursively.
- Use a file with the list of input data files which enable you to select files for meta data extraction. Each file's (absolute or relative) path on a one row.
- Graph can be stored:
- in-memory only,
- in-memory, but always automatically backed from/to specified file system root directory (Note: If graph in file system exists it is loaded into the memory automatically and before application exit it is stored back to the file system. The store/load delay is significant with a large graph.),
- in OpenLink Virtuoso OpenSource database.
- Export an application built-in prefix map into a text file.
- Set the graph name.
- Empty the the graph.
- Can print the graph size automatically before and after operations.
- Import RDF files(s) to the graph.
- Import GZipped RDF files is possible with all data stores. MetaMed will gunzip file when reading.
- Set the directory with SPARQL files. One query per file with prefixes or without them when known to MetaMed. All queries will be automatically extended with build-in prefixes and processed.
- Select output format for all queries.
- Export an application built-in prefix map into a text file.
- Export the whole RDF graph to a file with specified RDF serialization.
- Merge RDF files – using import and export options (e.g. with in-memory graph).
- Use this tool for RDF serialization conversion (e.g. with in-memory graph, you can use export RDF graph property).
- Plugable file format support for extractors. You can implement own plugins with support for other file formats.
- Parallel data processing - you can set number of threads for:
- meta data extraction
- querying data by SPARQL
- graph export (when extracted by more threads)
- Configuration file order:
- user specific ~/.mre/connection.properties.
- or system configuration in the /etc/mre/connection.properties files.
- specify your own file by -virt argument.
- The file connection.properties content looks like:
url=jdbc:virtuoso://localhost:1111 user=username password=password graph=https://mre.zcu.cz/dataset/example
Property graph is the default graph name. It is used when connecting/opening graph. You can change used graph by --graph argument.
Input and output format support
Implemented support of file formats:
- DASTA – DAta STAndard (the Czech language only) is national standard for medical data by Ministry of Health of the Czech Republic,
- HL7 – Health Level Seven (HL7) – partial/development support only, because (still) it is not a national standard in the Czech Republic.
- DICOM – Digital Imaging and Communications in Medicine,
- Stroke data – our internal XML format for diagnostic data and treatment results during three months after the stroke incident,
- PNG – Portable Network Graphics
- JFIF/JPEG – JPEG File Interchange Format/Joint Photographic Experts Group
- TIFF – Tag Image File Format
RDF serialization formats support:
- RDF/XML,
- RDF/XML-ABBREV,
- TURTLE,
- N3,
- N-TRIPLE.
Supported (query) output formats:
- text,
- csv, tsv,
- rdf,
- sse,
- json,
- xml and xmlstring.
SPARQL is supported for RDF data queries.