ding2tei-haskell
================

A program to convert the Ding [0] dictionary into the TEI [1] format, to be
used within the FreeDict project [2].

This program was written as part of a Bachelor's thesis.  See doc/thesis.pdf.


Dependencies
------------

* main ding2tei program:
  * Haskell (Glasgow Haskell Compiler / GHC)
    * base
    * containers (Data.Map, Data.Set)
    * transformers (Control.Monad.Trans.{State,Writer})
    * pretty (Text.PrettyPrint)
    * haskell-xml (Text.XML.Light)
    * safe (Safe.Exact)
  * Happy
  * Alex
* preprocessing:
  * Bash
  * sed (with support for `-E`; e.g., GNU sed, OpenBSD's sed)
* `make` (optional)
  * GNU make
  * standard UNIX utilities (provided by, e.g., GNU coreutils)
  * curl (`make download-ding-1.9`)
  * xz (`make download-ding-1.9`)

On Debian-based distributions:

# apt install ghc libghc-xml-dev libghc-safe-dev happy alex bash sed make \
    coreutils curl xz-utils


Locale
------

Use a UTF-8 locale, otherwise some things might produce unexpected results or
even fail.  In particular, do not use the C locale.


Obtain the Ding dictionary
--------------------------

$ make download-ding-1.9

Alternatively, it can be manually downloaded from:

 https://ftp.tu-chemnitz.de/pub/Local/urz/ding/de-en/

and stored (uncompressed) at:

 dict/ding/de-en.txt


Build ding2tei
--------------

$ make

You may need to include `-dynamic` in HCFLAGS; e.g.:

$ make HCFLAGS='-O2 -Wall -dynamic'


Build TEI dictionaries
----------------------

$ make deu-eng
$ make eng-deu

Note: The main program (ding2tei) allocates a lot of memory, around 5.7 GiB,
      when applied to the full Ding dictionary.


Debug / Run ding2tei manually
-----------------------------

# TODO: make (alex, happy)
$ make haskell-source-files
$ ghci -i -ibuild:src -XRecursiveDo Test
>> parseLine $ scan "some dictionary :: line\n"
>> pretty $ parseLine $ scan (examples !! 0)
>> (Just ding, log) = parse $ scan $ header ++ (concat examples)
>> pretty ding
>> tei = ding2tei $ enrichDirected $ enrichUndirected ding
>> putStr $ prettyTEI $ tei
>> :q

$ ./ding2tei --help
$ ./ding2tei doc/examples/test.ding - | pager

See also:

 src/Main.hs
 GNUmakefile


Usage with the FreeDict tools
-----------------------------

To use the resulting TEI files with the FreeDict tools:

$ make -C dict/tei/lg1-lg2

Notes:
 * Above, lg1-lg2 should be either of deu-eng and eng-deu.
 * Memory usage of the FreeDict tools is even higher than that of ding2tei.
 * Running time is quite long (~ 1 day).
 * See <https://github.com/freedict/tools> for more information.


Licensing
---------

ding2tei-haskell is distributed under the GNU Affero General Public License
(AGPL), version 3 or later.

See the license headers in the source files and the license itself for details
(should be supplied with this program under the name COPYING).

The Ding source is licensed under the GNU General Public License (GPL), version
2 or later.  In the context of this program, version 3 of the license should be
considered, since it allows combining with the AGPLv3.  (Likely, later versions
will also allow combining.)

The combination of the Ding dictionary and this program form a combined work,
the resulting TEI dictionary is an "object form" of this combined work, in
terms of both the GPL and AGPL.


Author
------

ding2tei-haskell is written by Einhard Leichtfuß.
Feel free to contact me by e-mail to <alguien@respiranto.de>.


References
----------

[0] https://www-user.tu-chemnitz.de/~fri/ding/
[1] https://tei-c.org/
[2] https://freedict.org/
