As we wrote in our last blog post:
Finally, we are at the stage where we are starting to think about how we will publish our first drafts. Expensive typesetting is out of the question (at least until we produce a major release), and isn't really compatible with our aim to be open for everyone to use and benefit from (how many people have a license for Adobe InDesign and the knowledge of how to use it?)...
So, this week - in shā' Allāh - we'll be sharing what we've learnt so far. The reason there is so much urgency on this front is that as every day goes by, more translation content is produced (at least in theory!). The current format of Markdown is not good enough to produce a high quality output, and a lot of work would be needed for typesetting. Paying for a professional would be expensive, and having it done in-house would be time consuming. Worst of all, there would be no way to update the content as new sections are translated, at least not without doing a part of the work again. Therefore, we need a solution that will take our text and produce a semi-professional book layout automatically. The current working theory: LaTeX.
Once again, we wanted to streamline the workflow, so we don't really want to have to convert something into something else. This is where the website sharelatex.com comes in. The service is very similar to Penflip, but uses LaTeX as the file format. It supports XeLaTex and Polyglossia, which means that they have excellent right-to-left Arabic and font support, and you can edit the content directly online, as well as receive contributions from other users. The free account doesn't have git integration, so we settled on the student account (there's nothing in the terms and conditions that says that you have to be a student!), which costs $8 per month. It is possible to get an account with Dropbox support for free on sharelatex.com, by recommending a few friends, but Dropbox doesn't give us the same amount of power when it comes to managing the large number of files and - in shā' Allāh - the large number of contributors that we are aiming for. We could host our own version of the software for free, but this would cost somewhere in the region of $40 per month in server fees, so right now, it isn't worth it.
This is our new workflow:
- Create a blank project on sharelatex.com (see our Qur'an Project here).
- Link the project to our account at github.com.
- "Clone" (i.e., download) the project using Github for Mac.
- Generate the project layout.
- Import the individual files into our translation software, SDL Trados Studio.
- Translate and save the files, with regular "commits" (i.e., saves) to the git repository.
- "Push" (i.e., upload) the changes using Github for Mac.
- Synchronise the changes with sharelatex.com, tweak and preview.
- Generate PDF and other formats, using sharelatex.com, and in future in shā' Allāh, custom tools for things like eBooks and online content.
This workflow allows us to do some pretty advanced things, including semi-professional typesetting, automatic PDF generation, user contributions (we get 6 'power users' who can contribute via sharelatex.com, but we can accept an unlimited number of contributions via github.com), selective undo (undo one change made three weeks ago and keep everything else since then), branches (where the new version of the book is being worked on separately and doesn't affect the previous version), and version tagging (so everyone knows what version of the book they have, and can keep updated with changes). Most importantly, we can translate directly within these files, so no need for copy/paste between translation software and typesetting software.
Future research includes getting the most out of the LaTeX system (it's quite complex to learn), integrating the King Fahd Qur'an Printing Complex fonts into the output, as well as working on automatically generating formats other than PDF, such as automatically updating the website with the latest translation, and automatically producing eBooks. All of these things are - with the help of Allāh - possible, at least according to initial research, but there's a lot of work needed to actually get there.
The last topic for the blog post today is step 4 in the workflow above. One of the nice things that LaTeX allows is building up a document made out of lots of smaller files, as well as choosing parts of a file to include or exclude. If we look at the content that we have for the Qur'an project, we can summarise as follows:
- The Arabic of each aayah, currently an image from quran.com.
- Our Qur'an translation of each aayah, which may include footnotes.
- Alternative translations for other styles of recitation, such as Warsh.
- Our Tafseer translation, for each aayah.
- Sūrah titles, including alternate names for each.
- Translators comments, including the rationale for choosing particular words.
- Tags, so that you will be able to search aayaat by tag and topic.
- Some poetry (more on that another time!).
Now, it's pretty obvious that not all of this will be published in the same document, and some will not be published at all (at least not in the traditional way). We might even have different versions of the Qur'an translation, some which will include things that others don't. The web version will need to include tags, and we could have two print versions, one with and one without footnotes. We might want to produce a translation only for Warsh, or a translation that compares both Warsh and Ḥafṣ. The book of tafseer is likely to include some parts of the Qur'an translation (at least Arabic aayaat), but not all. Right now, the working theory is to have one file per aayah, and then use some LaTeX features to include/exclude content. Then, we can build up each sūrah from the smaller files, and also build up different versions, according to which content we want. A few large 'main' files bring together all of these into one book. We're not sure how far we need to take this, and at what point we might decide to make separate projects; but for now, we'd like to keep everything in one place.
This is a sample of the current layout for Sūrah al-‘Aṣr:
- File: 103-001.tex
- File: 103-002.tex
- File: 103-003.tex
The last file gathers together the content of files -001 to -003, along with the sūrah title and associated information. Then at a higher level, we have something like:
This gathers together 001.tex to 114.tex, including all of the content. Then, we have something like:
- File: quran-translation-hafs.tex
- File: tafseer-as-sadi.tex
Which simply manipulate main.tex to include/exclude certain types of content. Finally, we have something like:
- File: quran-translation-hafs-for-pdf.tex
- File: quran-translation-hafs-for-ebook.tex
- File: quran-translation-hafs-for-web.tex
These apply some custom formatting, with possible some further including/excluding of content. We're currently in the process of testing this system right now.
However, this gives us another problem, albeit a small-ish one: how do we go about generating all of these files. Cutting and pasting over 6,000 aayaat* of the Qur'an, along with 1,900 pages of tafseer doesn't sound like too much fun. We need to auto-generate the files and crucially, the basic content within each file. For this, we're using a Ruby script, and the Nokogiri library. The idea is that the script will pull the data from King Saud University's Qu'ran page, and quran.com. For those interested in how we're doing it, there are some scripts that can be found on our github quran page, in the _scripts folder. We're also using this excellent image generation project from the people at quran.com.
That's more than enough information for now!
Muhammad Tim & Basak Omar