Disclaimer: this is a long post - to be fair, it has been over 280 days since the last update!
It's been a long few months for the Humble Foundation. With Muhammad Tim moving to Dubai, we haven't been able to update you as often as we would have hoped. Having said that, we all feel that the charity is in a better position now than at any time previously, and all praise is for Allāh.
With Muhammad Tim now settled, and able to give a significant portion of his time to the Humble Foundation projects, we are on schedule to meet our 2015 deadlines, in shā' Allāh. However, we recognise that we need to improve our updates and communication, hence the new blog!
I'm almost working full-time on Humble Foundation projects for the next couple of months, since I'm not teaching. Primary goals are to finish volume eight of Tafseer as-Sa'dī (keep an eye out for that on twitter which I'm intending to start up again next week), to finish the Allah's Names first draft, and to complete Welcome to Islam. All of these are scheduled for release before Hajj, and I very much hope that with the help of Allāh, they will provide the critical mass needed to push the foundation to the next level. I'm quietly confident, that with the help of Allāh, then a strong push from the volunteers, we can meet our deadlines for the 2014/15 projects.
We also have some legal and regulatory things to tie up, with Basak Omar taking a lead on that during his summer break from the Islamic University of Madeenah.
Apart from keeping you updated, we thought it would be nice to share with you some of the technology that we are using to assist with translation. Looking at our projects for 2014/15, two involve major amounts of translation: The Qur'an Project, including the translation of Tafseer as-Sa'dī; and Allah's Names. As a small organisation with a limited budget, we needed to find a way to facilitate the translation of large amounts of Arabic text (Tafseer as-Sa'dī runs to almost 2000 pages) into English. We also wanted a way to achieve a high level of consistency, despite allowing community contributions, as well as making the translation easy to maintain, as we expect that there will be several draft releases before an official release is made. Add the need for robust backup to the mix, and we have a challenge on our hands!
- Git is extremely robust and well tested for managing teams of people editing and contributing to text.
- Git is a popular open standard, meaning that should Penflip fail, it is easy to migrate to other services.
- We already use Git for managing our websites, so there is a small learning curve.
- All changes are managed and preserved, and multiple backups are easily managed.
- The content is stored as plain text files, formatted with Markdown, meaning that we are not tied into any proprietary software or expensive licensing, and the files are open for all and easy to manipulate.
- Penflip provides a nice wrapper for all of this, in such a way that non-technical people can easily contribute, without being trained in the underlying technology.
However, like any tool, there are some limitations, including:
- Right-to-left support is poor (as is the case with plain text in general). This is partially the fault of Markdown, and partially because Penflip doesn't have any clever language detection to make up for it.
- It doesn't help with translation accuracy, consistency, and so on.
- Some small but annoying bugs/caveats.
So, we started looking at tools to help with translation, and came across SDL Trados Studio, an industry-leading CAT (computer-assisted translation) tool. The benefits for us included:
- An interface specifically designed for translation.
- Excellent consistency, through the use of terminology databases.
- A tool that 'learns' from the translator, and offers suggestions based on previously translated text.
- Something much more sophisticated than Google Translate, with access to multiple sources of translation and translation engines.
- A significant increase in productivity over time.
Once again, there were some concerns:
- It's not cheap - editions start at several hundred pounds per license, running to thousands of pounds for the top packages.
- It's proprietary, and so we are locked in to using the product.
- It doesn't support community collaboration, at least not without expensive server software.
- It doesn't handle all input file formats equally. Right-to-left Arabic PDF documents seem the worst of the lot, although that is most likely because they are badly produced in the first place.
- Some small but annoying bugs/caveats - for such expensive software, it isn't as polished as you would expect.
In the end, it was a special offer that sold it for us, with one of the top packages on sale for around £450, including a free upgrade to the 2015 version on release. That would give us a minimum of 18 months before wanting to upgrade - in shā' Allāh - and conceivably more than that. After a 30-day free evaluation, and doing some calculations on both current and future translation volumes, it was well worth the investment, especially if it plays a big role in producing the Qur'an and Tafseer translations. To put this in context, we were willing to pay up to £5,000 to purchase the rights to Tafseer as-Sa'dī from one of the existing organisations who were translating it, but were unable to reach a deal. If this software, after the help of Allāh, makes it possible to complete the translation within the allotted time, and with no additional cost other than volunteer time, and a better standard of translation than the one that we were trying to purchase, then the investment has paid for itself many times over. Of course, we can't be sure that this will be the case, but based on the evaluation, we were happy to proceed.
What followed was several days of training and the odd post to SDL's support forum. Since the software uses databases of existing translations to learn, as well as terminology databases, we needed to feed it something to get started. What better than some existing (copyright-free) translations of the Qur'an! At first we tried the inbuilt alignment tool, but soon gave up - it was a nightmare to get the Arabic and English aligned, and we needed to get the translations into a format that would be recognised by the software another way. An SDL engineer recommended Excel, with Arabic in one column, and English in the other. A free add-on package called Glossary Converter does the rest. This is the process we settled upon:
- Download the text of the Qur'an; we used a combination of Quran.com, SearchTruth because of the HTML tables which paste nicely into Word and Excel, and Tanzil, with quite a bit of cleanup needed.
- Open the text file in Microsoft Word.
- Use a wildcard replace "[0-9]* |[0-9]* |" to remove the sūrah and āyah numbers from Tanzil, as well as some Word macros in VBA to clean things up for the other site.
- Copy and paste into Excel, and check everything lines up properly.
- Use Go To...Special...Blank Cells to remove any blank rows.
- Confirm that the Arabic and English match throughout the document.
- Import the Excel file into Glossary Converter, and output to TDX.
- Import the TDX file into Trados Studio and use as a translation memory, build auto-suggest dictionaries, and so on.
This gave us another idea: that we should share these translation memory files and dictionaries once they are complete. These are a huge investment for any translator, and we are a not-for-profit organisation, so there's no reason not to share! While the software is expensive, the cheaper 'freelance' versions are not so bad, and by providing our dictionary files and translation memories, we hope to be able to contribute to a generally higher standard of specialist Islamic translation, as well as to encourage the use of professional translation tools, as opposed to having our best translators do everything in Microsoft Word.
For the next step in our project, we were not content to lose all of the benefits of Penflip/Git, and so we started looking for a way to integrate the two. Since Trados Studio is perfectly capable of handling plain text files, and Git (at least for most of the time) is nothing more than a collection of plain text files, we reasoned that it should be possible to take the input from our Penflip Git repository, pull it into Trados Studio, and then output to the same place. This is a work in progress, but so far, things are going well. In the end, we hope that we can automatically publish our updates on a regular basis to Penflip (or another Git-based solution), and get the best of both worlds, including community contributions and the robust change management that Git provides.
Finally, we are at the stage where we are starting to think about how we will publish our first drafts. Expensive typesetting is out of the question (at least until the first major release), and isn't really compatible with our aim to be open for everyone to use and benefit from (how many people have a license for Adobe InDesign and the knowledge of how to use it?). We're currently researching LaTeX (no, not the rubber material - the computer software for typesetting and document preparation), specifically XeLaTex and Polyglossia. The idea is to eventually migrate away from Markdown (or at least to use some sort of hybrid), and move to storing the text using LaTeX formatting and templates. If all goes well, with the help of Allāh, we should be able to automatically generate PDF and other formats, with at least semi-professional typesetting and layout. Once again, any templates which are developed will be shared with the community, in shā' Allāh.
What's exciting about all of this from our point of view is that we are pushing boundaries and thinking outside of the box; we aren't content with just translating. Instead, we are aiming, with the help of Allāh, to develop a more efficient process and toolset for translation, that can be used and shared with others around the world. If successful, this could potentially have a greater impact than the translations themselves, particularly as commercial publishing of quality Islamic books is so rare, and so fraught with problems.
Looking forward to sharing some our work with you, and wishing you all a successful and productive Ramaḍān!
Muhammad Tim & Basak Omar