Corpus Analysis – Part II

In the first part of the series, I covered the importance of corpus analysis and how a tool called AntConc can be used to learn more about your corpus in a smart, efficient way. This second part focuses on the Clusters/N-Grams feature in AntConc. Tips and techniques to use it effectively are included. Clusters/N-Grams This…

Corpus Analysis – Part I

The Introduction As you probably know, Statistical Machine Translation (SMT) needs considerably big amounts of text data to produce good translations. We are talking about millions of words. At the same time, SMT has the ability to translate millions of words relatively fast (VERY fast, in comparison to human translators). In this scenario, and speaking…

Visualizing Translation Quality Data – Part I

There is no knowledge that is not power We are, no doubt, living some of the most exciting days of the Information Age. Computers keep getting faster, smartphones are ubiquitous. Huge amounts of data are created daily by amazingly diverse sources. It is definitely easier than ever to gather data for language services buyers and…

Polysemy in Statistical MT – Tips for Linguists

A quick introduction Perhaps Machine Translation would be solved by now if each word had only one meaning and, thus, only one possible translation. That is obviously not the case in the world we live in. Words have multiple meanings, and that is what we call polysemy. In the same way a concept can be…

Linguists: Create Your Own Customized Google Search Engine

Not long ago, I learned that Google lets you create your own search engine. Google Custom Search was basically designed to allow website owners to add a search engine for their sites and help users find the information they are looking for. Now, since you can customize this engine, you get to choose which site(s)…

The Basics of Quality Estimation

Quality Estimation is a method used to automatically provide a quality indication for machine translation output without depending on human reference translations. In more simple terms, it’s a way to find out how good or bad are the translations produced by an MT system, without human intervention.

Google Hacks for Linguists

Google has definitely changed the way in which translators and linguists work, and it has become our main tool for language research. It does make sense to know how to make the most out of our Google searches to find exactly what we need amidst mountains of information.

Better, Faster, and More Efficient Post-editing

Post-editing (PE) is correcting machine translation output. There isn’t a simpler way to put it. However, in spite of the simple definition, there is a lot that it is still unclear about post-editing. The fact that there are different degrees of post-editing and the lack of a general PE business model companies can use, certainly does not help. Let’s try to clarify some of these issues and share some tips for better PE.

5 Tools to Build Your Basic Machine Translation Toolkit

If you are a linguist working with Machine Translation (MT), your job will be a lot easier if you have the right tools at hand. Having a strong toolkit, and knowing how to use it, will save you loads of time and headaches. It will help you work in an efficient manner, as well. As…