If i download and combine many Translation memories, would the TM predicttions become betert ?
Thread poster: Vitor Pinheiro
Vitor Pinheiro
Vitor Pinheiro
Brazil
Local time: 14:50
English to Portuguese
+ ...
Apr 21, 2022

I notice that i can keep copying Translation memories from older projects to new projects in order to keep udpating the database of one single TM file, and i think that would be usefull, even if the range of subjects varies, tho i'm not sure.

So, wouldnt also be usefull to do it with as many TM i could find ?

I'm not sure if, somehow, the prediction system would become inacurate by increasing the quantity of acurate, but random, translation units, as i'm trying to do..
... See more
I notice that i can keep copying Translation memories from older projects to new projects in order to keep udpating the database of one single TM file, and i think that would be usefull, even if the range of subjects varies, tho i'm not sure.

So, wouldnt also be usefull to do it with as many TM i could find ?

I'm not sure if, somehow, the prediction system would become inacurate by increasing the quantity of acurate, but random, translation units, as i'm trying to do... What are your thoughts on that ?
Collapse


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 13:50
Russian to English
+ ...
Doesn't make any sense to me Apr 22, 2022

I'm not the most knowledgeable person, but in my experience that's not what TMs are for. They don't make "predictions," but are used to retrieve similar or identical text that you already translated for another job. Adding a bunch of TMs on unrelated subjects would do nothing for you.

Maybe one of the more expert users will chime in.... See more
I'm not the most knowledgeable person, but in my experience that's not what TMs are for. They don't make "predictions," but are used to retrieve similar or identical text that you already translated for another job. Adding a bunch of TMs on unrelated subjects would do nothing for you.

Maybe one of the more expert users will chime in.
You can also ask at https://sourceforge.net/projects/omegat/lists/omegat-users
Collapse


Jorge Payan
Philippe Locquet
Vitor Pinheiro
expressisverbis
 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 18:50
English to French
+ ...
Principles Apr 22, 2022

Having all the wood essences in the world doesn't make you a carpenter.
It has often been said that a CAT tool is not supposed to translate in your place.

As Susan stated, your TM helps you with similar text previously translated. To understand where you are going and what kind of expectations to have, here are some concepts to research and understand:
_TM is not MT
_Scores and TM matching (fuzzy, 100% etc.)
_Fuzzy Threshold
_Concordance (or TM lookup)<
... See more
Having all the wood essences in the world doesn't make you a carpenter.
It has often been said that a CAT tool is not supposed to translate in your place.

As Susan stated, your TM helps you with similar text previously translated. To understand where you are going and what kind of expectations to have, here are some concepts to research and understand:
_TM is not MT
_Scores and TM matching (fuzzy, 100% etc.)
_Fuzzy Threshold
_Concordance (or TM lookup)

And maybe,
_Penalties

Once you understand this, you'll know better how you want to use/shape your TM to assist you.
Be well
Collapse


Vitor Pinheiro
expressisverbis
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 19:50
Member (2006)
English to Afrikaans
+ ...
Big momma TM Apr 22, 2022

Vitor Pinheiro wrote:
I notice that i can keep copying Translation memories from older projects to new projects in order to keep udpating the database of one single TM file, and I think that would be usefull, even if the range of subjects varies, though I'm not sure.

What you describe is what's known in the industry as a big momma TM. It is a single TM that contains translations from multiple clients and multiple subject fields. Some translators prefer to use such a TM. There is nothing wrong with such an approach. I myself have used it for a number of years and I had no complaints. There are advantages, however, to having separate TMs for separate clients or separate subject fields.


Vitor Pinheiro
expressisverbis
 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 18:50
English to French
+ ...
as many TM i could find Apr 22, 2022

Samuel Murray wrote:
What you describe is what's known in the industry as a big momma TM.

True

Vitor Pinheiro wrote:
So, wouldn't also be useful to do it with as many TM i could find?

However, putting anything you can get your hands to make it big doesn't mean it will be useful.
If your new text is different from what's in your TM, you will still get very little suggestions (or suggestions that are quite different if you lower your fuzzy threshold).
You may also depend on low-quality translations. Most translators are not keen on just handing-over their TM, so superior quality TM are not that easy to come by (even less when they include content protected by an NDA, which in my sense is a no-go for Big Momma TM strategies).

I explain how to extract big TM available for free in this video https://youtu.be/wVeU9NKEYjM
DGT-TM has good and average quality content, I tend to find in my language pairs that later year's files from the DGT-TM tend to have better quality.

Have fun experimenting!


Vitor Pinheiro
 
tcordonniery
tcordonniery
France
Local time: 19:50
RAM usage May 4, 2022

Hi all

I won't speak from a translator point of view, other people here already answered correctly. Here a technical addition.

One potential problem for usage of big memory is that OmegaT, contrarily to most other CAT tools, loads the full contents of the TMX directly in the RAM (memory) of the computer. If you have gigabytes of TMX in the tm/ folder you will have as a consequence, lot of memory usage and since it is not indexed, searches will become very slow.
So
... See more
Hi all

I won't speak from a translator point of view, other people here already answered correctly. Here a technical addition.

One potential problem for usage of big memory is that OmegaT, contrarily to most other CAT tools, loads the full contents of the TMX directly in the RAM (memory) of the computer. If you have gigabytes of TMX in the tm/ folder you will have as a consequence, lot of memory usage and since it is not indexed, searches will become very slow.
Some people here will add that OmegaT also supports zipped TMX in the tm/ folder ; but this saves only space in the disk, not in memory, because the contents is unzipped in memory during the load.

What should be done instead
1. save all your tmx files in a database.
2. when you start a new translation, submit the source file to the database: as a result you will receive smaller TMX file(s) which contain only the segments which have a chance to appear in the results.
3. when your translation is finished, take the TMX file of the project (you can take project_save.tmx or one of the level 1 or level 2, depending whenever you want tags or not) and add it to the database for next time

I did not find convincing free/libre/open source database-based translation memory servers (there is MyMemory which is free, but not really open source; there are some other free servers like Wordfast TM but they are not open source and OmegaT does not have a plugin to access them) so I wrote my own one. Have a look to http://www.silvestris-lab.org/ and in particular to the projects Exilis and Elefas: these are two approaches for indexed translation memories (Elefas is a server, Exilis is a local index).

These tools are not linked to OmegaT (which does not have the notion of translation memory plugin, while it has notion of machine translation plugin) but since it works under the principle of batch extraction (described by the 3 steps in the previous section) rather than segment-per-segment call, they are potentially compatible with any kind of CAT tool even without a plugin.

For the problem of "unrelated subject" mentioned by Susan, Elefas has the notion of collection: you can decide to have one collection per topic (or as suggests Samuel, one per client), specify for each document which topic(s) it is related to, and include collection as a filter during a search.

These two tools are in early stage of development and probably hard to use. Don't hesitate to report result of your tests, any problems you could have with it, I am always happy if I can improve them. In any case they are open source (EUPL license). Note that these are only database tools, they do not contain any translation memory by themselves (but you can try for example to import DGT-TM, mentioned in previous post, for example).

Hope it helps...
Collapse


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


If i download and combine many Translation memories, would the TM predicttions become betert ?






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »