An algorithm for segmentation?
دھاگا پوسٹ کرنے والے: CafeTran Trainer
CafeTran Trainer
CafeTran Trainer
نیدر لینڈ
رکن (2006)
Feb 5, 2022

Since we are discussing segmentation at the Keyboard Maestro forum, I was wondering whether someone can provide an algorithm for segmentation? E.g. in VBA, basic, AppleScript etc.

 
Joakim Braun
Joakim Braun  Identity Verified
سویڈن
Local time: 10:17
سویڈشسےجرمن
+ ...
Objective-C Feb 5, 2022

Core Foundation and Cocoa on MacOS. Not very portable, but it illustrates the general approach: An object that slices up a string based on delimiters (in this case built into CFStringTokenizer). This should work across many languages and writing systems. If locale was irrelevant we wouldn't need the tokenizer and could reduce the code to one line or a couple of lines.

NSMutableArray<NSString*>* sentences = [NSMutableArray array];
CFLocaleRef locale = CFLocaleCopyCurrent
... See more
Core Foundation and Cocoa on MacOS. Not very portable, but it illustrates the general approach: An object that slices up a string based on delimiters (in this case built into CFStringTokenizer). This should work across many languages and writing systems. If locale was irrelevant we wouldn't need the tokenizer and could reduce the code to one line or a couple of lines.

NSMutableArray<NSString*>* sentences = [NSMutableArray array];
CFLocaleRef locale = CFLocaleCopyCurrent ();
NSString* aStr = @"A string? With some sentences. In it!";
CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (__bridge CFStringRef) aStr, CFRangeMake(0, aStr.length), kCFStringTokenizerUnitSentence, locale);

for(;;)
{
CFStringTokenizerTokenType tokenType = CFStringTokenizerAdvanceToNextToken (tokenizer);

if(tokenType != kCFStringTokenizerTokenNone)
{
CFRange cfr = CFStringTokenizerGetCurrentTokenRange (tokenizer);

[sentences addObject:[aStr substringWithRange:NSMakeRange(cfr.location, cfr.length)]];
}
else
{
break;
}
}

CFRelease(tokenizer);
CFRelease(locale);

[Bearbeitet am 2022-02-05 20:35 GMT]
Collapse


CafeTran Trainer
Philippe Locquet
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

An algorithm for segmentation?







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »