08 April 2010

Removing all accents (and other diacritics) from a String

Valid since: op4j 1.0

Remove all diacritics common in European languages from the characters in a String, converting the text into an ASCII-compatible String. A common operation, for example, in text search comparison scenarios.

Our conts variable is an array containing the names of the Earth's continents in Castilian Spanish language:
//conts == ARRAY [ "África", "América", "Antártida", "Asia", "Europa", "Oceanía" ]
...and, knowing that our users might forget to input accents in our application, we need to strip all the accents from those texts so that searches are not influenced by their bad ortography:
//conts == ARRAY [ "Africa", "America", "Antartida", "Asia", "Europa", "Oceania" ]

Use the op4j asciify() function in the FnString function hub class, which is able to transform accented characters(and also other diacritics) into their non-accented equivalents.

Also, this function will have to be applied to each element of the array, and so a map(..) action will be needed for executing the function.

conts = Op.on(conts).map(FnString.asciify()).get();

This will be of course less verbose -but equivalent- than:

conts = Op.on(conts).forEach().exec(FnString.asciify()).get();

But, what if we also wanted an uppercase output? Well, just throw in the FnString.toUpperCase() function:
conts = 
Or equivalently, we could chain both functions into one:
conts = 
    Op.on(conts).map(FnFunc.chain(FnString.asciify(), FnString.toUpperCase())).get();

No comments:

Post a Comment