Interface LanguageDetector
- All Known Implementing Classes:
LanguageDetectorImpl
See website for details.
This detector cannot handle well: Short input text, can work or give wrong results. Text written in multiple languages. It likely returns the language for the most prominent text. It's not made for that. Text written in languages for which the detector has no profile loaded. It may just return other similar languages.
-
Method Summary
Modifier and TypeMethodDescriptioncom.google.common.base.Optional<LdLocale>
detect
(CharSequence text) Returns the best detected language if the algorithm is very confident.getProbabilities
(CharSequence text) Returns all languages with at least some likeliness.
-
Method Details
-
detect
Returns the best detected language if the algorithm is very confident.Note: you may want to use getProbabilities() instead. This here is very strict, and sometimes returns absent even though the first choice in getProbabilities() is correct.
- Parameters:
text
- You probably want aTextObject
.- Returns:
- The language if confident, absent if unknown or not confident enough.
-
getProbabilities
Returns all languages with at least some likeliness.There is a configurable cutoff applied for languages with very low probability.
The way the algorithm currently works, it can be that, for example, this method returns a 0.99 for Danish and less than 0.01 for Norwegian, and still they have almost the same chance. It would be nice if this could be improved in future versions.
- Parameters:
text
- You probably want aTextObject
.- Returns:
- Sorted from better to worse. May be empty. It's empty if the program failed to detect any language, or if the input text did not contain any usable text (just noise).
-