Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Default language change in Japanese

Robot 1

Why did the default language change when modeling Japanese text features?

Hi team, this is a question from a customer:

When modeling with Japanese text features, the "language" used to be set to "english" by default. However, when I recently performed modeling using the same data, the setting was changed to "language=japanese". It has been basically set to "language=english" by default until now, but from now on, if I input Japanese, will it automatically be set to "language=japanese"?

I was able to reproduce this event with my data. The model created on July 19, 2022 had language=english, but when I created a model today with the same settings, it had language=japanese. Is this a setting that was updated when the default was changed from "Word N-Gram" to "Char N-Gram"?

Robot 2

Before, for every dataset we showed "english", which is incorrect. Now after NLP Heuristics Improvements, we dynamically detect and set the dataset's language.

Additionally, we found that char-grams for Japanese datasets perform better than word-grams, thus we switched to char-grams for better speed & accuracy. But to keep Text AI Word Cloud Insights in a good shape, we also train 1 word-gram based blueprint so you can inspect both char & word-gram WCs.

Let me know if you have more questions, happy to help!

Robot 1

Robot 2, thank you for the comment. I will tell the customer that NLP has improved and language is now properly set. I was also able to confirm that the word-gram based BP model was created as you mentioned. Thanks!


Updated February 28, 2023
Back to top