[opensuse-factory] Sudden loss in ability to identify CJK characters
I am not sure which update this started occuring in, but I have suddenly lost the ability to properly view, navigate and open files and folders written in Chinese or Japanese in the terminal. I can see and write the characters quite clearly in Dolphin. This seems to lead to an inability to unzip and unrar files and folders if they contain CJK characters. This for example is what 王菲-心經-bAVWsXm1ToY.mkv and 般若波羅蜜多心經 - 黃思婷- rUnNUqLgjTI.webm appear as in the terminal when there was no previous issue. ??????-??????-bAVWsXm1ToY.mkv ???????????????????????? - ?????????-rUnNUqLgjTI.webm I tried to youtube-dl a video with CJK characters and this was the warning that appeared: WARNING: Assuming --restrict-filenames since file system encoding cannot encode all characters. Set the LC_ALL environment variable to fix this. I then opened /etc/sysconfig in Yast and navigated to System > Environment > Language > RC_LC_ALL and filled in the empty option with en_GB.UTF-8, which I presume to be the right option. But the problem still persists. In fact, I also get weird files like "???c_? .webm" where attempts to remove them from Dolphin show the message: "Unable to run the command specified. The file or folder /home/rewarp/Videos/���c_� .webm does not exist." Is anyone able to help?
On Friday, 26 May 2017 6:38:50 PM +08 Chan Ju Ping wrote:
I am not sure which update this started occuring in, but I have suddenly lost the ability to properly view, navigate and open files and folders written in Chinese or Japanese in the terminal.
I can see and write the characters quite clearly in Dolphin.
This seems to lead to an inability to unzip and unrar files and folders if they contain CJK characters.
This for example is what 王菲-心經-bAVWsXm1ToY.mkv and 般若波羅蜜多心經 - 黃思 婷- rUnNUqLgjTI.webm appear as in the terminal when there was no previous issue.
??????-??????-bAVWsXm1ToY.mkv ???????????????????????? - ?????????-rUnNUqLgjTI.webm
I tried to youtube-dl a video with CJK characters and this was the warning that appeared:
WARNING: Assuming --restrict-filenames since file system encoding cannot encode all characters. Set the LC_ALL environment variable to fix this.
I then opened /etc/sysconfig in Yast and navigated to System > Environment
Language > RC_LC_ALL and filled in the empty option with en_GB.UTF-8, which I presume to be the right option.
But the problem still persists. In fact, I also get weird files like "??? c_? .webm" where attempts to remove them from Dolphin show the message: "Unable to run the command specified. The file or folder /home/rewarp/Videos/���c_� .webm does not exist."
Is anyone able to help?
I poked around a bit more and got youtube-dl to work like normal again by setting en_GB.UTF-8 in RC_LC_CTYPE. However, CJK characters still appear like ????? in the terminal.
On Friday 2017-05-26 12:54, Chan Ju Ping wrote:
.webm" where attempts to remove them from Dolphin show the message: "Unable to run the command specified. The file or folder /home/rewarp/Videos/?????????c_??? .webm does not exist."
Is anyone able to help?
I poked around a bit more and got youtube-dl to work like normal again by setting en_GB.UTF-8 in RC_LC_CTYPE. However, CJK characters still appear like ????? in the terminal.
Run the "locale" program from an xterm. What result do you get? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Friday, 26 May 2017 7:11:36 PM +08 Jan Engelhardt wrote:
On Friday 2017-05-26 12:54, Chan Ju Ping wrote:
.webm" where attempts to remove them from Dolphin show the message: "Unable to run the command specified. The file or folder /home/rewarp/Videos/?????????c_??? .webm does not exist."
Is anyone able to help?
I poked around a bit more and got youtube-dl to work like normal again by setting en_GB.UTF-8 in RC_LC_CTYPE. However, CJK characters still appear like ????? in the terminal.
Run the "locale" program from an xterm. What result do you get?
I removed the setting in LC_ALL back to its original empty form. Output: --- locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_MY.UTF-8 LC_CTYPE=en_GB.UTF-8 LC_NUMERIC="en_MY.UTF-8" LC_TIME="en_MY.UTF-8" LC_COLLATE="en_MY.UTF-8" LC_MONETARY=en_MY.UTF-8 LC_MESSAGES="en_MY.UTF-8" LC_PAPER="en_MY.UTF-8" LC_NAME="en_MY.UTF-8" LC_ADDRESS="en_MY.UTF-8" LC_TELEPHONE="en_MY.UTF-8" LC_MEASUREMENT="en_MY.UTF-8" LC_IDENTIFICATION="en_MY.UTF-8" LC_ALL= --- I can properly download and save files in CJK again. And figured out how to delete files based on their inode address.
On Friday 2017-05-26 13:19, Chan Ju Ping wrote:
Run the "locale" program from an xterm. What result do you get?
I removed the setting in LC_ALL back to its original empty form.
Output:
--- locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_MY.UTF-8 LC_CTYPE=en_GB.UTF-8
Yeah I would expect there to be problems if LC_CTYPE uses a definition which does not exist in glibc (en_MY). -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Friday 2017-05-26 13:19, Chan Ju Ping wrote:
Run the "locale" program from an xterm. What result do you get?
I removed the setting in LC_ALL back to its original empty form.
Output:
--- locale: Cannot set LC_MESSAGES to default locale: No such file or
On Friday, 26 May 2017 8:07:10 PM +08 Jan Engelhardt wrote: directory
locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_MY.UTF-8 LC_CTYPE=en_GB.UTF-8
Yeah I would expect there to be problems if LC_CTYPE uses a definition which does not exist in glibc (en_MY).
I was trying to set locale settings as close to Malaysia as possible (recently moved back there from the US), but use British English as the language for everything else. I probably messed up somewhere in the settings. Didn't even realise this was the reason why I suddenly couldn't view CJK file formatting until I literally couldn't download and open files.
On Friday, 26 May 2017 8:22:37 PM +08 Chan Ju Ping wrote:
Yeah I would expect there to be problems if LC_CTYPE uses a definition which does not exist in glibc (en_MY).
I was trying to set locale settings as close to Malaysia as possible (recently moved back there from the US), but use British English as the language for everything else. I probably messed up somewhere in the settings. Didn't even realise this was the reason why I suddenly couldn't view CJK file formatting until I literally couldn't download and open files.
I found the module I was changing. It's the Format module for Numeric, Currency and Time Formats There was a en_MY region setting. I am guessing it's not ready for my particular setup yet.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2017-05-26 14:30, Chan Ju Ping wrote:
There was a en_MY region setting. I am guessing it's not ready for my particular setup yet.
Not in Leap, either: minas-tirith:~ # locate en_MY minas-tirith:~ # You have to find another locale. Unless it is packaged in something else I don't have installed :-? - -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" (Minas Tirith)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iF4EAREIAAYFAlkoLbgACgkQja8UbcUWM1ymIQD9HjiVCMEx1YyJe0QK6j2Wclfd 2OLZbuNfl0wcNND8mcgBAIWz0wrauL8Rm1kntuvzzRMwt3k5rqNXMpESxuFtIT1C =SYHr -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Carlos E. R. wrote:
Not in Leap, either:
minas-tirith:~ # locate en_MY minas-tirith:~ #
You have to find another locale. Unless it is packaged in something else I don't have installed :-?
It seems to exist at least - I get two matches on TW, but only from perl and python. /usr/lib/perl5/vendor_perl/5.24.0/DateTime/Locale/en_MY.pod /usr/lib/python2.7/site-packages/babel/locale-data/en_MY.dat But that doesn't help xterm/konsole, of course.... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sunday, 28 May 2017 11:04:58 PM +08 Peter Suetterlin wrote:
Carlos E. R. wrote:
Not in Leap, either:
minas-tirith:~ # locate en_MY minas-tirith:~ #
You have to find another locale. Unless it is packaged in something else I don't have installed :-?
It seems to exist at least - I get two matches on TW, but only from perl and python.
/usr/lib/perl5/vendor_perl/5.24.0/DateTime/Locale/en_MY.pod /usr/lib/python2.7/site-packages/babel/locale-data/en_MY.dat
But that doesn't help xterm/konsole, of course....
I have changed my locale and this is my output now: -- locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_GB.UTF-8 LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_MY.UTF-8 LC_COLLATE="en_GB.UTF-8" LC_MONETARY=en_MY.UTF-8 LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL= -- I still can't see the CJK characters in my terminal, and /etc/sysconfig Editor describes for LC_ALL, "This variable will override all LC-variables!! Again, ROOT_USES_LANG must be set to "yes", if an effect on the superuser account is desired." So that would make my LC_TIME setting meaningless, presumably. Or am I mistaken?
On Monday, 29 May 2017 5:34:21 PM +08 Chan Ju Ping wrote:
I have changed my locale and this is my output now:
-- locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_GB.UTF-8 LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_MY.UTF-8 LC_COLLATE="en_GB.UTF-8" LC_MONETARY=en_MY.UTF-8 LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL= --
I still can't see the CJK characters in my terminal, and /etc/sysconfig Editor describes for LC_ALL,
"This variable will override all LC-variables!! Again, ROOT_USES_LANG must be set to "yes", if an effect on the superuser account is desired."
So that would make my LC_TIME setting meaningless, presumably. Or am I mistaken?
I tried uncompressing a file with Japanese characters, and it failed. So it appears if it doesn't work in the terminal, it won't work for unrar or unzip. Any wild solutions to try?
On Monday, 29 May 2017 18:17:14 +08 Chan Ju Ping wrote:
I tried uncompressing a file with Japanese characters, and it failed. So it appears if it doesn't work in the terminal, it won't work for unrar or unzip.
Any wild solutions to try?
So this is really confusing, but it somehow worked. I opened up the Language module in YaST and set it to EN_US. This is my locale output now: -- LANG=en_GB.UTF-8 LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL=en_GB.UTF-8 -- And I can now view and unrar files containing Japanese characters. So, bug report?
On 05/29/2017 06:17 AM, Chan Ju Ping wrote:
On Monday, 29 May 2017 5:34:21 PM +08 Chan Ju Ping wrote:
I have changed my locale and this is my output now:
-- locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_GB.UTF-8 LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_MY.UTF-8 LC_COLLATE="en_GB.UTF-8" LC_MONETARY=en_MY.UTF-8 LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL= --
I still can't see the CJK characters in my terminal, and /etc/sysconfig Editor describes for LC_ALL,
"This variable will override all LC-variables!! Again, ROOT_USES_LANG must be set to "yes", if an effect on the superuser account is desired."
So that would make my LC_TIME setting meaningless, presumably. Or am I mistaken? I tried uncompressing a file with Japanese characters, and it failed. So it appears if it doesn't work in the terminal, it won't work for unrar or unzip.
Any wild solutions to try?
It does not seem as though en_MY is available for Linux. If you look in /usr/lib/locale/ for en_*, there are 37 entries, but no en_MY. Also, there is no output from: 'locale --all-locales | egrep -i en_MY' It is recognized as a language combination in CLDR (http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_in...), but does not seem to be defined on Linux (Opensuse or Red Hat, so probably glibc). It does seem to be defined for Windows (https://www.microsoft.com/resources/msdn/goglobal/default.mspx?submitted=4409&OS=Windows%207). This may be why we can find references to it in Perl and Python. Since en_MY is does not seem to be defined for Linux, it would probably be best to just drop it. Or, if you are ambitious, you could write a definition, as described on https://sourceware.org/glibc/wiki/Locales#Charsets. Patrick -- Patrick McNeil Université de Montréal - TI Pav. Roger-Gaudry, X-205 Téléphone: (514) 343-6111, poste 5247 Courriel: patrick.mcneil@umontreal.ca Télécopie/FAX: (514) 343-2155 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
* Patrick McNeil
On 05/29/2017 06:17 AM, Chan Ju Ping wrote:
On Monday, 29 May 2017 5:34:21 PM +08 Chan Ju Ping wrote:
I have changed my locale and this is my output now:
-- locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_GB.UTF-8 LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_MY.UTF-8 LC_COLLATE="en_GB.UTF-8" LC_MONETARY=en_MY.UTF-8 LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL= --
I still can't see the CJK characters in my terminal, and /etc/sysconfig Editor describes for LC_ALL,
"This variable will override all LC-variables!! Again, ROOT_USES_LANG must be set to "yes", if an effect on the superuser account is desired."
So that would make my LC_TIME setting meaningless, presumably. Or am I mistaken? I tried uncompressing a file with Japanese characters, and it failed. So it appears if it doesn't work in the terminal, it won't work for unrar or unzip.
Any wild solutions to try?
It does not seem as though en_MY is available for Linux. If you look in /usr/lib/locale/ for en_*, there are 37 entries, but no en_MY. Also, there is no output from: 'locale --all-locales | egrep -i en_MY'
locate --all-locales |egrep -i en_MY locate: unrecognized option '--all-locales' locate en_MY /usr/lib/python2.7/site-packages/babel/locale-data/en_MY.dat rpm -qf /usr/lib/python2.7/site-packages/babel/locale-data/en_MY.dat python2-Babel-2.4.0-2.1.noarch
It is recognized as a language combination in CLDR (http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_in...), but does not seem to be defined on Linux (Opensuse or Red Hat, so probably glibc).
it is on Tw -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Registered Linux User #207535 @ http://linuxcounter.net Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet freenode -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 05/29/2017 08:04 AM, Patrick Shanahan wrote:
* Patrick McNeil
[05-29-17 07:16]: On 05/29/2017 06:17 AM, Chan Ju Ping wrote:
On Monday, 29 May 2017 5:34:21 PM +08 Chan Ju Ping wrote:
I have changed my locale and this is my output now:
-- locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_GB.UTF-8 LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_MY.UTF-8 LC_COLLATE="en_GB.UTF-8" LC_MONETARY=en_MY.UTF-8 LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL= --
I still can't see the CJK characters in my terminal, and /etc/sysconfig Editor describes for LC_ALL,
"This variable will override all LC-variables!! Again, ROOT_USES_LANG must be set to "yes", if an effect on the superuser account is desired."
So that would make my LC_TIME setting meaningless, presumably. Or am I mistaken? I tried uncompressing a file with Japanese characters, and it failed. So it appears if it doesn't work in the terminal, it won't work for unrar or unzip.
Any wild solutions to try? It does not seem as though en_MY is available for Linux. If you look in /usr/lib/locale/ for en_*, there are 37 entries, but no en_MY. Also, there is no output from: 'locale --all-locales | egrep -i en_MY' locate --all-locales |egrep -i en_MY locate: unrecognized option '--all-locales'
locate en_MY /usr/lib/python2.7/site-packages/babel/locale-data/en_MY.dat
rpm -qf /usr/lib/python2.7/site-packages/babel/locale-data/en_MY.dat python2-Babel-2.4.0-2.1.noarch
It is recognized as a language combination in CLDR (http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_in...), but does not seem to be defined on Linux (Opensuse or Red Hat, so probably glibc). it is on Tw
I think there was a small typo in your command line: 'locate --all-loc...' should be 'locale...'. The idea is to see what locales the locale DB thinks are available rather than looking for actual files. On TW (and on Opensuse), some locale files are provided by particular programs. On TW "20170522", with Python and Perl Date::Time::Locale installed, I have en_MY locale provided by: python2-Babel-2.4.0-2.1.noarch perl-DateTime-Locale-1.050000-1.1.noarch I believe this means that you could use this locale from a Python 2 script/program or from a Perl script/program (via Date::Time::Locale), but I don't think the OS will see it. If it did, the 'locale --all-locales' command should see it. And if the OS does not see the locale, very few programs (like terminals) would be able to use it. -- Patrick McNeil Université de Montréal - TI Pav. Roger-Gaudry, X-205 Téléphone: (514) 343-6111, poste 5247 Courriel: patrick.mcneil@umontreal.ca Télécopie/FAX: (514) 343-2155 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
* Patrick McNeil
On 05/29/2017 08:04 AM, Patrick Shanahan wrote:
* Patrick McNeil
[05-29-17 07:16]: [...]
It does not seem as though en_MY is available for Linux. If you look in /usr/lib/locale/ for en_*, there are 37 entries, but no en_MY. Also, there is no output from: 'locale --all-locales | egrep -i en_MY' locate --all-locales |egrep -i en_MY locate: unrecognized option '--all-locales'
locate en_MY /usr/lib/python2.7/site-packages/babel/locale-data/en_MY.dat
rpm -qf /usr/lib/python2.7/site-packages/babel/locale-data/en_MY.dat python2-Babel-2.4.0-2.1.noarch
It is recognized as a language combination in CLDR (http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_in...), but does not seem to be defined on Linux (Opensuse or Red Hat, so probably glibc). it is on Tw
I think there was a small typo in your command line: 'locate --all-loc...' should be 'locale...'. The idea is to see what locales the locale DB thinks are available rather than looking for actual files.
locate --all-locale |egrep -i en_MY locate: unrecognized option '--all-locale'
On TW (and on Opensuse), some locale files are provided by particular programs. On TW "20170522", with Python and Perl Date::Time::Locale installed, I have en_MY locale provided by:
python2-Babel-2.4.0-2.1.noarch perl-DateTime-Locale-1.050000-1.1.noarch
I believe this means that you could use this locale from a Python 2 script/program or from a Perl script/program (via Date::Time::Locale), but I don't think the OS will see it. If it did, the 'locale --all-locales' command should see it.
And if the OS does not see the locale, very few programs (like terminals) would be able to use it.
you are correct, I mis-read/mis-typed and I get the same missing en_MY. tks, -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Registered Linux User #207535 @ http://linuxcounter.net Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet freenode -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, 29 May 2017 18:17:14 +0800 Chan Ju Ping wrote:
I tried uncompressing a file with Japanese characters, and it failed. So it appears if it doesn't work in the terminal, it won't work for unrar or unzip.
Any wild solutions to try?
Install unzip-rcc instead of unzip. -- WBR Kyrill
participants (7)
-
Carlos E. R.
-
Chan Ju Ping
-
Jan Engelhardt
-
Kyrill Detinov
-
Patrick McNeil
-
Patrick Shanahan
-
Peter Suetterlin