logo
Apache Lounge
Webmasters

 


About

Forum Index Downloads Search Register Log in  RSS Apache Lounge
 


Keep Server Online

If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.

or

Bitcoin

A donation makes a contribution towards the costs, the time and effort that's going in this site and building.

Thank You! Steffen

Apache Lounge is not sponsored.

Your donations will help to keep this site alive and well, and continuing building binaries.



Apache 2.4 - FCGI - Perl - Broken Encoding

 
Post new topic   Reply to topic    Apache Forum Index -> Apache



View previous topic :: View next topic  
Author Message
j_d



Joined: 25 Oct 2018
Posts: 2

PostPosted: Thu 25 Oct '18 14:25    Post subject: Apache 2.4 - FCGI - Perl - Broken Encoding Reply with quote

My setup:

Apache 2.4.29 running on a Linux machine, with fcgid loaded, and an index.pl which is running. The script itself uses:

Code:

use utf8;
binmode(STDOUT,':utf8');


I also have a MySQL server running, version 5.7.24. Under personal loses I've not only managed to get rid of most traces of latin1 in my database, but actually replace it with the proper MySQL UTF-8 encoding:

Code:

mysql> SHOW variables LIKE 'character%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+


In my Perl script I'm using the DBI module in order to connect to my database, and because I need proper UTF-8 support in that one (lc('ẞ') needs to equal 'ß') I'm passing the option "mysql_enable_utf8 => 1" to the connection method.

Let's first see if my script returns the proper encoding headers:

Code:

$ perl index.pl
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
<html>
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>weiß</body>
</html>


'weiß' is selected from the database here - so that looks good. Now let's print out the content on the browser:

Code:

wei�


Well, that was unexpected. For some reason my browser displays mojibake even though I make sure to tell the browser that my character set is UTF-8 (and it's actually recognised correctly, I just checked).

Hey, maybe Apache is doing something weird with my response? Let's find out what it does with characters that aren't in western 8-bit encodings, like ... Japanese?

Code:

$ perl index.pl
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
<html>
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body><span id="文字化け">weiß</span></body>
</html>


OK, nothing too unexpected on my UTF-8 console. Let's have a look at it in the browser:

Code:

weiß


Whoah! Just by introducing non-ISO-8859-1 characters to my HTML output my browser suddenly displays my German umlauts correctly. And since I cannot reproduce the issue with my script, couldn't it be Apache itself who's looking at the response, saying "if there's only ISO-8859-1 characters in it I'm going to assume ISO-8859-1 encoding and re-encode it even if it doesn't fit at all"?

It should be noted that "�" is pretty much the kind of mojibake you get when you try to print an UTF-8 character on a device that expects ISO-8859-1 characters. I already set "AddDefaultCharset utf-8" to my /etc/apache2/apache2.conf, but that didn't fix the issue. Now I could just add Japanese characters to my body at all times in order to force Apache to not try to act smart and failing at it miserably - but c'mon, that can't seriously be the solution?!
Back to top
j_d



Joined: 25 Oct 2018
Posts: 2

PostPosted: Tue 30 Oct '18 12:04    Post subject: Reply with quote

Well, I found out what the exact problem was. Due to the XS part of the FCGI module using tied handles, as this post suggests, binmode actually did jack-effing-shit - and Jack is out of town. As such the code would:

- check if the internal UTF-8 flag of the value it was supposed to print was set
- try to "downgrade" the string to ISO-8859-1
- and gob some retarded error message in my logs if it failed to do so miserably.

The error message in particular was:

Code:

Use of wide characters in FCGI::Stream::PRINT is deprecated and will stop wprking in a future version of FCGI (sic!)


, and that let me to this commit. The offending code:

Code:
if (DO_UTF8(ST(n)) && !sv_utf8_downgrade(ST(n), 1) && ckWARN_d(WARN_UTF8))


This code checks if the internal UTF-8 flag is set, tries to downgrade it to ISO-8859-1, and complains like a crybaby if it can't do so, like when there are characters in it that do not belong to ISO-8859-1 (like Japanese characters).

My theory: UTF-8 came out in 1993, and Perl gloriously overslept its coming until 2000. Then version 5.6 came out and added some, but not proper support for UTF-8. Two years later version 5.8 came out, with some actual UTF-8 support. The FCGI module was written in 2003, and the programmer didn't give two f's about different encodings and just wanted to prevent people to do sane things. In 2010 some Chinese Wanna-Be-Programmer wanted to earn some Open-Source cred by adding useless crap to various projects so that they'd be able to say with a straight face that they've been active in the open source community, for instance when asked so for a job interview. As such he modified the warning instead of getting rid of it entirely.

What a fuckup.

My solution now is to channel all outputs to my own printing routine, where the UTF-8 flag is cleared, the string is printed, and the flag is reset, if need be:

Code:

sub my_print($)
{
        my($string) = @_;
        my $is_utf8 = is_utf8(${$string});
        _utf8_off(${$string}) if($is_utf8);
        print ${$string};
        _utf8_on(${$string}) if($is_utf8);
}


And the people who're going to suggest to encode my strings instead of setting and clearing flags can go burn in hell for all I care. The thread can be closed. I'm done.
Back to top


Post new topic   Reply to topic    Apache Forum Index -> Apache
Page 1 of 1