Quantcast

Clarification on encoding update

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Clarification on encoding update

Michael Osipov
Hi folks,

the default encoding has been changed recently to UTF-8 [1], though I
still think that it requires some more description from a user's point
of view and more code cleanup

* All templates (!) (anything else?) from ImputStream are read with that
new input encoding, unless other stated
* Isn't output.econding obsolete? All merge methods require a Writer anyway.
* UTF-8 is defined several times in RuntimeConstants and
StringResource*. Is that intended?

[1]
http://velocity.apache.org/engine/2.0/upgrading.html#dependencies-changes

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarification on encoding update

Claude Brisson-2
Hi.

On 09/12/2016 15:34, Michael Osipov wrote:
> Hi folks,
>
> the default encoding has been changed recently to UTF-8 [1], though I
> still think that it requires some more description from a user's point
> of view and more code cleanup

Feel free to help!

>
> * All templates (!) (anything else?) from ImputStream are read with
> that new input encoding, unless other stated

That's all, except a resource can also be a static text file, as in
#include('file').

> * Isn't output.econding obsolete? All merge methods require a Writer
> anyway.

I removed it at first, and then put it back in because it is used by the
view tools, to set the HTTP response content type charset (as in:
"text/html; charset=UTF-8"). J2EE doc states that this charset will be
used when created the response writer. I know it's a bit of a pity to
have it defined in the engine while it is used in the view tools, but:
  - it is how it was done before
  - it makes some sense to have it defined here, symmetrically to
input.encoding, even if it's not used directly by the engine
  - it's still useful to distinguish input and output encodings, it let
someone serve UTF-8 files from ISO-8859-1 templates, for instance

Why the (!) ?

> * UTF-8 is defined several times in RuntimeConstants and
> StringResource*. Is that intended?

Nope. I fix that.


   Claude

>
> [1]
> http://velocity.apache.org/engine/2.0/upgrading.html#dependencies-changes
>
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarification on encoding update

Michael Osipov
Am 2016-12-09 um 16:10 schrieb Claude Brisson:

> Hi.
>
> On 09/12/2016 15:34, Michael Osipov wrote:
>> Hi folks,
>>
>> the default encoding has been changed recently to UTF-8 [1], though I
>> still think that it requires some more description from a user's point
>> of view and more code cleanup
>
> Feel free to help!
>
>>
>> * All templates (!) (anything else?) from ImputStream are read with
>> that new input encoding, unless other stated
>
> That's all, except a resource can also be a static text file, as in
> #include('file').

Ah, and I expect input.encoding to be respected here as well?

>> * Isn't output.econding obsolete? All merge methods require a Writer
>> anyway.
>
> I removed it at first, and then put it back in because it is used by the
> view tools, to set the HTTP response content type charset (as in:
> "text/html; charset=UTF-8"). J2EE doc states that this charset will be
> used when created the response writer. I know it's a bit of a pity to
> have it defined in the engine while it is used in the view tools, but:
>  - it is how it was done before
>  - it makes some sense to have it defined here, symmetrically to
> input.encoding, even if it's not used directly by the engine
>  - it's still useful to distinguish input and output encodings, it let
> someone serve UTF-8 files from ISO-8859-1 templates, for instance

Are you really certain about this? I just checked the Tools 2.0 code and
VelocityView refers to
public static final String DEFAULT_PROPERTIES_PATH =
"/org/apache/velocity/tools/view/velocity.properties";

which is likely to be an overlay to the default velocity.properties.

Moreover, VelocityView defines UTF-8 already as default, even changed by
you.

I am somewhat confused here. If the Java code of Engine and Tools
contain default encoding UTF-8, why have another duplicate in the
properties file as well?

By defintion of ServletRespose#getWriter() [2] the writer is always
guaranteed to have an encoding set.

> Why the (!) ?

To indicate that is solely applies to templates, it doesn't. I simply
forgot about #include.

I am quite certain that Tools need some code cleanup pretty much the
same great cleanup you did to Engine.

Michael

[2]
https://tomcat.apache.org/tomcat-8.5-doc/servletapi/javax/servlet/ServletResponse.html#getWriter()

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarification on encoding update

Claude Brisson-2
On 09/12/2016 16:39, Michael Osipov wrote:
[...]
>>>
>>> * All templates (!) (anything else?) from ImputStream are read with
>>> that new input encoding, unless other stated
>>
>> That's all, except a resource can also be a static text file, as in
>> #include('file').
>
> Ah, and I expect input.encoding to be respected here as well?
>
#include() uses the encoding of the template it appears in.

>>> * Isn't output.econding obsolete? All merge methods require a Writer
>>> anyway.
>>
>> I removed it at first, and then put it back in because it is used by the
>> view tools, to set the HTTP response content type charset (as in:
>> "text/html; charset=UTF-8"). J2EE doc states that this charset will be
>> used when created the response writer. I know it's a bit of a pity to
>> have it defined in the engine while it is used in the view tools, but:
>>  - it is how it was done before
>>  - it makes some sense to have it defined here, symmetrically to
>> input.encoding, even if it's not used directly by the engine
>>  - it's still useful to distinguish input and output encodings, it let
>> someone serve UTF-8 files from ISO-8859-1 templates, for instance
>
> Are you really certain about this?

About what, more specifically? That the encoding can be changed? Pretty
sure, yes. But it could be the subject of a test case.

> I just checked the Tools 2.0 code and VelocityView refers to
> public static final String DEFAULT_PROPERTIES_PATH =
> "/org/apache/velocity/tools/view/velocity.properties";
>
> which is likely to be an overlay to the default velocity.properties.
>
And you mean that we could put out.encoding here? I'm not opposed to it.

> Moreover, VelocityView defines UTF-8 already as default, even changed
> by you.
>
Which default are you refering to?

velocity-tools-view/src/main/java/org/apache/velocity/tools/view/VelocityViewServlet.java:263:
request.setCharacterEncoding(getVelocityProperty("input.encoding",
"UTF-8"));

velocity-tools-view/src/main/java/org/apache/velocity/tools/view/VelocityView.java:110:
        public static final String DEFAULT_OUTPUT_ENCODING = "UTF-8";

The second is needed, the first one could be changed to
o.a.v.RuntimeConstants.ENCODING_DEFAULT, is that what you mean?

> I am somewhat confused here. If the Java code of Engine and Tools
> contain default encoding UTF-8, why have another duplicate in the
> properties file as well?

I agree it's redundant to have UTF-8 default both specified in the
properties file and in the code. Well, the code's default is a default
default... I don't know what usages will be made of the library, if the
default properties file will always be available or not... Redundancy is
one of the roots of evil, but this one is the kind of harmless
redundancy I can live with.

>
> By defintion of ServletRespose#getWriter() [2] the writer is always
> guaranteed to have an encoding set.
>
Which relies on the encoding given by getEncoding(), which in turn
relies on the encoding set by setEncoding() OR by setContentType(),
precisely the one we give.
>> Why the (!) ?
>
> To indicate that is solely applies to templates, it doesn't. I simply
> forgot about #include.
>
> I am quite certain that Tools need some code cleanup pretty much the
> same great cleanup you did to Engine.
>
In the lack of patches, remarks like yours are of course welcome,
because I am pretty sure I cannot afford a full code review.


   Claude



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarification on encoding update

Michael Osipov
Am 2016-12-09 um 18:26 schrieb Claude Brisson:

> On 09/12/2016 16:39, Michael Osipov wrote:
> [...]
>>>>
>>>> * All templates (!) (anything else?) from ImputStream are read with
>>>> that new input encoding, unless other stated
>>>
>>> That's all, except a resource can also be a static text file, as in
>>> #include('file').
>>
>> Ah, and I expect input.encoding to be respected here as well?
>>
> #include() uses the encoding of the template it appears in.

Perfect!

>>>> * Isn't output.econding obsolete? All merge methods require a Writer
>>>> anyway.
>>>
>>> I removed it at first, and then put it back in because it is used by the
>>> view tools, to set the HTTP response content type charset (as in:
>>> "text/html; charset=UTF-8"). J2EE doc states that this charset will be
>>> used when created the response writer. I know it's a bit of a pity to
>>> have it defined in the engine while it is used in the view tools, but:
>>>  - it is how it was done before
>>>  - it makes some sense to have it defined here, symmetrically to
>>> input.encoding, even if it's not used directly by the engine
>>>  - it's still useful to distinguish input and output encodings, it let
>>> someone serve UTF-8 files from ISO-8859-1 templates, for instance
>>
>> Are you really certain about this?
>
> About what, more specifically? That the encoding can be changed? Pretty
> sure, yes. But it could be the subject of a test case.

No, that only View Tools really need it. They will stick to default if
none ist provided. Put it in a different perspective, Engine does not
need to provide the ouput encoding of View Tools becaue it does not know
anything about it.

>> I just checked the Tools 2.0 code and VelocityView refers to
>> public static final String DEFAULT_PROPERTIES_PATH =
>> "/org/apache/velocity/tools/view/velocity.properties";
>>
>> which is likely to be an overlay to the default velocity.properties.
>>
> And you mean that we could put out.encoding here? I'm not opposed to it.

Yes, since View Tools use it and Engine does not.

>> Moreover, VelocityView defines UTF-8 already as default, even changed
>> by you.
>>
> Which default are you refering to?
>
> velocity-tools-view/src/main/java/org/apache/velocity/tools/view/VelocityViewServlet.java:263:
>
> request.setCharacterEncoding(getVelocityProperty("input.encoding",
> "UTF-8"));

INPUT_ENCODING instead of "input.encoding".

> velocity-tools-view/src/main/java/org/apache/velocity/tools/view/VelocityView.java:110:
>
>        public static final String DEFAULT_OUTPUT_ENCODING = "UTF-8";
>
> The second is needed, the first one could be changed to
> o.a.v.RuntimeConstants.ENCODING_DEFAULT, is that what you mean?

Exactly.

>> I am somewhat confused here. If the Java code of Engine and Tools
>> contain default encoding UTF-8, why have another duplicate in the
>> properties file as well?
>
> I agree it's redundant to have UTF-8 default both specified in the
> properties file and in the code. Well, the code's default is a default
> default... I don't know what usages will be made of the library, if the
> default properties file will always be available or not... Redundancy is
> one of the roots of evil, but this one is the kind of harmless
> redundancy I can live with.

Agreed.

>>> Why the (!) ?
>>
>> To indicate that is solely applies to templates, it doesn't. I simply
>> forgot about #include.
>>
>> I am quite certain that Tools need some code cleanup pretty much the
>> same great cleanup you did to Engine.
>>
> In the lack of patches, remarks like yours are of course welcome,
> because I am pretty sure I cannot afford a full code review.

I will keep that in mind. For starters, I am thinking how I can properly
test Velocity Engine 2.0-SNAPSHOT/Velocity Tools 3.0-SNAPSHOT with the
entire Maven Doxia chain.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...