Discussion:
[AUCTeX] wrong encoding for *output* buffer
Werner LEMBERG
2018-01-26 07:57:10 UTC
Permalink
[git commit 4b66b9f60e3ce4a552bd4f3230b659347add1446]


Folks,


I have the following in my master document.

%%% Local Variables:
%%% coding: utf-8
%%% mode: latex
%%% TeX-engine: xetex
%%% TeX-PDF-mode: t
%%% TeX-master: t
%%% End:

However, the `*xxx output* buffer (showing the compilation results of
xetex) is in latin-1 encoding – I guess this is due to

(set-language-environment "latin-1")
(setq-default buffer-file-coding-system 'latin-1)

in my `~/.emacs' file...

What must I do so that auctex obeys the local encoding variables in my
master document, thus overriding `~/.emacs'? [AFAICS, xetex *can* be
forced to use non-UTF8 legacy encodings, so relying on `TeX-engine' is
probably not sufficient.]


Werner
Ikumi Keita
2018-01-26 13:33:59 UTC
Permalink
Hi Werner,
Post by Werner LEMBERG
[git commit 4b66b9f60e3ce4a552bd4f3230b659347add1446]
Folks,
I have the following in my master document.
%%% coding: utf-8
%%% mode: latex
%%% TeX-engine: xetex
%%% TeX-PDF-mode: t
%%% TeX-master: t
However, the `*xxx output* buffer (showing the compilation results of
xetex) is in latin-1 encoding – I guess this is due to
(set-language-environment "latin-1")
(setq-default buffer-file-coding-system 'latin-1)
in my `~/.emacs' file...
What must I do so that auctex obeys the local encoding variables in my
master document, thus overriding `~/.emacs'? [AFAICS, xetex *can* be
forced to use non-UTF8 legacy encodings, so relying on `TeX-engine' is
probably not sufficient.]
AUCTeX determines the coding system for reading from the output of
asynchronous TeX process by the function
`TeX-adjust-process-coding-system'. It basically obeys the coding
system of the command buffer, i.e., the buffer in which `C-c C-c' or
something like it is issued. So I expect that it usually works well.

Does the symptom you described occur for all xelatex documents or for
some particular documents? If latter, one possible guess is that
(1) The document is devided in multiple files and
(2) The sub file of the master file has no multibyte characters, i.e.,
contains only ASCII characters and
(3) The command buffer is the one for that ASCII sub file.
In that case, the local value of `buffer-file-coding-system' of the sub
file buffer can be a kind of `undecided-*', and emacs eventually uses
the value of `default-process-coding-system' for reading from the output
of xelatex. With your settings, `default-process-coding-system' is
(iso-latin-1-* . iso-latin-1-*), so the utf-8 characters in the output
are not decoded correctly.

If this guess is right, putting the
%%% coding: utf-8
cookie in your ASCII sub file would do the trick. This makes the local
value of `buffer-file-coding-system' to be the specified value, so the
utf-8 output from xelatex would be decoded correctly.

If this is not the case, I'm grateful if you provide sample xelatex
documents to examine.

Regards,
Ikumi Keita
Werner LEMBERG
2018-01-31 12:31:31 UTC
Permalink
Post by Ikumi Keita
AUCTeX determines the coding system for reading from the output of
asynchronous TeX process by the function
`TeX-adjust-process-coding-system'. It basically obeys the coding
system of the command buffer, i.e., the buffer in which `C-c C-c' or
something like it is issued. So I expect that it usually works well.
Thanks for the explanation. Right now, I can't repeat the issue.

The only thing which looks strange to me is that, the mode line in the
`*xxx output* buffer starts with

1:**

(but the UTF-8 contents of the log file as emitted by XeTeX is
correctly displayed).
Post by Ikumi Keita
[...] putting the
%%% coding: utf-8
cookie in your ASCII sub file would do the trick. This makes the
local value of `buffer-file-coding-system' to be the specified
value, so the utf-8 output from xelatex would be decoded correctly.
I will try that as soon as I encounter the problem again.
Post by Ikumi Keita
If this is not the case, I'm grateful if you provide sample xelatex
documents to examine.
Will do!


Werner
Ikumi Keita
2018-02-01 05:29:18 UTC
Permalink
Hi Werner,
Post by Werner LEMBERG
The only thing which looks strange to me is that, the mode line in the
`*xxx output* buffer starts with
1:**
(but the UTF-8 contents of the log file as emitted by XeTeX is
correctly displayed).
In that case, latin-1 is the coding system for saving that buffer and
utf-8 is for decoding the output from external process. Of several
coding systems, the one for saving the buffer is the most important for
most cases, so usually only that one is displayed in the mode line.

Emacs assigns several coding systems separately according to their
purposes: saving the buffer, decoding the output from process, encoding
the input to process, decoding the keyboard input from text terminal,
encoding the screen output to text terminal... You can see three of
them in the form like

EEE:**

when you do "emacs -nw" on text terminal. See the doc string of the
variable `mode-line-mule-info' for detail.

More detailed information about coding systems associated with the
buffer can be displayed via C-h C or M-x describe-coding-system.

Best,
Ikumi Keita
Werner LEMBERG
2018-02-01 06:13:12 UTC
Permalink
Post by Ikumi Keita
Post by Werner LEMBERG
The only thing which looks strange to me is that, the mode line in
the `*xxx output* buffer starts with
1:**
(but the UTF-8 contents of the log file as emitted by XeTeX is
correctly displayed).
In that case, latin-1 is the coding system for saving that buffer
and utf-8 is for decoding the output from external process. Of
several coding systems, the one for saving the buffer is the most
important for most cases, so usually only that one is displayed in
the mode line.
Thanks for the explanation, which I already knew :-)

My question was probably not precise enough: I wonder why auctex
doesn't set the buffer encoding also (derived from the master file's
local variables), given that auctex itself generates the *xxx output*
buffer.


Werner
David Kastrup
2018-02-01 08:56:05 UTC
Permalink
Post by Werner LEMBERG
Post by Ikumi Keita
Post by Werner LEMBERG
The only thing which looks strange to me is that, the mode line in
the `*xxx output* buffer starts with
1:**
(but the UTF-8 contents of the log file as emitted by XeTeX is
correctly displayed).
In that case, latin-1 is the coding system for saving that buffer
and utf-8 is for decoding the output from external process. Of
several coding systems, the one for saving the buffer is the most
important for most cases, so usually only that one is displayed in
the mode line.
Thanks for the explanation, which I already knew :-)
My question was probably not precise enough: I wonder why auctex
doesn't set the buffer encoding also (derived from the master file's
local variables), given that auctex itself generates the *xxx output*
buffer.
TeX is an 8-bit program wrapping its output, including output containing
quotes of the source as error locators, every 79bytes, irrespective of
character boundaries? It also may encode some bytes in the middle of a
character as ^^xx . Interpreting its output thus relies on the output
actually being interpretable in the given encoding. We might have a bit
more leeway here with XEmacs out of the race (XEmacs' utf-8 encoding and
reencoding was not round-trippable).
--
David Kastrup
jfbu
2018-02-01 12:28:24 UTC
Permalink
Post by Werner LEMBERG
My question was probably not precise enough: I wonder why auctex
doesn't set the buffer encoding also (derived from the master
file's local variables), given that auctex itself generates the
*xxx output* buffer.
TeX is an 8-bit program wrapping its output, [...]
I was specifically asking for XeTeX.
[...] including output containing quotes of the source as error
locators, every 79bytes, irrespective of character boundaries? It
also may encode some bytes in the middle of a character as ^^xx.
IIRC, XeTeX is going to fix that (or already has) so that UTF-8
characters won't be broken in the middle of the sequence.
even pdflatex does with option -8bit
---- file test.tex
\documentclass{article}
\begin{document}
\typeout{éàù}
\end{document}
----
Lines are still getting wrapped after 79 bytes.
--
David Kastrup
yes, besides my test file is just crap

I wanted to do it but with 7bit ascii control characters rather
once they are given suitable catcodes

apologies

Jean-François
Ikumi Keita
2018-02-01 09:33:09 UTC
Permalink
Hi Werner,
Post by Werner LEMBERG
My question was probably not precise enough: I wonder why auctex
doesn't set the buffer encoding also (derived from the master file's
local variables), given that auctex itself generates the *xxx output*
buffer.
Because the output buffer *xxx output* rarely needs to be saved in a
file. That buffer is only used for receiving outputs from TeX (and some
related programs such as makeindex), so AUCTeX just sets the coding
system for the process communication explicitly, leaving
`buffer-file-coding-system' of *xxx output* untouched.

Best,
Ikumi Keita
Werner LEMBERG
2018-02-01 11:56:28 UTC
Permalink
Post by Ikumi Keita
Post by Werner LEMBERG
My question was probably not precise enough: I wonder why auctex
doesn't set the buffer encoding also (derived from the master
file's local variables), given that auctex itself generates the
*xxx output* buffer.
Because the output buffer *xxx output* rarely needs to be saved in a
file. That buffer is only used for receiving outputs from TeX (and
some related programs such as makeindex), so AUCTeX just sets the
coding system for the process communication explicitly, leaving
`buffer-file-coding-system' of *xxx output* untouched.
OK, thanks.


Werner

Loading...