[AUCTeX] A problem with \parencite and fill-paragraph

Discussion:

a***@cock.li

2017-03-30 12:19:36 UTC

Hello everybody,

In auctex-mode, given

--8<---------------cut here---------------start------------->8---
asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf xxxx \parencite{asdf}
--8<---------------cut here---------------end--------------->8---

after calling `fill-paragraph` I get the following

--8<---------------cut here---------------start------------->8---
asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf
xxxx \parencite{asdf}
--8<---------------cut here---------------end--------------->8---

(note that xxxx gets wrapped to the new line even though it should stay
in the same line)

Here's the result when I use \cite instead of \parencite:

--8<---------------cut here---------------start------------->8---
asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf xxxx
\cite{asdf}
--8<---------------cut here---------------end--------------->8---

I guess it has something to do with the initial letters of \parencite
which are the same as \par and \par is treated in a special way by
fill-paragraph.

Is there any solution (or workaround) to this problem?

Thanks for help in advance!

Cheers,
Adam

PS I'm using Emacs version 25.1.1 and AucTeX version 11.90.0 from elpa.

Ikumi Keita

2017-03-30 15:48:53 UTC

Permalink

Hi Adam and all,

Post by a***@cock.li
In auctex-mode, given
asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf xxxx \parencite{asdf}
after calling `fill-paragraph` I get the following
asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf
xxxx \parencite{asdf}
(note that xxxx gets wrapped to the new line even though it should stay
in the same line)

Indeed.

Post by a***@cock.li
asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf xxxx
\cite{asdf}
I guess it has something to do with the initial letters of \parencite
which are the same as \par and \par is treated in a special way by
fill-paragraph.

I haven't found solution to this yet, but managed to find out why this
happens. Adam guessed correctly. The function call chain on emacs 25.1
is as follows:
fill-paragraph
-> LaTeX-fill-paragraph
-> LaTeX-fill-region-as-paragraph
-> LaTeX-fill-region-as-para-do
-> LaTeX-fill-move-to-break-point
-> fill-move-to-break-point
-> fill-nobreak-p

The last `fill-nobreak-p' is called with the point being just before
"\parencite" and returns t according to `(looking-at paragraph-start)'
because the regexp `paragraph-start' here matches "\par" of
"\parencite". So the space just before "\parencite" isn't considered as
line-breakable position.

The reason that `paragraph-start' matches "\par" of "\parencite" lies in
the following part of latex.el:
----------------------------------------------------------------------
(defvar LaTeX-paragraph-commands-internal
'("[" "]" ; display math
"appendix" "begin" "caption" "chapter" "end" "include" "includeonly"
"label" "maketitle" "noindent" "par" "paragraph" "part" "section"
"subsection" "subsubsection" "tableofcontents" "newpage" "clearpage")
"Internal list of LaTeX macros that should have their own line.")

(defun LaTeX-paragraph-commands-regexp-make ()
"Return a regular expression matching defined paragraph commands."
(concat (regexp-quote TeX-esc) "\$"
(regexp-opt (append LaTeX-paragraph-commands
LaTeX-paragraph-commands-internal)) "\$"))

...

(defvar LaTeX-paragraph-commands-regexp (LaTeX-paragraph-commands-regexp-make)
"Regular expression matching LaTeX macros that should have their own line.")

(defun LaTeX-set-paragraph-start ()
"Set `paragraph-start'."
(setq paragraph-start
(concat
"[ \t]*%*[ \t]*\$"
LaTeX-paragraph-commands-regexp "\\|"
(regexp-quote TeX-esc) "\\(" LaTeX-item-regexp "\$\\|"
"\\$\\$\\|" ; Plain TeX display math (Some people actually
; use this with LaTeX. Yuck.)
"$\\)")))
----------------------------------------------------------------------
[summary]
Since the list `LaTeX-paragraph-commands-internal' includes "par",
`paragraph-start' is set to the regexp which matches the every string
"\par" in the buffer, even if it is the first four letters of
"\parencite".

Post by a***@cock.li
Is there any solution (or workaround) to this problem?

A breif test suggests that this particular situation would be solved
by changing the definition of `LaTeX-paragraph-commands-regexp-make' to:
----------------------------------------------------------------------
(defun LaTeX-paragraph-commands-regexp-make ()
"Return a regular expression matching defined paragraph commands."
(concat (regexp-quote TeX-esc) "\$"
(regexp-opt (append LaTeX-paragraph-commands
;; add here \b
LaTeX-paragraph-commands-internal)) "\$\\b"))
----------------------------------------------------------------------
However, it seems that this change breaks the filling of the constructs
involving "\[" and "\]".

Regards,
Ikumi Keita

Arash Esbati

2017-03-30 20:01:36 UTC

Permalink

Post by Ikumi Keita
[summary]
Since the list `LaTeX-paragraph-commands-internal' includes "par",
`paragraph-start' is set to the regexp which matches the every string
"\par" in the buffer, even if it is the first four letters of
"\parencite".

Hi Keita,

your observation is correct (needless to say, as always!).

Post by Ikumi Keita

Post by a***@cock.li
Is there any solution (or workaround) to this problem?

A breif test suggests that this particular situation would be solved
----------------------------------------------------------------------
(defun LaTeX-paragraph-commands-regexp-make ()
"Return a regular expression matching defined paragraph commands."
(concat (regexp-quote TeX-esc) "\$"
(regexp-opt (append LaTeX-paragraph-commands
;; add here \b
LaTeX-paragraph-commands-internal)) "\$\\b"))
----------------------------------------------------------------------
However, it seems that this change breaks the filling of the constructs
involving "\[" and "\]".

You need \B to match the end of a "not-word". What do you think about a
change like this?

--8<---------------cut here---------------start------------->8---
\documentclass{article}
\begin{document}

\begin{verbatim}
Eval:
(defun LaTeX-paragraph-commands-regexp-make ()
"Return a regular expression matching defined paragraph commands."
(let (cmds symbs)
(dolist (mac (append LaTeX-paragraph-commands
LaTeX-paragraph-commands-internal))
(if (string-match "[a-zA-Z]" mac)
(push mac cmds)
(push mac symbs)))
(concat (regexp-quote TeX-esc) "\$"
(regexp-opt cmds)
"\\b"
"\\|"
(regexp-opt symbs)
"\\B"
"\$")))

eval-defun as well:
(defvar LaTeX-paragraph-commands-regexp
(LaTeX-paragraph-commands-regexp-make))

Run:
(LaTeX-set-paragraph-start)
\end{verbatim}

Fill: %
asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf xxxx \parencite{asdf}

asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf asdf xxxx \cite{asdf}

\[
a+b
\]
\parencite{asdf}
\end{document}
--8<---------------cut here---------------end--------------->8---

Best, Arash

Mosè Giordano

2017-03-30 20:32:29 UTC

Permalink

Hi all,

first of all, thanks to Adam for the report and the correct guess
about the culprit!

I think we all came up with similar solutions. I was concerned about
performance of Arash's fix, but according to `benchmark-run' the
overhead isn't large (slowdown of the order of few percents) and
shouldn't be noticeable in normal situations.

Arash, if you're going to install your patch, please add also a test.
I know that writing tests is one of the most annoying part of coding,
yet it's very important because it ensures that we won't break
something again in the future.

Bye,
Mosè

Arash Esbati

2017-03-31 07:38:23 UTC

Permalink

Post by MosÃ¨ Giordano
I think we all came up with similar solutions. I was concerned about
performance of Arash's fix, but according to `benchmark-run' the
overhead isn't large (slowdown of the order of few percents) and
shouldn't be noticeable in normal situations.

Hi Mosè,

thanks for looking at it. I think I can tweak the function to get
faster by increasing `gc-cons-threshold' and changing the string-match
condition. Here is what I get with /1 as my original suggestion and /2
the new one. Do you also want to give it a roll?

--8<---------------cut here---------------start------------->8---
(defun LaTeX-paragraph-commands-regexp-make/1 ()
"Return a regular expression matching defined paragraph commands."
(let (cmds symbs)
(dolist (mac (append LaTeX-paragraph-commands
LaTeX-paragraph-commands-internal))
(if (string-match "[a-zA-Z]" mac)
(push mac cmds)
(push mac symbs)))
(concat (regexp-quote TeX-esc) "\$"
(regexp-opt cmds)
"\\b"
"\\|"
(regexp-opt symbs)
"\\B"
"\$")))

(defun LaTeX-paragraph-commands-regexp-make/2 ()
"Return a regular expression matching defined paragraph commands."
(let ((gc-cons-threshold most-positive-fixnum)
cmds symbs)
(dolist (mac (append LaTeX-paragraph-commands
LaTeX-paragraph-commands-internal))
(if (string-match "[^a-zA-Z]" mac)
(push mac symbs)
(push mac cmds)))
(concat (regexp-quote TeX-esc) "\$"
(regexp-opt cmds)
"\\b"
"\\|"
(regexp-opt symbs)
"\\B"
"\$")))

(benchmark-run 10000 (LaTeX-paragraph-commands-regexp-make))
(benchmark-run 10000 (LaTeX-paragraph-commands-regexp-make/1))
(benchmark-run 10000 (LaTeX-paragraph-commands-regexp-make/2))

Results:
Original: (1.8593714000000001 135 1.421876200000007)
/1: (1.9218724 133 1.421879200000003)
/2 (w/o gc-tweak): (1.8749965 133 1.406252999999996)

Original: (1.8281209 133 1.3437572999999858)
/1: (1.9062489999999999 132 1.4375054999999968)
/2 (w/ gc-tweak): (1.8437484 131 1.3750041000000017)
--8<---------------cut here---------------end--------------->8---

Post by MosÃ¨ Giordano
Arash, if you're going to install your patch, please add also a test.
I know that writing tests is one of the most annoying part of coding,
yet it's very important because it ensures that we won't break
something again in the future.

I will wait a day to see if Keita has another suggestion, then will
install a patch. I'm easy with adding a test, will think about a proper
test case.

Best, Arash

Mosè Giordano

2017-03-31 12:11:43 UTC

Permalink

Hi Arash,

Post by Arash Esbati

I don't think it's necessary to touch `gc-cons-threshold' here. When
I expressed concerns about performance I was thinking that the
function was called much more often, instead it's probably used once
per visited document, or something like that.

Bye,
Mosè

Ikumi Keita

2017-03-31 07:49:05 UTC

Permalink

Hi Arash,

Post by Arash Esbati
You need \B to match the end of a "not-word". What do you think about a
change like this?

I think your approach is basically OK. (I vaguely remember
font-latex.el uses similar methods for building regexp.)

Let me make some comments on this code.

Post by Arash Esbati
\documentclass{article}
\begin{document}
\begin{verbatim}
(defun LaTeX-paragraph-commands-regexp-make ()
"Return a regular expression matching defined paragraph commands."
(let (cmds symbs)
(dolist (mac (append LaTeX-paragraph-commands
LaTeX-paragraph-commands-internal))
(if (string-match "[a-zA-Z]" mac)
(push mac cmds)
(push mac symbs)))
(concat (regexp-quote TeX-esc) "\$"
(regexp-opt cmds)
"\\b"
"\\|"
(regexp-opt symbs)
"\\B"
"\$")))

(1) I'm wondering whether this "\B" is really necessary. With this
"\B", "\[" followed by alphanumeric letters without separating spaces is
not considered as a start of a paragraph. Forgive me if I'm saying
something stupid, but this fills
----------------------------------------------------------------------
abc def ghi jkl
\[xyz=123\]
ABC DEF GHI JKL
----------------------------------------------------------------------
to
----------------------------------------------------------------------
abc def ghi jkl \[xyz=123\] ABC DEF GHI JKL
----------------------------------------------------------------------
when typing M-q with the point on the equation, while the original code
fills to
----------------------------------------------------------------------
abc def ghi jkl
\[xyz=123\] ABC DEF GHI JKL
----------------------------------------------------------------------
. If the line "\\B" is omitted from the code above, the results are the
same. Am I missing something?

(2) Considering the compatibility with older emacsen, the usage of
`regexp-opt' seems to require some more tweaks. Without optional second
argument, `regexp-opt' of xemacs 21.4 does not enclose the result with
"$" and "$".

So I propose to change the `concat' part like this:
(concat (regexp-quote TeX-esc) "\$?:"
(regexp-opt cmds "\\(?:")
"\\b"
"\\|"
(regexp-opt symbs "\\(?:")
"\$")

Regards,
Ikumi Keita

Arash Esbati

2017-04-01 12:01:25 UTC

Permalink

Post by Ikumi Keita

Post by Arash Esbati
You need \B to match the end of a "not-word". What do you think about a
change like this?

I think your approach is basically OK. (I vaguely remember
font-latex.el uses similar methods for building regexp.)
(1) I'm wondering whether this "\B" is really necessary.

Hi Keita,

no, \B is not really necessary, I just wanted to say that appending \b
to the complete regexp would not work for \[; it would be \B or
nothing. And dropping \B is more versatile.

Post by Ikumi Keita
(2) Considering the compatibility with older emacsen, the usage of
`regexp-opt' seems to require some more tweaks. Without optional second
argument, `regexp-opt' of xemacs 21.4 does not enclose the result with
"$" and "$".
(concat (regexp-quote TeX-esc) "\$?:"
(regexp-opt cmds "\\(?:")
"\\b"
"\\|"
(regexp-opt symbs "\\(?:")
"\$")

Thanks for raising the compat issue. I will install this (dopping
"\$?:" for the second regexp):

--8<---------------cut here---------------start------------->8---
(defun LaTeX-paragraph-commands-regexp-make ()
"Return a regular expression matching defined paragraph commands.
Regexp part containing TeX control words is postfixed with `\\b'
to avoid ambiguities (e.g. \\par vs. \\parencite)."
(let (cmds symbs)
(dolist (mac (append LaTeX-paragraph-commands
LaTeX-paragraph-commands-internal))
(if (string-match "[^a-zA-Z]" mac)
(push mac symbs)
(push mac cmds)))
(concat (regexp-quote TeX-esc) "\\(?:"
(regexp-opt cmds "\\(?:")
"\\b"
"\\|"
(regexp-opt symbs)
"\$")))
--8<---------------cut here---------------end--------------->8---

Best, Arash

Ikumi Keita

2017-03-31 12:06:09 UTC

Permalink

Sorry, I should have thought more flexibly.

Post by Ikumi Keita
(concat (regexp-quote TeX-esc) "\$?:"
(regexp-opt cmds "\\(?:")
"\\b"
"\\|"
(regexp-opt symbs "\\(?:")
"\$")

The second `regexp-opt' does not require the optional argument. Just
(concat (regexp-quote TeX-esc) "\$?:"
(regexp-opt cmds "\\(?:")
"\\b"
"\\|"
(regexp-opt symbs)
"\$")
would be sufficient.

Best,
Ikumi Keita

Ikumi Keita

2017-04-01 13:31:55 UTC

Permalink

Hi Arash,

Post by Arash Esbati
no, \B is not really necessary, I just wanted to say that appending \b
to the complete regexp would not work for \[; it would be \B or
nothing. And dropping \B is more versatile.

Thank you for clarifying. That agrees with my thought.

Post by Arash Esbati
(defun LaTeX-paragraph-commands-regexp-make ()
"Return a regular expression matching defined paragraph commands.
Regexp part containing TeX control words is postfixed with `\\b'
to avoid ambiguities (e.g. \\par vs. \\parencite)."
(let (cmds symbs)
(dolist (mac (append LaTeX-paragraph-commands
LaTeX-paragraph-commands-internal))
(if (string-match "[^a-zA-Z]" mac)
(push mac symbs)
(push mac cmds)))
(concat (regexp-quote TeX-esc) "\$?:"
(regexp-opt cmds "\\(?:")
"\\b"
"\\|"
(regexp-opt symbs)
"\$")))

Now I fully agree with this proposal.

Best,
Ikumi Keita