WordPress’s wpautop() function messed up my semantic image captions (and how I fixed it)

Although I use WordPress for my blog and am a fan of Matt Mullenweg, I find that WordPress has a bloated core and does a lot of stuff I don’t really need or want it to do. Many people boycott WordPress without really using it, and that’s understandable, because it can be a nightmare sometimes.

I am a fan of getting WordPress to do things it wasn’t meant to do. By default, a lot of its functions aren’t semantic, one being the img_caption_shortcode, the way image captions are displayed when using shortcodes.

Default HTML for WordPress image caption

<div id="attachment_54905" style="width: 1210px" class="wp-caption aligncenter">
   <img src="/wp-content/uploads/Kawaii-Box-18.jpg" alt="All the contents of the box" width="1200" height="800">
   <p class="wp-caption-text">All the contents of the box</p>
</div>

We can improve this. We can use more semantic elements such as figure and figcaption to give more meaning to these elements.

Improved WordPress caption

I can probably refine this code now as I have been using it since about May 2014, but this is the more semantic captioned image HTML that I wrote. I’ve named the elements using better class names and used HTML5 elements as I mentioned earlier.

function caption_shortcode($val, $attr, $content = null) {
   extract(shortcode_atts(array('id'=> '','align'=> 'aligncenter','width'=> '','caption' => ''), $attr));

   if ( 1 > (int) $width || empty($caption) )return $val;
   $capid = '';

   if ( $id ) {
      $id = esc_attr($id);
      $capid = 'id="figcaption_'. $id . '" ';
      $id = 'id="' . $id . '" aria-labelledby="figcaption_' . $id . '" ';
   }

   $figure = '<figure ' . $id . 'class="post-figure figure-' . esc_attr($align) . '" style="width: '. (int) $width . 'px">';
   $figure .= do_shortcode( $content ) . '<figcaption ' . $capid . 'class="post-figcaption">' . $caption . '</figcaption></figure>';

   return $figure;
}

add_filter( 'img_caption_shortcode', 'caption_shortcode', 10, 3 );

The dilemma with WordPress default formatting

WordPress core files can do some annoying things to our code, no matter how much we fill our functions file with our beautiful gunk. Some of these things are actually intended to make the common (wo)man’s life easier, but can be a nightmare for developers.

The wpautop() function

I love to write poetry. A lot of poetry relies on the formatting and punctuation of the piece – em dashes or en dashes, commas, quotation marks, indentation, and most importantly – line breaks.

This is something that the WordPress core has chosen to butcher in excerpts by way of the wpautop() function, a function that filters the content of WordPress posts and changes double line breaks into paragraph tags. This function unfortunately doesn’t do just that. It cleans up a lot of other things and essentially has a lot of conditionals that cause it to put line breaks elsewhere.

NB: I actually haven’t figured out exactly why yet, but this seems to only be a bother in post excerpts.

The one spot where this function wreaks havoc is my beautiful, semantically correct figure element I have written for captioned images. Although my code above shows the figcaption element directly after the img element, WordPress likes to insert an arbitrary <br> just after the image.

An example of this annoying inserted line break.
An example of this annoying inserted line break.

Obviously, this drove me nuts.

I spent a long time trying to figure out how to fix the problem, before I gave up and used the following in my functions.php file to strip the line break from my post excerpts.

// Strip 
from excerpt function remove_br_excerpt( $content ) { return wpautop( $content, false ); } remove_filter( 'the_excerpt', 'wpautop' ); add_filter( 'the_excerpt', 'remove_br_excerpt' );

Obviously, it affected my poetry too.

It worked perfectly for my captioned images. However, because it affected all excerpt content – not just those captioned images – it would also affect my poetry excerpts, so three lines could turn into a single long one.

Finding wpautop() and $pee in formatting.php

I found the function in formatting.php, a file in WordPress core. It looks a little something like this (at least half of it does). It’s a behemoth of a function, though I’m sure worse exists. I also don’t know why they chose to call the variable ‘pee’ apart from referring to the paragraph tag.

Finding the wpautop() function in formatting.php (a core file)
Finding the wpautop() function in formatting.php (a core file)

The function is preceded by this comment:

A group of regex replaces used to identify text formatted with newlines and
replace double line-breaks with HTML paragraph tags. The remaining line-breaks
after conversion become <<br />> tags, unless $br is set to '0' or 'false'.

I found that the figcaption element was included in the set of rules that added a line break before certain specified elements.

A solution to stop WordPress from adding line breaks where the heck I don’t want it to

I had a solution for my issue with the lines of poetry and the line breaks in my captioned images: I could use wpautop() comfortably, as long as I could somehow remove that line break before figcaption.

Updating core files is not recommended, even though I did not have the issue as soon as I removed figcaption from the variable $allblocks. Ugh.

Just override the default.

It was quite simple, and thanks to Nick’s logical brain we could fix the issue. I just had to filter the wpautop from my post excerpt, and then throw in my own version.

I copied the code from formatting.php to declare a new function, and removed the offending code so that my captioned images would not be affected.

function dodge_default_formatting( $pee, $br = true ) {
   // original content of wpautop from formatting.php,
   // minus the code I did not want
}

remove_filter( 'the_excerpt', 'wpautop' );
add_filter( 'the_excerpt', 'dodge_default_formatting', 10, 3 );

It seems pretty straightfoward, but to go through that entire process of figuring out where the mysterious <br> tag was coming from was painful. I can blame WordPress, but complaining about it will not stop the passing judgement from others, nor will it speed up my perusing into Ghost or other possible platforms. :P

I hope this post has helped anyone else who may have had the same issue.

Fun fact: This blog post has the longest title of any other posts on my blog. Yay! :D

Comments on this post

I absolutely love your web development posts! I’m learning so much through them.

I’ve never used image captions on WordPress because I tried once using the default system and it messed everything up, so this is pretty interesting and handy.

I love WordPress and I think some bloggers think I’m lying when I go on about how great it is, but sometimes it’s the most frustrating thing ever.

I’m glad you are learning! :D

I would totally use the function I included to replace the default with what you want. Reach out if you need any help. :)

I agree! I feel a bit gross telling people I use WordPress because there is a bit of taboo around it. Yet, we can’t deny that over 20% of all websites on the internet use it! It’s just not the greatest option for us web developers when we really want to do things our way. It has become more beginner-friendly over the years and unfortunately some performance sacrifices have had to be made. :(

This is so helpful! The more I use images in posts, the more I’ve been wondering why there was a line break after all my images. It was annoying me, and I kept thinking it was something I did.

I am assuming, however, you do manual upgrades, though?

It took me a while to find the issue, but I’m glad I did, and I am also glad that you now have an answer to your question!

I don’t upgrade manually, I upgrade automatically. I updated the core files once, and because I had done that, my changes were overridden when I upgraded. I do everything I want in my theme’s functions file now, so I don’t touch the core at all. This means that when I upgrade I won’t have any issues.

I only ever upgraded manually once, because I wanted to know what it was like. I’ve never had my website break over an automatic upgrade. :D

Thank you for this, great example.

I used it with the `the_content` hook and with the `the_excerpt` hook as per your example.