The SEO purist may argue why anyone would ever want to use PDF content on a website for search purposes. The reality, however, is that many businesses have a lot of PDF assets. These may include sell sheets, brochures, white papers, technical briefs, etc. The purist simply says why not convert these to html? In the real world, not everyone has the time, budget, and expertise to do that. There may also be other “marketing” reasons. Perhaps a company wants its prospects to experience the content along with all the other brand elements inherent in its print materials. Whatever the reason, there are lots of PDFs available on the web, and you can optimize PDFs to get high-ranking search results. Here are some tips on the right way to do it.
1. Make sure your PDFs are text based. Okay, this first one is pretty obvious. However, we still find companies whose materials were designed in an image-based program. When the PDF is made using these programs, the PDF is an image; there is no text for the search engines to read.
2. Complete the document properties. It seems like the vast majority of PDFs are without specified document properties, the most important of which is the Title. The Title property, if present, almost invariably represents the words that will be displayed as the heading of the search result. It’s the equivalent of the html title tag. If you don’t complete the Title property, the search engine is going to generate a title from the PDF’s content, and it may not be what you would choose. We’ve all seen some pretty goofy looking titles to search results associated with PDFs. Not only do they look ridiculous, but they probably won’t get clicked. In the full version of Acrobat, go to File>Document Properties to specify the Title.
There are other document properties (meta data) you can supply, including Author, Subject, and Keywords, but presently these appear to have little search-related affect. It would be nice if Subject acted as the meta description to be displayed under the heading of the search result, but I haven’t seen this to be true. For now, however, I’d complete the Subject property as if it were a meta description. Perhaps in the future search engines will treat it as such.
3. Optimize the copy. Copy in text-based PDFs is no different than web-page copy. Optimize it.
4. Build links into PDFs. Make sure you include links in your PDFs, and pay attention to the anchor text used. Search engines do recognize these links. Not very often, but sometimes you’ll find backlinks in PDFs. Their limited occurrence, however, is likely related to the fact that most people don’t put links into PDFs; most people treat PDFs as static print documents. In addition to including links in PDFs for search-related purposes, there’s also a good business reason. Often, PDFs are passed along to others via email. Accordingly, a reader may be viewing the PDF in isolation (i.e., not associated with your website.) By placing links into PDFs, you give these readers an easy way to click back into your site, where you can further influence them.
5. Pay attention to the version. While search engines do “read” and index PDFs, search engines’ capabilities tend to lag new versions of Acrobat. Although Acrobat 8 is out, for now you should save your PDFs as version 1.6 (Acrobat 7) or lower to ensure search engines can index the content.
Not only is saving PDFs at a lower version good for the search engines, it’s also good for users. Not everyone has the latest versions of Acrobat Reader. Accordingly, I’d recommend saving PDFs as version 1.5 or lower. This way it will be good for search engines and most readers.
6. Optimize the file size for search. Don’t post a huge PDF for download. Not only is this annoying and unnecessary for site visitors, it’s also burdensome for the search engines. If it’s too big, the search engines may abandon the PDF before even getting access to its content. Using the full version of Acrobat, select Advanced>PDF Optimizer to “right-size” the document.
You may also want to enable the “Optimize for Fast Web View” option in the Preferences>General Settings panel. This allows the PDF to be “loaded” a page at a time, rather than waiting for the whole PDF to download.
7. Pay attention to placement. If you bury links to PDFs deep within your site’s file structure, they’re less likely to get indexed. If you want to use PDFs for high-ranking search results, links to those PDFs should be on web pages closer to the root level of the site’s file structure.
8. Influence meta descriptions for PDFs. For web pages, the meta description is what is displayed under the title in a search result. With PDFs, the search engines search the copy of the PDF and select something to display. While with PDFs you have less control of what is displayed as the description to the search result, you can still influence this. The best way to do this is to make sure that you have a good, optimized sentence or two near the start of your PDF. If these sentences correspond to the search term used, it’s likely that these sentences are the ones that will be displayed as the description under the search result’s heading.
9. Specify the reading order. As noted above, search engines search the copy of the PDF and select something to display as a description under the search result’s heading. Depending on how the reading order of your PDF is specified, this may lead the search engine to select some pretty strange stuff to display.
In a previous column, Organic Landing Page: A Case Study, I noted a search result for “transit seating.” That search result is noted below:
Admittedly, this is not a very enticing description, and it’s not likely to get clicked even if it ranks highly in the search results. Why did Google select this text to display? Because it’s the first thing Google read in the PDF.
Every PDF has a reading order. Similar to properly optimized web pages, you want to make sure that valuable content is read first. How do you know the reading order? With the PDF open and while using the full version of Acrobat, select Advanced>Accessibility>Add Tags to Document. Then select Advanced>Accessibility>Touch Up Reading Order. Then the reading order of the PDF will be displayed.
You can see in the image above that the reading order of the transit seating PDF does not start with valuable content. Rather, many extraneous items are “read” before the valuable content. That’s why Google displayed what it did in the search result. If you want PDFs to be optimized for search, make sure you understand the reading order of the PDF and use the Touch Up Reading Order tool to manage what the search engine will read first.
10. Tag your PDFs You can also add tags to your PDFs, similar to html tags. Again, with the PDF open and while using the full version of Acrobat, select Advanced>Accessibility>Add Tags to Document. Acrobat will give you a document report and recommend things you may want to consider changing. You’ll have the ability to tag headings, alternate text for images, etc.
11. Pay attention. Every time you open a PDF, make even a small change, and save it once again, major unseen things may change. The reading order may change automatically. You may inadvertently save it as a higher version. It may get saved using the default size setting instead of a properly optimized size. If you’re going to further optimize existing PDFs, may sure you check all of these things before posting a new version of the PDF.