The True World Politics Conversation has moved.
Join the Conversation on:
https://twpter.com
The Independent, Free Speech Social Media Platform
Part of a series on Information Warfare
25 Dec 2020
Video Title:
Tutorial for archiving web pages as dynamic PDF files - part of a series on Information Warfare
Runtime: 00:14:33 | Original Video Source:
http://www.trueworldpolitics.com/video-pages/tutorial-for-archiving-web-pages-as-dynamic-pdf-files.php
Downloadable mp4 Mirror:
http://www.trueworldpolitics.com/video-pages/images/tutorial-for-archiving-web-pages-as-dynamic-pdf-files-part-of-a-series-on-information-warfare.mp4
TWP Archive Date: 25 Dec 2020
-
Introduction:
When we as members
of the public want to hold our institutions and leaders accountable
for their actions and public statements, or we simply want to preserve
important information from the web before it disappears, or is "disappeared".
We need to preserve and store those web pages offline in
an easily transportable and accessible format that can be saved
to disk or USB drive as a regular standalone file, that can be e-mailed
and even be printed to paper if needed.
Simply saving an
internet shortcut to the web page or copying and pasting a link
into an e-mail is not good enough, because of the increasingly ephemeral
nature of the internet.
This video tutorial will show you
how to save web pages as PDF documents to make effective offline
archives of web pages.
These saved PDF files will have copy-
able and searchable text, and have preserved embedded hyperlinks
that can be clicked from the PDF document, to open the corresponding
web page in your web browser.
We will go over best practices
and learn how to scale the web image so that it fits best on the
PDF page, and how to remove unwanted elements from the page like
advertising and pop up windows, and also add comments, stamps and
our own hyperlinks to the page.
System requirements: This
process is designed for, and tested to work with Windows based desktop
computers.
Browser requirements: This process works best
with Google Chrome Browser variants like ''Dissenter'' or ''Brave''
that have built in ad blockers. However,
Google Chrome version
87 or newer, will work fine for most web pages. The use of Mozilla
Firefox is not recommended because it will not preserve hyperlinks.
PDF file editor requirements: This process works best with ''Nitro
Pro'' PDF editor, but Adobe Acrobat X Pro or newer could also be
used.
Step one:
Selecting the correct
parameters and adjusting the image size
Locate the main browser
menu and select print, or press Ctrl P to bring up the print dialog
window.
Now select the print destination from the drop down
menu.
Your selections may vary in the print destination menu
depending on what printers and software you have installed on your
computer.
Select Save as PDF. This selection seems to work
better than the others when it comes to preserving hyperlinks and
some embedded page metadata. It also automatically generates a file
name for the PDF based on the web page title.
For pages,
select All.
For Layout, Portrait is recommended because most
web pages are designed to display best this way, additionally this
page orientation will yield the best results when printing to paper.
Next click on the More Settings menu to open it.
For
paper size select Letter for American Standard 8 ½ inch by 11 inch
paper; or select whatever paper size is most common where you operate.
For pages per sheet, leave this set to one; as it maintains
proper proportions of the web page. This can always be changed in
the future when re-printing these documents.
For Margins,
always leave this set at Default to preserve the standard margins
around the edges of the printed page that all printers are known
to be able to print. If you change this setting, some printers may
not be able to print the page on one sheet of paper correctly.
For Scale select Custom, this allows you to scale the image
from 10% to 200% in order to optimize page display for a particular
page, and also allows you to manipulate where page breaks occur,
in order to maintain paragraphs and graphics positioning. Many web
pages will display adequately at 100%, but adjusting this setting
from 65% to 125% is often required to get the best results on some
web pages. It is not advisable to scale pages below 60%, doing so
often makes the page difficult to read when printed to paper.
For Options:
Headers and footers check box. Always,
always, keep the Headers and footers box checked. This option saves
the date, page title, and page URL to the headers and footers of
each printed page. This information is absolutely crucial for maintaining
good historical archives and identifying sources of information.
For Background graphics check box: in most circumstances you
would keep this box checked to preserve the background graphics
of a page. However, some web pages are poorly design and utilize
graphics that are not formatted to the page correctly, causing them
to obscure text or other page elements. Un-checking this box can
eliminate this problem.
Now that you are familiar with the
page settings and parameters of the print dialog window, you can
make adjustments and observe the changes to the output in the print
preview window, before committing to saving the PDF file.
Best practices for PDF file names.
When naming PDF archive
files you should avoid all punctuation, spaces, and special characters,
and only use lower case letters and numbers. By preserving this
UNIX based naming convention, it makes it easier to search for this
document on your own computer, and also makes it easier to publish
and search on the web.
The easiest and best way to select
a UNIX formatted file name for a PDF archive file, is by using the
original server file name of the web page that you printed it from.
Simply go to the address bar of your browser and locate the last
trailing slash in the URL and copy everything until the first dot.
Usually this dot will be followed by one of the common page types;
HTML, PHP, or it may not have a dot at the end.
Best Practices
for creating PDF files that are more likely to be authenticated
as true captures of a web page.
When you are trying to archive
information, you need to think like an archaeologist in the way
that a dig site is preserved and everything is cataloged. Just saving
the target information itself, without context makes it nearly impossible
to authenticate. You need to preserve as much of the page and it's
elements as possible. The meta data saved in decorative images,
advertisements and other non targeted page elements often gets encoded
into the PDF file when it is saved, this meta data from unwanted
page elements can be used to corroborate the authenticity of the
PDF file as an accurate capture of the web page at a particular
time. You should make every effort to preserve the entire page.
However, if the non targeted elements of the web page greatly
obscure or confuse the target information, it may be desirable to
eliminate them.
There are a few tricks you can use to accomplish
this.
We mentioned, unchecking the background graphics checkbox
before.
Another effective trick is Selected printing
While pressing and holding the left mouse button, Select and
highlight the contiguous text and images that you want saved in
your PDF file. When this is complete, move the mouse cursor over
any part of the highlighted area, click the right mouse button and
select Print.
After the print dialog window opens you can
preview what will be saved to PDF and make adjustments.
Step two:
Post production editing.
Earlier we showed you how to remove unwanted page elements during
printing.
Now we will show you post production techniques in
Nitro Pro PDF Editor for removing unwanted page elements, that obscure
our target information. Note that Adobe Acrobat Pro has similar
PDF editing capabilities.
Open your PDF file and Identify
page elements that you want to remove.
Click on the Nitro Pro
Home tab and click on Edit or or press Ctrl E to enter editing mode.
Then simply select the page element by clicking on it with
your left mouse button, then delete it by pressing the delete key
on your keyboard, or right clicking and selecting delete.
If you make a mistake, you can press Ctrl Z to un-do the last
change.
After removing all the unwanted page elements, remember
to re-save your PDF document before closing.
Step three:
Adding a custom stamp
with website URL, highlighting text and other post production PDF
document edits.
Note: Use these editing techniques sparingly.
The more changes that you make to a PDF document, the more embedded
meta data you alter or destroy; which could render your document
unable to be authenticated as a true copy of a web page for archival
purposes.
Open your PDF file in Nitro Pro PDF Editor and
Click on the Review tab.
Identify page elements that you want
to add to the document.
We will open the stamp dialog window
and select a custom logo stamp, which had already been configured
into this computer's installation of Nitro Pro PDF Editor.
We will insert it onto the page, and size it to fit in the desired
location in the upper right corner of the first page of the document.
Next we will click on the Page Layout tab and select link. We
then position the mouse cursor, click and hold the left mouse button
to drag out a target area for the link.
After releasing the
left mouse button, the link target is indicated and the Create Link
dialog box automatically opens.
For Link appearance, Link
type select: Invisible Rectangle, from the drop down menu.
For Link action: select Open a webpage, from the radio buttons.
Click Next to proceed to the Edit Web Link dialog box.
Type or paste a full, correct website URL into the box and click
the OK button.
Now open the Home tab and select the hand
tool or press Ctrl H to exit editing mode.
Use the mouse
cursor to verify the link insertion by hovering over the link target
area.
Highlighting text:
Click on the Review tab and
select Highlight.
Position the mouse cursor near the text
you want to highlight then click and hold the left mouse button
as you select the text that you wish to highlight and release the
left mouse button.
When highlighting is complete, press Ctrl
H to exit editing mode.
Remember to save your edited PDF
document again to preserve the changes in the saved document.
Software Recommendations:
Note, we
have not received any compensation from the makers of these software
products, we recommend them solely in the interest of properly equipping
other interested parties to make quality archives of web pages and
the information they contain, in order to hold our leaders and institutions
accountable for their actions.
We recommend Google Chrome
Browser Variants like Dissenter and Brave that automatically block
ads, and have excellent web page to PDF conversion capabilities.
https://brave.com/
https://dissenter.com/#download
https://www.google.com/chrome/
However, we recommend viewing a web page with different browsers
such as MS Internet Explorer, as well as Mozilla Firefox.
There have been documented cases of Google Chrome variant browsers
censoring web pages in real time. While this may be a an errant
function of their ad blockers it must be noted, and they can not
be completely trusted to display a web page accurately.
Additional
Recommendations for archiving information from the web:
The
ability to archive video from the internet in standard file formats
like MP4, so that they can be stored and distributed offline is
also very important to keeping the Powers That Be, accountable.
We also recommend:
4k Video Downloader
by Open Media, excellent for quickly and easily downloading videos
from Youtube and many other video sharing platforms.
https://www.4kdownload.com/products/product-videodownloader
Camtasia by TechSmith, if you can view a video on your computer,
Camtasia can download and save it. However, it does take some practice
to set up and use.
https://www.techsmith.com/video-editor.html
Thank You for watching this tutorial on how to effectively archive
web pages as PDF documents.
Good luck and Good Hunting, for
the truth.