Flickr to WordPress Part 2: replacing old image references

Almost exactly a year ago, I shared my tool and process for moving a Flickr-hosted photo collection to a WordPress-powered photo website. That tool has been used by a few folks now and I'm so glad it was helpful.

The thing that I hadn't done yet, but now have and detail in this follow-up post, is to create an automated way to clean up references to Flickr-hosted photos that exist in an existing WordPress website or blog. Without this critical final step, one could have a lot of historical content that still references Flickr-hosted photos instead of the version being hosted on your shiny new WordPress-powered photo site.

So here's how I did it. As with my previous post, this information is geared toward a technical audience that is comfortable with the command line and possibly modifying PHP code to suit their own purposes.

Method

I thought about a couple of different ways I could handle this "find and replace" operation.

With 13 years of blog posts, many of which contain references to Flickr photos in some form, making the changes manually was not an option.

I thought about continuing to iterate on my existing PHP command line tool that takes the exported Flickr dataset and generates a WordPress import, which would mean creating some kind of lookup data file where a find-replace command would use the Flickr photo URL to find the new WordPress-hosted photo URL. But when I saw all of the different ways I had embedded Flickr photos in my blog post content:

  • on a line by itself for oembed rendering
  • as simple links to a Flickr photo page
  • as <a> plus <img> tag groups that displayed the images full size inline with a link
  • as <img> tag groups that displayed the images at various smaller sizes, aligned left or right

I realized that I would need to be able to lookup the proper image URL for each display scenario. And given that my WordPress-powered photo site generated different image sizes (and that some of these had changed since the original data migration), that was not going to be simple. No one-size-fits-all substitution would work.

The good news is that WordPress easily supports building a custom REST API endpoint that would support a dynamic lookup of the information I needed on the photo site, for use on any site where I was finding-replacing content. Once I realized I could decouple those operations, it was clear how to proceed.

Creating a "Find by Flickr URL" API Endpoint

The first step, then, was to create a REST API endpoint on my WordPress-powered photo site that would allow me to specify the original Flickr photo URL and find the related WordPress post that had been generated during the migration process.

If you look at the code of the original migration tool, you'll note that for each WordPress post it creates, it adds a post meta field _flickr_photopage where it stores the URL of the Flickr-hosted photo. That usually looks something like https://www.flickr.com/photos/myflickrusername/123456789/. We can use that post meta field to do a simple lookup of the equivalent WordPress post object.

Since I want to be able to retrieve an image URL at a specific size so that I'm not embedding full size, large image files in posts that only need, say, the 300 pixel wide version, I also needed to accept width and height parameters, and then do a lookup of the related attachment file URL in WordPress.

Here's the class that I created to do all of this. If you use it, you'll need to customize a few things, including the Flickr username.

With that class loaded into my WordPress photo site's theme or in a plugin, now I have access to this kind of API call:

https://my-wp-photo-website.com/wp-json/myphotos/v1/find-by-flickr-url/?flickr-url=https://www.flickr.com/photos/myflickrusername/123456789/

And the returned JSON response of that kind of API call would look something like this:

{
     result: "found",
     post_id: 2425,
     permalink: "https://photos.chrishardie.com/2014/08/updesk-setup/",
     thumbnail_id: 5424,
     thumbnail_url: "https://photos.chrishardie.com/wp-content/uploads/2014/08/15130177147_75d885dc2a_o.jpg",
     thumbnail_width: 2448,
     thumbnail_height: 3264
}

Nice! Now I have the permalink of the photo post that replaces the original Flickr photo page, the URL of the image media file I can use in an <img> tag, and some other meta info if I need it.

The Find-and-Replace WP CLI Plugin

Now, it's time to use that API endpoint in a big find-replace operation.

The clear choice was to make a WP CLI command that could be used to run this on the command line, where I could log and review warnings and errors and have better memory management.

In creating this command, I used this general approach:

  1. Get all the posts in the WordPress database that mentioned a Flickr URL with my username in it
  2. For each post found, look for specific kinds of Flickr link and image references that need to be replaced
  3. Extract the original Flickr photo page URL from those references
  4. Use the API endpoint created above to look up the corresponding information on the photo site
  5. Update the post content with the information retrieved from the API

It sounds fairly simple, but I ran into several challenges and opportunities for optimization:

  • Not only were there references to the Flickr URL structure above, there were variations to consider such as the internal version of my Flickr user account ID that was in wider use years ago, or the flic.kr shortened version of their domain.
  • The API lookups could generate a lot of activity on my photo site, so I added some caching since those responses should rarely be changing.
  • I found some photos that I had apparently set to "private" or "contacts only" on Flickr but had left referenced in my blog posts, so I had to manually address those.
  • My Flickr-to-WordPress migration tool didn't handle Flickr "sets" (although it preserved and stored the data needed to handle the), so I had to redirect those references.
  • I had to make sure not to replace Flickr references to other people's photos.
  • Flickr varied its use of http versus https in different embed code it generated over the years.

In the end, I had a working plugin that could do a dry-run operation to see what it was going to change, and then do a "for real" run to actually update the posts as stored in the database.

$ wp flickr-fixer fix-refs --dry_run=false
Getting all posts containing Flickr references...
Found 203 posts to process.
Success: 621 replacement(s) made across all posts

With the API lookup cache primed, on my site it only took a minute or two to run. YAY!

You can see the final plugin code here.

(If you use it, you'll need to find/replace my Flickr username and a few other hardcoded references accordingly.)

Lessons Learned

When I think about the time I put in to first creating the original Flickr-to-WordPress migration tool, and then the time put into this content cleanup tool, it turns out it was a non-trivial project. But it always felt like the right thing to do, since once I was moved fully into WordPress I would have absolute control over my photo collection without depending on the changing services or business model of Flickr or anyone else.

It also highlights a few important lessons for migrations and owning your data online:

  • Try to be consistent in the ways you reference third-party tools and services in your content or workflows. If you have a bunch of variations and inconsistencies in place, any future move to another tool is going to be that much more painful.
  • Hold on to as much metadata as you can. You never know when it's going to come in handy.
  • When tackling big migrations, break hard problems up into smaller, slightly easier problems.
  • Document your thinking and your work along the way. It's too easy to get stuck going in circles on longer projects if you forget where you've already been.
  • APIs are magical and user-facing services that don't have them should be avoided at all costs.

I think this concludes my 14.5 years of being a Flickr user. I've canceled my Pro subscription and plan to delete my account in the weeks ahead.

If you find any of these tools useful, or if you have a different approach, I'd love to hear about it.

Slides and links from my php[world] talk

Yesterday I gave a talk at php[world] 2019 about "Tips and Tools for Gluing Together the Open Web." Below are the slides, text and links from my talk. Where applicable, the slide images link to relevant websites and code.

A few months ago, I was helping a local non-profit organization with their new website. We were almost set to launch, but they had one more request: can we get the next five upcoming events we've put up on our organization's Facebook page to show up in a list in the website's footer? I initially thought that surely there would be a simple way to do this, but as I looked in to it, the short answer was no, no you can't.

There's an embeddable Facebook widget that can display events, but you don't get a lot of control over appearance and it's far from a helpful list that can be scanned quickly. This seemed like such a basic request - connecting Facebook page events to a website - and yet as I poured through Google and Stack Overflow search results all I could find were frustrated users and abandoned tools that had tried and failed to do this one thing.

Continue reading "Slides and links from my php[world] talk"

Moving photos from Flickr to WordPress

If you're ready to move your own Flickr photo collection to WordPress and feel comfortable on the command line, you can go straight to the Flickr to WordPress tool I built and get started.

Update on Feb 18, 2020: You can now also learn about my tool for finding/replacing old Flickr image references in your WordPress post content.

Here's some backstory:

I used to love Flickr as a place to store photos, and as a community for sharing and discussing photography. But as its ownership changed hands and its future became at times uncertain, I grew reluctant to trust that it could continue to be a permanent home for my own photos. My discomfort increased as I have become more engaged with the need to have full ownership over the things I create online.

So, I set out to migrate my 3.6GB collection of 2,481 Flickr photos, along with their tags, comments and other metadata, into a new home while I still could.

Continue reading "Moving photos from Flickr to WordPress"

Using MoveOn for petitions

It's been a long time since I started a petition to try to change something in my world. But in recent weeks my local City Council has been threatening to do some silly things related to funding the development of bike and pedestrian paths here, and I'd heard enough people say informally that they were concerned by those threats that I decided it was time to create a central spot where they could put all their names for Council members to see.

And that's how I ended up using the petitions.moveon.org service, which has turned out to be excellent for this purpose.

A few things in particular that I like about it:

  • While clearly scaled to support national and state level petitions, the MoveOn tool did a great job of enabling a smaller petition targeted at a local legislative body that might not otherwise be in their system. I was able to enter the names and email addresses of my local Council members as "targets" of the petition, and they then were set up to receive deliveries of signatures directly.
  • Related to that, the MoveOn system allowed the targets of the petition to respond directly to the petition signers with a message, without giving them direct access to each others` private contact information.
  • The system automatically picks small signature goals to start with and then scales them up as new milestones are hit. I think this helps avoid the awkward "WE'RE GOING TO HAVE A HUNDRED MILLION SIGNATURES HERE!" declarations by petition creators that quickly yield disappointment.
  • The system offered up interesting summary stats about where signatures were coming from and what activity on the petition looked like over time.
  • When I had to contact MoveOn's petition support (one of the signers had accidentally left out a word in a comment that significantly changed the meaning, and wanted it corrected) they were fast to respond and provided a quick solution.
  • Other features in the petition tool, like handling delivery via print and email, contacting petition signers, "declaring victory," and more seemed really well designed; simple, effective, built for bringing about real action.
  • The petition application is open source and available for anyone to contribute to it.

One of the things I wanted to do as the signature numbers climbed and as I prepared to present the petition to Council was create something that visualized the signing names in one place. The signature count on the petition itself was not super prominent, and in only displaying 10 names at a time it was easy to miss out on the sense of a large part of the local population making a clear statement about what they want.

So I sniffed the XMLHttpRequests being made by the MoveOn site and found the underlying API that was being used to load the signature names. I whipped up a simple PHP script that queries that API to fetch all the names, and then does some basic cleanup of the list: leaving out anything that doesn't look like a full name, making capitalization and spacing consistent, sorting, etc.

I published the tool online at https://github.com/ChrisHardie/moveon-petition-tools in case anyone else might find it useful. (I later learned that MoveOn makes a CSV export of signatures and comments available when you go to print your petition, so that's an option too.)

Using the output of my tool, I created this simple graphic that shows all of the signed names to date:

All in all the MoveOn petition platform has been great, and I think it's made a difference just the way I wanted it to. I highly recommend it.

Gluing the web together with a personal webhook server

I mentioned earlier that I'm using a personal webhook server as glue between various online tools and services, and this post expands on that.

Why It Matters

For the open web to work well enough to bring people back to it, information has to be able to come and go freely between different systems without relying on centrally controlled hubs.

Just like RSS feeds, REST APIs and other tools that allow software programs to talk to each other across disparate systems, webhooks are an important part of that mix. Whereas many of those tools are "pull" technologies - one system goes out and makes a request of another system to see what's been added or changed - webhooks are a "push" technology, where one system proactively notifies another about a change or event that's occurred.

You might be using webhooks right now! Popular services like IFTTT and Zapier are built in part on webhooks in order to make certain things happen over there when something else has happened over here. "Flash my lights when my toaster is done!" And while I have great respect for what IFTTT and Zapier do and how they do it, I get a little uncomfortable when I see (A) so much of the automated, programmable web flowing through a small number of commercially oriented services, and (B) software and service owners building private API and webhook access for IFTTT and Zapier that they don't make available to the rest of us:

So I decided that just as I host RSS feeds and APIs for many of the things I build and share online (mostly through WordPress, since it makes that so easy), I also wanted to have the ability to host my own webhook endpoints.

How I Set It Up

I used the free and open source webhook server software. (It's unfortunate that they chose the same name as the accepted term for the underlying technical concept, so just know that this isn't the only way to run a webhook server.) I didn't have a Go environment set up on the Ubuntu server I wanted to use, so I installed the precompiled version from their release directory.

I created a very simple webhook endpoint in my initial hooks.json configuration file that would tell me the server was working at all:

[
  {
    "id": "webhook-monitor",
    "execute-command": "/bin/true",
    "command-working-directory": "/var/tmp",
    "response-message": "webhook is running",
  }
]

Then I started up the server to test it out:

/usr/local/bin/webhook -hooks //cdn.chrishardie.com/path/to/hooks.json -verbose -ip 127.0.0.1

This tells the webhook server to listen on the server's localhost network interface, on port 9000. When I ping that endpoint I should see a successful result:

$ curl -I http://localhost:9000/hooks/webhook-monitor
HTTP/1.1 200 OK
...

From there I set up an nginx configuration to make the server available on the Internet. Here are some snippets from that config:

upstream webhook_server {
    server 127.0.0.1:9000;
    keepalive 300;
}

limit_req_zone $request_uri zone=webhooklimit:10m rate=1r/s;

server {
    ...

       location /hooks/ {
           proxy_pass http://webhook_server;
           ...

        limit_req zone=webhooklimit burst=20;
    }

    ...
}

This establishes a proxy setup where nginx passes requests for my public-facing server on to the internally running webhook server. It also limits them to a certain number of requests per second so that some poorly configured webhook client can't hammer my webhook server too hard.

I generated a Let's Encrypt SSL certificate for my webhook server so that all of my webhook traffic will be encrypted in transit. Then I added the webhook startup script to my server's boot time startup by creating /etc/init/webhook.conf:

description "Webhook Server"
author "Chris Hardie"

start on runlevel [2345]
stop on runlevel [!2345]

setuid myuser

console log

normal exit 0 TERM

kill signal KILL
kill timeout 5

exec /usr/local/bin/webhook -hooks //cdn.chrishardie.com/path/to/hooks.json -verbose -ip 127.0.0.1 -hotreload

The hotreload parameter tells the webhook server to monitor for and load any changes to the hooks.json file right away, instead of waiting for you to restart/reload the server. Just make sure you're confident in your config file syntax. 🙂

After that, service start webhook will get things up and running.

To add new webhook endpoints, you just add on to the JSON configuration file. There are some good starter examples in the software documentation.

I strongly recommend using a calling query string parameter that acts as a "secret key" before allowing any service to call one of your webhook endpoints. I also recommend setting up your nginx server to pass along the calling IP address of the webhook clients and then match against a whitelist of allowed callers in your hook config. These steps will help make sure your webhook server is secure.

Finally, I suggest setting up some kind of external monitoring for your webhook server, especially if you start to depend on it for important actions or notifications. I use an Uptimerobot check that ensures my testing endpoint above returns the "webhook is running" string that it expects, and alerts me if it doesn't.

If you don't want to go to the trouble of setting up and hosting your own webhook server, you might look at Hookdoo, which hosts the webhook endpoints for you and then still allows you to script custom actions that result when a webhook is called. I haven't used it myself but it's built by the same folks who released the above webhook server software.

How I Use It

So what webhook endpoints am I using?

My favorite and most frequently used is a webhook that is called by BitBucket when I merge changes to the master branch of one of the code repositories powering various websites and utilities I maintain. When the webhook is called it executes a script locally that then essentially runs a git pull on that repo into the right place, and then pings a private Slack channel to let me know the deployment was successful. This is much faster than manually logging into my webserver to pull down changes or doing silly things like SFTPing files around. This covers a lot of the functionality of paid services like DeployBot or DeployHQ, though obviously those tools offer a lot more bells and whistles for the average use case.

I use a version of this webhook app for SmartThings to send event information from my various connected devices and monitors at home to a database where I can run SQL queries on them. It's fun to ask your house what it's been up to in this way.

For the connected vehicle monitoring adapter I use, they have a webhook option that sends events about my driving activity and car trips to my webhook server. I also store this in a database for later querying and reporting. I have plans to extend this to create or modify items on my to-do list based on locations I've visited; e.g. if I go to the post office to pick up the mail, check off my reminder to go to the post office.

I've asked the Fastmail folks to add in support for a generic webhook endpoint as a notification target in their custom mail processing rules; right now they only support IFTTT. There are lots of neat possibilities for "when I receive an email like this, do this other thing with my to-do list, calendar, smart home or website."

Speaking of IFTTT, they do have a pretty neat recipe component that will let you call webhooks based on other triggers they support. It's a very thoughtful addition to their lineup.

Conclusion

I'm not expecting many folks to go out and set up a webhook server; there's probably a pretty small subset of the population that will find this useful, let alone act on it. But I'm glad it's an option for someone to glue things together like this if they want to, and we need to make sure that software makers and service providers continue to incorporate support for webhooks and other technologies of an open, connected web as they go.

My everyday apps

This may be one of those posts that no one other than me cares about, but I've been meaning to take a snapshot of the software tools I use on a daily basis right now, and how they're useful to me. (This comes at a time when I'm considering a major OS change in my life -- more on that later -- so as a part of that I need to inventory these tools anyway.)

So every day when I get up and stretch my arms in front of my energy inefficient, eye-strain-causing big blue wall screen with a cloud in the middle, grumbling to myself that we now call things "apps" instead of programs or software or really any other name, I see:

Airmail 3 - my main tool for reading, sorting, sending and finding email across a few different accounts. Replaced Thunderbird with it a few years back, really good choice.

Chrome - on the desktop, my daily browser of choice. I tried the new Firefox recently and liked it, but not enough to switch. I have lots of extensions installed but some favorites are Adblock Plus, JSONView and XML Tree, Momentum, Pinboard Tools, Postman, Signal Private Messenger, Vanilla Cookie Manager, Xdebug helper and Web Developer. On mobile I use iOS Safari.

macOS and iOS Calendar and Contacts - works with all of the various online calendars and address books I sync to, have stayed pretty intelligent and user-friendly without getting too cluttered.

Todoist - for task management and keeping my brain clear of things I care about but don't want to have to remember.

Slack - for communicating with my co-workers and others. I have accounts on 11 different Slack instances, and only stay logged in to about 4 of those at once.

Continue reading "My everyday apps"

Creating a private website with WordPress

When we became parents in 2015, Kelly and I talked about where and how we wanted to share the initial photos and stories of that experience with a small group of our family and friends. In case you haven't noticed, I feel pretty strongly about the principle of owning our digital homes. So I felt resistance to throwing everything up on Facebook in hopes that we'd always be able to make their evolving privacy and sharing settings and policies work for us, while also trusting that every single Facebook friend would honor our wishes about re-sharing that information.

I took some time to explore tools available for creating a private website that would be relatively easy for our users to access, relatively easy to maintain, and still limited in how accessible the content would be to the wider world. (I tend to assume that all information connected to the Internet will eventually become public, so I try to avoid ever thinking in terms of absolute privacy when it comes to websites of any kind.)

I thought about using WordPress.com, which offers the ability to quickly create a site that is private and viewable only by invited users while maintaining full ownership and control of the content. I passed on this idea in part because it didn't allow quite the level of feature customization that I wanted, and partly because it's a service of my employer, Automattic. While I fully trust my colleagues to be careful and sensitive to semi-private info stored there, it felt a little strange to think of creating something a bit vulnerable and intended for a small group of people within that context. I would still highly recommend the WordPress.com option for anyone looking for a simple, free/low-cost solution to get started.

Here are the WordPress tools I ended up using, with a few notes on my customizations:

Basic WordPress Configuration

For the basic WordPress installation and configuration, I made the following setup choices:

  • I put the site on a private, dedicated server so that I had control over the management and maintenance of the site software (as opposed to a shared server where my content, files or database may be accessible to others).
  • I used a Let's Encrypt SSL certificate and forced all traffic to the SSL version of the site, to ensure all communication and access would be encrypted.
  • I set up a child theme of a default WordPress theme so I could add a few customizations that would survive future parent theme updates.
  • I set "Membership" so that "Anyone can register" in the role of Subscriber (see more below on why this is okay).
  • For Search Engine Visibility I set "Discourage search engines from indexing this site".
  • For discussion I set "

Continue reading "Creating a private website with WordPress"

Cloud email, contacts & calendars without Google

Tricky situationI like Google and a lot of the things it does in the world. When people ask me what free mail, calendaring and contact syncing tools they should use, I usually include Google's services in my answer. But I always explain that they're trading some privacy and ownership of their information for the "free" part of that deal. "You're the product, not the customer" and all that.

For me, I've always tried to avoid having my own data and online activities become the product in someone else's business model. There are plenty of places where I can't or don't do that, and I mostly make those tradeoffs willingly. But so far, I've been able to avoid using Google (and Apple and Microsoft) for managing my personal email, calendaring and contact syncing.

Here's how.

Continue reading "Cloud email, contacts & calendars without Google"

Chrome extensions to manage online privacy

Privacy

There are a couple of extensions for Chrome that I've been using for a while now to try to maintain or improve my privacy online. Some have been helpful, others haven't. Some mini-reviews:

Terms of Service, Didn't Read

Most every modern website has a "Terms of Service" that governs your interactions with it. The document usually lays out how and when the site will use any data it collects about you - helpful, right? The document is also usually many pages long and would potentially take hours to fully absorb and understand. Terms of Service, Didn't Read is an extension that tries to give you a high-level view of the Terms of Service of the site you're on, based on their team's reading and interpretation of those documents on your behalf. If there are particular concerns related to privacy and personal data use, the extension will flag that when you arrive.

I used this extension for several months, finding it interesting at first to see how the sites I visited regularly measured up to TOSDR's evaluation. But after the initial curiosity wore off, I realized that for the most part, the information here wasn't changing my behavior. If TOSDR flagged something like "The copyright license is broader than necessary" or "This service tracks you on other websites," I'd still have to do some more digging to figure out exactly what that meant, and whether or not I was comfortable with it. So, the information provided by TOSDR is helpful, but not always conveniently actionable when it comes to protecting privacy. (There's a theme in all this: protecting privacy is rarely convenient.)

Continue reading "Chrome extensions to manage online privacy"