Slides and links from my php[world] talk

Yesterday I gave a talk at php[world] 2019 about "Tips and Tools for Gluing Together the Open Web." Below are the slides, text and links from my talk. Where applicable, the slide images link to relevant websites and code.

A few months ago, I was helping a local non-profit organization with their new website. We were almost set to launch, but they had one more request: can we get the next five upcoming events we've put up on our organization's Facebook page to show up in a list in the website's footer? I initially thought that surely there would be a simple way to do this, but as I looked in to it, the short answer was no, no you can't.

There's an embeddable Facebook widget that can display events, but you don't get a lot of control over appearance and it's far from a helpful list that can be scanned quickly. This seemed like such a basic request - connecting Facebook page events to a website - and yet as I poured through Google and Stack Overflow search results all I could find were frustrated users and abandoned tools that had tried and failed to do this one thing.

Continue reading "Slides and links from my php[world] talk"

Unlocking community event information from Facebook

It wasn't always this way.

It might be hard to remember, but there was a time when Facebook wasn't the only place to learn about upcoming events in our communities. Instead of having to scroll past silly memes and political rants to get the details of the next potluck, lecture or book club you cared about, the information was available in lots of other places. Events were available on websites and apps, and they were shared via constantly updated feeds that you could even integrate directly into your own personal calendaring system.

You could learn about what was happening down the street or across town, and you didn't have to give up your online privacy to do it.

At some point, this shifted. Weary of the duplication of efforts maintaining event information in multiple places, and increasing fragmentation of sources that you needed to consult to figure out what's happening, people longed for a more centralized, authoritative spot to enter and learn about community events. I was one of those people.

Enter Facebook

Facebook saw this opportunity and jumped on it.

They made it easy to enter and promote events, and they sprinkled in social features to make it extra compelling to do. Not only could you learn about the event itself, you could see who among your friends was planning on (or interested in) going, and what they had to say about it. If you were an event organizer you could see what kinds of events were catching the interest of your target audience, and you could more easily avoid scheduling your big fundraising gala on the same night as the art museum open house.

At first, Facebook made access to event data one of the most friction-free parts of its platform, and they didn't require you to be "on Facebook" to see it. You could get email notifications, RSS feeds, ICS feeds to bring event information into your own productivity tools and daily life. There was API access so that you could build tools and websites that incorporated the event data where you wanted it to be.

Over time, all of that access was removed. The APIs were shut off, the feeds were shut down, and the message was clear: you have to come to the Facebook website or app to learn about events that you might want to attend.

It's nothing new to note that Facebook's user interface and business model are built around keeping people inside its walled garden. Whatever you might want to contribute to or get from your Facebook experience, they want you to do it on their website or in their mobile app, and on their terms. Their ability to sell advertising space depends on it.

But as with many other aspects of Facebook culture and its grip on our personal and community data, there's a significant downside. Facebook's decision to lock up event information has real implications for how we encounter and experience public life in our real-world communities.

Why is this a problem?

It means that like our exposure to news or updates from our online friends, our awareness of community events is driven by a black-box algorithm optimized around profit over everything else. It means that one company's shifting views on what constitutes an acceptable event, and its sensitivity to the interests of paying advertisers and political organizations could determine whether we see the details of an upcoming protest, demonstration or other exercising of free speech.

And it means that anyone concerned about the privacy implications of having their interest in a given event tracked, sold and monetized may have less exposure to the events that may have traditionally shaped and defined public life in their community, and private life in their circles of friends and neighbors.

There are practical implications too. An organization or business that still chooses to have its own website is having to enter and maintain event information in multiple places, which is time consuming and inefficient. People who want to avoid using Facebook are either left out of the loop or forced into using it again.

I went looking for solutions to this in the context of a website project I was working on recently, where the request was simple: can we bring our Facebook event data into our WordPress site? I had naively assumed that Facebook would have an interest in making this easy: if people could enter their events in one place and have them pushed out to everywhere that mattered, it would be so much easier to see them as the natural place to maintain that information.

But again, they don't make it easy. There's no API available to fetch event data, even for a Facebook page on which I'm an administrator. Event data is not displayed on public-facing Facebook.com pages in any kind of structured ways, and in fact it is rendered in ways that make it resistant to traditional "scraping" tools. There are no other user- or developer-friendly tools for working with event data that I can find.

Yes, Facebook does have a "page widget" that lets you display event information from a Facebook page elsewhere via embedding, if you are an admin on that page. The layout and customization options are pretty limited, but more importantly this is not the same thing as having access to the event information itself for importing, displaying, searching, archiving or other actions that someone might want to take.

Asking page owners to initiate displaying the event information elsewhere also eliminates the chance for other interested parties to "slice and dice" event data in potentially useful or interesting ways. If I want to create a website where you can search and display all of the animal adoption-related events in the Facebook facebook happening within 50 miles of my zip code, I can't. I can't do this even if all of the animal adoption agencies enter their events on Facebook, make them public and put page widgets up on their own website.

Can software fix this?

Being a software developer who works all day long on tools that try to make publishing easier and more interconnected, as I thought about this issue I still felt like there must be some way to extract event data from publicly shared Facebook events. After all, if you can see them in your web browser even while not logged in to Facebook, then that means the data is by definition publicly accessible in some form.

After analyzing the structure of a Facebook web page, its Javascript and the asynchronous HTTP calls made to fully render the content on it, I found that there is a way. Facebook's own event display pages are making a POST to the URL https://www.facebook.com/api/graphql/  to retrieve relevant event details, which are then rendered in HTML and CSS for a normal user to see. I sniffed the request and removed all of the query parameters I could find that might be extraneous. Here's an example of a resulting logged-out POST request that retrieves a batch of the upcoming events for a given Facebook page (my local Parks and Recreation Department):

POST /api/graphql/ HTTP/1.1
Host: www.facebook.com
Content-Type: application/x-www-form-urlencoded
Origin: https://www.facebook.com
User-Agent: PostmanRuntime/7.15.2
Accept: */*
Cache-Control: no-cache
Host: www.facebook.com
Accept-Encoding: gzip, deflate
Content-Length: 255
Connection: keep-alive
cache-control: no-cache

fb_api_req_friendly_name=PageEventsTabUpcomingEventsCardRendererQuery&variables=%7B%22pageID%22%3A%2250650939390%22%7D&__req=8&__user=0&av=0&__a=1&__be=1&dpr=2&fb_api_caller_class=RelayModern&__pc=PHASED%3ADEFAULT&__comet_req=false&doc_id=2343886202319301

I can't tell you what all of the query parameters mean but I think doc_id is the Facebook-internal indicator of a shorthand for which query to run (so apparently "2343886202319301" maps to "all upcoming events" or something like that) and pageID is the internal ID of the Facebook page itself. I can imagine this query breaking later on if Facebook decided to update the doc_id mappings, but for the time being, it works.

The result of the request is a JSON object that contains all the upcoming event info one would need to import Facebook events into another system, display them on a non-Facebook website, and so on.

From this point, I could imagine creating a tool that, given a list of Facebook pages, automatically and regularly grabs all of the upcoming public events available in Facebook and does something useful with them. Here's a proof of concept PHP script that lays the groundwork for that.

Am I going to get in trouble for this?

Does Facebook care if we access event data in this way? Yes, yes they do. Their terms of service about Automated Data Collection explicitly states that if you try to extract data from the Facebook platform in an automated way, they can ban you forever. But more likely than anything is that they will change the way their platform works to make this kind of data retrieval even more difficult.

I think there's hope for some shifting expectations about what's considered fair here. A U.S. appeals court ruled just last month that web scraping does not constitute illegal activity, and went so far as to acknowledge that if a scraper is retrieving publicly available data owned not by the platform but by its users, the scraping can't be blocked.

In the case of Facebook event data, it's worth noting again that we're talking about publicly available information shared by organizations that want it to get more exposure online about events that they want the public to attend. Restricting access to that because it's a potential source of advertising revenue seems beyond greedy to me. In the end though, it's up to organizations, businesses and individuals to decide whether they want to have their event data locked up, or out on the open web.

You can help

If you are an event organizer, please consider posting your event data outside of Facebook on a publicly available website!

If you are someone who cares about building a healthy culture of civic engagement in your community, advocate for the organizations you're involved with to move away from tools that make this harder in the long term!

If you are a developer at Facebook or anywhere else that builds tools for people to share information, please don't lock up that information! Show your commitment to the open web. Provide APIs, RSS (or in this case, ICS) feeds, good documentation and facilitate easy exports of user-owned data. (A quick shout-out to the folks at Eventbrite who, at least for now, make available for free a very robust API to access community events shared on their platform.)

I'm glad for any feedback and suggestions about these challenges; please comment below. I've also explored the themes discussed here in various past posts, including:

 

New WordPress plugin: Printable PDF Newspaper

I've released a new WordPress plugin I've been working on for a while, "Printable PDF Newspaper."

The plugin does one thing: it generates a newspaper-style PDF document from the posts on your WordPress site.

Pick which posts you want to include, customize a few things about how they're displayed, decide which fields to display (including a QR code that links a print reader with a smartphone back to your site), and generate the PDF. You can download it for printing or save it to your media library for easy linking and sharing.

That's it. That's the plugin.

There have been other plugins that do this kind of thing, but they were usually either reliant on a third-party PDF generation service, some of which required subscriptions, or they hadn't been updated in quite a long time so the architecture was out of date. Some allow you to generate a printable version of your website, but without the focus on a newspaper-style format.

So why would you want to generate a printed thing from your online site content?

Maybe you're producing a newsletter for your campus, neighborhood or community, and you want to take the information you've already published online and hang it up on a message board. Maybe you're a small news organization that wants to tease would-be subscribers at the local diner about what you have to offer. Maybe you just want to have something to look at over your own coffee in the morning. Whatever it is, some people still enjoy encountering things and ideas through engagement with objects in the physical world, and I hope this tool helps facilitate that for WordPress publishers of all sorts.

I've been thinking about the concept for this plugin for over a year, maybe more. I hadn't made time to actually work on building it, so I assumed the idea would just fade away. When it didn't and I still found myself really wanting to see this thing in the world, I contracted out the building of a first rough version, and then spent some time reworking it to my liking. It has rough edges and there are plenty of things it could do better, but I'm proud of it as a version 1.0.

The Printable PDF Newspaper plugin is available for free on the WordPress.org Theme Directory, and the source is available on GitHub.

Features I hope to add in the future as my personal time allows include:

  • Saving of PDF configuration selections for easy re-use in future runs
  • More customizable header size and layout
  • Better support for additional formatting, fonts and styles
  • Filters to alter how content is selected and laid out
  • Generate QR Codes within the plugin instead of using the Google Chart API
  • Ability to auto-generate the PDF on a schedule
  • Better controls for limiting number of pages generated and controlling column breaks.

If you're a WordPress publisher and you get to try this out, I'd appreciate any feedback. Enjoy!

Better WordPress multisite image URLs

I've been running a personal WordPress multisite instance for several years now, and I use it to host a variety of personal and organizational sites, including this one. I really like the ways it allows me to standardize and consolidate my management of WordPress as a tool, while still allowing a lot of flexibility for customizing my sites just as though they were individual self-hosted sites.

For the most part, my use of WordPress in multisite/network mode doesn't have any user-facing implications, especially since I use the WordPress MU Domain Mapping plugin to map custom domain names to every site I launch. As far as anyone visiting my sites knows, it's a standalone WordPress site that looks and works like any other.

The one exception to this has been the URL structure for images and other attachments that I upload to any site hosted on this multisite instance. Whereas the typical WordPress image URL might look like this:

https://example.com/wp-content/uploads/2019/03/my_image.jpg

on a multisite instance, there is an directory structure added in to separate each site's uploads into its own subdirectory:

https://example.com/wp-content/uploads/sites/25/2019/03/my_image.jpg

where 25 might be the site's unique site ID within that multisite setup.

There's nothing wrong with this approach and it certainly makes technical sense if you have lots of sites on your multisite instance that are either subdirectories or subdomains of the main multisite domain.

Continue reading "Better WordPress multisite image URLs"

Moving photos from Flickr to WordPress

If you're ready to move your own Flickr photo collection to WordPress and feel comfortable on the command line, you can go straight to the Flickr to WordPress tool I built and get started. Here's some backstory:

I used to love Flickr as a place to store photos, and as a community for sharing and discussing photography. But as its ownership changed hands and its future became at times uncertain, I grew reluctant to trust that it could continue to be a permanent home for my own photos. My discomfort increased as I have become more engaged with the need to have full ownership over the things I create online.

So, I set out to migrate my 3.6GB collection of 2,481 Flickr photos, along with their tags, comments and other metadata, into a new home while I still could.

Continue reading "Moving photos from Flickr to WordPress"

Hiding authors and users in WordPress

As an advanced publishing tool, WordPress typically defaults to displaying information about the author behind a given post or page on a WordPress site. But sometimes you want to build a website that has a more "singular" editorial identity, and that doesn't appear to be authored and managed by multiple people, even if it is. I see this regularly with corporate brands, political organizations, larger not-for-profits, and advocacy groups where the identity of a post or page's author could distract from the content being shared.

So how do you keep WordPress from revealing the author information - names, usernames and more - for the administrative users of your site? Here are a few tips, aimed at WordPress developers comfortable customizing their sites through changing the theme and plugin code.

Continue reading "Hiding authors and users in WordPress"

Using MoveOn for petitions

It's been a long time since I started a petition to try to change something in my world. But in recent weeks my local City Council has been threatening to do some silly things related to funding the development of bike and pedestrian paths here, and I'd heard enough people say informally that they were concerned by those threats that I decided it was time to create a central spot where they could put all their names for Council members to see.

And that's how I ended up using the petitions.moveon.org service, which has turned out to be excellent for this purpose.

A few things in particular that I like about it:

  • While clearly scaled to support national and state level petitions, the MoveOn tool did a great job of enabling a smaller petition targeted at a local legislative body that might not otherwise be in their system. I was able to enter the names and email addresses of my local Council members as "targets" of the petition, and they then were set up to receive deliveries of signatures directly.
  • Related to that, the MoveOn system allowed the targets of the petition to respond directly to the petition signers with a message, without giving them direct access to each others` private contact information.
  • The system automatically picks small signature goals to start with and then scales them up as new milestones are hit. I think this helps avoid the awkward "WE'RE GOING TO HAVE A HUNDRED MILLION SIGNATURES HERE!" declarations by petition creators that quickly yield disappointment.
  • The system offered up interesting summary stats about where signatures were coming from and what activity on the petition looked like over time.
  • When I had to contact MoveOn's petition support (one of the signers had accidentally left out a word in a comment that significantly changed the meaning, and wanted it corrected) they were fast to respond and provided a quick solution.
  • Other features in the petition tool, like handling delivery via print and email, contacting petition signers, "declaring victory," and more seemed really well designed; simple, effective, built for bringing about real action.
  • The petition application is open source and available for anyone to contribute to it.

One of the things I wanted to do as the signature numbers climbed and as I prepared to present the petition to Council was create something that visualized the signing names in one place. The signature count on the petition itself was not super prominent, and in only displaying 10 names at a time it was easy to miss out on the sense of a large part of the local population making a clear statement about what they want.

So I sniffed the XMLHttpRequests being made by the MoveOn site and found the underlying API that was being used to load the signature names. I whipped up a simple PHP script that queries that API to fetch all the names, and then does some basic cleanup of the list: leaving out anything that doesn't look like a full name, making capitalization and spacing consistent, sorting, etc.

I published the tool online at https://github.com/ChrisHardie/moveon-petition-tools in case anyone else might find it useful. (I later learned that MoveOn makes a CSV export of signatures and comments available when you go to print your petition, so that's an option too.)

Using the output of my tool, I created this simple graphic that shows all of the signed names to date:

All in all the MoveOn petition platform has been great, and I think it's made a difference just the way I wanted it to. I highly recommend it.

Gluing the web together with a personal webhook server

I mentioned earlier that I'm using a personal webhook server as glue between various online tools and services, and this post expands on that.

Why It Matters

For the open web to work well enough to bring people back to it, information has to be able to come and go freely between different systems without relying on centrally controlled hubs.

Just like RSS feeds, REST APIs and other tools that allow software programs to talk to each other across disparate systems, webhooks are an important part of that mix. Whereas many of those tools are "pull" technologies - one system goes out and makes a request of another system to see what's been added or changed - webhooks are a "push" technology, where one system proactively notifies another about a change or event that's occurred.

You might be using webhooks right now! Popular services like IFTTT and Zapier are built in part on webhooks in order to make certain things happen over there when something else has happened over here. "Flash my lights when my toaster is done!" And while I have great respect for what IFTTT and Zapier do and how they do it, I get a little uncomfortable when I see (A) so much of the automated, programmable web flowing through a small number of commercially oriented services, and (B) software and service owners building private API and webhook access for IFTTT and Zapier that they don't make available to the rest of us:

So I decided that just as I host RSS feeds and APIs for many of the things I build and share online (mostly through WordPress, since it makes that so easy), I also wanted to have the ability to host my own webhook endpoints.

How I Set It Up

I used the free and open source webhook server software. (It's unfortunate that they chose the same name as the accepted term for the underlying technical concept, so just know that this isn't the only way to run a webhook server.) I didn't have a Go environment set up on the Ubuntu server I wanted to use, so I installed the precompiled version from their release directory.

I created a very simple webhook endpoint in my initial hooks.json configuration file that would tell me the server was working at all:

[
  {
    "id": "webhook-monitor",
    "execute-command": "/bin/true",
    "command-working-directory": "/var/tmp",
    "response-message": "webhook is running",
  }
]

Then I started up the server to test it out:

/usr/local/bin/webhook -hooks /path/to/hooks.json -verbose -ip 127.0.0.1

This tells the webhook server to listen on the server's localhost network interface, on port 9000. When I ping that endpoint I should see a successful result:

$ curl -I http://localhost:9000/hooks/webhook-monitor
HTTP/1.1 200 OK
...

From there I set up an nginx configuration to make the server available on the Internet. Here are some snippets from that config:

upstream webhook_server {
    server 127.0.0.1:9000;
    keepalive 300;
}

limit_req_zone $request_uri zone=webhooklimit:10m rate=1r/s;

server {
    ...

       location /hooks/ {
           proxy_pass http://webhook_server;
           ...

        limit_req zone=webhooklimit burst=20;
    }

    ...
}

This establishes a proxy setup where nginx passes requests for my public-facing server on to the internally running webhook server. It also limits them to a certain number of requests per second so that some poorly configured webhook client can't hammer my webhook server too hard.

I generated a Let's Encrypt SSL certificate for my webhook server so that all of my webhook traffic will be encrypted in transit. Then I added the webhook startup script to my server's boot time startup by creating /etc/init/webhook.conf:

description "Webhook Server"
author "Chris Hardie"

start on runlevel [2345]
stop on runlevel [!2345]

setuid myuser

console log

normal exit 0 TERM

kill signal KILL
kill timeout 5

exec /usr/local/bin/webhook -hooks /path/to/hooks.json -verbose -ip 127.0.0.1 -hotreload

The hotreload parameter tells the webhook server to monitor for and load any changes to the hooks.json file right away, instead of waiting for you to restart/reload the server. Just make sure you're confident in your config file syntax. 🙂

After that, service start webhook will get things up and running.

To add new webhook endpoints, you just add on to the JSON configuration file. There are some good starter examples in the software documentation.

I strongly recommend using a calling query string parameter that acts as a "secret key" before allowing any service to call one of your webhook endpoints. I also recommend setting up your nginx server to pass along the calling IP address of the webhook clients and then match against a whitelist of allowed callers in your hook config. These steps will help make sure your webhook server is secure.

Finally, I suggest setting up some kind of external monitoring for your webhook server, especially if you start to depend on it for important actions or notifications. I use an Uptimerobot check that ensures my testing endpoint above returns the "webhook is running" string that it expects, and alerts me if it doesn't.

If you don't want to go to the trouble of setting up and hosting your own webhook server, you might look at Hookdoo, which hosts the webhook endpoints for you and then still allows you to script custom actions that result when a webhook is called. I haven't used it myself but it's built by the same folks who released the above webhook server software.

How I Use It

So what webhook endpoints am I using?

My favorite and most frequently used is a webhook that is called by BitBucket when I merge changes to the master branch of one of the code repositories powering various websites and utilities I maintain. When the webhook is called it executes a script locally that then essentially runs a git pull on that repo into the right place, and then pings a private Slack channel to let me know the deployment was successful. This is much faster than manually logging into my webserver to pull down changes or doing silly things like SFTPing files around. This covers a lot of the functionality of paid services like DeployBot or DeployHQ, though obviously those tools offer a lot more bells and whistles for the average use case.

I use a version of this webhook app for SmartThings to send event information from my various connected devices and monitors at home to a database where I can run SQL queries on them. It's fun to ask your house what it's been up to in this way.

For the connected vehicle monitoring adapter I use, they have a webhook option that sends events about my driving activity and car trips to my webhook server. I also store this in a database for later querying and reporting. I have plans to extend this to create or modify items on my to-do list based on locations I've visited; e.g. if I go to the post office to pick up the mail, check off my reminder to go to the post office.

I've asked the Fastmail folks to add in support for a generic webhook endpoint as a notification target in their custom mail processing rules; right now they only support IFTTT. There are lots of neat possibilities for "when I receive an email like this, do this other thing with my to-do list, calendar, smart home or website."

Speaking of IFTTT, they do have a pretty neat recipe component that will let you call webhooks based on other triggers they support. It's a very thoughtful addition to their lineup.

Conclusion

I'm not expecting many folks to go out and set up a webhook server; there's probably a pretty small subset of the population that will find this useful, let alone act on it. But I'm glad it's an option for someone to glue things together like this if they want to, and we need to make sure that software makers and service providers continue to incorporate support for webhooks and other technologies of an open, connected web as they go.

My everyday apps

This may be one of those posts that no one other than me cares about, but I've been meaning to take a snapshot of the software tools I use on a daily basis right now, and how they're useful to me. (This comes at a time when I'm considering a major OS change in my life -- more on that later -- so as a part of that I need to inventory these tools anyway.)

So every day when I get up and stretch my arms in front of my energy inefficient, eye-strain-causing big blue wall screen with a cloud in the middle, grumbling to myself that we now call things "apps" instead of programs or software or really any other name, I see:

Airmail 3 - my main tool for reading, sorting, sending and finding email across a few different accounts. Replaced Thunderbird with it a few years back, really good choice.

Chrome - on the desktop, my daily browser of choice. I tried the new Firefox recently and liked it, but not enough to switch. I have lots of extensions installed but some favorites are Adblock Plus, JSONView and XML Tree, Momentum, Pinboard Tools, Postman, Signal Private Messenger, Vanilla Cookie Manager, Xdebug helper and Web Developer. On mobile I use iOS Safari.

macOS and iOS Calendar and Contacts - works with all of the various online calendars and address books I sync to, have stayed pretty intelligent and user-friendly without getting too cluttered.

Todoist - for task management and keeping my brain clear of things I care about but don't want to have to remember.

Slack - for communicating with my co-workers and others. I have accounts on 11 different Slack instances, and only stay logged in to about 4 of those at once.

Continue reading "My everyday apps"