Yesterday I gave a talk at php[world] 2019 about "Tips and Tools for Gluing Together the Open Web." Below are the slides, text and links from my talk. Where applicable, the slide images link to relevant websites and code.
A few months ago, I was helping a local non-profit organization with their new website. We were almost set to launch, but they had one more request: can we get the next five upcoming events we've put up on our organization's Facebook page to show up in a list in the website's footer? I initially thought that surely there would be a simple way to do this, but as I looked in to it, the short answer was no, no you can't.
There's an embeddable Facebook widget that can display events, but you don't get a lot of control over appearance and it's far from a helpful list that can be scanned quickly. This seemed like such a basic request - connecting Facebook page events to a website - and yet as I poured through Google and Stack Overflow search results all I could find were frustrated users and abandoned tools that had tried and failed to do this one thing.
It might be hard to remember, but there was a time when Facebook wasn't the only place to learn about upcoming events in our communities. Instead of having to scroll past silly memes and political rants to get the details of the next potluck, lecture or book club you cared about, the information was available in lots of other places. Events were available on websites and apps, and they were shared via constantly updated feeds that you could even integrate directly into your own personal calendaring system.
You could learn about what was happening down the street or across town, and you didn't have to give up your online privacy to do it.
At some point, this shifted. Weary of the duplication of efforts maintaining event information in multiple places, and increasing fragmentation of sources that you needed to consult to figure out what's happening, people longed for a more centralized, authoritative spot to enter and learn about community events. I was one of those people.
Facebook saw this opportunity and jumped on it.
They made it easy to enter and promote events, and they sprinkled in social features to make it extra compelling to do. Not only could you learn about the event itself, you could see who among your friends was planning on (or interested in) going, and what they had to say about it. If you were an event organizer you could see what kinds of events were catching the interest of your target audience, and you could more easily avoid scheduling your big fundraising gala on the same night as the art museum open house.
At first, Facebook made access to event data one of the most friction-free parts of its platform, and they didn't require you to be "on Facebook" to see it. You could get email notifications, RSS feeds, ICS feeds to bring event information into your own productivity tools and daily life. There was API access so that you could build tools and websites that incorporated the event data where you wanted it to be.
Over time, all of that access was removed. The APIs were shut off, the feeds were shut down, and the message was clear: you have to come to the Facebook website or app to learn about events that you might want to attend.
It's nothing new to note that Facebook's user interface and business model are built around keeping people inside its walled garden. Whatever you might want to contribute to or get from your Facebook experience, they want you to do it on their website or in their mobile app, and on their terms. Their ability to sell advertising space depends on it.
But as with many other aspects of Facebook culture and its grip on our personal and community data, there's a significant downside. Facebook's decision to lock up event information has real implications for how we encounter and experience public life in our real-world communities.
Why is this a problem?
It means that like our exposure to news or updates from our online friends, our awareness of community events is driven by a black-box algorithm optimized around profit over everything else. It means that one company's shifting views on what constitutes an acceptable event, and its sensitivity to the interests of paying advertisers and political organizations could determine whether we see the details of an upcoming protest, demonstration or other exercising of free speech.
And it means that anyone concerned about the privacy implications of having their interest in a given event tracked, sold and monetized may have less exposure to the events that may have traditionally shaped and defined public life in their community, and private life in their circles of friends and neighbors.
There are practical implications too. An organization or business that still chooses to have its own website is having to enter and maintain event information in multiple places, which is time consuming and inefficient. People who want to avoid using Facebook are either left out of the loop or forced into using it again.
I went looking for solutions to this in the context of a website project I was working on recently, where the request was simple: can we bring our Facebook event data into our WordPress site? I had naively assumed that Facebook would have an interest in making this easy: if people could enter their events in one place and have them pushed out to everywhere that mattered, it would be so much easier to see them as the natural place to maintain that information.
But again, they don't make it easy. There's no API available to fetch event data, even for a Facebook page on which I'm an administrator. Event data is not displayed on public-facing Facebook.com pages in any kind of structured ways, and in fact it is rendered in ways that make it resistant to traditional "scraping" tools. There are no other user- or developer-friendly tools for working with event data that I can find.
Yes, Facebook does have a "page widget" that lets you display event information from a Facebook page elsewhere via embedding, if you are an admin on that page. The layout and customization options are pretty limited, but more importantly this is not the same thing as having access to the event information itself for importing, displaying, searching, archiving or other actions that someone might want to take.
Asking page owners to initiate displaying the event information elsewhere also eliminates the chance for other interested parties to "slice and dice" event data in potentially useful or interesting ways. If I want to create a website where you can search and display all of the animal adoption-related events in the Facebook facebook happening within 50 miles of my zip code, I can't. I can't do this even if all of the animal adoption agencies enter their events on Facebook, make them public and put page widgets up on their own website.
Can software fix this?
Being a software developer who works all day long on tools that try to make publishing easier and more interconnected, as I thought about this issue I still felt like there must be some way to extract event data from publicly shared Facebook events. After all, if you can see them in your web browser even while not logged in to Facebook, then that means the data is by definition publicly accessible in some form.
I can't tell you what all of the query parameters mean but I think doc_id is the Facebook-internal indicator of a shorthand for which query to run (so apparently "2343886202319301" maps to "all upcoming events" or something like that) and pageID is the internal ID of the Facebook page itself. I can imagine this query breaking later on if Facebook decided to update the doc_id mappings, but for the time being, it works.
The result of the request is a JSON object that contains all the upcoming event info one would need to import Facebook events into another system, display them on a non-Facebook website, and so on.
From this point, I could imagine creating a tool that, given a list of Facebook pages, automatically and regularly grabs all of the upcoming public events available in Facebook and does something useful with them. Here's a proof of concept PHP script that lays the groundwork for that.
Am I going to get in trouble for this?
Does Facebook care if we access event data in this way? Yes, yes they do. Their terms of service about Automated Data Collection explicitly states that if you try to extract data from the Facebook platform in an automated way, they can ban you forever. But more likely than anything is that they will change the way their platform works to make this kind of data retrieval even more difficult.
I think there's hope for some shifting expectations about what's considered fair here. A U.S. appeals court ruled just last month that web scraping does not constitute illegal activity, and went so far as to acknowledge that if a scraper is retrieving publicly available data owned not by the platform but by its users, the scraping can't be blocked.
In the case of Facebook event data, it's worth noting again that we're talking about publicly available information shared by organizations that want it to get more exposure online about events that they want the public to attend. Restricting access to that because it's a potential source of advertising revenue seems beyond greedy to me. In the end though, it's up to organizations, businesses and individuals to decide whether they want to have their event data locked up, or out on the open web.
You can help
If you are an event organizer, please consider posting your event data outside of Facebook on a publicly available website!
If you are someone who cares about building a healthy culture of civic engagement in your community, advocate for the organizations you're involved with to move away from tools that make this harder in the long term!
If you are a developer at Facebook or anywhere else that builds tools for people to share information, please don't lock up that information! Show your commitment to the open web. Provide APIs, RSS (or in this case, ICS) feeds, good documentation and facilitate easy exports of user-owned data. (A quick shout-out to the folks at Eventbrite who, at least for now, make available for free a very robust API to access community events shared on their platform.)
I'm glad for any feedback and suggestions about these challenges; please comment below. I've also explored the themes discussed here in various past posts, including:
The plugin does one thing: it generates a newspaper-style PDF document from the posts on your WordPress site.
Pick which posts you want to include, customize a few things about how they're displayed, decide which fields to display (including a QR code that links a print reader with a smartphone back to your site), and generate the PDF. You can download it for printing or save it to your media library for easy linking and sharing.
That's it. That's the plugin.
There have been other plugins that do this kind of thing, but they were usually either reliant on a third-party PDF generation service, some of which required subscriptions, or they hadn't been updated in quite a long time so the architecture was out of date. Some allow you to generate a printable version of your website, but without the focus on a newspaper-style format.
So why would you want to generate a printed thing from your online site content?
Maybe you're producing a newsletter for your campus, neighborhood or community, and you want to take the information you've already published online and hang it up on a message board. Maybe you're a small news organization that wants to tease would-be subscribers at the local diner about what you have to offer. Maybe you just want to have something to look at over your own coffee in the morning. Whatever it is, some people still enjoy encountering things and ideas through engagement with objects in the physical world, and I hope this tool helps facilitate that for WordPress publishers of all sorts.
I've been thinking about the concept for this plugin for over a year, maybe more. I hadn't made time to actually work on building it, so I assumed the idea would just fade away. When it didn't and I still found myself really wanting to see this thing in the world, I contracted out the building of a first rough version, and then spent some time reworking it to my liking. It has rough edges and there are plenty of things it could do better, but I'm proud of it as a version 1.0.
I'll be presenting a session at php[world] in Washington, D.C. on October 23rd, "Tools and Tips for Gluing Together the Open Web." I get to combine a bunch of topics I'm interested in: the programming language that powers 83% of the web, tools and ideas for helping people own their online homes and content, the principles behind (and discussions happening around) strengthening the open web, the publishing platform that powers over a third of websites out there, and hacky little bits of "glue" software I've written to get data in or out of a given service.
I'm excited and it looks like a great conference. (My employer, Automattic, is an event sponsor.) If you're interested in PHP, software development and/or the open web and will be in the D.C. area then, I hope to see you there.
I've been running a personal WordPress multisite instance for several years now, and I use it to host a variety of personal and organizational sites, including this one. I really like the ways it allows me to standardize and consolidate my management of WordPress as a tool, while still allowing a lot of flexibility for customizing my sites just as though they were individual self-hosted sites.
The one exception to this has been the URL structure for images and other attachments that I upload to any site hosted on this multisite instance. Whereas the typical WordPress image URL might look like this:
where 25 might be the site's unique site ID within that multisite setup.
There's nothing wrong with this approach and it certainly makes technical sense if you have lots of sites on your multisite instance that are either subdirectories or subdomains of the main multisite domain.
If you're ready to move your own Flickr photo collection to WordPress and feel comfortable on the command line, you can go straight to the Flickr to WordPress tool I built and get started. Here's some backstory:
I used to love Flickr as a place to store photos, and as a community for sharing and discussing photography. But as its ownership changed hands and its future became at times uncertain, I grew reluctant to trust that it could continue to be a permanent home for my own photos. My discomfort increased as I have become more engaged with the need to have full ownership over the things I create online.
So, I set out to migrate my 3.6GB collection of 2,481 Flickr photos, along with their tags, comments and other metadata, into a new home while I still could.
As an advanced publishing tool, WordPress typically defaults to displaying information about the author behind a given post or page on a WordPress site. But sometimes you want to build a website that has a more "singular" editorial identity, and that doesn't appear to be authored and managed by multiple people, even if it is. I see this regularly with corporate brands, political organizations, larger not-for-profits, and advocacy groups where the identity of a post or page's author could distract from the content being shared.
So how do you keep WordPress from revealing the author information - names, usernames and more - for the administrative users of your site? Here are a few tips, aimed at WordPress developers comfortable customizing their sites through changing the theme and plugin code.
It's been a long time since I started a petition to try to change something in my world. But in recent weeks my local City Council has been threatening to do some silly things related to funding the development of bike and pedestrian paths here, and I'd heard enough people say informally that they were concerned by those threats that I decided it was time to create a central spot where they could put all their names for Council members to see.
And that's how I ended up using the petitions.moveon.org service, which has turned out to be excellent for this purpose.
A few things in particular that I like about it:
While clearly scaled to support national and state level petitions, the MoveOn tool did a great job of enabling a smaller petition targeted at a local legislative body that might not otherwise be in their system. I was able to enter the names and email addresses of my local Council members as "targets" of the petition, and they then were set up to receive deliveries of signatures directly.
Related to that, the MoveOn system allowed the targets of the petition to respond directly to the petition signers with a message, without giving them direct access to each others` private contact information.
The system automatically picks small signature goals to start with and then scales them up as new milestones are hit. I think this helps avoid the awkward "WE'RE GOING TO HAVE A HUNDRED MILLION SIGNATURES HERE!" declarations by petition creators that quickly yield disappointment.
The system offered up interesting summary stats about where signatures were coming from and what activity on the petition looked like over time.
When I had to contact MoveOn's petition support (one of the signers had accidentally left out a word in a comment that significantly changed the meaning, and wanted it corrected) they were fast to respond and provided a quick solution.
Other features in the petition tool, like handling delivery via print and email, contacting petition signers, "declaring victory," and more seemed really well designed; simple, effective, built for bringing about real action.
One of the things I wanted to do as the signature numbers climbed and as I prepared to present the petition to Council was create something that visualized the signing names in one place. The signature count on the petition itself was not super prominent, and in only displaying 10 names at a time it was easy to miss out on the sense of a large part of the local population making a clear statement about what they want.
So I sniffed the XMLHttpRequests being made by the MoveOn site and found the underlying API that was being used to load the signature names. I whipped up a simple PHP script that queries that API to fetch all the names, and then does some basic cleanup of the list: leaving out anything that doesn't look like a full name, making capitalization and spacing consistent, sorting, etc.
I published the tool online at https://github.com/ChrisHardie/moveon-petition-tools in case anyone else might find it useful. (I later learned that MoveOn makes a CSV export of signatures and comments available when you go to print your petition, so that's an option too.)
Using the output of my tool, I created this simple graphic that shows all of the signed names to date:
All in all the MoveOn petition platform has been great, and I think it's made a difference just the way I wanted it to. I highly recommend it.
I mentioned earlier that I'm using a personal webhook server as glue between various online tools and services, and this post expands on that.
Why It Matters
For the open web to work well enough to bring people back to it, information has to be able to come and go freely between different systems without relying on centrally controlled hubs.
Just like RSS feeds, REST APIs and other tools that allow software programs to talk to each other across disparate systems, webhooks are an important part of that mix. Whereas many of those tools are "pull" technologies - one system goes out and makes a request of another system to see what's been added or changed - webhooks are a "push" technology, where one system proactively notifies another about a change or event that's occurred.
You might be using webhooks right now! Popular services like IFTTT and Zapier are built in part on webhooks in order to make certain things happen over there when something else has happened over here. "Flash my lights when my toaster is done!" And while I have great respect for what IFTTT and Zapier do and how they do it, I get a little uncomfortable when I see (A) so much of the automated, programmable web flowing through a small number of commercially oriented services, and (B) software and service owners building private API and webhook access for IFTTT and Zapier that they don't make available to the rest of us:
Dear app/tool/service developers: if you're going to offer "integrations" with folks like IFTTT, Zapier PLEASE also take the time to directly support the open standards those connections are built on (e.g. REST APIs, webhooks, RSS, CalDav, CardDav, etc).
So I decided that just as I host RSS feeds and APIs for many of the things I build and share online (mostly through WordPress, since it makes that so easy), I also wanted to have the ability to host my own webhook endpoints.
How I Set It Up
I used the free and open source webhook server software. (It's unfortunate that they chose the same name as the accepted term for the underlying technical concept, so just know that this isn't the only way to run a webhook server.) I didn't have a Go environment set up on the Ubuntu server I wanted to use, so I installed the precompiled version from their release directory.
I created a very simple webhook endpoint in my initial hooks.json configuration file that would tell me the server was working at all:
This establishes a proxy setup where nginx passes requests for my public-facing server on to the internally running webhook server. It also limits them to a certain number of requests per second so that some poorly configured webhook client can't hammer my webhook server too hard.
I generated a Let's Encrypt SSL certificate for my webhook server so that all of my webhook traffic will be encrypted in transit. Then I added the webhook startup script to my server's boot time startup by creating /etc/init/webhook.conf:
description "Webhook Server"
author "Chris Hardie"
start on runlevel 
stop on runlevel [!2345]
normal exit 0 TERM
kill signal KILL
kill timeout 5
exec /usr/local/bin/webhook -hooks /path/to/hooks.json -verbose -ip 127.0.0.1 -hotreload
The hotreload parameter tells the webhook server to monitor for and load any changes to the hooks.json file right away, instead of waiting for you to restart/reload the server. Just make sure you're confident in your config file syntax. 🙂
After that, service start webhook will get things up and running.
I strongly recommend using a calling query string parameter that acts as a "secret key" before allowing any service to call one of your webhook endpoints. I also recommend setting up your nginx server to pass along the calling IP address of the webhook clients and then match against a whitelist of allowed callers in your hook config. These steps will help make sure your webhook server is secure.
Finally, I suggest setting up some kind of external monitoring for your webhook server, especially if you start to depend on it for important actions or notifications. I use an Uptimerobot check that ensures my testing endpoint above returns the "webhook is running" string that it expects, and alerts me if it doesn't.
If you don't want to go to the trouble of setting up and hosting your own webhook server, you might look at Hookdoo, which hosts the webhook endpoints for you and then still allows you to script custom actions that result when a webhook is called. I haven't used it myself but it's built by the same folks who released the above webhook server software.
How I Use It
So what webhook endpoints am I using?
My favorite and most frequently used is a webhook that is called by BitBucket when I merge changes to the master branch of one of the code repositories powering various websites and utilities I maintain. When the webhook is called it executes a script locally that then essentially runs a git pull on that repo into the right place, and then pings a private Slack channel to let me know the deployment was successful. This is much faster than manually logging into my webserver to pull down changes or doing silly things like SFTPing files around. This covers a lot of the functionality of paid services like DeployBot or DeployHQ, though obviously those tools offer a lot more bells and whistles for the average use case.
I use a version of this webhook app for SmartThings to send event information from my various connected devices and monitors at home to a database where I can run SQL queries on them. It's fun to ask your house what it's been up to in this way.
For the connected vehicle monitoring adapter I use, they have a webhook option that sends events about my driving activity and car trips to my webhook server. I also store this in a database for later querying and reporting. I have plans to extend this to create or modify items on my to-do list based on locations I've visited; e.g. if I go to the post office to pick up the mail, check off my reminder to go to the post office.
I've asked the Fastmail folks to add in support for a generic webhook endpoint as a notification target in their custom mail processing rules; right now they only support IFTTT. There are lots of neat possibilities for "when I receive an email like this, do this other thing with my to-do list, calendar, smart home or website."
I'm not expecting many folks to go out and set up a webhook server; there's probably a pretty small subset of the population that will find this useful, let alone act on it. But I'm glad it's an option for someone to glue things together like this if they want to, and we need to make sure that software makers and service providers continue to incorporate support for webhooks and other technologies of an open, connected web as they go.
It's been about a year since I left Facebook, and I'm still glad I did. (I guess there were those thirty years before Facebook existed that I somehow managed without it, too.)
People in my circles generally continue to assume that I've seen their event invitations and life updates on Facebook, and so it's still a regular occurrence that I find out about something well after everyone else, or not at all. This is most annoying when it's something really time sensitive that I would have liked to have been a part of.
Some of my published writings have been shared extensively on Facebook, generating hundreds or even thousands of views on my various websites, but I don't have a way of knowing where that activity is coming from or what kind of conversation it might be generating there. I've had people tell me in person that they saw and liked something via Facebook, which is nice, but of course I wish they'd leave their likes and comments on my site where it's closer to the original writing, visible to the world, and not subject to later deletion by some corporate entity. (This comes up for any social network, not just Facebook, but it tends to be the one generating the most traffic for me.)
I won't make a claim that the hours I've saved by not looking at Facebook have freed me up to accomplish some amazing other thing. I will say that I felt a nice release from the self-created pressure to keep up with my interactions and profile there, and that in turn has contributed to an increase in my overall creative energy for other things.
I had one time where I needed to use the Facebook sharing debugger for a work project. I signed up for a new account to do this, but Facebook clearly found my lack of interest in populating a real-looking profile to be suspicious, and closed down the account soon after. In the end it was faster to ask a colleague with an active account to do the debugging for me and share the results. As I've said before, I think it's ridiculous and irresponsible that Facebook doesn't make that tool available to logged-out users.
I'm still surprised at how many organizations and businesses use Facebook as their one and only place for posting content; some even do it in a way that I just can't see it as a logged-out user, and others don't seem to realize that they're giving Facebook 80% of any screen real estate on the links I can see. I am now much more likely to avoid doing business with or offering my support to these entities if they don't bother offering non-Facebook ways for me to engage.
I've accepted that people will not necessarily seek out the open version of the web on their own. Being off Facebook has reinforced that there are big gaps to close in the user experiences that other tools and services offer (the WordPress/blogging ecosystem not least among them). My own efforts to migrate my content that still exists on other services like Flickr into a digital home that I fully control are slow-going, so I don't expect other people to even bother. Facebook is still the path of least resistance for now.
When the actions of Cambridge Analytica were in the news, it was tempting to feel smug about not being an active Facebook user. But I know they still have tons of information about me that is of value to advertisers and others, and that even as I use browser plugins to try to prevent Facebook from accumulating an even larger profile of my online activity, it is a losing battle until there are larger shifts in the culture and business models of technology companies.