Unlocking community event information from Facebook

It wasn't always this way.

It might be hard to remember, but there was a time when Facebook wasn't the only place to learn about upcoming events in our communities. Instead of having to scroll past silly memes and political rants to get the details of the next potluck, lecture or book club you cared about, the information was available in lots of other places. Events were available on websites and apps, and they were shared via constantly updated feeds that you could even integrate directly into your own personal calendaring system.

You could learn about what was happening down the street or across town, and you didn't have to give up your online privacy to do it.

At some point, this shifted. Weary of the duplication of efforts maintaining event information in multiple places, and increasing fragmentation of sources that you needed to consult to figure out what's happening, people longed for a more centralized, authoritative spot to enter and learn about community events. I was one of those people.

Enter Facebook

Facebook saw this opportunity and jumped on it.

They made it easy to enter and promote events, and they sprinkled in social features to make it extra compelling to do. Not only could you learn about the event itself, you could see who among your friends was planning on (or interested in) going, and what they had to say about it. If you were an event organizer you could see what kinds of events were catching the interest of your target audience, and you could more easily avoid scheduling your big fundraising gala on the same night as the art museum open house.

At first, Facebook made access to event data one of the most friction-free parts of its platform, and they didn't require you to be "on Facebook" to see it. You could get email notifications, RSS feeds, ICS feeds to bring event information into your own productivity tools and daily life. There was API access so that you could build tools and websites that incorporated the event data where you wanted it to be.

Over time, all of that access was removed. The APIs were shut off, the feeds were shut down, and the message was clear: you have to come to the Facebook website or app to learn about events that you might want to attend.

It's nothing new to note that Facebook's user interface and business model are built around keeping people inside its walled garden. Whatever you might want to contribute to or get from your Facebook experience, they want you to do it on their website or in their mobile app, and on their terms. Their ability to sell advertising space depends on it.

But as with many other aspects of Facebook culture and its grip on our personal and community data, there's a significant downside. Facebook's decision to lock up event information has real implications for how we encounter and experience public life in our real-world communities.

Why is this a problem?

It means that like our exposure to news or updates from our online friends, our awareness of community events is driven by a black-box algorithm optimized around profit over everything else. It means that one company's shifting views on what constitutes an acceptable event, and its sensitivity to the interests of paying advertisers and political organizations could determine whether we see the details of an upcoming protest, demonstration or other exercising of free speech.

And it means that anyone concerned about the privacy implications of having their interest in a given event tracked, sold and monetized may have less exposure to the events that may have traditionally shaped and defined public life in their community, and private life in their circles of friends and neighbors.

There are practical implications too. An organization or business that still chooses to have its own website is having to enter and maintain event information in multiple places, which is time consuming and inefficient. People who want to avoid using Facebook are either left out of the loop or forced into using it again.

I went looking for solutions to this in the context of a website project I was working on recently, where the request was simple: can we bring our Facebook event data into our WordPress site? I had naively assumed that Facebook would have an interest in making this easy: if people could enter their events in one place and have them pushed out to everywhere that mattered, it would be so much easier to see them as the natural place to maintain that information.

But again, they don't make it easy. There's no API available to fetch event data, even for a Facebook page on which I'm an administrator. Event data is not displayed on public-facing Facebook.com pages in any kind of structured ways, and in fact it is rendered in ways that make it resistant to traditional "scraping" tools. There are no other user- or developer-friendly tools for working with event data that I can find.

Yes, Facebook does have a "page widget" that lets you display event information from a Facebook page elsewhere via embedding, if you are an admin on that page. The layout and customization options are pretty limited, but more importantly this is not the same thing as having access to the event information itself for importing, displaying, searching, archiving or other actions that someone might want to take.

Asking page owners to initiate displaying the event information elsewhere also eliminates the chance for other interested parties to "slice and dice" event data in potentially useful or interesting ways. If I want to create a website where you can search and display all of the animal adoption-related events in the Facebook facebook happening within 50 miles of my zip code, I can't. I can't do this even if all of the animal adoption agencies enter their events on Facebook, make them public and put page widgets up on their own website.

Can software fix this?

Being a software developer who works all day long on tools that try to make publishing easier and more interconnected, as I thought about this issue I still felt like there must be some way to extract event data from publicly shared Facebook events. After all, if you can see them in your web browser even while not logged in to Facebook, then that means the data is by definition publicly accessible in some form.

After analyzing the structure of a Facebook web page, its Javascript and the asynchronous HTTP calls made to fully render the content on it, I found that there is a way. Facebook's own event display pages are making a POST to the URL https://www.facebook.com/api/graphql/  to retrieve relevant event details, which are then rendered in HTML and CSS for a normal user to see. I sniffed the request and removed all of the query parameters I could find that might be extraneous. Here's an example of a resulting logged-out POST request that retrieves a batch of the upcoming events for a given Facebook page (my local Parks and Recreation Department):

POST /api/graphql/ HTTP/1.1
Host: www.facebook.com
Content-Type: application/x-www-form-urlencoded
Origin: https://www.facebook.com
User-Agent: PostmanRuntime/7.15.2
Accept: */*
Cache-Control: no-cache
Host: www.facebook.com
Accept-Encoding: gzip, deflate
Content-Length: 255
Connection: keep-alive
cache-control: no-cache

fb_api_req_friendly_name=PageEventsTabUpcomingEventsCardRendererQuery&variables=%7B%22pageID%22%3A%2250650939390%22%7D&__req=8&__user=0&av=0&__a=1&__be=1&dpr=2&fb_api_caller_class=RelayModern&__pc=PHASED%3ADEFAULT&__comet_req=false&doc_id=2343886202319301

I can't tell you what all of the query parameters mean but I think doc_id is the Facebook-internal indicator of a shorthand for which query to run (so apparently "2343886202319301" maps to "all upcoming events" or something like that) and pageID is the internal ID of the Facebook page itself. I can imagine this query breaking later on if Facebook decided to update the doc_id mappings, but for the time being, it works.

The result of the request is a JSON object that contains all the upcoming event info one would need to import Facebook events into another system, display them on a non-Facebook website, and so on.

From this point, I could imagine creating a tool that, given a list of Facebook pages, automatically and regularly grabs all of the upcoming public events available in Facebook and does something useful with them. Here's a proof of concept PHP script that lays the groundwork for that.

Am I going to get in trouble for this?

Does Facebook care if we access event data in this way? Yes, yes they do. Their terms of service about Automated Data Collection explicitly states that if you try to extract data from the Facebook platform in an automated way, they can ban you forever. But more likely than anything is that they will change the way their platform works to make this kind of data retrieval even more difficult.

I think there's hope for some shifting expectations about what's considered fair here. A U.S. appeals court ruled just last month that web scraping does not constitute illegal activity, and went so far as to acknowledge that if a scraper is retrieving publicly available data owned not by the platform but by its users, the scraping can't be blocked.

In the case of Facebook event data, it's worth noting again that we're talking about publicly available information shared by organizations that want it to get more exposure online about events that they want the public to attend. Restricting access to that because it's a potential source of advertising revenue seems beyond greedy to me. In the end though, it's up to organizations, businesses and individuals to decide whether they want to have their event data locked up, or out on the open web.

You can help

If you are an event organizer, please consider posting your event data outside of Facebook on a publicly available website!

If you are someone who cares about building a healthy culture of civic engagement in your community, advocate for the organizations you're involved with to move away from tools that make this harder in the long term!

If you are a developer at Facebook or anywhere else that builds tools for people to share information, please don't lock up that information! Show your commitment to the open web. Provide APIs, RSS (or in this case, ICS) feeds, good documentation and facilitate easy exports of user-owned data. (A quick shout-out to the folks at Eventbrite who, at least for now, make available for free a very robust API to access community events shared on their platform.)

I'm glad for any feedback and suggestions about these challenges; please comment below. I've also explored the themes discussed here in various past posts, including:

 

Speaking at php[world]

I'll be presenting a session at php[world] in Washington, D.C. on October 23rd, "Tools and Tips for Gluing Together the Open Web." I get to combine a bunch of topics I'm interested in: the programming language that powers 83% of the web, tools and ideas for helping people own their online homes and content, the principles behind (and discussions happening around) strengthening the open web, the publishing platform that powers over a third of websites out there, and hacky little bits of "glue" software I've written to get data in or out of a given service.

I'm excited and it looks like a great conference. (My employer, Automattic, is an event sponsor.) If you're interested in PHP, software development and/or the open web and will be in the D.C. area then, I hope to see you there.

Moving photos from Flickr to WordPress

If you're ready to move your own Flickr photo collection to WordPress and feel comfortable on the command line, you can go straight to the Flickr to WordPress tool I built and get started. Here's some backstory:

I used to love Flickr as a place to store photos, and as a community for sharing and discussing photography. But as its ownership changed hands and its future became at times uncertain, I grew reluctant to trust that it could continue to be a permanent home for my own photos. My discomfort increased as I have become more engaged with the need to have full ownership over the things I create online.

So, I set out to migrate my 3.6GB collection of 2,481 Flickr photos, along with their tags, comments and other metadata, into a new home while I still could.

Continue reading "Moving photos from Flickr to WordPress"

Gluing the web together with a personal webhook server

I mentioned earlier that I'm using a personal webhook server as glue between various online tools and services, and this post expands on that.

Why It Matters

For the open web to work well enough to bring people back to it, information has to be able to come and go freely between different systems without relying on centrally controlled hubs.

Just like RSS feeds, REST APIs and other tools that allow software programs to talk to each other across disparate systems, webhooks are an important part of that mix. Whereas many of those tools are "pull" technologies - one system goes out and makes a request of another system to see what's been added or changed - webhooks are a "push" technology, where one system proactively notifies another about a change or event that's occurred.

You might be using webhooks right now! Popular services like IFTTT and Zapier are built in part on webhooks in order to make certain things happen over there when something else has happened over here. "Flash my lights when my toaster is done!" And while I have great respect for what IFTTT and Zapier do and how they do it, I get a little uncomfortable when I see (A) so much of the automated, programmable web flowing through a small number of commercially oriented services, and (B) software and service owners building private API and webhook access for IFTTT and Zapier that they don't make available to the rest of us:

So I decided that just as I host RSS feeds and APIs for many of the things I build and share online (mostly through WordPress, since it makes that so easy), I also wanted to have the ability to host my own webhook endpoints.

How I Set It Up

I used the free and open source webhook server software. (It's unfortunate that they chose the same name as the accepted term for the underlying technical concept, so just know that this isn't the only way to run a webhook server.) I didn't have a Go environment set up on the Ubuntu server I wanted to use, so I installed the precompiled version from their release directory.

I created a very simple webhook endpoint in my initial hooks.json configuration file that would tell me the server was working at all:

[
  {
    "id": "webhook-monitor",
    "execute-command": "/bin/true",
    "command-working-directory": "/var/tmp",
    "response-message": "webhook is running",
  }
]

Then I started up the server to test it out:

/usr/local/bin/webhook -hooks /path/to/hooks.json -verbose -ip 127.0.0.1

This tells the webhook server to listen on the server's localhost network interface, on port 9000. When I ping that endpoint I should see a successful result:

$ curl -I http://localhost:9000/hooks/webhook-monitor
HTTP/1.1 200 OK
...

From there I set up an nginx configuration to make the server available on the Internet. Here are some snippets from that config:

upstream webhook_server {
    server 127.0.0.1:9000;
    keepalive 300;
}

limit_req_zone $request_uri zone=webhooklimit:10m rate=1r/s;

server {
    ...

       location /hooks/ {
           proxy_pass http://webhook_server;
           ...

        limit_req zone=webhooklimit burst=20;
    }

    ...
}

This establishes a proxy setup where nginx passes requests for my public-facing server on to the internally running webhook server. It also limits them to a certain number of requests per second so that some poorly configured webhook client can't hammer my webhook server too hard.

I generated a Let's Encrypt SSL certificate for my webhook server so that all of my webhook traffic will be encrypted in transit. Then I added the webhook startup script to my server's boot time startup by creating /etc/init/webhook.conf:

description "Webhook Server"
author "Chris Hardie"

start on runlevel [2345]
stop on runlevel [!2345]

setuid myuser

console log

normal exit 0 TERM

kill signal KILL
kill timeout 5

exec /usr/local/bin/webhook -hooks /path/to/hooks.json -verbose -ip 127.0.0.1 -hotreload

The hotreload parameter tells the webhook server to monitor for and load any changes to the hooks.json file right away, instead of waiting for you to restart/reload the server. Just make sure you're confident in your config file syntax. 🙂

After that, service start webhook will get things up and running.

To add new webhook endpoints, you just add on to the JSON configuration file. There are some good starter examples in the software documentation.

I strongly recommend using a calling query string parameter that acts as a "secret key" before allowing any service to call one of your webhook endpoints. I also recommend setting up your nginx server to pass along the calling IP address of the webhook clients and then match against a whitelist of allowed callers in your hook config. These steps will help make sure your webhook server is secure.

Finally, I suggest setting up some kind of external monitoring for your webhook server, especially if you start to depend on it for important actions or notifications. I use an Uptimerobot check that ensures my testing endpoint above returns the "webhook is running" string that it expects, and alerts me if it doesn't.

If you don't want to go to the trouble of setting up and hosting your own webhook server, you might look at Hookdoo, which hosts the webhook endpoints for you and then still allows you to script custom actions that result when a webhook is called. I haven't used it myself but it's built by the same folks who released the above webhook server software.

How I Use It

So what webhook endpoints am I using?

My favorite and most frequently used is a webhook that is called by BitBucket when I merge changes to the master branch of one of the code repositories powering various websites and utilities I maintain. When the webhook is called it executes a script locally that then essentially runs a git pull on that repo into the right place, and then pings a private Slack channel to let me know the deployment was successful. This is much faster than manually logging into my webserver to pull down changes or doing silly things like SFTPing files around. This covers a lot of the functionality of paid services like DeployBot or DeployHQ, though obviously those tools offer a lot more bells and whistles for the average use case.

I use a version of this webhook app for SmartThings to send event information from my various connected devices and monitors at home to a database where I can run SQL queries on them. It's fun to ask your house what it's been up to in this way.

For the connected vehicle monitoring adapter I use, they have a webhook option that sends events about my driving activity and car trips to my webhook server. I also store this in a database for later querying and reporting. I have plans to extend this to create or modify items on my to-do list based on locations I've visited; e.g. if I go to the post office to pick up the mail, check off my reminder to go to the post office.

I've asked the Fastmail folks to add in support for a generic webhook endpoint as a notification target in their custom mail processing rules; right now they only support IFTTT. There are lots of neat possibilities for "when I receive an email like this, do this other thing with my to-do list, calendar, smart home or website."

Speaking of IFTTT, they do have a pretty neat recipe component that will let you call webhooks based on other triggers they support. It's a very thoughtful addition to their lineup.

Conclusion

I'm not expecting many folks to go out and set up a webhook server; there's probably a pretty small subset of the population that will find this useful, let alone act on it. But I'm glad it's an option for someone to glue things together like this if they want to, and we need to make sure that software makers and service providers continue to incorporate support for webhooks and other technologies of an open, connected web as they go.

A year without Facebook

It's been about a year since I left Facebook, and I'm still glad I did. (I guess there were those thirty years before Facebook existed that I somehow managed without it, too.)

Some observations:

People in my circles generally continue to assume that I've seen their event invitations and life updates on Facebook, and so it's still a regular occurrence that I find out about something well after everyone else, or not at all. This is most annoying when it's something really time sensitive that I would have liked to have been a part of.

Some of my published writings have been shared extensively on Facebook, generating hundreds or even thousands of views on my various websites, but I don't have a way of knowing where that activity is coming from or what kind of conversation it might be generating there. I've had people tell me in person that they saw and liked something via Facebook, which is nice, but of course I wish they'd leave their likes and comments on my site where it's closer to the original writing, visible to the world, and not subject to later deletion by some corporate entity. (This comes up for any social network, not just Facebook, but it tends to be the one generating the most traffic for me.)

I won't make a claim that the hours I've saved by not looking at Facebook have freed me up to accomplish some amazing other thing. I will say that I felt a nice release from the self-created pressure to keep up with my interactions and profile there, and that in turn has contributed to an increase in my overall creative energy for other things.

I had one time where I needed to use the Facebook sharing debugger for a work project. I signed up for a new account to do this, but Facebook clearly found my lack of interest in populating a real-looking profile to be suspicious, and closed down the account soon after. In the end it was faster to ask a colleague with an active account to do the debugging for me and share the results. As I've said before, I think it's ridiculous and irresponsible that Facebook doesn't make that tool available to logged-out users.

I'm still surprised at how many organizations and businesses use Facebook as their one and only place for posting content; some even do it in a way that I just can't see it as a logged-out user, and others don't seem to realize that they're giving Facebook 80% of any screen real estate on the links I can see. I am now much more likely to avoid doing business with or offering my support to these entities if they don't bother offering non-Facebook ways for me to engage.

I've accepted that people will not necessarily seek out the open version of the web on their own. Being off Facebook has reinforced that there are big gaps to close in the user experiences that other tools and services offer (the WordPress/blogging ecosystem not least among them). My own efforts to migrate my content that still exists on other services like Flickr into a digital home that I fully control are slow-going, so I don't expect other people to even bother. Facebook is still the path of least resistance for now.

When the actions of Cambridge Analytica were in the news, it was tempting to feel smug about not being an active Facebook user. But I know they still have tons of information about me that is of value to advertisers and others, and that even as I use browser plugins to try to prevent Facebook from accumulating an even larger profile of my online activity, it is a losing battle until there are larger shifts in the culture and business models of technology companies.

Scoring sites on their commitment to the open web?

A month ago in a tweet related to my post about bringing people back to the open web, I casually proposed a resource that would score tools, services and other websites on their commitment to being a part of the open web. I'm back to flesh that idea out a little more.

Crude mockup of a score badge

I'm imagining a simple site that displays a score or grade for each major user-facing tool or service on the web.

The score would help users of the site know at a glance what to expect from the service when it comes to the practices and mechanics of maintaining openness on the web. A badge with the score on it could be voluntarily displayed by the sites themselves, or the score could be incorporated into a browser extension and similar tools that give visibility to the information as users explore the web.

If a site has a high score, users could confidently invest time and energy in it knowing that they'd benefit from clear ownership of their data, easy interoperability with other tools, and no proprietary lock-in. If a site has a low score, users would know that they are entering a walled garden where their data and access to it is the product.

The score or grade would be based on some easily digestible criteria. In my initial proposal these would look at the robustness of the site's API offering, the availability of standard feed options, the usefulness of export tools, the focus on user empowerment, and the level of transparency about how the service works and makes use of user data:

Continue reading "Scoring sites on their commitment to the open web?"