Slides and links from my php[world] talk

Yesterday I gave a talk at php[world] 2019 about "Tips and Tools for Gluing Together the Open Web." Below are the slides, text and links from my talk. Where applicable, the slide images link to relevant websites and code.

A few months ago, I was helping a local non-profit organization with their new website. We were almost set to launch, but they had one more request: can we get the next five upcoming events we've put up on our organization's Facebook page to show up in a list in the website's footer? I initially thought that surely there would be a simple way to do this, but as I looked in to it, the short answer was no, no you can't.

There's an embeddable Facebook widget that can display events, but you don't get a lot of control over appearance and it's far from a helpful list that can be scanned quickly. This seemed like such a basic request - connecting Facebook page events to a website - and yet as I poured through Google and Stack Overflow search results all I could find were frustrated users and abandoned tools that had tried and failed to do this one thing.

Continue reading "Slides and links from my php[world] talk"

Unlocking community event information from Facebook

It wasn't always this way.

It might be hard to remember, but there was a time when Facebook wasn't the only place to learn about upcoming events in our communities. Instead of having to scroll past silly memes and political rants to get the details of the next potluck, lecture or book club you cared about, the information was available in lots of other places. Events were available on websites and apps, and they were shared via constantly updated feeds that you could even integrate directly into your own personal calendaring system.

You could learn about what was happening down the street or across town, and you didn't have to give up your online privacy to do it.

At some point, this shifted. Weary of the duplication of efforts maintaining event information in multiple places, and increasing fragmentation of sources that you needed to consult to figure out what's happening, people longed for a more centralized, authoritative spot to enter and learn about community events. I was one of those people.

Enter Facebook

Facebook saw this opportunity and jumped on it.

They made it easy to enter and promote events, and they sprinkled in social features to make it extra compelling to do. Not only could you learn about the event itself, you could see who among your friends was planning on (or interested in) going, and what they had to say about it. If you were an event organizer you could see what kinds of events were catching the interest of your target audience, and you could more easily avoid scheduling your big fundraising gala on the same night as the art museum open house.

At first, Facebook made access to event data one of the most friction-free parts of its platform, and they didn't require you to be "on Facebook" to see it. You could get email notifications, RSS feeds, ICS feeds to bring event information into your own productivity tools and daily life. There was API access so that you could build tools and websites that incorporated the event data where you wanted it to be.

Over time, all of that access was removed. The APIs were shut off, the feeds were shut down, and the message was clear: you have to come to the Facebook website or app to learn about events that you might want to attend.

It's nothing new to note that Facebook's user interface and business model are built around keeping people inside its walled garden. Whatever you might want to contribute to or get from your Facebook experience, they want you to do it on their website or in their mobile app, and on their terms. Their ability to sell advertising space depends on it.

But as with many other aspects of Facebook culture and its grip on our personal and community data, there's a significant downside. Facebook's decision to lock up event information has real implications for how we encounter and experience public life in our real-world communities.

Why is this a problem?

It means that like our exposure to news or updates from our online friends, our awareness of community events is driven by a black-box algorithm optimized around profit over everything else. It means that one company's shifting views on what constitutes an acceptable event, and its sensitivity to the interests of paying advertisers and political organizations could determine whether we see the details of an upcoming protest, demonstration or other exercising of free speech.

And it means that anyone concerned about the privacy implications of having their interest in a given event tracked, sold and monetized may have less exposure to the events that may have traditionally shaped and defined public life in their community, and private life in their circles of friends and neighbors.

There are practical implications too. An organization or business that still chooses to have its own website is having to enter and maintain event information in multiple places, which is time consuming and inefficient. People who want to avoid using Facebook are either left out of the loop or forced into using it again.

I went looking for solutions to this in the context of a website project I was working on recently, where the request was simple: can we bring our Facebook event data into our WordPress site? I had naively assumed that Facebook would have an interest in making this easy: if people could enter their events in one place and have them pushed out to everywhere that mattered, it would be so much easier to see them as the natural place to maintain that information.

But again, they don't make it easy. There's no API available to fetch event data, even for a Facebook page on which I'm an administrator. Event data is not displayed on public-facing Facebook.com pages in any kind of structured ways, and in fact it is rendered in ways that make it resistant to traditional "scraping" tools. There are no other user- or developer-friendly tools for working with event data that I can find.

Yes, Facebook does have a "page widget" that lets you display event information from a Facebook page elsewhere via embedding, if you are an admin on that page. The layout and customization options are pretty limited, but more importantly this is not the same thing as having access to the event information itself for importing, displaying, searching, archiving or other actions that someone might want to take.

Asking page owners to initiate displaying the event information elsewhere also eliminates the chance for other interested parties to "slice and dice" event data in potentially useful or interesting ways. If I want to create a website where you can search and display all of the animal adoption-related events in the Facebook facebook happening within 50 miles of my zip code, I can't. I can't do this even if all of the animal adoption agencies enter their events on Facebook, make them public and put page widgets up on their own website.

Can software fix this?

Being a software developer who works all day long on tools that try to make publishing easier and more interconnected, as I thought about this issue I still felt like there must be some way to extract event data from publicly shared Facebook events. After all, if you can see them in your web browser even while not logged in to Facebook, then that means the data is by definition publicly accessible in some form.

After analyzing the structure of a Facebook web page, its Javascript and the asynchronous HTTP calls made to fully render the content on it, I found that there is a way. Facebook's own event display pages are making a POST to the URL https://www.facebook.com/api/graphql/  to retrieve relevant event details, which are then rendered in HTML and CSS for a normal user to see. I sniffed the request and removed all of the query parameters I could find that might be extraneous. Here's an example of a resulting logged-out POST request that retrieves a batch of the upcoming events for a given Facebook page (my local Parks and Recreation Department):

POST /api/graphql/ HTTP/1.1
Host: www.facebook.com
Content-Type: application/x-www-form-urlencoded
Origin: https://www.facebook.com
User-Agent: PostmanRuntime/7.15.2
Accept: */*
Cache-Control: no-cache
Host: www.facebook.com
Accept-Encoding: gzip, deflate
Content-Length: 255
Connection: keep-alive
cache-control: no-cache

fb_api_req_friendly_name=PageEventsTabUpcomingEventsCardRendererQuery&variables=%7B%22pageID%22%3A%2250650939390%22%7D&__req=8&__user=0&av=0&__a=1&__be=1&dpr=2&fb_api_caller_class=RelayModern&__pc=PHASED%3ADEFAULT&__comet_req=false&doc_id=2343886202319301

I can't tell you what all of the query parameters mean but I think doc_id is the Facebook-internal indicator of a shorthand for which query to run (so apparently "2343886202319301" maps to "all upcoming events" or something like that) and pageID is the internal ID of the Facebook page itself. I can imagine this query breaking later on if Facebook decided to update the doc_id mappings, but for the time being, it works.

The result of the request is a JSON object that contains all the upcoming event info one would need to import Facebook events into another system, display them on a non-Facebook website, and so on.

From this point, I could imagine creating a tool that, given a list of Facebook pages, automatically and regularly grabs all of the upcoming public events available in Facebook and does something useful with them. Here's a proof of concept PHP script that lays the groundwork for that.


<?php
/**
* Proof of concept, retrieve publicly-available Facebook page event data.
* Chris Hardie <chris@chrishardie.com>
*
* To use, first add Guzzle as a dependency:
* $ commposer require guzzlehttp/guzzle
*
*/
require 'vendor/autoload.php';
use GuzzleHttp\Client;
define( 'FB_UPCOMING_EVENT_DOC_ID', 2343886202319301 );
$facebook_pages = array(
'50650939390' => array(
'page_name' => 'Richmond Parks Department',
'page_url' => 'https://www.facebook.com/richmondparks/',
'category' => 'outdoor',
),
);
$facebook_base_params = array(
'fb_api_req_friendly_name' => 'PageEventsTabUpcomingEventsCardRendererQuery',
'__req' => '8',
'__user' => '0',
'av' => '0',
'__a' => '1',
'__be' => '1',
'dpr' => '2',
'fb_api_caller_class' => 'RelayModern',
'__pc:PHASED' => 'DEFAULT',
'__comet_req' => 'false',
);
$client = new Client(
[
// Base URI is used with relative requests
'base_uri' => 'https://www.facebook.com',
// You can set any number of default request options.
'timeout' => 2.0,
]
);
$headers = [
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
'Content-Type' => 'application/x-www-form-urlencoded',
'Origin' => 'https://www.facebook.com',
];
foreach ( $facebook_pages as $facebook_page_id => $facebook_page ) {
$facebook_variable_value = '{"pageID":"' . $facebook_page_id . '"}';
$facebook_form_params = array_merge(
$facebook_base_params,
array(
'variables' => $facebook_variable_value,
'doc_id' => FB_UPCOMING_EVENT_DOC_ID,
)
);
$upcoming_event_response = $client->request(
'POST',
'/api/graphql/',
[
'headers' => $headers,
'form_params' => $facebook_form_params,
]
);
$response_code = $upcoming_event_response->getStatusCode();
if ( ! empty( $response_code ) && ( 200 === $response_code ) ) {
$body = $upcoming_event_response->getBody();
if ( null !== $body ) {
$fb_response = json_decode( $body, true );
$fb_events = $fb_response['data']['page']['upcoming_events']['edges'];
}
}
if ( ! empty( $fb_events ) && ( 0 < count( $fb_events ) ) ) {
foreach ( $fb_events as $fb_event ) {
$event = $fb_event['node'];
echo $event['id'] . PHP_EOL;
echo $event['name'] . PHP_EOL;
echo $event['time_range']['start'] . PHP_EOL;
echo $event['event_place']['contextual_name'] . PHP_EOL;
echo $event['event_place']['city']['contextual_name'] . PHP_EOL;
echo $event['event_buy_ticket_url'] . PHP_EOL;
echo PHP_EOL;
}
}
}
exit;

Am I going to get in trouble for this?

Does Facebook care if we access event data in this way? Yes, yes they do. Their terms of service about Automated Data Collection explicitly states that if you try to extract data from the Facebook platform in an automated way, they can ban you forever. But more likely than anything is that they will change the way their platform works to make this kind of data retrieval even more difficult.

I think there's hope for some shifting expectations about what's considered fair here. A U.S. appeals court ruled just last month that web scraping does not constitute illegal activity, and went so far as to acknowledge that if a scraper is retrieving publicly available data owned not by the platform but by its users, the scraping can't be blocked.

In the case of Facebook event data, it's worth noting again that we're talking about publicly available information shared by organizations that want it to get more exposure online about events that they want the public to attend. Restricting access to that because it's a potential source of advertising revenue seems beyond greedy to me. In the end though, it's up to organizations, businesses and individuals to decide whether they want to have their event data locked up, or out on the open web.

You can help

If you are an event organizer, please consider posting your event data outside of Facebook on a publicly available website!

If you are someone who cares about building a healthy culture of civic engagement in your community, advocate for the organizations you're involved with to move away from tools that make this harder in the long term!

If you are a developer at Facebook or anywhere else that builds tools for people to share information, please don't lock up that information! Show your commitment to the open web. Provide APIs, RSS (or in this case, ICS) feeds, good documentation and facilitate easy exports of user-owned data. (A quick shout-out to the folks at Eventbrite who, at least for now, make available for free a very robust API to access community events shared on their platform.)

I'm glad for any feedback and suggestions about these challenges; please comment below. I've also explored the themes discussed here in various past posts, including: