Under the Hood: Approaches to Algorithmic Transparency | IGF 2023

8 Oct 2023 06:40h - 07:40h UTC

Knowledge Graph of Debate
Session report
Speakers

Disclaimer: It should be noted that the reporting, analysis and chatbot answers are generated automatically by DiploGPT from the official UN transcripts and, in case of just-in-time reporting, the audiovisual recordings on UN Web TV. The accuracy and completeness of the resources and results can therefore not be guaranteed.

Knowledge Graph of Debate

Session report

Full session report

Review and Edit: It’s important to scrutinise for grammatical inaccuracies, issues regarding sentence structure, typographical errors, or absent details. Any detected errors must be corrected. Ensure the text utilises UK spelling and grammar and rectify if this is not the case. The comprehensive summary should accurately mirror the main analysis text. Incorporate as many long-tail keywords in the summary as possible, whilst maintaining the summary’s quality.

Zoe Darme

The panel provided an in-depth examination of Google’s inner workings, the mechanics of its search algorithms, and the subtleties of the user experience. A noteworthy aspect was the insightful comparison of the search operation to a vending machine. This analogy aptly described each stage; ‘crawling’ – identifying various ‘drink’ options or webpages, and ‘indexing’ and ‘serving’ – organising and retrieval of these options.

This commentary then emphasised the importance of ‘algorithmic transparency’. It highlighted the necessity for visibility and understanding of how algorithms operate, the inherent data bias, and how results are generated and output, thereby indicating a push for increased openness in these processes.

The discussion delved into detail on the subject of search personalisation. The difference between a personalised search, influenced by past usage and habits, contrasted with a generic location-based result, which isn’t adjusted to individual’s tastes. This led to intriguing questions about Google’s transparency, given its personalisation feature doesn’t clarify why particular results are prioritised. Despite these concerns, Google’s Zoe Darme suggested that personalisation, when based on a user’s historic activity and personal preferences, could significantly enhance search result quality.

‘Search Quality Raters’ were highlighted in the panel. The revelation that Google applies hundreds of algorithms for evaluating webpage quality was emphasised. Worries were voiced about the deterioration of web content quality due to a trend named ‘SEOification’. This phenomenon implies a considerable shift towards manipulating search engine algorithms, often at the expense of content authenticity and originality.

A notable observation was the apparent movement of the internet’s content ecosystem from open-web platforms to closed environments – referred to as ‘walled gardens’. This trend seems to have instigated a decrease in open web content creation, leading to an interesting proposition – potentially incentivising content creation on the open web to preserve a diversely vibrant internet ecosystem.

Considerable attention was devoted to Google’s digital advertising practices. While it’s clear that there is a limit on the number of ads displayed at the top of Google search results, this limit isn’t explicitly defined. Commercial searches, such as those related to shopping or making bookings, were observed to have a larger volume of ads.

Finally, the utility and limitations of the incognito mode were analysed. It clarified several misunderstandings. Whilst Google does maintain awareness of a user’s location and search time conducted in incognito mode, it does not access the user’s search history in the same mode. However, users retain the ability to manage their personalisation settings independently of using incognito mode. This interpretation emphasises the nuanced control Google users have over personalisation and privacy.

Farzaneh Badii

The dialogue under examination centres on the multi-faceted involvement of algorithms in internet governance, with particular emphasis on the operational management of search engines such as Google and their accountability levels. Crucially, the discussion highlights the wide array of algorithms deployed during each individualized search query, underscoring the extensive and complex nature of their application.

This segues to a robust call for enhanced transparency surrounding the utilization of these algorithms. The importance of this becomes apparent when contemplating the societal and regulatory drive to hold corporations like Google to a heightened level of accountability. It’s not merely about unveiling the concealed layers mining each inquiry, but also comprehending the ramifications of algorithmic operations in crafting public communication.

Moreover, the dialogue underscores a need for discussion of a more granular nature. Essentially, this means delving deeper into the specifics of how algorithms function and are employed, rather than a superficial overview, in order to promote fairness, justice, and innovation within the digital sector.

There is also an articulated need to solicit more feedback on the usefulness of the explained processes serving the industry and the public, raising several pertinent questions. For instance, how can this illustrative case study be utilised most effectively? What can be learnt from it? What additional information or tools are requisite? These open-ended inquiries underline the constant need for innovation and improvement in the internet infrastructure.

Despite delving into complex issues, the dialogue is deemed a beneficial exercise, proving advantageous in sparking conversations around accountability and transparency in the digital arena.

In relation to broader global implications, the conversation aligns with the ethos of several of the United Nations’ Sustainable Development Goals (SDGs), notably SDG 9 which underscores the importance of Industry, Innovation and Infrastructure, and SDG 16 that advocates for promoting Peace, Justice and Strong Institutions. Both these goals, vast in their individual mandates, intersect at the necessity for transparent and accountable technological entities.

To sum up, the dialogue illuminates the pivotal role of algorithms in internet search queries, urges for heightened transparency concerning their operations, necessitates detailed, granular dialogues, and calls for more feedback on the efficacy of the explained processes. Above all, despite the nuanced topics, the discourse is regarded as an invaluable dialogue, contributing towards the realisation of key Sustainable Development Goals.

Audience

The discussions underscored ongoing concerns about the quality of web content, highly influenced by advertising strategies and manoeuvres aimed at exploiting Google’s search algorithm. These actions have reportedly led to misleading search results and a noticeable degradation in the quality of content available to users. Alongside this, there are prevalent business trends pushing for content creation within so-called ‘walled gardens’ – private platforms controlling access to content. This trend has incited apprehension about the sustainability of an open web environment, raising questions about the ethical stewardship of the information ecosystem.

In-depth dialogues surrounding the facet of personalisation in search results ensued, elucidating the difference between personalisation and customisation. Personalisation is a distinctive feature based on an individual user’s past searches, and the examples given highlighted how this leads to varied search results for individuals with different interests. However, Google needs to clarify how it communicates this personalisation process to its users. The delicate equilibrium between personalised and non-personalised search results influences user satisfaction and affects the overall grade of content.

Google’s authority over the quantity and positioning of sponsored adverts appearing ahead of the actual search results was analysed. Suspicions over Google potentially favouring commercial queries were sparked by an article by Charlie Wurzel, emphasising the need for greater transparency in this area. While the placement of adverts appears arbitrary, adverts often appear in response to queries where users demonstrate an intention to make a purchase or booking.

The discussion evolved to demonstrate how users could gauge Google’s search personalisation by comparing outcomes in Incognito mode versus normal browsing mode. While Incognito mode restricts Google’s access to a user’s search history, it still captures details such as location and time of the search. Interestingly, Google assures user control over personalisation settings, accessible with a simple click and ensuring secure management of personal settings.

A significant portion of the conversation focused on transparency in handling search queries and algorithms. Misconceptions about Google manipulating search queries were dispelled. Google’s issue of an extensive 160-page document on search quality rater guidelines was praised as a commendable move towards fostering transparency. However, demands for verifiable evidence, accountability and third-party audits of Google’s narratives emerged.

The potential efficiency of the Digital Services Act with its proposed audit mechanisms was seen as a forward stride to enhance transparency. However, doubts over the reliability of third-party assessments remain, along with issues related to the apt interpretation and utilisation of transparency information. A recurring sentiment was that transparency can only be realised through adequate funding and resources.

The recommendation to create a centralised Transparency Initiatives Portal for efficient access to all disclosures was regarded as a practical solution. This move would arguably benefit all parties involved in the comprehension and verification of data related to transparency. In sum, these discussions reflect the need for increased vigilance, clarity and public involvement in the control and management of online content, putting an emphasis on data privacy, fair business practices, transparency and user satisfaction.

Charles Bradley

Charles Bradley, renowned for his insightful commentary on diverse digital technologies, provides his perspectives on a number of significant issues. On the topic of personalisation in internet search, Bradley proposes an inclusive view, defining it as a system’s capability to deliver results tailored to a user’s pre-existing knowledge base. This approach implies that personalisation goes beyond simply catering to preferences, and instead, appreciates the user’s comprehension on a specific subject.

Moreover, Bradley underlines the importance of code audits, suggesting these security checks should ideally be performed by trusted third parties. The objective is to nurture stronger trust between technology companies and journalists, a relationship often strained due to contentious issues surrounding data privacy and source protection. However, Bradley acknowledges the challenge in this area due to the sparse pool of qualified personnel capable of conducting such intricate audits.

Remaining on the theme of accountability, Bradley emphasises the significance of external checks and measures for maintaining system accountability. Solely relying on self-assurances from tech giants, as exemplified by companies like Google, regularly falls short of providing adequate assurance or satisfaction to users. Here, Bradley questioned whether the Digital Services Act (DSA) could effectively accommodate the implementation of these external audits, displaying a cautious and investigative stance on the proposed legislation.

Additionally, Bradley exhibits a keen interest in integrating audience feedback into the information sphere about company activities. Audience feedback can proffer valuable insights for companies aiming to ascertain public sentiment or identify areas for improvement. Acknowledging the challenges of striking the appropriate balance in terms of information dissemination, Bradley underscores the necessity for transparency for industry stakeholders, government entities, and advocates. The struggle resides in soliciting information that companies may have previously been reticent to share, and ensuring that the initial company impressions coincide with stakeholder needs.

In conclusion, even though most of Bradley’s sentiments were neutral, his call for audience feedback was perceived as a positive endeavour towards enhancing transparency and improving stakeholder communication. This comprehensive analysis embodies Bradley’s profound understanding of the digital landscape, accentuating the intricacies of personalisation, the need for informed security measures, and the challenges in achieving transparency in an ever-evolving digital environment.

Session transcript

Zoe Darme:
I didn’t realize it was going to be an art class. Baseline then to have a deeper conversation about algorithmic transparency. We have another group. Okay. So what I see here from what algorithmic transparency means to people are a bunch of different responses. So making visible the way that algorithms work, knowing why a result is showing up when I think that provide an input. To me, algorithmic transparency means openly knowing the quality and biases of datasets that feed the algorithm, a way to understand why and how your experience on a service is influenced or displays or organized or shaped. To me, algorithmic transparency means being able to understand how the results are output from the algorithm. So lots of different things. Very hard to display on a screen about this big. So how do we actually do algorithmic transparency? On the other side here, we’ve got some amazing grade A art that I’m going to make NFTs out of, become very rich, quit Google. But I think Charles is going to try to show what some of you art majors drew. There are a lot of squiggly lines, but my particular favorite one here, I’ll just describe it to you, says user, arrow, query, arrow, magic with asterisks, arrow, and then results. And we do talk a lot about Google magic, but it shouldn’t be necessarily so mystifying that it feels like magic all the time. So hopefully maybe you guys can see some of this. I encourage you to come up afterwards to take a look at all of your contributions. So this is how search actually works. We drew it kind of like a vending machine. And so here, if you can think of you in front of the search results page, that’s like the vending machine. And then that is what we call front end, what the user sees and experiences. And then back behind, this is the Google magic. You’ve got natural language processing and query understanding. You’ve got crawling because you’ve got to find all those webpages. You’ve got data retrieval. Once you’ve crawled, how do you go get something from that index? And then you have all of the webpages, which here we are pretending to be soda cans. So let’s build together Gurgle, the search engine for the thirsty. And so we are going to use this lovely vending machine to really quickly go over how search actually works. So the first step when you’re building Gurgle is you have to find as many drinks as possible. And so to do that, you have to crawl. And so how would you find as many drink options as possible in the world? So returning to our trusty vending machine, this is where we are at the process. We used to have this as a spider web. And my very, very senior software engineer said, that’s gross, Zoe. You’re going to gross people out that they’re going to think spiders are in their soda. So now it’s a shopping cart. And if I were to ask you to find as many drink options as humanly possible, you might come up with 10, 20, 15. But in one day, we at Google are able to crawl millions and millions of pages. So group question. Anybody can shout out. No drawing involved. Just raise your hand. If you were going to start crawling and building your search engine, where would you start? How would you start crawling? Great. Past searches. Anybody else? Where would you start crawling the web? DNS. Okay. Anybody else? At the juice bar. Yeah.

Audience:
This is kind of content specific. So I’m probably trying to feed it something that was content specific. So maybe a magazine or something that was specifically about beverages. Because that’s what we’re looking at. Or maybe blogs or something about the topic.

Zoe Darme:
So those are all great answers. They’re all correct answers. And this was also sort of a trick question. Because you can start anywhere. With the web, once you get to a few links away from any other page, you’re basically able to get to other pages. And eventually you’re able to crawl and index a large proportion of the World Wide Web. All right. So next we are going to swiftly go through crawling and talk about indexing. So once we have all of our drink options, what do we do next? You have to organize all of them. Next one. Great. So now we have all of our cans. And we’re here in the data back ends. This is the inside of our vending machine. And it’s rows and columns of drinks organized perfectly. But they’re not organized when we get them, right? Because if we’re having a Pepsi and then the next person says, I like Coke. And then the next person says, oh, soda is bad for you. I only drink seltzer water from La Croix. Then eventually you’re going to get a collection of drinks, but they’re not going to be well organized. So indexing is what actually organizes the content that we found. And so I’ll skip over a lot of this because I think oh, actually. Sorry. So before we go on to serving, what do you think it’s important to put in your index for each drink? What would you want to know about the drink? Color? Nutritional value? Price? Is it artificial or is it organic? The result is not sponsored, Jim. It is organic. Anything else that you would want to know? Is it alcoholic? That’s a good one. Past recalls. Do I really want to drink this poison drink? Yeah. Great. So in this analogy, this is kind of what we’re doing with pages. We want to know what is the page about. We read the metadata, find out what the title of the page is. We kind of group it. Are these pages all about soda or are all these pages all about seltzer water? Or are all of these pages about alcoholic beverages of the type that hopefully we’re all going to enjoy later? Great. Great. Great. Amazing. Search engineer here. So we have how many reviews, how many people have been talking about this particular drink? Similarly, how many people are linking to it? That’s a signal that we can use, although that’s also a signal that can be gamed pretty easily through fake links. All right. So then let’s go on to serving. All right. So we’ve made it now to the part of the vending machine that provides the drink we asked for. So now that we have all of our drinks indexed, we can serve the user the results that they’d like based on their query. But the computer inside the machine is registering your input and helping you find what you want. But in a search engine, this looks a bit different. While a vending machine is responding to an alphanumeric code that matches exactly one drink, you want a Mountain Dew, I don’t know why. You want a Mountain Dew, you press the Mountain Dew button. But for a search engine, you have to take your query and match it with annotations of index pages. So is it about a drink that’s 24 ounces? Is it alcoholic or nonalcoholic? All those things that we mentioned before. And then the claw here is doing the data retrieval. It’s getting our drink and depositing it in the tray. So let’s take a specific query here. So this query is soda. And without any additional context, when someone puts in the word soda, what do you think that they would want? Kate wants a fizzy drink. What else? Okay. Nick is baking some delicious cake and wants baking soda. Anything else? Okay. Most people aren’t even looking for soda. They would call it pop or soft drink. Yeah. Yeah. So it really depends on where you are. So that is why we do use course and location, for example, to try to do that query understanding. So, for example, my favorite example of this is Super Trooper. Am I going to age myself? Please tell me somebody knows. Anybody? Super Trooper? Super Trooper? Okay. It’s an ABBA song. So most people searching for Super Trooper are looking for ABBA. But ABBA actually named their song after a type of light. So if you are a Broadway lighting designer, you may actually be looking for a specific type of stage light. And so synonyms are one of the things that are really hard for us and actually started before all of this AI, before Gen AI, before all of this, we really were starting with synonyms. And so if you have one person looking for pop and one person looking for baking soda, what are the type of signals that we should be using? You kind of got there already. How do you know if somebody wants a drink or baking soda? Location? Previous search. Somebody wants personalization. Time of day, also a good one. Yep. Habits. So what I’m hearing is that actually some personalization would be really useful in some cases to understand the difference between what somebody is looking for. But we actually don’t personalize a ton on the search results page at the moment. The amount of personalization changes over time. So what we try to do is actually use advancements in language understanding and query understanding. So the signals are the bits of information that help us understand its relevance to a query. And this can be about how recent a page is. So for a query like soda, that may not be as important. Another signal that we may use is the location of the word. Is the word soda the title of the page? That might make us think it’s more about soda. Is the word soda down there on the third paragraph penultimate sentence may not actually be as relevant. So those are the types of signals that we use. And they help us get as close to what we call user intent as possible. But we don’t actually have a set waiting system for queries. And I think this is what’s really hard when people say, I want algorithmic transparency from you, Google. You need to tell me the exact, you know, A, B, C, D, and E for how you’re returning results. Because it’s dynamic and it changes over time. So can anyone give me an example of where something like freshness might be important? A query where you might want something more. Game scores. Great example. If you are searching for… I’m trying to think of soccer teams to draw a connection with you all, but it’s eluding me. Arsenal versus Chelsea. Oh, thank God. Arsenal versus Chelsea. You want the game that was played last night and not the game that was played last week. What about you, Jim? You said time of day. What’s a type of query where time of day might be more important? Weather. Right. What’s the temperature going to be? Because if you’re searching in the morning, you might want to know earlier. Elections. For elections, you’d want something fresher, right? The elections that are happening sooner. Yeah, these are all great examples. Or the actual results. Great. So for newsy types of queries, freshness is more important. For other types of more stable queries, like hair braiding, I tried to… I failed, but I tried to braid my hair today. If you look that up, you may not need the latest hairstyle. And so freshness might not be as important. All right. So you guys gave us all sorts of great signals that we could use. And if we used only those, one thing that we would miss is spam. So the reason that these calls for algorithmic transparency get complicated really fast is because we are constantly in an adversarial space with about 40 billion pieces of spam a day. Next slide, please. Great. So I am almost done. So despite everything, search is a total work in process. We are always launching huge updates sometimes to core ranking, to small tune-ups, to make sure that we’re giving the right soda at the right time. All designed to make search work better so that we can find the most relevant, highest quality results possible. And the last thing I want to mention is page quality. Because that’s one thing we haven’t really talked about. And that’s actually one thing that’s very important in discussions about algorithmic transparency. How are you actually rating what the quality of the page is to determine how the highest quality results can float to the top? It is not magic. It’s actually through a combination of hundreds of algorithms, at least, plus on top of that our search quality rater program. So we do use humans not to rate individual pages, but to take a sample and to say, your systems are working as intended here. They’re not working as intended there. And so our page quality is summed up by the acronym EEAT, experience, expertise, authoritativeness, and trustworthiness. And so I think a lot of people want algorithmic transparency to be like, step into this closet, show me the code, I will know how it works. But actually, we’ve been really transparent about this and put it in a 160-page document on how we understand page quality. How many of you have read the search quality rater guidelines? Bars gets a free drink at the bar. Kate gets a free drink at the bar. That is what we consider to be a huge effort towards algorithmic transparency because it tells you exactly what our biases are. Our biases are for high page quality and experience, expertise, authoritativeness, and trustworthiness. But people are busy. They don’t have time. Like Fars, too. It was a research project. It was her job. To read 160 pages. But a lot of times, based on what we saw you guys put up for what you want algorithmic transparency to be, it really is about reading 160 pages about what a company says they are biased towards in terms of quality. But I think what people really want is something simpler, something easier, like the about this result feature that we showed at the beginning. So thank you very much for indulging me. Thank you for your art. That was amazing. Really, really good. And I’m going to throw it back to Fars and Nick so that we can have a discussion with the rest of our time. Thanks. So when you’re looking at page quality for rankings, and this is horizontally across time.

Audience:
If you look at the kind of, let’s say, advertising that is coming in and a lot of the search links becoming oriented towards gaming the Google algorithm, as you rightly said, and the quality of content on the web itself degenerates, how do you handle that? I mean, your quality of results will drop because the quality of content on the web is game towards selling and not towards the kind of internet you saw in the early 2010s or 2012s. I mean, it’s a general user’s perspective, but happy to hear your thoughts on it.

Zoe Darme:
I mean, that’s the heart of it. How do you, because it is super adversarial, and if people know, OK, the moment people realized it was about how many links are linking your page, then everybody was just linking to their own pages, right? And now I think what you’re talking about is what’s often called the SEOification of web results. I think the other thing here is a shift in the information ecosystem, where a lot of content creation is happening more in hosted content platforms or walled gardens. And so if people are incentivized to interact either in a platform, a closed platform, or a chat group, or whatever, they’re not creating on the open web. And so I think there is a larger ecosystem question to be had about, do we want to incentivize? And also, how do we incentivize content creation? In the golden age of the internet, when I grew up in the 90s, when I had an x-file, maybe more than one x-files page hosted on Lycos and Angelfire. So yeah, great question. Yeah.

Audience:
The words personalization and customization, do you use it interchangeably? For me, if I give an example, I’m not a soccer person or a cricket person, whereas my husband is that. So if I search about football or cricket, some things will come up, which is more preliminary things about cricket, like this is cricket, this is football. But if my husband searches, it will be more of an advanced source of search results. So do you call this personalization or customization?

Charles Bradley:
I mean, Zoe might answer that, but I would think of that as personalization. If the system knows enough about you to know that you don’t know much about cricket, then that’s a personalized result, and the changes are personalized.

Zoe Darme:
Yeah, the way we think about personalization is, is it something unique to you? Is it your search history? Is it because something about your past searches seem to imply that you’re a cricket person over a soccer person? That, to us, is personalization. That’s why we say location, course and location, doesn’t matter if I’m searching in Queens for best pizza near me. Everybody else is going to get the same result. There’s not something about me they don’t know, as Google should, that I really like Soto Listele over Domino’s. Yeah. Can we imagine microphones online?

Audience:
Hi, John from the AI Foundation for the Record. So we were talking about potentially personalized results. We didn’t see any example on the search that we did before, but if it was personalized, would the interface tell you how did they get to that conclusion? What was the process that it went through?

Zoe Darme:
So for right now, this particular feature doesn’t say we know X, Y, and Z about you, and that’s why it says it’s personalized or not. A good example of where a search might be personalized is for more like a feature like what to watch, what to read, what to eat. And so if you’re searching for a lot of recipes about Cassoulet, maybe after a while and you have personalization on, that’s the type of search where we might give you Julia Child over, say, oh, who’s that? Am I on the record? I won’t say anything bad. Rachel Ray, for example. So you might want a higher quality Cassoulet recipe if you’re a true French chef connoisseur. I’m sure Rachel Ray’s recipes are great. Yeah.

Audience:
Depending on the Google search I do, I get a number of sponsored ads that come up first. Sometimes a whole page will be sponsored ads, like if it’s a hotel. How did Google decide how many sponsored ads to show me before I get to the actual search? You’ve caught me out. I am an organic search person. We have people who work specifically on ads. There is a limit for the number of ads that you’ll see on the top of the search results page. I won’t give you the exact number because I’m going to forget. I think it’s something, Kate, do you know? Well, there’s a limit, but also ads appear on queries, like only certain sets of queries. So if you’re searching for a product or searching for a hotel. So certain queries will have more ads than other queries just because they are ones where people are looking to shop or to buy something or book a hotel, et cetera. And so it’s different depending on the query.

Zoe Darme:
Yeah, like for example, the first query that you saw me show, which was internet governance form, there were no ads on that one. But there are, even for those that are more shopping journeys, there are limits for the number of ads that can be shown. Hi, I’m Ricky from the Brazilian Network Information Center.

Audience:
So if you use incognito window, Google still knows your location and time of your search, but does not have access to your search history, right? Could you use the difference between the search, the incognito window and the normal window as a measure of personalization you get on the search?

Zoe Darme:
I mean, you could, or you can just turn personalization off. So if you go into the three dots, it’ll show you personalized for you or not personalized for you. And then there’s a link that says, manage my personalization settings. So even if you don’t want to go into incognito, you can use that link to turn personalization on or off. And then it’s the same as comparing between an incognito window and a regular browser window.

Audience:
Charlie Wurzel published a piece recently about the fact that Google might be boosting commercial queries. And what does transparency would look like to make that more publicly available, if it’s true?

Zoe Darme:
Great question. So I actually, maybe what we’ll do is pull up. Charles, you might want to start. Maybe what we’ll do is pull up. Charles, do you mind pulling up Danny Sullivan’s tweets on this? Because our search liaison actually provided a public response and rebuttal because it was a very, it was a misconception of how we would handle those queries. So there’s not a magic Google fairy behind saying, I put in the query Kate Sheeran, but really in the back end. And we’re not telling you we’re putting in Kate Sheeran Nike, for example.

Audience:
Yeah, go ahead. Let’s say I didn’t want to take your word for it. How would you demonstrate that to me? What sort of information would I need to answer that question?

Zoe Darme:
Algorithmic transparency. That’s a really great question, Nick. And I would love to know what we can do beyond what we’ve already done in product transparency through publishing 160 pages of search quality rater guidelines, through having Danny Sullivan put out regular tweets saying, no, we do not do x, y, and z. It would be interesting to know what we could do to show people that. If us saying it over and over and emphatically is not enough, then what can we do? And that’s why I made the joke, do we open a door and let people into the closet? I don’t mean to be facetious, but I’m not really quite sure what is that thing that we can do to prove it.

Audience:
What about closed independent third party audits? Would that be a mechanism? That’s a great question, Nick.

Zoe Darme:
Maybe since you’re at the oversight board, you might want to answer about what you know of the DSA or some of the other content regulations about third party audits. I know that’s one thing that the ecosystem is thinking about.

Charles Bradley:
Yeah, so I think it seems to me that in many cases, what you would want is someone you trust to have done a thorough audit of the code. So what do they need? You’ve got, one, the problems of how do we define the people that we trust? We can do that. We can sort of figure that out. How do we ensure that the personal information doesn’t leak? And probably as part of the same assurance process, you can figure that out. Then you’re pretty quickly whittling down the pool of people that are going to be in a position to be able to do that. I think there’s a lot of room for civil society organizations to work with journalists to provide that level of support. But we haven’t seen it because typically, we don’t yet have those relationships between technology companies and journalists that would allow a technology company to be assured that the results were not going to be misconstrued, that the results would not result in a leak of personal information, while at the same time allowing the journalist to actually publish what they want. Will the DSA fix that? I’m happy to defer to other people in the room on that. But I do think that fundamentally, what you are probably looking for for systems accountability is some sort of external measure. I think you’re probably right on this point that when Zoe says Google can provide you all of these assurances, you’re not going to be satisfied usually until you’ve had someone actually go through and spend a whole heap of time understanding what’s going on, which is pretty expensive.

Audience:
Hi, everybody. My name’s Tom. I’m from New Zealand, so apologies for my accent in advance. I’m project lead for something called the Action Coalition on Meaningful Transparency. And I think I just wanted to offer a couple of observations. I do think the audit mechanisms under the Digital Services Act are going to be something that is really important for effective transparency. I think the kinds of materials that auditors can access are extremely detailed. If you look at a laundry list for the different kinds of things you might want from companies in order to provide meaningful transparency, the auditors essentially have access to all of those things. They can ask people questions. They can ask to look at models, all these kinds of things. So I do think audit is going to be a really important component of the DSA. I think probably the next conversation we might be having, though, if we’re relying on independent third-party assessments is going to be, how do we have any confidence in those assessments? So at IGF next year, we might be looking at the first round of DSA audit reports. But everybody will be saying, how do we know these are any good? And how do we know that the auditors have said the right things and seen the right things? So it’s going to be sort of an ongoing issue, I think, as to how we get this kind of transparency. I just want to flag that one of the things we’re thinking about as well is, given that we have so much transparency information, and I think you’re referring to that from a sort of Google perspective, the question of how people can effectively make use of that is a whole other question. And I think that’s going to require things like funding, for example, because having the time and the expertise to look into all of these things takes a lot of time and money, basically. So I think another component of meaningful transparency is going to be funding. And then I’ll just call out one other thing, which is being able to find all of those disclosures in one place. So one thing we’ve heard through our conversations with various stakeholders is that it can be quite hard to find the information that’s already being disclosed. So what we’ve tried to do is pull together a portal that anybody can submit information to. It’s called the Transparency Initiatives Portal. And the idea there is that we will try to have a useful piece of community infrastructure for accessing various kinds of transparency information and initiatives. So hopefully, that will be something that we can talk about in the future.

Farzaneh Badii:
Thanks so much. Any other questions? We have about five minutes left. If not, I’ll pass back to Farz for closing thoughts. Sure, thank you. When you were going through the presentation, I was thinking that for each of the steps that you actually explained, there are hundreds of algorithms involved. So when we are asking for algorithm transparency in that kind of context in the life of a query, what do we actually mean by that? Why are we asking for transparency? This is why it’s so important to know why we are asking for transparency to hold Google accountable or to do certain research. And sometimes, when we are asking for transparency, we might actually want access to data and not necessarily transparency in a way that transparency and clarity in instruction. So I think that talking about transparency at granular level might also help civil society and policymakers with their effort to govern the internet and search engine and Google. And I think that this was a useful exercise. But I think that we need to provide more feedback for how to make this actually how we can use this example and what more we need and want through these processes in order to kind of feed into our conversations about internet governance in general. Do you have any closing remarks? Really? Why not? You were going on.

Charles Bradley:
I’m just really keen to hear from those in the audience. We don’t have any more time now, but I invite you to write down on a Post-it and leave it with us. Because I think this is a really tough challenge, pitching this at exactly the right level. Do you want more detail? Do you want the two-day workshop version? Do you want something that is a little bit more technical but a smaller component? Or are you happy with the sort of level of abstraction for your own uses as advocates, as people in industry and government, other stakeholders? What would actually be useful? I think that’s always the hardest part when you’re asking for information from a company that they’ve never given before. Their first impression of what you might be looking for probably doesn’t align with exactly what you need. So I think gathering that data, and if you want to write some feedback, we’d love to hear it, would be incredibly, incredibly useful. OK. Thank you very much. Thank you, guys. Thanks. We didn’t? Oh, no, that’s Saad. Oh, hi. How are you doing? Good. Good, good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good. Good.

:
Good. Good. Good.

Speakers

Speech speed

26 words per minute

Speech length

6 words

Speech time

14 secs

Audience

Speech speed

184 words per minute

Speech length

1172 words

Speech time

382 secs

Arguments

Advertising and the gaming of the Google algorithm have led to a decrease in the quality of content on the web

Supporting facts:

The increase in advertising and businesses trying to game the Google algorithm has led to skewed search results.

There has been a reported decrease in the quality of web content from the user’s perspective.

Topics: Google algorithm, Advertising, Web content, Quality control

The current information ecosystem incentivizes interaction within platforms or walled gardens, rather than on the open web.

Supporting facts:

Content creation is majorly happening in hosted platforms or walled gardens

Society has shifted from open web content creation

Topics: Information Ecosystem, Web Results, Content Creation

Personalization in search results is unique to individual users based on their past search history

Supporting facts:

Zoe Darme describes personalization as something unique to a user, based on their past searches

Topics: Personalization, Search history, Google

Obtaining personalized results depends on whether the user’s unique preferences are known

Supporting facts:

Zoe Darme uses the example of a user’s preference for a specific pizza place over another to illustrate this concept

Topics: Personalized results, User preferences

Google decides how many sponsored ads to show before actual search results appear

Topics: Google Search, Sponsored Ads

Using Incognito mode, Google knows your location and time of search but doesn’t have access to your search history.

Topics: Google, Incognito Mode, Personal Data

Google lets you control personalization settings

Supporting facts:

By going into the three dots, one can choose between personalized or not personalized settings.

Managing personalization settings can be achieved via a link, turning personalization on or off.

Topics: Google, Personalization, Data Privacy

There seems to be a misconception about how Google handles queries.

Supporting facts:

Zoe Darme mentioned that there is no magic Google fairy manipulating search queries.

Charles is asked to pull up Danny Sullivan’s tweets as a reference to explain Google’s query handling.

Topics: Google, Search Query

Algorithmic transparency and product transparency at Google

Supporting facts:

Google has published 160 pages of search quality rater guidelines

Topics: Google, Transparency, Algorithm

The need for additional measures to prove transparency

Topics: Transparency, Trust, Proof

Audit mechanisms under the Digital Services Act will be crucial for effective transparency

Supporting facts:

The kinds of materials that auditors can access are extremely detailed, including models and personal interviews

Topics: Digital Services Act, transparency, audit mechanisms

Confidence in third-party assessments is going to be another ongoing issue

Topics: transparency, third-party assessments, Digital Services Act

Effective use of transparency information is a separate challenge

Supporting facts:

Expertise and resources are necessary to make sense of transparency information

Topics: transparency, information use

Meaningful transparency will require funding

Topics: transparency, funding

A centralised portal to access all disclosures

Supporting facts:

The Transparency Initiatives Portal is an effort towards making transparency information accessible

Topics: transparency, disclosures, Transparency Initiatives Portal

Report

Alongside this, there are prevalent business trends pushing for content creation within so-called ‘walled gardens’ – private platforms controlling access to content. This trend has incited apprehension about the sustainability of an open web environment, raising questions about the ethical stewardship of the information ecosystem.

However, Google needs to clarify how it communicates this personalisation process to its users. The delicate equilibrium between personalised and non-personalised search results influences user satisfaction and affects the overall grade of content. Google’s authority over the quantity and positioning of sponsored adverts appearing ahead of the actual search results was analysed.

Suspicions over Google potentially favouring commercial queries were sparked by an article by Charlie Wurzel, emphasising the need for greater transparency in this area. While the placement of adverts appears arbitrary, adverts often appear in response to queries where users demonstrate an intention to make a purchase or booking.

Interestingly, Google assures user control over personalisation settings, accessible with a simple click and ensuring secure management of personal settings. A significant portion of the conversation focused on transparency in handling search queries and algorithms. Misconceptions about Google manipulating search queries were dispelled.

Google’s issue of an extensive 160-page document on search quality rater guidelines was praised as a commendable move towards fostering transparency. However, demands for verifiable evidence, accountability and third-party audits of Google’s narratives emerged. The potential efficiency of the Digital Services Act with its proposed audit mechanisms was seen as a forward stride to enhance transparency.

However, doubts over the reliability of third-party assessments remain, along with issues related to the apt interpretation and utilisation of transparency information. A recurring sentiment was that transparency can only be realised through adequate funding and resources. The recommendation to create a centralised Transparency Initiatives Portal for efficient access to all disclosures was regarded as a practical solution.

This move would arguably benefit all parties involved in the comprehension and verification of data related to transparency. In sum, these discussions reflect the need for increased vigilance, clarity and public involvement in the control and management of online content, putting an emphasis on data privacy, fair business practices, transparency and user satisfaction.

Charles Bradley

Speech speed

131 words per minute

Speech length

622 words

Speech time

285 secs

Arguments

System knowing enough about the user to provide results based on their knowledge about a subject is personalization

Supporting facts:

Charles Bradley’s view about personalization is that if a system is providing results according to user’s knowledge on the subject, then it can be called personalization

Topics: Personalization, Internet Search

Third party audits of code should be conducted by trusted parties, assuring no leak of personal information.

Supporting facts:

There’s need of establishing trust between technology companies and journalists.

The pool of qualified people to conduct audits is small.

Topics: DSA, Code audits, Technology companies, Journalists

External measure is fundamental for systems accountability.

Supporting facts:

Google’s self-assurances are usually not enough for satisfaction, indicating need of third party measures.

Topics: Systems accountability, External measures

Keen to hear from audience

Supporting facts:

Invites audience to write feedback and opinions

Challenges in pitching information at the right level

Gathering data from audience feedback can be useful

Topics: Audience Feedback, Company Information, Transparency

Report

This approach implies that personalisation goes beyond simply catering to preferences, and instead, appreciates the user’s comprehension on a specific subject. Moreover, Bradley underlines the importance of code audits, suggesting these security checks should ideally be performed by trusted third parties.

The objective is to nurture stronger trust between technology companies and journalists, a relationship often strained due to contentious issues surrounding data privacy and source protection. However, Bradley acknowledges the challenge in this area due to the sparse pool of qualified personnel capable of conducting such intricate audits.

Here, Bradley questioned whether the Digital Services Act (DSA) could effectively accommodate the implementation of these external audits, displaying a cautious and investigative stance on the proposed legislation. Additionally, Bradley exhibits a keen interest in integrating audience feedback into the information sphere about company activities.

Audience feedback can proffer valuable insights for companies aiming to ascertain public sentiment or identify areas for improvement. Acknowledging the challenges of striking the appropriate balance in terms of information dissemination, Bradley underscores the necessity for transparency for industry stakeholders, government entities, and advocates.

The struggle resides in soliciting information that companies may have previously been reticent to share, and ensuring that the initial company impressions coincide with stakeholder needs. In conclusion, even though most of Bradley’s sentiments were neutral, his call for audience feedback was perceived as a positive endeavour towards enhancing transparency and improving stakeholder communication.

This comprehensive analysis embodies Bradley’s profound understanding of the digital landscape, accentuating the intricacies of personalisation, the need for informed security measures, and the challenges in achieving transparency in an ever-evolving digital environment.

Farzaneh Badii

Speech speed

126 words per minute

Speech length

265 words

Speech time

126 secs

Report

Interestingly, the push towards transparency is construed as potentially a covert demand for data access. Therefore, clarifying what form ‘transparency’ takes, and what the end goal of this transparency is, becomes a critical point of discussion. There is also an articulated need to solicit more feedback on the usefulness of the explained processes serving the industry and the public, raising several pertinent questions.

For instance, how can this illustrative case study be utilised most effectively? What can be learnt from it? What additional information or tools are requisite? These open-ended inquiries underline the constant need for innovation and improvement in the internet infrastructure.

Despite delving into complex issues, the dialogue is deemed a beneficial exercise, proving advantageous in sparking conversations around accountability and transparency in the digital arena. In relation to broader global implications, the conversation aligns with the ethos of several of the United Nations’ Sustainable Development Goals (SDGs), notably SDG 9 which underscores the importance of Industry, Innovation and Infrastructure, and SDG 16 that advocates for promoting Peace, Justice and Strong Institutions.

Both these goals, vast in their individual mandates, intersect at the necessity for transparent and accountable technological entities. To sum up, the dialogue illuminates the pivotal role of algorithms in internet search queries, urges for heightened transparency concerning their operations, necessitates detailed, granular dialogues, and calls for more feedback on the efficacy of the explained processes.

Above all, despite the nuanced topics, the discourse is regarded as an invaluable dialogue, contributing towards the realisation of key Sustainable Development Goals.

Zoe Darme

Speech speed

153 words per minute

Speech length

3588 words

Speech time

1405 secs

Arguments

Algorithmic transparency means making visible how algorithms work, knowing the quality and biases of datasets, and understanding how the output or results are obtained

Supporting facts:

Different responses on algorithmic transparency were collected, ranging from making visible the workings of algorithms, knowing why a result is showing up when an input is provided, to understanding how results are output from the algorithm

Topics: Algorithm, Transparency, Data Bias

The process of search works like a vending machine with crawling, natural language processing, query understanding, data retrieval, and user interface being the main components

Supporting facts:

The user interface is described as the vending machine, behind which lies the ‘Google magic’ consisting of natural language processing, query understanding, crawling, and data retrieval

Topics: Search Engine, Natural Language Processing, Crawling

Search engines are complex systems, containing several processes such as crawling, indexing, organizing, and serving data.

Supporting facts:

The speaker talked about these processes using the analogy of a vending machine, with each phase constituting a different part of its operation.

Crawling is like searching for different drink options, indexing is the organization of these drinks, and serving is the retrieval of the requested drink.

Topics: Web crawling, Indexing, Search engine, Data organization

An exact weighting system for queries does not exist, thus making algorithmic transparency complex.

Supporting facts:

Different signals such as the freshness or location of the searched words can influence the search results.

The usage of these signals is dynamic and changes over time.

Topics: Query weighting, Algorithmic transparency

Google uses Search Quality Raters and hundreds of algorithms to determine page quality.

Supporting facts:

Google’s biases are for high page quality, summed up by the acronym EAT (Experience, Expertise, Authoritativeness, Trustworthiness).

The Search Quality Rater Guidelines are a 160-page document detailing these biases.

Topics: Page Quality, Search Quality Raters, Algorithm

The quality of web content has been affected by the ‘SEOification’ of web results, where content is created with a focus on gaming search engine algorithms rather than providing quality content.

Supporting facts:

The audience member mentioned a degeneration in web content quality due to the shift towards gaming search engine algorithms, rather than focusing on quality content.

Zoe Darme agreed and mentioned the ‘SEOification’ of web results as a concerning phenomenon.

Topics: SEO, content quality

The internet’s content ecosystem has shifted, with more content creation happening in closed platforms or ‘walled gardens’, lessening the amount of content created in the open web.

Supporting facts:

Zoe Darme noted that if people are incentivized to interact in a closed platform or a chat group, they’re not creating on the open web.

Topics: internet ecosystem, walled gardens, content creation

Personalization occurs when a system tailors results based on one’s unique characteristics or past activities.

Supporting facts:

Consideration of a people’s search history for generating results.

When someone’s preferences, inferred from past searches, affect the results displayed.

Topics: System Customization, Information Retrieval

Google’s personalization feature does not explain why it provides particular results.

Supporting facts:

This particular feature doesn’t say we know X, Y, and Z about you, and that’s why it says it’s personalized or not.

Topics: Google Search, Personalization

Personalization in Google’s search can be more prominent in searches like ‘what to watch’, ‘what to read’, or ‘what to eat’.

Supporting facts:

A good example of where a search might be personalized is for more like a feature like what to watch, what to read, what to eat.

Topics: Google Search, Personalization, User Behavior

There is a limit to the number of ads shown at the top of Google search results

Supporting facts:

Specific number for ad limit is not mentioned

The limit varies as some queries have more ads than others

Topics: Google, Digital Advertising, Search Engine Optimization

Google still knows your location and time of your search in incognito mode

Topics: Incognito Mode, Google Search, Location Tracking

Google does not have access to your search history in incognito mode

Topics: Incognito Mode, Google Search, Privacy

Personalization in search can be turned on or off manually

Topics: Google Search, Personalization

Google isn’t boosting commercial queries artificially.

Supporting facts:

The search liaison provided a public response negating this notion.

Claims of Google boosting commercial queries are misconceptions.

Topics: Google, Commercial queries, Search Transparency

Zoe Darme underscores the need for algorithmic transparency

Supporting facts:

Google has published 160 pages of search quality rater guidelines

Danny Sullivan puts out regular tweets insisting Google does not engage in x,y,z activities

Topics: Algorithmic Transparency, Product Transparency

Report

Despite these concerns, Google’s Zoe Darme suggested that personalisation, when based on a user’s historic activity and personal preferences, could significantly enhance search result quality. ‘Search Quality Raters’ were highlighted in the panel. The revelation that Google applies hundreds of algorithms for evaluating webpage quality was emphasised.

Worries were voiced about the deterioration of web content quality due to a trend named ‘SEOification’. This phenomenon implies a considerable shift towards manipulating search engine algorithms, often at the expense of content authenticity and originality. A notable observation was the apparent movement of the internet’s content ecosystem from open-web platforms to closed environments – referred to as ‘walled gardens’.

This trend seems to have instigated a decrease in open web content creation, leading to an interesting proposition – potentially incentivising content creation on the open web to preserve a diversely vibrant internet ecosystem. Considerable attention was devoted to Google’s digital advertising practices.

While it’s clear that there is a limit on the number of ads displayed at the top of Google search results, this limit isn’t explicitly defined. Commercial searches, such as those related to shopping or making bookings, were observed to have a larger volume of ads.

However, users retain the ability to manage their personalisation settings independently of using incognito mode. This interpretation emphasises the nuanced control Google users have over personalisation and privacy.

Under the Hood: Approaches to Algorithmic Transparency | IGF 2023

Table of contents

Knowledge Graph of Debate

Session report

Full session report

Zoe Darme

Farzaneh Badii

Audience

Charles Bradley

Session transcript

Speakers

Report

Audience

Arguments

Advertising and the gaming of the Google algorithm have led to a decrease in the quality of content on the web

The current information ecosystem incentivizes interaction within platforms or walled gardens, rather than on the open web.

Personalization in search results is unique to individual users based on their past search history

Obtaining personalized results depends on whether the user’s unique preferences are known

Google decides how many sponsored ads to show before actual search results appear

Using Incognito mode, Google knows your location and time of search but doesn’t have access to your search history.

Google lets you control personalization settings

There seems to be a misconception about how Google handles queries.

Algorithmic transparency and product transparency at Google

The need for additional measures to prove transparency

Audit mechanisms under the Digital Services Act will be crucial for effective transparency

Confidence in third-party assessments is going to be another ongoing issue

Effective use of transparency information is a separate challenge

Meaningful transparency will require funding

A centralised portal to access all disclosures

Report

Charles Bradley

Arguments

System knowing enough about the user to provide results based on their knowledge about a subject is personalization

Third party audits of code should be conducted by trusted parties, assuring no leak of personal information.

External measure is fundamental for systems accountability.

Keen to hear from audience

Report

Farzaneh Badii

Arguments

There are hundreds of algorithms involved in the steps explained

Report

Zoe Darme

Arguments

Algorithmic transparency means making visible how algorithms work, knowing the quality and biases of datasets, and understanding how the output or results are obtained

The process of search works like a vending machine with crawling, natural language processing, query understanding, data retrieval, and user interface being the main components

Search engines are complex systems, containing several processes such as crawling, indexing, organizing, and serving data.

An exact weighting system for queries does not exist, thus making algorithmic transparency complex.

Google uses Search Quality Raters and hundreds of algorithms to determine page quality.

The quality of web content has been affected by the ‘SEOification’ of web results, where content is created with a focus on gaming search engine algorithms rather than providing quality content.

The internet’s content ecosystem has shifted, with more content creation happening in closed platforms or ‘walled gardens’, lessening the amount of content created in the open web.

Personalization occurs when a system tailors results based on one’s unique characteristics or past activities.

Google’s personalization feature does not explain why it provides particular results.

Personalization in Google’s search can be more prominent in searches like ‘what to watch’, ‘what to read’, or ‘what to eat’.

There is a limit to the number of ads shown at the top of Google search results

Google still knows your location and time of your search in incognito mode

Google does not have access to your search history in incognito mode

Personalization in search can be turned on or off manually

Google isn’t boosting commercial queries artificially.

Zoe Darme underscores the need for algorithmic transparency

Report

Related event

Internet Governance Forum 2023