Social and Political Impacts of Web Search Techniques — An Overview
1.0 Introduction
A search engine in today’s world is the primary means by which an average user discovers information about almost anything. People all over the world cognitively depend on search engine results to form opinions, fill the information gap, check facts, survey a market, and for numerous other purposes in everyday life. Search algorithms utilized in search engines influence how users gain information and develop a bias on societal issues and political opinions, thus effecting a user’s attitude and shaping their opinion. In this paper an attempt to summarize the social, economic, and cultural impacts of present web search techniques in portrayed. For the purpose of narrowing the scope in this paper, it is limited to the impacts of general-purpose search engines.
2.0 Impact of Web search Techniques
In this section, a systematic review of the impact of web search engines on social, economic, and cultural issues is presented.
2.1 Social and Political Impact of Search Engines
Could a search engine influence the elections in a democracy? Research suggests that Google Search has the power to sway democratic elections [2]. There are growing concerns over the power popular web search engines hold over the political outcomes of an election, with the recent finding that bias or favoritism in search rankings can significantly influence voting behavior.
Robertson et al., [1] audited the impacts of the composition of search engine result pages (SERPs) and user-related personalization of search have in politically-related search. The study suggests that diverse information presentation in SERPs with components such as, knowledge box, people-ask, twitter, news-card, etc., when considered with the trust associated with top-ranked results, likely increases the dissemination of misinformation or “fake news”. The study also attempted to quantify the personalization of search results based on the following criteria: logged in to Google or not, number and types of Alphabet (Google) products used, participants’ ratings of Donald Trump, and their political party, participants’ characteristics. It tried to understand the dynamics between social media such as “Twitter” and web search results. The result was that personalization in searches related to politics given the influence search engines have, could have a significant effect on a searcher’s voting behavior. It also noted that searchers who have a low political inclination towards the left or right are more conducive to being swayed by biased search results. In a related experiment of 2,150 people during the 2014 Indian elections indicated that 24.5% of undecided voters could be swayed by biased rankings in search results [8].
Robertson and Ronald [2], quantified partisan bias among searchers post President Donald Trump’s inauguration. Partisan bias has been shown to influence voting behaviors through newspapers, television (e.g., the “Fox News Effect”), social media (see also “digital gerrymandering”), and search engines (e.g., the “Search Engine Manipulation Effect (SEME)”). It was found that the partisan bias swayed election-related search ranking preferences of undecided voters by 20% or more. According to this study, the results placed toward the bottom of Google SERPs were more left-leaning than the results placed toward the top. The direction and magnitude of overall lean varied widely by search query, component type, and other factors. Further, Google’s ranking algorithm shifted the average lean of SERPs slightly to the right of their unweighted average. Embedded tweets in Google’s search results, likely amplified the reach of Donald Trump’s Twitter account because of its prominence near the top of search results. [2].
Though the exploration of misinformation spread has primarily been of a focus in social media, it is observed that social media in combination with trust in search engines could increase exposure to and consumption of misinformation. Metaxa et al., [3] coined the word “search media” vis a vis algorithmically curated content meant to be consumed as media by search engine users. It highlights both the search algorithm’s workings and real-world events as factors affecting search media. Search media functions as “metamedia”, which reflects the state of the real-world media ecosystem. The algorithms used to curate search media are non-transparent and act as gatekeepers of information. The study strongly suggests the high risks of search results being consumed by the user akin to traditional media sources resulting in misinformation, political bias, and campaign agenda propagation.
2.2 Impact of Search Engines on News
It is important to consider not only how search results are ranked but also the way they are formatted and displayed to the user when assessing the role search engines play in forming preferences, bias, and providing availability of information to the user. These guide attention and behavior towards items places at a certain location with markup elements that add semantic meaning. With this in view, Trielli and Diakopoulos [8] focus on one of Google’s prominent search components, “Top Stories box” and its role in shaping attention and availability of news information. The study found that the top 20% of news sources account for 86% of all impressions (appearance of a link in the Top Stories box aggregated by their root domain). CNN and NYT were accounted for 17.4% of the impressions observed [8], thus concluding the diversity of news sources appears to be limited. Source diversity is especially important for queries that serve the purpose of providing public information and have social consequences.
The study also found that Top Stories box is more inclined to have left-leaning impressions than right-leaning ones, which could mean either one of two things, (1) the Google algorithm is biased in selecting left-leaning sources; or (2) there is more left/liberal news content being published online. The algorithm also appears to have a tendency to favor more recent news as top-ranked results, which could mean that news sources that refresh news more often even though they may not necessarily have better quality news would receive better visibility. The news sources in the Top Stories box is observed to receive significantly more traffic as opposed to others from Google.
An earlier study on Google’s knowledge panel component conducted by Lurie and Mustafaraj [9], also corroborates similar results on the impacts of the search engine algorithm and human-computer interaction have on how search users receive their news information. SERPs influence users’ decision making and news literacy. It is to be noted that google’s SERP is found to become an arena where algorithms, humans, and publishers meet and try to influence one another [9]. While deciding on the authenticity and trust of a news source on behalf of the user, search platforms such as Google, play a crucial role in influencing their decision, given the fact that users already place such trust in these platforms.
2.3 Impact of Search Engines on Health
Search engine users are increasingly relying on web search results for diagnosing and researching medical conditions and health issues. In view of the searchers, it provides a quick, no-cost way to investigate and treat medical issues, which is proven to be oftentimes medically inaccurate and even sometimes a health risk as determined by health experts. The bias of top-ranked search results, and the possible misinformation contained in these results, poses to be a great health risk and concern for a naïve user. Search engines return relevant documents, irrespective of them being correct or incorrect, providing incorrect information might be harmful. Secondly, when searching for the effectiveness of medical treatment, research has shown that there is a bias towards stating that those treatments are effective [4]. Thus, it does not provide the most accurate factual opinion when it comes to medical treatments. Search engines are also being used for performing exploratory medical queries. It is observed that people might develop unjustified health concerns, by believing benign symptoms being explained by serious illnesses in search results. This behavior has also been termed as “Cyberchondria”. Different users may have specific preferences on how they formulate their queries. High-level Query Formulation features as well as individual word choices reveal information about the searcher. There is a lot of variation in SERPs for different but similar word choices. Search engines serve users with a history of medical searching, with more concerning results. Concerning SERPs might reduce HUI queries and hence real-world health-seeking in the short term [4]. Puspitasari [6] shows that familiarity with health topics affects the behavior of health information search. Misunderstandings in health information can potentially have fatal consequences. The familiarity of health information varies by user. The study found that during various stages of a search process, participants with more familiarity with health information had more success in achieving search results with higher efficacy and medical correctness.
Often times SERPs contain incorrect and misleading results that can have detrimental effects on users who have a content bias. Even with the think-aloud strategy from Ghenai [7] in place, which implied participants would say out loud what information they expected to gain from a search activity, it was clear that cognitive bias associated with search still proved to significantly alter the user’s interpretation and learning from a SERP. These implications are profound when users with little to no familiarity with medical concepts, search for serious illnesses such as cancer, with a typical relevance algorithm acting as the agent to produce both correct and incorrect results.
2.4 Impact of Search Engines on Privacy
Search engine algorithms typically use relevance feedback from users to form a user profile in order to customize search tailored to their frequent searches and topics of interest. This helps search engines like Google, provide a tailored and more effective search experience for its users. Collecting relevance feedback is not the most practical and feasible from a user experience perspective, thus search engines rely on collecting information about users discreetly in the background without interrupting the user. Although useful in some scenarios, this strategy poses an immense concern over privacy for a growing number of search users. Through a survey conducted as part of the study [10], in the country of India which ranks third in the world in terms of the number of internet users, concluded that an increasing number of users feel that they have no control over their privacy. Concerns over privacy and features of personalization often don’t go together. Thus, there is a growing need for search companies to pay attention to the privacy of users.
3.0 Observations and Discussion
Across all domains we have discussed in section 2, there seems to be some commonality between the negative impacts web search engines have had in these various domains. Concerns over algorithmically generated content over the web have been receiving increasing concerns all over the world. It’s role especially in molding and warping public opinion to the extent of bias is alarmingly concerning, even more so when it has a deep social and political impact on nations. There are a number of ways in which, this process is being aided, a few of which are outlined here.
· Rank Bias- The cognitive bias of search users towards top-ranked results being more accurate and trustworthy. A disproportionate number of clicks and attention go to the top results [1].
Another study indicated that college students trust Google’s ranking of SERP results and tend to click on the first couple of results even when more relevant links were ranked towards the bottom [9].
· Trust Bias- The unjustified trust search users have in the authenticity and accuracy of SERPs. It is observed that users believe that the search results reflect real-life opinions due to biased content. In particular, results can be interpreted as a consensus at a larger scale even though when they only reflect a certain point of view [7].
· Source Bias- It is the social obligation for a search engine to provide a range of perspectives and viewpoints and socio-political positions for the users. Source bias is much more profound in the case of news sources, as we observed in the previous section. SERPs seem to default to certain result sources, one prominent example being Wikipedia links.
· Misinformation- Search engines are inertly designed to produce documents/results which are algorithmically the most relevant, irrespective of these results having correct or incorrect information. The incorrect information translates to “fake news” in terms of news and politics and has much more dire consequences when it comes to average users with little health knowledge-seeking life-altering medical treatments and information online. It is found that users are highly influenced by misinformation, demonstrating a degree to which search biases can impact individual decision-making [7]. The direct answer box of Google has been shown to be prone to manipulation, thus transmitting misleading and false information [9].
· Search Components/ Visual Markers- Although from the point of view of user experience and quick and clear delivery of information, search components such as Google’s knowledge component, embedded twitter results, top stories box, people-ask, news-card, people-search, related-search and so on, and markup elements that add semantic meaning, provide good user experience, these elements have been found to construct bias and provide limited sources of information to the user. This was highlighted by Robertson et al.[1], where among all types of components the top 20% of the domains accounted for 96.1% of all domains of the sources of search components. This inequality is also paralleled among individual components [1]. In terms of news, it is found that publishers that had news articles in the Top Stories box received a significant boost in traffic (up to 1/6th more) as opposed to the ones placed in organic results in the SERP [8].
· Personalization- It is in the nature of search engine recommendation algorithms to learn user behavior and interests for suggesting content to users based on their user profile. This provides a tailored search experience to each user and also helps to produce top results that may be more relevant to the user. Since information relevance is highly subjective and majorly depends on the perception of the user of the information retrieval system, search engines seek to obtain some markers on users which will help them to increase their recall and precision of retrieved documents. However, this process may be counterproductive, when the user is a learner and the goal of information retrieval is knowledge discovery.
It may also be speculated that personalization creates a “filter bubble”, where only supporting information is retrieved, creating somewhat of selective exposure to information. This can be especially troublesome for health searchers. Schoenherr and White [5] highlighted that, past user queries do have a direct impact on producing search results that may be medically more concerning and serious. Political personalization can entrench users’ existing political beliefs by limiting exposure to cross-cutting information and alternative views and beliefs. The study [1] illustrates the measures of personalization with respect to political party inclination, president Trump’s ratings, and Google account sign-in.
4.0 Critical Analysis
The majority of the studies referred to in our discussion consist of studying one major search engine, that is Google, thus the algorithmic study is limited to its logic, functioning and behavior, and findings pertaining to user behavior thus obtained cannot be generalized to apply to other search engines such as Bing.
In addition to the choice of a search engine, the platform on which the surveys and audits have been conducted is limited to desktop browsers and captures desktop results only, despite the evidence that the majority of user search activity is on handheld mobile devices. Further, there is limited research on how search activity performed by the Internet of Things (IoT) devices such as smart assistants impact search engine users and if the audits and analysis of traditional search correlate with that of IoT devices. It can, therefore, be concluded, that there is restricted source diversity for these studies.
Major search engines like Google perform very high-level Information Retrieval that involves the execution of complex algorithms. Any attempt on trying to encompass the entirety of the functioning of their algorithms is a difficult pursuit and not standardized. Adding to this, the non-transparency of Google’s source code and inner workings, questions the reliability of the audits and studies conducted so far that appear to have limited technical coverage.
There appears to be insufficient study of the relationship between social media and web search and how they influence each other. With the increasing number of social media search components appearing on SERPs, it is important to study the algorithms behind their rankings and availability, to better understand their implications on user search biases. For instance, there is no analysis of how results are ranked in Google’s twitter-card component, and what influences certain tweets to be given prominence over others. In addition, there is no formal study of how the visual design and placement of information within these search components affect user behavior on screen.
The data sample of any research plays a major role in determining the outcomes and can sometimes not present an accurate picture. For instance, in the Robertson, et al. study [2], results were aggregated and the participant sample was imbalanced in terms of demographics, political preferences and taken at different times of the day and it is common knowledge that web traffic can vary drastically over the course of a day. In this sense, studies performed around a major political event might have varying results from that of a normal scenario, analysis of which is limited. Another key data point is the search terms used, which is at the discretion of researchers and not of the general population.
One of the key factors for personalization employed by search engines is based on the searcher’s location. A search in one part of the world may vastly differ from another part of the world even on the same search platform. Out of all the studies discussed here, five of them [1,2,3,8,9] focused on the U.S. version of Google with U.S. centric search terms. Therefore, it is unclear if the results of these studies would vary across the world. In addition, there are no set ways to ensure de-personalization of search, as is the case in the Robertson, et al. study [1], which relies on using Chrome’s incognito mode to ensure this.
In the Ghenai, et al. study [7], the think-aloud method fails when a user’s need is unconscious which may be affected by various factors outside of the scope of the user.
5.0 Suggestions
When an increasingly large number of people cognitively depending on search engines each day, every design and algorithmic decision made by the search platforms carry a broad impact. This impact is not just on the individual information searcher, influencing what information they find and absorb, but also on society in general, affecting our culture and politics by navigating people toward certain information and perspectives.
Through the presented analysis of the present research on current and future impacts of search techniques on society, economy and culture, it is evident that there is a lack of sufficient and periodical audits of modern search platforms. Periodical and in-depth algorithmic audits of a broad range of search platforms are suggested. In order to track the constantly changing features, composition, and ranking factors that produce search results in search engines, regular audits will provide a means to track these changes and also how their impact on users varies.
Broader frameworks for the study of the impacts of search techniques, incorporating design elements such as search components, might aid in revealing new insights not just on the algorithms but also on human-computer interaction. Such frameworks should also consider, expanding the scope of the choice of search engine platform by including other major market players in search such as Bing. Frameworks might benefit from incorporating user-centered methods such as surveys to generate a more realistic and thorough index of search key terms. Data scraping methods for analyzing whether personalization alters news displayed to users might be limited in their application and data collection plugins in JavaScript for example which are open source, might aid in this regard. For a study on news searches, considering computational methods for categorizing articles and news sources might aid in providing a more comprehensive ground to work with. In addition, a longitudinal investigation of news searches might help to visualize how news searchers are affected over a duration of time. Study on news searches also appears to be limited to nationally recognized news, since there is an overall underrepresentation of local news outlets in SERPs [8], a further analysis concentrated on local news may provide a better understanding of the impact of news searches overall. Finally, in order to account for the billions of mobile device users and an increasingly large number of IoT devices (e.g. smart assistants) users, it is strongly suggested that future research frameworks include cross-platform search as one of the key factors to be considered when studying the effects of search techniques on users and society. Lastly, the impact of SERP ranking and composition on the user’s future behavior needs to be taken on as a factor when designing the research framework.
From the point of view of search engine researchers, given the amount of misinformation that is prevalent in SERPs, more robust algorithms that not only consider relevance, but also consider the correctness, authenticity, authority, and truthfulness of results when evaluating pages is highly warranted. Just as non-relevant documents are given zero gain value, incorrect documents must be assigned negative gain in order to should shape their document ranking. An alternative to this approach may be to use visual markup elements to add semantic meaning to results with respect to their correctness in addition to their author and source might aid in mitigating some of the same problems. Lastly, tools can be designed to monitor the quality of SERPs with respect to social elements such as politics and news to detect misinformation even before it is spread.
Acronyms:
SERP: search engine result page
SERPs: search engine result pages
HUI: real-world healthcare utilization
IoT: Internet of Things
References:
1. Robertson, Ronald E., David Lazer, and Christo Wilson. “Auditing the personalization and composition of politically-related search engine results pages.” Proceedings of the 2018 World Wide Web Conference. 2018.
2. Robertson, Ronald E., et al. “Auditing partisan audience bias within google search.” Proceedings of the ACM on Human-Computer Interaction 2.CSCW (2018): 1–22.
3. Metaxa, Danaë, et al. “Search media and elections: A longitudinal investigation of political search results.” Proceedings of the ACM on Human-Computer Interaction 3.CSCW (2019): 1–17.
4. Ghenai, Amira. “Health misinformation in search and social media.” Proceedings of the 2017 International Conference on Digital Health. 2017.
5. Schoenherr, Georg P., and Ryen W. White. “Interactions between health searchers and search engines.” Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 2014.
6. Puspitasari, Ira. “The impacts of consumer’s health topic familiarity in seeking health information online.” 2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA). IEEE, 2017.
7. Ghenai, Amira, Mark D. Smucker, and Charles LA Clarke. “A Think-Aloud Study to Understand Factors Affecting Online Health Search.” Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. 2020.
8. Trielli, Daniel, and Nicholas Diakopoulos. “Search as news curator: The role of Google in shaping attention to news information.” Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 2019.
9. Lurie, Emma, and Eni Mustafaraj. “Investigating the Effects of Google’s Search Engine Result Page in Evaluating the Credibility of Online News Sources.” Proceedings of the 10th ACM Conference on Web Science. 2018.
10. Punagin, Saraswathi, and Arti Arya. “Privacy and Personalization Perceptions of the Indian Demographic with respect to Online Searches.” Proceedings of the Third International Symposium on Women in Computing and Informatics. 2015.