{"id":108372,"date":"2025-04-15T16:17:08","date_gmt":"2025-04-15T20:17:08","guid":{"rendered":"https:\/\/cdt.org\/?post_type=insight&p=108372"},"modified":"2025-04-15T18:09:58","modified_gmt":"2025-04-15T22:09:58","slug":"automated-tools-for-social-media-monitoring-irrevocably-chill-millions-of-noncitizens-expression","status":"publish","type":"insight","link":"https:\/\/cdt.org\/insights\/automated-tools-for-social-media-monitoring-irrevocably-chill-millions-of-noncitizens-expression\/","title":{"rendered":"Automated Tools for Social Media Monitoring Irrevocably Chill Millions of Noncitizens’ Expression"},"content":{"rendered":"\n
Last week, USCIS stated its plans to routinely screen applicants\u2019 social media activity<\/a> for alleged antisemitism when making immigration decisions in millions of cases, and announced that it is scouring the social media accounts of foreign students<\/a> for speech that it deems potential grounds to revoke their legal status. Simultaneously, the Department of State has started using AI to enforce its \u201cCatch and Revoke<\/a>\u201d policy and weed out \u201cpro-Hamas\u201d views among visa-holders, particularly including students who have protested against Israel\u2019s war in Gaza. <\/p>\n\n\n\n This isn\u2019t USCIS\u2019s first time conducting some form of social media monitoring; in fact, their first foray into social media data collection was in 2014<\/a>. But, it is the first time the government has used a previously obscure provision of immigration law<\/a> to target a large group of noncitizens for removal based on their political opinions and activism that the Secretary of State has determined could have \u201cpotentially serious adverse foreign policy consequences.\u201d The current Administration\u2019s broad definitions of speech that could lead to visa revocation or application denial, and the questionable constitutionality of making immigration decisions based on viewpoint, raise concerns that will only be exacerbated by the use of flawed, error-prone social media monitoring technologies.<\/p>\n\n\n\n The American immigration system already subjects applicants to disproportionate<\/a> invasions of privacy and surveillance, some applicants more than others<\/a>. In the current Administration, immigration enforcement has been particularly aggressive and gone beyond the bounds of previous enforcement efforts, with agents bringing deportation proceedings against applicants on valid visas on the basis of their legally-protected speech, including authorship of op-eds<\/a>, participation in protests<\/a>, and, according to a real albeit now-deleted social media post by the Immigration and Customs Enforcement agency, their ideas<\/a>. Noncitizens have long been aware of the government\u2019s surveillance of their speech<\/a> and their social media activity, which has deterred them from accessing essential services<\/a> and speaking freely on a wide range of topics, including their experience with immigration<\/a> authorities, labor conditions in their workplace, or even domestic violence.<\/p>\n\n\n\n What is happening now, however, is an unprecedented and calculated effort by the U.S. government to conduct surveillance of public speech and use the results to target for removal those who disagree with government policy. At the time of writing, over 1,000 student visas have been revoked<\/a> according to the State Department, some of which have been for participation in First Amendment-protected activities<\/a>. For example, one post-doctoral student<\/a> at Georgetown reportedly had his visa revoked for posting in support of Palestine on social media, posts that were characterized<\/a> as \u201cspreading Hamas propaganda\u201d by a DHS spokesperson. In a high-profile case from earlier this year, the former President of Costa Rica received an email from the U.S. government revoking his visa to the United States<\/a> a few weeks after he criticized the government on social media, saying, \u201cIt has never been easy for a small country to disagree with the U.S. government, and even less so, when its president behaves like a Roman emperor, telling the rest of the world what to do.” All signs indicate that disagreement with this Administration\u2019s viewpoints could lead to negative consequences for noncitizens seeking to enter or remain in this country in any capacity.<\/p>\n\n\n\n This expansion of ideological targeting is cast against the backdrop of an immigration system that faces, at times, a Sisyphean backlog<\/a> of applications and insufficient oversight<\/a> of enforcement decisions, which are only growing in this political climate. Mistakes are routinely made, and they have devastating consequences. To the extent oversight agencies did exist, including through entities such as the Department of Homeland Security\u2019s Office for Civil Rights and Civil Liberties, they have been shuttered<\/a> or undermined, which will make it all the more difficult to identify and fix errors and failures to provide due process.<\/p>\n\n\n\n Applicants have little recourse to seek remedy or appeal mistakes when they are made, instead having to choose among cautious over-compliance in the form of silence, potential retaliation, or self-deportation<\/a> to avoid it all. Increased social media surveillance of noncitizens against this backdrop will compound existing inequities within the system, and will almost certainly further chill noncitizens\u2019 ability to speak and participate freely in society for fear of running afoul of the Administration.<\/p>\n\n\n\n And that\u2019s all before accounting for the problems with the tools that the government will use to conduct this monitoring. The automated tools<\/a> used for this type of social media surveillance are likely to be based on keyword filters<\/a> and machine learning models<\/a>, including large language models<\/a> such as those that underlie chatbots such as ChatGPT. These tools are subject to various flaws and limitations that will exacerbate the deprivation of individuals\u2019 fundamental rights to free expression and due process. This litany of problems with automated social media analysis is so pronounced that DHS opted against using such a system<\/a> during the first Trump administration. DHS’s concerns about erroneous enforcement and deportations<\/a> may have disappeared, but the risks from this technology have not.<\/p>\n\n\n\n First, models may be trained with a particular bias. Social media monitoring systems are generally trained on selected keywords and data easily found on the web, such as data scraped from Reddit, Wikipedia, and other largely open-access sources<\/a>, which over-index<\/a> on the views and perspectives of a few. Keywords may be added to the training corpus to fit the domain of use<\/a>, such as offering examples of what constitutes \u201canti-semitism\u201d or threats to national security. Should the training data over-represent a particular set of views or designations<\/a> of \u201cforeign terrorists,\u201d the model may over-flag speech by some individuals more than others. The Administration\u2019s over-capacious definition<\/a> of the term \u201cantisemitic\u201d may be weaponized during the training of these social media monitoring models, subjecting to greater scrutiny anyone who has engaged in speech with which the Administration disagrees on topics such as Israel-Palestine or campus protests related to military actions against Gaza, even where the speech is protected by the First Amendment.<\/p>\n\n\n\n Second, and relatedly, these prescriptive tools struggle to parse context. While keyword filters and machine learning models may be able to identify words or phrases they\u2019ve been tasked to detect, they are unable to parse the context in which the term is used, including such essential human expressions as humor, sarcasm, irony, and reclaimed language. We\u2019ve written previously about how the use of automated content analysis tools<\/a> by Facebook to enforce its Dangerous Organization & Individuals\u2019 policy erroneously flagged and took down all posts containing the word \u201cshaheed\u201d (which means martyr in Arabic), even when an individual was named Shaheed or in contexts where individuals were not using the term in a way that glorified or approved of violence<\/a>. Noncitizen journalists<\/a> who cover protests or federal policy and post their articles on social media may also be flagged and surveilled simply for doing their job. People named Isis<\/a> have long been caught up in the fray and flagged by these automated technologies. Posts by individuals citing the \u201csoup nazi<\/a>\u201d episode of Seinfeld may also be swept in this analysis. Models\u2019 inability to parse context will also limit their ability to conduct predictive analysis. Vendors procured by USCIS<\/a> to conduct social media monitoring assert that they use AI to scan for \u201crisky keywords\u201d and identify persons of interest, but promises of predictive analysis likely rest on untested and discriminatory assumptions<\/a> and burden the fundamental rights of all individuals swept up by these social media monitoring tools. <\/p>\n\n\n\n Finally, the systems will be especially error-prone in multilingual settings. New multilingual language models purport to work better in more languages, yet are still trained primarily on English-language<\/a> data, some machine-translated non-English data, and other available and often religious or government documents<\/a>,\u2014all imperfect proxies for how individuals speak their languages online. Multilingual training data for models is likely to underinclude terms frequently used by native speakers, including spoken regional dialects, slang, code-mixed terms, and \u201calgospeak<\/a>.\u201d As a result, most models are unable to parse the more informal ways people have of speaking online, leading to erroneous outcomes when models analyze non-English language speech.<\/p>\n\n\n\n There have already been countless instances where digital translation technologies have been used by U.S. immigration enforcement agencies in problematic ways, which have prevented individuals from accessing a fair process and even safety. For example, an automated translation tool resulted in an individual erroneously being denied asylum because it misunderstood that she was seeking safety from parental abuse, literally translating that her perpetrator \u201cel jefe\u201d was her boss rather than her father. An individual from Brazil was detained<\/a> for six months because of an incomplete asylum application, because the translation tool ICE used translated \u201cBelo Horizonte\u201d literally to \u201cbeautiful horizon\u201d instead of identifying it as a city in which the applicant had lived. Another automated system used to conduct content analysis mistranslated \u201cgood morning\u201d in Arabic<\/a> to \u201cattack them.\u201d Widespread use of these error-prone systems to detect disfavored ideas will only exacerbate the discriminatory treatment of those who speak English as a second language.<\/p>\n\n\n\n Ultimately, the adoption of automated technologies to scan social media data will punish people for engaging in legal speech and result in more errors in an already flawed system. It will also chill the speech of millions of people in this country and abroad, impoverishing the global conversations that happen online. An applicant seeking to adjust their status or become a U.S. citizen, or even a U.S. citizen seeking to communicate with a noncitizen, will reasonably think twice before speaking freely or engaging in constitutionally-protected activities like protesting, simply because of the specter of social media surveillance. They<\/a> already<\/a> are<\/a>. <\/p>\n","protected":false},"featured_media":86099,"template":"","content_type":[7251],"area-of-focus":[77,806,799],"class_list":["post-108372","insight","type-insight","status-publish","has-post-thumbnail","hentry","content_type-blog","area-of-focus-free-expression","area-of-focus-government-surveillance","area-of-focus-us-surveillance"],"acf":[],"_links":{"self":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight\/108372","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight"}],"about":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/types\/insight"}],"version-history":[{"count":5,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight\/108372\/revisions"}],"predecessor-version":[{"id":108377,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight\/108372\/revisions\/108377"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/media\/86099"}],"wp:attachment":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/media?parent=108372"}],"wp:term":[{"taxonomy":"content_type","embeddable":true,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/content_type?post=108372"},{"taxonomy":"area-of-focus","embeddable":true,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/area-of-focus?post=108372"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}