willmeyers' blog

Last night, >20% of Reddit links on Google weren't reachable

Insomnia strikes again!

I wrote a script to fetch Reddit links shown on the first page of Google search results and check whether the subreddit linked was private or not.

I asked ChatGPT to generate a sample of 100 common Google searches and appended "reddit" to the end of them.

Results of Survey

.       total	private	percentage	term

0	10	4	40.0	django or ruby reddit
1	10	3	30.0	best chair reddit
2	9	0	0.0	funny meme reddit
3	10	0	0.0	reverse string reddit
4	10	6	60.0	iOS or Android reddit
5	10	4	40.0	best laptop for coding reddit
6	10	2	20.0	Python or Java reddit
7	10	4	40.0	web development framework comparison reddit
8	5	5	100.0	best budget headphones reddit
9	10	0	0.0	programming language comparison reddit
10	10	4	40.0	JavaScript or TypeScript reddit
11	10	0	0.0	best travel destinations reddit
12	12	0	0.0	book recommendations reddit
13	10	0	0.0	best online coding courses reddit
14	10	4	40.0	fitness tips for beginners reddit
15	10	1	10.0	WordPress or Drupal reddit
16	10	2	20.0	best gaming mouse reddit
17	5	3	60.0	top programming subreddits
18	10	0	0.0	best coding bootcamps reddit
19	9	4	44.44	career advice for developers reddit
20	5	4	80.0	best budget smartphone reddit
21	10	1	10.0	comparison between Mac and Windows reddit
22	10	0	0.0	best hiking trails reddit
23	9	0	0.0	dieting tips reddit
24	10	2	20.0	React or Vue reddit
25	9	0	0.0	best podcast recommendations reddit
26	10	0	0.0	self-improvement tips reddit
27	10	0	0.0	best coffee brewing methods reddit
28	11	0	0.0	fiction book recommendations reddit
29	10	6	60.0	DIY home improvement projects reddit
30	5	3	60.0	best budget DSLR camera reddit
31	10	1	10.0	traveling with pets tips reddit
32	10	0	0.0	best vegetarian recipes reddit
33	10	1	10.0	coding bootcamp reviews reddit
34	10	1	10.0	top cities for remote work reddit
35	10	0	0.0	best board games for game nights reddit
36	10	1	10.0	music production software comparison reddit
37	7	1	14.29	home workout routines reddit
38	5	0	0.0	best budget travel destinations reddit
39	10	0	0.0	motorcycle vs car for commuting reddit
40	10	0	0.0	best productivity apps reddit
41	4	0	0.0	cryptocurrency investment advice reddit
42	10	10	100.0	anime recommendations reddit
43	10	0	0.0	best hiking gear reddit
44	10	6	60.0	remote job interview tips reddit
45	10	1	10.0	coding challenges for beginners reddit
46	10	0	0.0	plant-based diet tips reddit
47	10	5	50.0	best gaming keyboards reddit
48	10	1	10.0	freelancing tips and tricks reddit
49	14	0	0.0	foreign language learning resources reddit
50	5	1	20.0	budget-friendly home decor ideas reddit
51	5	5	100.0	best budget wireless headphones reddit
52	13	0	0.0	healthy meal prep ideas reddit
53	10	8	80.0	photography equipment recommendations reddit
54	10	0	0.0	web design vs web development reddit
55	10	0	0.0	book suggestions for self-improvement reddit
56	10	0	0.0	creative writing tips reddit
57	5	1	20.0	best budget fitness trackers reddit
58	10	3	30.0	advice for first-time homebuyers reddit
59	10	1	10.0	top meditation apps reddit
60	10	0	0.0	best hiking trails near me reddit
61	9	0	0.0	cooking tips and tricks reddit
62	10	1	10.0	programming languages comparison reddit
63	5	4	80.0	best budget-friendly smartphones reddit
64	10	4	40.0	self-care routine ideas reddit
65	10	1	10.0	best podcasts for personal development reddit
66	8	0	0.0	home office setup inspiration reddit
67	10	5	50.0	top indie games recommendations reddit
68	10	0	0.0	tips for starting a small business reddit
69	5	0	0.0	best budget-friendly laptops for students reddit
70	10	3	30.0	fitness motivation and tips reddit
71	9	0	0.0	gardening hacks and tips reddit
72	10	0	0.0	best sci-fi book series reddit
73	5	1	20.0	budget-friendly meal planning reddit
74	10	7	70.0	tips for mastering a musical instrument reddit
75	10	0	0.0	career change advice reddit
76	5	3	60.0	best budget-friendly home workout equipment reddit
77	10	7	70.0	professional networking tips reddit
78	4	0	0.0	top documentaries to watch reddit
79	10	9	90.0	outdoor photography tips and tricks reddit
80	5	0	0.0	best budget-friendly skincare products reddit
81	10	10	100.0	yoga for beginners resources reddit
82	10	0	0.0	cooking with limited ingredients recipes reddit
83	9	2	22.22	top DIY craft ideas reddit
84	10	2	20.0	car buying tips and negotiation strategies reddit
85	9	0	0.0	best personal finance books reddit
86	10	2	20.0	coding bootcamp success stories reddit
87	10	0	0.0	tips for improving focus and concentration reddit
88	5	5	100.0	home renovation ideas on a budget reddit
89	10	4	40.0	top podcasts for true crime enthusiasts reddit
90	10	0	0.0	beginner-friendly painting techniques reddit
91	5	0	0.0	best budget-friendly wireless routers reddit
92	10	7	70.0	tips for sustainable living reddit
93	10	2	20.0	finding remote freelance gigs reddit
94	10	0	0.0	book recommendations for historical fiction lovers reddit
95	5	0	0.0	budget-friendly travel gear reddit
96	5	3	60.0	best budget-friendly noise-canceling headphones reddit
97	9	3	33.33	tips for improving sleep quality reddit
98	10	2	20.0	italy hotels reddit

Taking the mean of the percentages column, we get 23.8%. Potentially over a fifth of Reddit links being served to users are blocked behind private subreddits.

Think or do whatever you wish with this info.

The script

import urllib.parse
import  requests

cache = {}

def search_and_check_reddit():
    with open("terms.txt", "r") as f:
        for term in f.readlines():
            headers = {
                "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            }
            page = requests.get(f"https://www.google.com/search?q={term}", headers=headers)

            links = [link.split('&')[0][7:] for link in page.text.split('href="')[1:]]
            links = [link for link in links if link[:4] == "http" and "reddit.com" in link]

            total = len(links)
            private = 0
            for link in links:
                parsed_link = urllib.parse.urlparse(link)
                subreddit = parsed_link.path.split('/')[2]
                if subreddit in cache:
                    if cache[subreddit] is True:
                        private += 1
                        continue
                
                response = requests.get(link, allow_redirects=True)
                if "private-community-modal" in response.text:
                    cache[subreddit] = True
                    private += 1
                else:
                    cache[subreddit] = False
                
            print(f"{total}\t{private}\t{private/total*100:.2f}%\t{term}")

search_and_check_reddit()