Crawling smarter, not harder

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:10):
- JohnHello, and welcome to another episode of Search Off the Record, a podcast coming to you from the Google Search team. My name is John, and today we have Lizzi and Gary. Say hi.
- LizziDon't tell us what to do.
- GaryYeah. Hi.

(00:31):
- JohnThank you. Thank you. So nice to have you here. Last time we talked with Dave Smart, and apparently we also talked about crawling, but I was not here.
- GaryFor the listeners, John is trying to figure out Lizzi's notes because Lizzi started reading this, or wanted to read this, and then John was like, "No, I'll do it."

(00:58):
- LizziHe would not let me do the intro so that we are left with this intro, which is very confusing.
- JohnOkay, go for it, Lizzi.
- LizziOkay, so this is supposed to be a part two for people who are not following along, I guess. We had episode one with Dave Smart to talk about what is crawling, and we sort of did a background, I don't know, set-the-stage episode. Since then, Gary has posted too many times about crawling on LinkedIn, so we thought maybe we could talk about that. What?

(01:29):
- GaryWhat? What do you mean? A) Why was I not told that Dave was part one? 2) What does it mean I'm posting too much or too many things about crawling? What does two mean?
- Lizzi2222. T-W-O.
- GaryYour English construction is weird.
- LizziI heard that you posted about crawling, but I actually didn't--

(01:50):
- GaryYou heard?
- Lizzi Yes, I heard. You told me that you posted about crawling on LinkedIn and you got some surprising responses from people. Surprising in more senses than one.
- GaryAre you sure?
- LizziI'm pretty sure it was you.
- GaryOh.
- LizziI also heard that this year you were going to work on crawling.

(02:15):
- GaryWhat?
- LizziIs that a true statement?
- GaryYeah.
- LizziAt the beginning of the year, you thought maybe you would do something with crawling?
- GaryWell, yeah. I mean, we've already done some things, I think. But, in general, yes, I think we should do more on crawling in the sense that we should make it more... Well, we should crawl somehow less, which would mean that we crawl more.

(02:40):
- LizziI think you did post about that on LinkedIn, and then Barry cross-posted that "Google wants to crawl less." And then the internet broke because they were like, "What?"
- GaryAh, Barry from--
- LizziThis is like Barry from Search Engine Roundtable.
- Gary Yeah. Right.
- LizziYes. Barry Schwartz.
- JohnOh, cool. I mean, it's something I hear from SEOs a lot where they think, well, Google usually crawls more when he thinks my site is good.

(03:06):
- LizziHe?
- JohnGoogle, the Googlebot.
- GaryThey slashed them.
- LizziGooglebot accepts all pronouns.
- JohnOkay, then that was fine. I'm sorry.
- GaryAre you a spokesperson for Googlebot?
- LizziYes.
- JohnOkay, so. So, people thought that Googlebot usually crawls more when Googlebot thinks that something is good. The assumption is that you can turn it around as well and be like, "Well, I will push Googlebot to crawl more and then Googlebot will think my site is actually good." Which...

(03:46):
- GaryNo.
- LizziI mean, is that like a chicken and an egg thing though?
- JohnWhat?
- LizziLike, does your site have to be good first for Google to then crawl it more? Or does Google crawling more then means your site is good.
- JohnI don't know, Gary. What do you think?
- GaryWhy me?
- JohnIf I can make Googlebot crawl my site more because of my fancy robots.txt file, does that mean that my site will be better in Search?

(04:11):
- GaryNo. I mean, why would it?
- LizziI mean, it sounds like people are using this as a proxy. Like, if Google is interested in my site more often, then that means my stuff is good.
- GaryIt could also mean that there's an infinite space on the site.
- JohnOh, that's a cool hack. I'll put a calendar script on my site.

(04:33):
- GaryNo, no. No. Sit down, please.
- LizziHas this always been a thing, that people think that more crawling is equals good.
- GaryYes, I think so. I mean, in one of the presentations that we keep doing, Search Central Live events, that is actually about myth busting. And it has at least one or two questions about crawling. And then it's like, "Oh, Google is crawling my site a lot, so my site must be very good." And it's like, "Nah, not really." Like it can mean many things, but generally if the content of a site is of high quality and it's helpful and people like it in general, then Googlebot--well, Google--tends to crawl more from that site, but it can also mean that, I don't know, the site was hacked. And then there's a bunch of new URLs that Googlebot gets excited about, and then it goes out and then it's crawling like crazy. Or we discover John's calendar script, and then we try to crawl every single URL for every day until 2077. It can mean other things as well than just quality. But then, on the flip side, if we are not crawling much or we are gradually slowing down with crawling, that might be a sign of low-quality content or that we rethought the quality of the site.

(06:08):
- LizziBut what if it's not changing, like the content? We go and crawl it, and they haven't made a change. Why would we need to go crawl that often again if they're not making a lot of changes?
- GaryI mean, we have to go back and see if it changed, right?
- LizziBut if we notice that it's not changing, do we then slow it down?

(06:29):
- GaryWell, we still have to go back.
- LizziBut would that result in, over time, less?
- GaryProbably, but I don't know. John has a site that he hasn't updated in like 72 years. I'm looking at the logs here.
- JohnSure.
- GaryAnd he could say:
- JohnIt still gets crawled. Yeah. I think it's challenging with those kind of sites because maybe it didn't get updated in the last couple of months, but maybe it gets updated in five minutes.

(07:00):
- LizziOkay, so Google still wants to check, just in case.
- JohnThat's my understanding at least. Yeah. I think with regards to the amount of crawling and the external perception, there's also the aspect of like a lot of sites have a lot of different pages, and then it's not so much that Google crawls one page very often. It's sometimes just like, well, if you have all of these pages and Google has never crawled them, then Google wouldn't be able to know what to do with it. So some of that perception of like, well, if only Google could crawl more, then it would see that I actually have some good content. I can kind of understand that.

(07:40):
- LizziIs it more about crawling more often?
- JohnMy assumption is that a lot of people just look at the Crawl Stats report in Search Console or server logs and just look at the number of requests over time. And then you don't necessarily see it's like, "Oh, it's looking at my home page every day," but more like, "It's looking at 500 pages every day." But which ones?

(08:05):
- LizziAre they hoping to see that just increasing over time? What's the ideal state from a site owner's perspective?
- GaryI think so.
- LizziBecause that also seems like it maybe bad.
- GaryYou know that form that we linked to on onesie on developers.google.com/search where you can report issues with Googlebot?

(08:25):
- LizziYeah.
- GaryThose reports end up in our inboxes. There we see sometimes that people are like, "Increase our crawl over time." And it doesn't work. We are not going to increase anyone's crawling if they write in through that form. Like, if there's some crawling emergency, then we would decrease the crawl volume for that site. But it's obvious that they want increased crawling over time. Some people want.

(09:00):
- LizziOkay, so you're saying that the form is there and you're supposed to use it only to report like too much, like your servers are being overloaded.
- GaryYeah, but, I mean, it's a form.
- LizziBut people are filling it out anyway and they're like, "Give me more!"
- GaryYeah, but it's a form. We are quite explicit about what you should use that form for. But then it's a form. So it's like people are going to people anyway. We get other requests as well, which we cannot satisfy, but we still get them.

(09:28):
- LizziHow would that work? Or have we ever considered a method like that, where people can ask?
- JohnAutomatically?
- LizziYeah.
- JohnWe had the setting in Search Console, but that was about limiting, so reducing the amount of crawl.
- LizziStill about a limit.
- GaryBut it's always about limiting, because the upper part that has to be determined about what the server tells us about how much it can handle.

(09:55):
- LizziWhat if it says, "I can handle everything?"
- GaryWell, it would not be able to. At one point, we would crash the server and we wouldn't be able to connect to it. That would be a very clear signal that we have to slow down.
- LizziOkay, so is it more of a site owners not understanding that dynamic, like what it means to request more, that that effect will then be that their servers crash?

(10:26):
- GaryI think the confusing part is that there are two parts to this. One is what the server can handle, and then there's the quality aspect to it. The content on the site has to be of high quality and useful for users or helpful for users. And then the Search demand for crawling would increase and then we would crawl more potentially. And then the technical part comes into play, like how much can we actually crawl without harming the server?

(11:05):
- LizziOkay.
- GaryBut it's not infinite. There has to be a limit because the server doesn't have infinite resources.
- LizziRight. But this year you thought we can optimize there, that there's something that we can do?
- GaryI mean, we were thinking about this for a long time. There was always crawl optimizations going around. If you look at the early posts on blog posts on onesie, on the blog, even in the early days 2006, 2007, they--Vanessa Fox, former product manager for the old Webmaster Tools, and the team--were already thinking about how to optimize crawling more.

(11:52):
- LizziIs it usually the same sort of approach, like we want to be more efficient about what we're doing, or is it like a timing thing? Is there something new that we could be doing that we haven't thought of before?
- GaryIt's a combination, I guess. Sitemaps, I don't know. John was involved with sitemaps early on, but sitemaps was one of those optimizations. And, on our site, I don't know, like 304, and if modified since, that was something that had to be implemented on our side, the support for it, I mean.

(12:32):
- JohnCool. And, with If-Modified-Since, is that something that you see people are doing correctly or is that something others should be doing differently?
- GaryWait. If modified since, that's a request header so it's us doing it correctly.

(12:54):
- JohnWell, it could be that the site says, "Oh, yes, everything changed today."
- GaryOh I see.
- JohnIt's like we asked, "Has it changed since yesterday?" and the site said, "Yes, yes." It's like, "You must take a look."
- LizziI see. Because it could be something that's automatically in place, like, "Yes, I update a link," but then my CMS says, "Okay, today is the new date that I published content." And so therefore it gets interpreted that I made a change, therefore come look at it.

(13:21):
- GaryThe response to an if modified since would be a 304, right.
- JohnI think a 304 is not modified. I don't know offhand. I would have to ask my friend Gemini.
- Lizzi304. Not modified. HTTP server response code.
- JohnOkay, so 304 would be like, "No, Google. Nothing has changed here." And a 200, I think would be the response then if it's like, "Okay, here is actually the new version."

(13:50):
- GaryRight. I think there's also caching directives that you can respond with. There's, I don't remember the name of the Apache server module, but there are other caching directives as well that you can respond with. I think, on our side, it's implemented externally. Doesn't seem to be used enough, I think. Basically, people are just responding with, like even if we send out the If-Modified-Since request header, servers are responding with just 200, basically just ignoring it. I don't think that's necessarily a good thing. But then, at least at Google, there are a few products that probably prefer that. Probably.

(14:41):
- LizziHow so?
- GaryLike, for example, news. I would imagine that they don't want, especially for live news, live blog stuff.
- LizziLike really time sensitive things that are happening, like as a cricket match is happening or something.
- GaryWe don't want to cache those, I guess. I don't know, but this is exactly what I want to analyze: how much 304 is used by external sites, how many If-Modified-Since headers are we sending out with our fetches, and then try to encourage people to use it more because it can save quite a bit of bandwidth and by definition, also resources for the servers.

(15:25):
- LizziI see.
- GaryLike, on our side, we don't particularly care about the resources for crawling.
- LizziHow does it save resources? Is it because we can just do a little quick check and then we don't have to fully look at everything?
- GaryYeah, exactly. A 304 response that, if I remember correctly, the RFC, the standard says that you don't put the HTTP response body in it. There should not be a response body. It's just the headers. So, basically, you send back, what, like a 1000 bytes instead of like 100,000 bytes or whatever it is.

(16:01):
- LizziIt's a lot smaller back, and therefore not taking up as much space from our side.
- JohnYeah, and I guess the server doesn't need to compile the full page.
- GaryYeah.
- JohnThe server can just do the lookup in a database and you're like, "Oh, nothing new. Move along." without having to actually compile the whole thing. It makes it more efficient, I imagine, for both sides.

(16:21):
- GaryIf you are thinking about our CMS that we are using for onesie, there are lots of moving parts on onesie. Like, for example, if you go to, I don't know, the blog home page, then you have the TOC on the left, or whatever we call it, but the book on the left, you have the title, you have the metadata that we have in the HTML. We have them at the data from dev site, the CMS that we use, and then you have the content. And then, for all of those, you have to make these weird calls to pull in and to compile. And then, all those calls, they cost resources. But then, if you can just make that one call that John said, that just check whether anything changed. Just one call. Just one call.

(17:06):
- LizziAnd it doesn't matter, like that's part step number two, to figure out whether or not something actually did change. We're just checking anyway. It doesn't matter if the change is big or not. I assume, in the next step, it would be to see like, "Okay, well what changed?"
- JohnWell that's, I think on the server side, the server basically just says, "Something changed. Here's everything." It's not like, "Here's a part of the page that has changed."

(17:34):
- GaryWhich would be interesting.
- LizziIs that something that a theoretical space that we could look at? Like, if we could say like, "Hey, actually it was just this one paragraph. That's where I made the change. You don't need to look at everything. Just this one thing was the changed." Would that be helpful if that were able to be like compartmentalized somehow?
- JohnFrom my point of view probably, but implementing it sounds like a nightmare. I don't know, maybe Gary wants to do it anyway.

(18:03):
- GaryWhat?
- LizziI mean, is this something that you would be thinking about or is this like, nope, crazy?
- GaryNo, it's not. I mean, it's crazy, but it's the kind of crazy that we actually like. What?
- JohnGood. Okay.
- GaryIt's a challenging task that can save lots of resources for the internet. Not on our side because, again, I wouldn't say that we have infinite resources, but especially with crawling, it's a tiny, tiny, tiny fraction of our resource usage. I ran out of air. Crawling is a tiny fraction of our resource usage, but from an external perspective, where they have to render the pages and make all those calls to make one page, just sending back the part that actually changed, that sounds like a cool thing, especially with, even in older HTTP versions, like I think starting from 1.1, there was chunked transfer. Basically, you could just say that, "From this segment to this segment, this is the part." And then you could just give that to the client from the server. But it was more complicated then. I think it was slightly broken. Every now and then, the chunks would get messed up. But then, someone pointed out on LinkedIn that someone on the IETF Track, Internet Engineering Task Force, which is a standards body where the Robots Exclusion Protocol also lives. Someone submitted a proposal for a new kind of chunked transfer. I'm watching that closely to see where it's going.

(19:59):
- LizziHow are they currently thinking about it? Is it like navigation, up here, and then the middle of the page is here? Or is it something more like, "This stuff changes really frequently."
} - GaryThat's my naive thinking. I think it's more complex than that. I would need to check the current draft to tell you how it actually works. But my naive thinking, that was that. Like, "Here's the header. Here's the sidebar." I'm fairly certain it's not that simple.

(20:33):
- JohnI imagine that's tricky because you almost have to render the page to understand the DOM if you're saying like, "Oh, the header changed." whereas, from a technical point of view, if you can say, "Oh, bytes 500 to 700 are now this thing," then that's easier.
- LizziBut people don't reliably put it in that same spot because it's free.

(20:57):
- GaryIt's more interesting, and more reliable most likely, because it's not up to the person. It's down to the server. And, of course, you can hack around with the server, like both John and I did stupid things with our servers to fool people. Okay.
- LizziInteresting.
- GaryApparently John didn't. Okay. I take it back.
- JohnNever. Never.

(21:17):
- GaryYou can make the server do stupid things, but you need quite a bit of knowledge about, well, in my case, I was on Apache about server modules, like Apache modules, and especially C to be able to modify modules enough to make them do something stupid.

(21:40):
- JohnI think it's also challenging because it mixes the content with the infrastructure. It's almost like different levels of interaction. But I think it would be cool if people could say, "Oh, actually, only this news item changed."
- LizziYeah, or like, on a product page, my pricing, "This little area is the thing that is changing all the time, but the description of this pair of shoes is the same."

(22:09):
- JohnExactly. Yeah, I don't know, from a personal point of view, I think that would be cool. Yeah, and the chunked transfer, I think is pretty common. It's also done for videos, I think, or large files where you have to--
- GaryFor large files, for sure.
- JohnYeah.
- GaryAlso I think POSTs, like POST methods.

(22:31):
- JohnYeah. I don't know that sounds pretty cool. What other kinds of optimizations do you see happening with regards to crawling?
- GaryMaybe better URL parameter handling.
- LizziWhat?
- JohnOh, okay.
- LizziLike hashtags.

(22:52):
- GaryOh, hashtags. Hashtags are complicated, and we have a very complicated relationship with them, I think.
- JohnDo you mean hashtags or like, what is it, anchors, like the pound symbol?
- LizziOh, sorry. The pound symbol. The hash symbol?
- GaryYeah. I just assumed that you meant that.
- LizziSorry, I did mean that.
- GaryThe problem with them is that they only live on the client side.

(23:17):
- LizziOkay. And why is it a problem?
- JohnOh, this is because you hate JavaScript, right?
- GaryWhat? I mean, yeah, but what?
- JohnThey're used for JavaScript sites, right?
- LizziFor the whole client side / server side, why is it a problem that it's on the client side? It's harder for us to get there?

(23:38):
- GaryPretty much.
- LizziOkay. It's further away from us.
- GaryWell, technically Googlebot cannot get there.
- JohnWithout rendering.
- GaryWithout rendering.
- LizziI see. Okay.
- JohnAnd the URL parameters that you mentioned, that would be something like the URL Parameter handling tool that we used to have more in a protocol format where you say, "This parameter is optional"?

(24:06):
- GaryOh, that's a good idea.
- LizziCan you give me like a real example? Like, what do we mean by URL Parameter handler?
- GaryLike hl=en and whatever parameters that we have on onesie and on support.google.com.
- LizziOkay, but what would make it hard, I guess, the fact that we're using those?

(24:26):
- GaryBecause, technically, you can add that in one almost infinite--well, de facto infinite--number of parameters to any URL, and the server will just ignore those that don't alter the response. Basically, it will just discard them. But that also means, that for every single URL that's on the internet, you have an infinite number of versions.

(24:52):
- LizziBecause all of this stuff is tacked on?
- GaryBecause you can just add URL parameters to it.
- LizziOkay.
- GaryThe server is instructed to ignore them. It will not alter the content that it returns. But it also means that when you are crawling, and crawling in the proper sense like "following links," and I'm air quoting here, then everything-- Why are you laughing? Like, we are not following links properly. It's just like we are collecting links and then we are going back.

(25:22):
- LizziWell, you imply that there's an improper use of crawling or an improper way to crawl.
- GaryWell, yeah, it's my pet peeve. On onesie, we keep saying Googlebot is following links, like, no, it's not following links. It's collecting links, and then it goes back to those links. It's not like properly following links. The picture that we are painting is that Googlebot is like hopping from--

(25:44):
- LizziIs it because it's going into the anthropomorphic territory where Googlebot thinks, Googlebot sees, Googlebot--
- GaryUnderstands.
- LizziUnderstands, follows, walking around on all eight legs.
- GaryWait!
- LizziSix legs. How many legs?
- JohnEight? Don't judge.
- LizziWhat do you mean? There's got to be a correct answer for this for spiders.

(26:06):
- GaryNine.
- LizziNo spiders, they have an even amount of legs. URL parameters, why is this a problem in terms of crawling efficiently? It sounds like it's because we're maybe wasting time looking at parameter versions of the links when it could be the same thing, but sometimes it is different.

(26:26):
- GaryExactly. Sometimes it is different, and that's the problem.
- LizziWe don't know based off of the URL.
- GaryWe basically have to crawl first to know that something is different, and we have to have a large sample of URLs to make the decision that, "Oh, this these parameters are useless."
- LizziOkay, and there's no way for site owners to tell us how they're grouped now?

(26:51):
- GaryDo you know how we like to remove features from Search Console?
- LizziYes, I remembered that we took it away because it was not used, I think.
- GaryI mean, it was not used.
- LizziYes. And now it seems like there's a need to be able to control this. But they weren't using the tool, so maybe there needs to be some other kind of solution that would be--

(27:11):
- GaryRight. But like, if someone is complaining that we are over crawling them because they have one of these weird URL spaces with an infinite number of URL parameters, then we could just tell them that, "Okay, use this method to block that URL space."

(27:34):
- LizziWhat kind of method?
- GaryEven robots.txt could be used. It doesn't have to be like--
- LizziOh, like, "Anything that is after this symbol, don't look at it"?
- GaryOr this combination or something like that.
- LizziInteresting.
- GaryBecause, with robots.txt, it's surprisingly flexible what you can do with it.
- LizziAnd that's something that we could do now?
- GaryYeah, we just have to figure out what to say.
- LizziOh, interesting.

(27:55):
- GaryAnd I don't have brains to think about it.
- LizziOkay.
- JohnOh, so the solution to crawling is more documentation.
- LizziOh.
- GaryJob security.
- LizziDarn. Wait wait, wait. We haven't asked John enough questions about what his ideas are.
- GaryYeah. John, what are your ideas?

(28:15):
- LizziYou keep asking Gary.
- GaryTell us your ideas.
- LizziHave you had any harebrained ideas?
- JohnHair-brained ideas?
- LizziIt's top of mind for me.
- JohnTop of mind?
- LizziI'm so sorry. Oh my God. What's top of mind for you?

(28:40):
- JohnI think it's challenging because I like sitemaps, for example, and apparently people also like sitemaps and they submit them in lots of really weird and broken ways. So that makes me a little bit jaded, almost, in the sense that it's like, "We will come up with a new method to make crawling more optimal for you." And then everyone's like, "Well, I will just use it incorrectly."

(29:06):
- GaryYeah.
- JohnSo that's kind of the challenge. On the other hand, I also would like to make it so that Google or other search engines don't have to guess how to crawl optimally.
- LizziLike it should be more clear and easy for other search engines to follow. Why do we need to go reinvent the wheel?

(29:28):
- JohnMaybe. Maybe, I don't know. But I think also just the awareness of everything around crawling, I think that makes a big difference. I noticed that, for example, when I launched my first crawler back in the year 1822, it ran on this obscure operating system called Windows. When I initially launched that, I noticed that almost every site that you put in there to try to crawl, it goes crazy, like finds all of this crazy stuff. And it essentially shows how complicated the web is, like all of these weird links, and they go in all different places and some of them are broken, some of them are infinitely long. I think just generally the awareness of how crawling works has gotten a lot better over that time. People use common content management systems, like WordPress, now, which make crawling a lot easier. And maybe some of that awareness just has to go a little bit further to make it so that more people understand potential pitfalls and then think about like, "Oh, this parameter that I want to add for tracking, maybe I shouldn't or maybe I should do it in a different way so that it doesn't affect crawling."

(30:53):
- LizziLike, what could be the consequence of my actions of implementing this thing? It could cause a domino effect somewhere else.
- JohnYeah, I think for smaller sites, you can do a lot of things wrong and you have a thousand URLs instead of ten, that doesn't change anything. But, if you're a giant e-commerce site and suddenly you have 100 billion URLs instead of one million, then that's kind of a big difference. So some amount of awareness from both sides I think is important.

(31:25):
- GaryAlso the thing about, "Okay, but I have enough resources, so just go ahead and crawl them anyway." But then it's like we could spend that time on URLs that will actually help your site, because, sure, I don't like when people think about crawl budget, but we are still spending time on crawling.

(31:54):
- LizziAnd you could apply it in a productive way. It's not just an exponential, just everything, firehose, and you will catch also the garbage stuff that doesn't matter. It's not helping anyone.
- GaryYeah.
- LizziIf you had to say one thing that you wish people wouldn't do or your pet peeve, what would it be? John, you can go first.

(32:17):
- JohnMy pet peeve is, at the moment, and I guess at the moment means I recently received some messages from folks about this, is people who don't look at the server stats in Search Console, the Crawl Stats in Search Console, because there's a lot of information in there if you just look at it. For example, response time is in there, average response time.

(32:48):
- LizziAre they just coming to your inbox and saying, "John, what is my average response time?"
- JohnNo.
- LizziAnd you're like, "Hello, you can just go look it up," or what kind of question?
- GaryOh no, he actually answers like, "792 milliseconds."
- JohnNo. Well, the problem, for me, is when it's not milliseconds anymore, they're like, "Oh, why are you not crawling my site enough?" And then I look at the stats and it's like, "Oh, it takes, on average, three seconds to get a page from your server. That's actually a very long time." We don't really tell people like what they should be aiming for there.

(33:18):
- LizziI see. Is it an on and off thing? Like, it's either working or it's not. And, if it takes two seconds versus ten seconds, we're not showing it as broken.
- JohnWell, I mean several seconds is actually fairly long. If you want us to crawl a million URLs from your website and, instead of 100 milliseconds, it takes like ten times as much or 20 times as much, that's a big difference. And that's something where, if you looked at those stats, then you could go to whoever's running your server and be like, "Look at these numbers. These numbers are objectively bad. You can improve them." And then they have something that they can work on, which is very different from a lot of other SEO things where it's like, "Oh, my relevance is not great." And then someone else on the server side is like, "Well, okay, I can't change that."

(34:10):
- LizziThis is more like a clear, like it's a black-and-white sort of number that you can take back and say like, "Things are bad, please fix it."
- JohnExactly. And you can multiply the number of pages on your site by the response time. You're like, "This is a lot of time that is being wasted."
- LizziOkay, so open the Crawl Stats report.

(34:30):
- JohnSo look at Search Console. Yeah.
- LizziAnd Gary.
- JohnWhat do you think, Gary?
- GaryWhat?
- JohnYou mentioned your pet peeve was people anthropomorphizing. That's your pet peeve that I do maybe.
- GaryYes.
- LizziBut, for the rest of the people, or in general, a pet peeve that you have about crawling that you wish that people either knew or a misconception that you see, like, "What the heck? If people would just do this, or stop doing this."

(34:59):
- GaryI don't know if I have a pet peeve really.
- LizziOr a hill you will die on.
- GaryI kind of want hosting companies to help more, their customers, when things go wrong. Because I wouldn't say very often, but every now and then, we see sites complaining to us that Googlebot is not crawling them. And then we look at what's happening and it's their DNS server is blocking us, or their server is blocking us, or their network is blocking us, and then we are like, "We have no idea where it's blocking, but it's blocking and it's on your side." And they are like, "No, because the hosting company was like, 'It must be you,' but it cannot be you. We see that we cannot connect to your server. Why would we not want to connect to your server or your DNS or whatever?" And it's like, "No, but the hosting company was like, 'It's on your side.' " I understand that because of how hosting companies are set up nowadays, that they are behind the CDN, that also eats up some of the trace information, or they are on elastic clusters that grow and shrink and, again, some of the traces are lost. But, still, if we could just spend more time on telling people, we, as those who worked on networking or whatever, or server management, how connections are made and then help people understand and also debug their problems, that would be fantastic. Because, if you know how a connection is made between a client and a server, then saying that it's on the client side, the problem, when a client cannot connect to a server, that's like a stretch.

(37:20):
- JohnSo you're saying more Search Console?
- GaryWhat's a Search Console?
- JohnMore features in Search Console.
- LizziI was hearing education ideas.
- JohnTell you when you're doing something wrong, or tell the site owner or the hoster.
- GaryWe should send more messages, but we should send all the messages on a single day.
- JohnOn a single day.
- GaryYeah, pile them up, and then on, I don't know, first day of the month, just send out all the messages.

(37:47):
- JohnI have a better idea. We post the messages on social media and then anyone can fix any site's problem.
- GaryI know. And then we tag.
- JohnWe tag people.
- GaryPeople. Yeah. "Hey, this is your site."
- JohnThis is your site, and we tag all the hosting companies.
- LizziOh, to come fix it, like hello, we can @ them directly, like the companies. No, that's too much.

(38:10):
- JohnI mean, sometimes the crawling problem is also on our side. We kind of have to accept that they will do the same thing.
- LizziMaybe it's a last resort. We were not able to contact you via this message, so we are now broadcasting.
- GaryOh, we did that before.
- JohnWe've done that before. We've also sent faxes before.
- LizziReally?
- JohnFaxes? Yes.

(38:32):
- LizziIs this like a setting? This would be great actually.
- JohnA great setting in Search Console?
- LizziIn Search Console, so instead of email notification, what method would you like to be notified, a fax option.
- JohnA fax. A fax number.
- LizziYes. Handwritten from John.
- JohnHandwritten from John.
- GaryWait. We want people to be able to read that.

(38:52):
- LizziYou have bad handwriting. I don't think I've ever seen your handwriting. I can't confirm.
- JohnSee.
- LizziI've never seen you write. Maybe it's only speech to text. All right. I think we are way over time, potentially. My timekeeper didn't gesture anything, so I'm not sure.

(39:12):
- JohnWe gestured a little bit.
- LizziA little bit, and I missed it because I can't see.
- JohnThat's fine.
- LizziOkay.
- JohnIt was fun. It was a good discussion.
- GaryOh, it was?
- JohnYeah.
- GaryOh.
- LizziWell, it was supposed to be painful. This was supposed to be--
- GaryWell, it was painful to me.
- LizziGood. Okay, well, that's it for this episode. Next time on Search Off the Record, we'll be talking with Mihai, another product expert, about working with the Search Console API. Thank you folks for listening, and goodbye.

(39:42):
- JohnGoodbye.
- GaryBuh-bye.

All Episodes

Episode Transcript

Popular Podcasts

Fudd Around And Find Out

Crime Junkie

The Breakfast Club

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Crawling smarter, not harder

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Fudd Around And Find Out

Crime Junkie

The Breakfast Club

All Episodes

Crawling smarter, not harder

Fudd Around And Find Out