3/30/2023 0 Comments Pricetracker system![]() ![]() You can read more about canonical link here.įor an Amazon product page, the canonical link tag looks like the following That’s why it became industry standard to include a canonical link in every page, which is just a link tag with rel=canonical parameter included in the head section of an webpage. You see, search engines faced this problem a long time ago as soon as websites started using dynamic routing, the URL wasn’t anymore a reliable way of fingerprinting an webpage. There is a very simple solution to this problem of getting an unique product URL. If we were to do the parsing manually, we would have to take care of all of these fringe cases manually, and even then we could not be sure if covered all cases or not. Also, as you might have noticed, sometimes the URL contains the product name as a slug, sometimes it doesn’t. Here, the part starting with ref is redundant, despite it not being a query parameter. For example, this is what a typical input URL might look like But there is a problem with both of these approaches-I can’t expect an user to give a clean URL as an input to the application, they’ll most likely input URLs with lots of query parameters and tracking codes and what not. I had noticed that both Amazon and Flipkart uses a product id in their URL, so either I can extract that id from the URL, or I can use the entire URL itself as the identifier. I figured I’d need an unique identifier for every product I’ll be tracking. If you’re unfamiliar with these libraries or scraping in general, check it out and you’ll be up to speed in no time. I’m not going to cover the scraping code I wrote in detail, instead I’m going to cover some interesting problems I faced and how I solved them. This article is an excellent quickstart guide to scraping using requests and lxml. There are some alternatives to lxml we could’ve used- beautifulsoup comes to mind-but lxml is much more faster than its competition, mostly because it is a compiled C library at it’s core. Using requests we can download the HTML page from the Amazon or Flipkart product page, and then we’ll use lxml to parse the HTML and extract the relevant information from it. But still, if you didn’t know, it’s an HTTP library with a very elegant and easy-to-use interface, and even the Python core team recommends using this over their native HTTP library urllib.request. You must have come across requests already, it might be the most popular Python library of all time. My go-to tools for scraping are requests and lxml. So without further ado, let’s dive into the core of the discussion, i.e. I decided to use Python as the programming language, along with the serverless framework to ease deployment. ![]() Built with AWS using a serverless architecture.It can always be added later if I feel like it.Īs for the technical architecture, the product would be I decided not to implement any user authentication, as that would increase the complexity a lot. Send alerts to the users whenever the prices of the products they’re tracking changes.Track the prices of only the products the users have requested tracking for.Note - You can find the full source code of the application on Github here, you can refer to that as you go through the article.įrom a functional standpoint, the application would do the following.I’m putting it here at the start rather than the end because I just want to show what’s to come, the article will go into detail about everything. The Architecture Overviewīefore I start explaining the application, here’s a simple architecture diagram I made of the whole setup. In this post I’m going to share in detail how I built it, including the architecture choices I made and why I made them. So I decided to build something like it myself, just the bare-bones, mostly for my own usage. Also it doesn’t track all products, only the popular ones. Keepa can even send you an alert in case the price goes down! But I’ve found Keepa can be a bit unreliable it seems to track the prices at a much coarser granularity, as a result it can miss some short-lived deals. Tools like Keepa can be handy in this regard, it shows you a graph of the price history of a particular product so that you can make a good decision. This can be both positive and negative: on the upside, if you’re vigilant you can grab some lucrative deals, the downside is that if you’re not careful you might end up buying something at a rate higher than the normal. The prices on online shopping sites such as Amazon and Flipkart change a lot, and that too quite frequently. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |