{"id":4045,"date":"2019-04-30T21:53:07","date_gmt":"2019-04-30T19:53:07","guid":{"rendered":"https:\/\/wolf-of-seo.de\/?post_type=glossary&#038;p=4045"},"modified":"2023-08-23T11:45:49","modified_gmt":"2023-08-23T09:45:49","slug":"scraping","status":"publish","type":"glossary","link":"https:\/\/wolf-of-seo.de\/en\/what-is\/scraping","title":{"rendered":"Scraping"},"content":{"rendered":"<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2019\/06\/Was-ist-37.png\" alt=\"Scraping\" class=\"wp-image-5240\"\/><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-scraping-was-ist-das\"><strong>Scraping - <\/strong><strong>What is it?<\/strong><\/h2>\n\n\n\n<p>Web <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> is known by many other names, depending on what a company wants to call it, Screen <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>Web Data Extraction, Web Harvesting and more. Whatever it is called, it is a technique used to extract large amounts of data from websites. The data is extracted from various websites and disks and stored locally to be used immediately or analyzed, which is mostly done later.<\/p>\n\n\n\n<p>The data is stored in a local file system or database tables, depending on the structure of the extracted data. Most websites we visit regularly allow us to see only the content and generally do not allow copies or downloads. Copying data manually is as good as cutting articles out of newspapers and can take days and weeks. At the same time, Web <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> the technique of automating this process so that a smart script can help you extract data from web pages of your choice and save it in a structured format.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-scraping-wie-funktioniert-eine-web-scraping-software\"><strong>Scraping - How does web scraping software work?<\/strong><\/h2>\n\n\n\n<p>A web<span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-software automatically loads multiple web pages one after the other and extracts data as per the requirement. It is either specifically designed for a particular website or it is one that can be configured to work with any website based on a set of parameters. With the click of a button, you can easily save the data available on a website to a file on your computer.<\/p>\n\n\n\n<p>In today's world, intelligent bots are taking over the web-<span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>. In contrast to the screen <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>, which just copies whatever the pixels on the screen are displaying, these bots extract the underlying HTML code as well as the data stored in a database in the background.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-scraping-was-sie-dabei-beachten-sollten\"><strong>Scraping - <\/strong><strong>What you should consider<\/strong><\/h2>\n\n\n\n<p>While it's a great tool for gaining all sorts of insight, there are some legal aspects you should be aware of so you don't get into trouble.<\/p>\n\n\n\n<p><strong>1. respect the robots.txt file.<\/strong><br>Always check the Robots.txt file from whichever website you want to scrape. The document has a set of rules that define how bots should interact with the website. If you still want to use the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> in a way that violates these rules, you may be operating in a legal gray area.<\/p>\n\n\n\n<p><strong>2. be careful not to load servers too frequently.<\/strong><br>Don't become a continuous scraper. Some web servers become victims of downtime when the load is very high. The bots add more interaction load to a website's server, and when the load exceeds a certain point, the server can slow down or crash and destroy a website's user experience.<\/p>\n\n\n\n<p><strong>3. it is better if you scrape data during idle time.<\/strong><br>In order not to be included in web<span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Traffic<\/span> and server downtimes, you can use it at night or at times when you see that the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Traffic<\/span> for a website is lower, scrape.<\/p>\n\n\n\n<p><strong>4. responsible handling of the scraped data<\/strong><br>The guidelines should be followed, and publishing copyrighted data can have serious consequences. Therefore, it is better if you use the collected data responsibly.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-vorteile-des-scrapings\">Advantages of scraping<\/h2>\n\n\n\n<p><span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> has the ability to collect an enormous amount of data in a very short time. It can be used to extract a wide range of information at once, and the data can then be further processed and analyzed to provide useful insights. <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> is an efficient solution that allows people to quickly and easily extract data from the web without having to copy and paste it manually.<\/p>\n\n\n\n<p>Therefore, it offers a variety of advantages, such as:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Time saving:<\/strong> <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> makes it possible to collect large amounts of data quickly and efficiently without having to enter it manually.<\/li>\n\n\n\n<li><strong>Accuracy:<\/strong> The automated process minimizes errors that could result from human input.<\/li>\n\n\n\n<li><strong>Access to large amounts of data:<\/strong> <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> allows data from many different websites to be collected and aggregated, providing a more comprehensive database.<\/li>\n\n\n\n<li><strong>Integration with other systems:<\/strong> The collected data can be easily integrated with other applications or systems for further analysis or reporting.<\/li>\n\n\n\n<li><strong>Cost savings:<\/strong> <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> makes it possible to collect data at a lower price or even for free, compared to other methods such as buying databases or paying for subscriptions.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-nachteile-des-scrapings\">Disadvantages of scraping:<\/h2>\n\n\n\n<p><span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> can be difficult, especially if the page you want to scrape contains many queries or complex data structures. Also, the page you want to scrape may contain a captcha or other security measures that make scraping difficult. In addition, scraping from a page may put one in a legal gray area if one does not properly query the page.<\/p>\n\n\n\n<p><strong>Therefore has <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> also some disadvantages that should be taken into account:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Legally questionable:<\/strong> In some cases, the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> of websites without the consent of the owner will be considered illegal. It is important to check before the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> of websites about the applicable laws and regulations and to make sure that you have the permission of the owner or that the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> is legal.<\/li>\n\n\n\n<li><strong>Violation of the terms of use:<\/strong> Some websites have terms of use that prohibit the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> prohibit. Scraping websites without adhering to these conditions can lead to warnings.<\/li>\n\n\n\n<li><strong>Difficulty in processing unstructured data:<\/strong> Web pages are often unstructured and contain many different types of content, such as images, videos, and tables. This can make the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> complicate the processing of the collected data.<\/li>\n\n\n\n<li><strong>Changes to the website:<\/strong> When the structure or layout of a web page changes, the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-tools used for data extraction may stop working. This may require updating the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-Tools or the creation of new ones to support the new website.<\/li>\n\n\n\n<li><strong>Performance issues:<\/strong> The <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> of large amounts of data can affect the performance of websites and lead to problems such as slow loading times or even website downtime. It is therefore important to keep the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> in such a way that it does not negatively affect the performance of the web pages.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-anwendungsfalle\">Use cases<\/h2>\n\n\n\n<p><span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> can be useful in many different industries. It can be helpful in price research, tracking trends, competitive analysis, online market research, SEO optimization, and lead generation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-beispiele\">Examples<\/h2>\n\n\n\n<p>An example of a use case is price search. A company can use a <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-tool to automatically monitor the prices of its competitors. This way, it can ensure that it always offers the lowest prices.<\/p>\n\n\n\n<p>Another example would be online market research. A company can use a <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-tool to gather data about its customers, competitors, and industries to make informed decisions.<\/p>\n\n\n\n<p><strong>More examples from <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-Activities could be:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The extraction of <strong>Price information<\/strong> from <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">E-commerce<\/span>-Websites to <strong>Price trends<\/strong> or to compare prices of competitors.<\/li>\n\n\n\n<li>The collection of <strong>Customer reviews<\/strong> and feedback from online rating platforms to assess the reputation of a company.<\/li>\n\n\n\n<li>The extraction of <strong>Job offers<\/strong> of career websites to create a database of <strong>Job Opportunities<\/strong> to create.<\/li>\n\n\n\n<li>The collection of <strong>Weather data<\/strong> from weather services to <strong>Weather forecasts<\/strong> or to study the behavior of weather phenomena.<\/li>\n\n\n\n<li>The extraction of <strong>Contact information<\/strong> of business directors from company profiles on sites like LinkedIn to find potential customers or business partners.<\/li>\n\n\n\n<li>The collection of <strong>News articles<\/strong> of news websites to create a database of news or track news trends.<\/li>\n\n\n\n<li>The extraction of <strong>Traffic information<\/strong> of traffic websites or apps to create traffic forecasts or patterns.<\/li>\n\n\n\n<li>The collection of <strong>Data from social media platforms<\/strong>n to gain insights into users' opinions and preferences or to assess the performance of brands.<\/li>\n\n\n\n<li>The extraction of <strong>Product information<\/strong> of comparison or rating websites to make product comparisons or to compare competitors' offerings.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-wie-plant-man-die-entwicklung-eines-web-crawlers\"><strong>How to plan the development of a web crawler?<\/strong><\/h2>\n\n\n\n<p>Developing a web crawler is a process that is divided into several steps. Here are the main steps you should follow to develop your own web crawler:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Planning:<\/strong> Bevor du mit dem eigentlichen Entwicklungsprozess beginnst, solltest du dir Gedanken dar&#xFC;ber machen, welche Art von Daten du sammeln m&#xF6;chtest und welche Art von Websites du crawlen m&#xF6;chtest. Du solltest auch eine Liste der URLs erstellen, die du crawlen m&#xF6;chtest, sowie eine Liste der URLs, die du nicht crawlen m&#xF6;chtest (z.B. Login-Seiten, etc.).<\/li>\n\n\n\n<li><strong>Technical preparation:<\/strong> Before you start developing the crawler, you should make sure that you have the necessary tools and technologies. You'll probably use a programming language like Python, Java or C#, as well as libraries like BeautifulSoup, Scrapy or Cheerio.<\/li>\n\n\n\n<li><strong>Developing the crawler:<\/strong> Once you have completed the planning and technical preparation, you can start developing the crawler. This step includes implementing the code that queries the URLs from the list of URLs to crawl, downloads the content of the pages and extracts the data you want to collect.<\/li>\n\n\n\n<li><strong>Testing the crawler:<\/strong> Once the crawler is developed, you should test it to make sure that it works as expected. You should run it on a small number of websites and make sure that it collects the right data and that there are no errors.<\/li>\n\n\n\n<li><strong>Optimization of the crawler:<\/strong> Once the crawler is tested and verified, you should optimize it to make sure it works faster and more efficiently. For example, you can use the cache to reduce the download time, or you can use multiple threads to increase the crawler's speed.<\/li>\n\n\n\n<li><strong>Deployment of the crawler:<\/strong> Once the crawler is optimized, you can deploy it to a server and run it regularly to collect the data you want.<\/li>\n<\/ol>\n\n\n\n<p>It is important to note that developing a web crawler can be an ongoing process and there will always be tweaks and adjustments depending on what kind of data you want to collect and what kind of websites you want to crawl.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-bekannte-websites-die-auf-webscraping-basieren\"><strong>Known websites based on web scraping<\/strong><\/h2>\n\n\n\n<p><strong>1. google:<\/strong> The mother of all scrapers! Google crawls billions of web pages every day in order to provide its <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Index<\/span> to update. The <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Googlebot<\/span> collects information from websites and uses it to determine the order of the <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Search results<\/span> to specify. Imagine you have a huge pile of books and Google is reading all those books to make sure you find the exact page you're looking for.<\/p>\n\n\n\n<p><strong>2. <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Wayback Machine<\/span>:<\/strong> It's like a time machine for the Internet! The <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Wayback Machine<\/span> from Archive.org archives billions of web pages so users can see what they looked like in the past. It's like having a photo folder for each web page and being able to flip back in time to see what it looked like years ago.<\/p>\n\n\n\n<p><strong>3. price comparison sites:<\/strong> Sites like idealo or Geizhals regularly scrape online stores to gather the latest prices and deals. It's like asking every store in town for the best price without taking a step.<\/p>\n\n\n\n<p><strong>4. travel booking sites:<\/strong> Platforms like Skyscanner or Kayak scrape flight, hotel, and rental car data from various providers to give users an overview of the best deals. It feels like you have a personal travel advisor checking all the options for you and presenting the best deals.<\/p>\n\n\n\n<p><strong>5. job portals:<\/strong> Some job portals pull job ads from various company websites and other job boards. This way, they make sure they always have the most up-to-date listings. It's like searching all the newspaper ads and company websites for the latest jobs, but finding everything on a single platform.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-tools-um-website-scraper-zu-erstellen\"><strong>Tools to create website scraper<\/strong><\/h2>\n\n\n\n<p>There are many different websites and tools that you can use to create web crawlers. Some of the most popular ones are:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Scrapy:<\/strong> An open source web crawling and web<span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-framework, which is written in Python. It is very powerful and can be used to extract large amounts of data from websites.<\/li>\n\n\n\n<li><strong>BeautifulSoup:<\/strong> Another open source library, written in Python and designed for web<span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> is used. It allows you to save the HTML code of a website. <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">parse<\/span> and extract the desired data.<\/li>\n\n\n\n<li><strong>Selenium:<\/strong> A tool that allows automated testing of web applications. It can be used to run interactions with a website and extract the results.<\/li>\n\n\n\n<li><strong>Octoparse:<\/strong> A visual web<span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-tool that allows you to extract data from websites without writing code.<\/li>\n\n\n\n<li><strong>Parsehub:<\/strong> Another visual web<span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-tool that allows you to extract data from complex websites.<\/li>\n\n\n\n<li><strong>Common Crawl:<\/strong> A nonprofit web<span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Crawling<\/span>-service that regularly crawls a huge amount of web pages and makes the data publicly available.<\/li>\n<\/ol>\n\n\n\n<p>There are many other websites and tools that you can use to <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Webcrawler<\/span> to create. Which one is best suited to your requirements depends on your specific project.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-wie-du-scrapy-nutzt-um-einen-website-scraper-zu-erstellen\"><strong>How to use Scrapy to create a website scraper<\/strong><\/h3>\n\n\n\n<p>Scrapy is an open source web crawling and web<span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-This is a framework written in Python that allows you to extract large amounts of data from websites.<\/p>\n\n\n\n<p>to use Scrapy, you must first make sure it is installed on your computer. You can install it with the following command in your command line:<\/p>\n\n\n\n<div class=\"bg-black mb-4 rounded-md\">\n<div class=\"p-4 overflow-y-auto\"><code class=\"!whitespace-pre-wrap hljs language-bash\">pip install scrapy<br>\n<\/code><\/div>\n<\/div>\n\n\n\n<p>Next, you need to create a new Scrapy project. You can do this with the following command:<\/p>\n\n\n\n<div class=\"bg-black mb-4 rounded-md\">\n<div class=\"p-4 overflow-y-auto\"><code class=\"!whitespace-pre-wrap hljs language-bash\">scrapy startproject [projectname]<br>\n<\/code><\/div>\n<\/div>\n\n\n\n<p>This will create a new directory with the name of your project and in it you will find a basic structure for your project.<\/p>\n\n\n\n<p>Now you need to create a \"spider\". A spider is what Scrapy uses to extract data from a web page. You can create a new spider by executing the following command in your command line:<\/p>\n\n\n\n<div class=\"bg-black mb-4 rounded-md\">\n<div class=\"p-4 overflow-y-auto\"><code class=\"!whitespace-pre-wrap hljs language-bash\">scrapy genspider [spidername] [domainname]<br>\n<\/code><\/div>\n<\/div>\n\n\n\n<p>This creates a new file with the name of your spider in the \"spiders\" directory of your project.<\/p>\n\n\n\n<p>In this file, you must now define the URLs that you want to crawl and how Scrapy should extract the data from these URLs. Scrapy uses \"XPath\" or \"CSS Selectors\" to find and extract certain parts of the HTML page. You can extract the desired information from the HTML pages by defining the corresponding XPath or CSS selectors in your spider.<\/p>\n\n\n\n<p>Once you have everything set up, you can start your spider with the following command:<\/p>\n\n\n\n<div class=\"bg-black mb-4 rounded-md\">\n<div class=\"p-4 overflow-y-auto\"><code class=\"!whitespace-pre-wrap hljs language-bash\">scrapy crawl [spidername]<br>\n<\/code><\/div>\n<\/div>\n\n\n\n<p>Scrapy will now crawl the URLs you defined and extract the data you specified in your spider. You can then save the extracted data to a file or embed it directly into your application.<\/p>\n\n\n\n<p>This was a rough overview of how to use Scrapy for Web <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Crawling<\/span> There are a lot of settings and extensions that can be used, depending on your project. It is worth reading the documentation of Scrapy thoroughly to use the full power of the framework.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-fazit\">Conclusion<\/h2>\n\n\n\n<p><span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span> can be a useful way to extract data from the web. It can be useful in a variety of industries and use cases, and can help people gather a large amount of data in a short amount of time. However, it is possible to get into legal gray areas, so it is important to understand the legal implications of using <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span>-tools to be taken into account.<\/p>","protected":false},"excerpt":{"rendered":"<p><span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\"><span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span><\/span> - What is it? Web <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\"><span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span><\/span> is known by many other names, depending on what a company wants to call it, Screen <span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\"><span class=\"\" data-gt-translate-attributes='[{\"attribute\":\"data-cmtooltip\", \"format\":\"html\"}]' tabindex=\"0\" role=\"link\">Scraping<\/span><\/span>Web Data Extraction, Web Harvesting and more. Whatever you call it, it is a technique used to extract large amounts of data from websites. The data is extracted from [...]<\/p>","protected":false},"author":3,"featured_media":0,"menu_order":0,"template":"","meta":{"_acf_changed":false,"footnotes":""},"class_list":["post-4045","glossary","type-glossary","status-publish","hentry"],"acf":{"show_faq":true,"faq_q_1":"Was ist Scraping?","faq_antwort_1":"Scraping ist eine Technik, bei der automatisierte Tools verwendet werden, um Informationen von Websites zu sammeln. Diese Informationen werden in der Regel mithilfe von Webcrawlern oder Webbots von den Websites heruntergeladen und in einer strukturierten Form gespeichert.","faq_q_2":"Wozu wird Scraping verwendet?","faq_antwort_2":"Scraping wird h\u00e4ufig verwendet, um gro\u00dfe Mengen von Daten aus dem Internet zu sammeln und zu analysieren. Es kann beispielsweise verwendet werden, um Preisdaten von verschiedenen Online-H\u00e4ndlern zu sammeln, um Trends zu analysieren oder um gro\u00dfe Mengen von Produktdaten zu sammeln, um diese in einer Suchmaschine anzuzeigen. Scraping kann auch verwendet werden, um Informationen von Social-Media-Plattformen zu sammeln, um soziale Netzwerke zu analysieren oder um Marktforschung durchzuf\u00fchren.","faq_q_3":"Wie funktioniert Scraping?","faq_antwort_3":"Scraping funktioniert, indem automatisierte Tools verwendet werden, um Websites zu durchsuchen und die gew\u00fcnschten Informationen zu sammeln. Diese Tools, auch Webcrawler oder Webbots genannt, folgen den Links auf einer Website und sammeln die gew\u00fcnschten Informationen, indem sie den HTML-Code der Seite analysieren. Die gesammelten Informationen werden dann in einer strukturierten Form gespeichert, z.B. in einer Datenbank oder in einer Excel-Tabelle.","faq_q_4":"Wie kann ich Scraping f\u00fcr meine Zwecke nutzen?","faq_antwort_4":"Wenn du Scraping f\u00fcr deine Zwecke nutzen m\u00f6chtest, gibt es verschiedene M\u00f6glichkeiten. Eine M\u00f6glichkeit ist, einen Webcrawler oder Webbot zu verwenden, um die gew\u00fcnschten Informationen von einer oder mehreren Websites zu sammeln. Du kannst auch spezielle Scraping-Software verwenden, die speziell f\u00fcr das Scraping von Websites entwickelt wurde. Es ist jedoch wichtig zu beachten, dass du m\u00f6glicherweise die Zustimmung der Websitebesitzer einholen und die geltenden Gesetze beachten musst, wenn du Scraping f\u00fcr deine Zwecke nutzen m\u00f6chtest.","faq_q_5":"Welche Arten von Tools werden f\u00fcr Scraping verwendet?","faq_antwort_5":"Es gibt verschiedene Arten von Tools, die f\u00fcr Scraping verwendet werden k\u00f6nnen, darunter Webcrawler, Webbots und spezielle Scraping-Software. Webcrawler sind Tools, die von Suchmaschinen verwendet werden, um das Internet zu durchsuchen und Webseiten zu indexieren. Webbots sind spezielle Tools, die f\u00fcr das Scraping von Websites entwickelt wurden und h\u00e4ufig f\u00fcr automatisierte Aufgaben wie das Sammeln von Preisdaten oder das \u00dcberwachen von Social-Media-Plattformen verwendet werden. Es gibt auch spezielle Scraping-Software, die speziell f\u00fcr das Scraping von Websites entwickelt wurde und h\u00e4ufig von Unternehmen oder Einzelpersonen verwendet wird, um gro\u00dfe Mengen von Daten zu sammeln und zu analysieren.","faq_q_6":"Welche Risiken sind mit Scraping verbunden?","faq_antwort_6":"Scraping kann einige Risiken mit sich bringen, einschlie\u00dflich der Verletzung der Datenschutzbestimmungen des betreffenden Websites, der \u00dcberlastung des Servers und des Risikos, dass die gesammelten Daten nicht vollst\u00e4ndig oder fehlerhaft sind.","faq_q_7":"Was sind die besten Methoden, um mit Scraping zu beginnen?","faq_antwort_7":"Wenn Sie mit Scraping beginnen m\u00f6chten, ist es am besten, sich zun\u00e4chst mit den Grundlagen vertraut zu machen, wie z.B. mit der Verwendung von Programmiersprachen und dem Einrichten eines Scraping-Bots. Sie sollten auch die Datenschutzerkl\u00e4rung und Nutzungsbedingungen der betreffenden Website vor dem Scrapen lesen.","faq_q_8":"Warum sollte man Scraping verwenden?","faq_antwort_8":"Scraping kann vielen Unternehmen helfen, indem es relevante Daten und Informationen schnell sammelt und untersucht. Es kann auch helfen, die Produktivit\u00e4t zu erh\u00f6hen, indem es Routineaufgaben automatisiert.","faq_q_9":"Wie sch\u00fctzt man sich vor Sch\u00e4den, die durch Scraping verursacht werden?","faq_antwort_9":"Es gibt einige Strategien, die Sie anwenden k\u00f6nnen, um Sch\u00e4den zu vermeiden, die durch Scraping verursacht werden. Dazu geh\u00f6ren die Einhaltung der Datenschutzerkl\u00e4rung und Nutzungsbedingungen des betreffenden Websites, das Einrichten eines zuverl\u00e4ssigen und eindeutigen Benutzernamens und das Einhalten eines angemessenen Scraping-Rhythmus.","faq_q_10":"Welche Programmiersprachen werden f\u00fcr Scraping verwendet?","faq_antwort_10":"Es gibt viele Programmiersprachen, die zum Scraping verwendet werden k\u00f6nnen, einschlie\u00dflich Python, Ruby, PHP, Java und JavaScript."},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.4 (Yoast SEO v26.6) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Was ist Scraping? (Und wie funktioniert es?) | WOLF OF SEO<\/title>\n<meta name=\"description\" content=\"Web Scraping ist eine Technik, um gro\u00dfe Datenmengen von Websites zu extrahieren! Erfahre Beitrag wof\u00fcr Scraping genutzt werden kann\u2714\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/wolf-of-seo.de\/en\/what-is\/scraping\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Scraping\" \/>\n<meta property=\"og:description\" content=\"Web Scraping ist eine Technik, um gro\u00dfe Datenmengen von Websites zu extrahieren! Erfahre Beitrag wof\u00fcr Scraping genutzt werden kann\u2714\" \/>\n<meta property=\"og:url\" content=\"https:\/\/wolf-of-seo.de\/en\/what-is\/scraping\/\" \/>\n<meta property=\"og:site_name\" content=\"WOLF OF SEO\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/wolf.of.seo.ns\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-23T09:45:49+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2019\/06\/Was-ist-37.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@wolf_of_seo\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/\",\"url\":\"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/\",\"name\":\"Was ist Scraping? (Und wie funktioniert es?) | WOLF OF SEO\",\"isPartOf\":{\"@id\":\"https:\/\/wolf-of-seo.de\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2019\/06\/Was-ist-37.png\",\"datePublished\":\"2019-04-30T19:53:07+00:00\",\"dateModified\":\"2023-08-23T09:45:49+00:00\",\"description\":\"Web Scraping ist eine Technik, um gro\u00dfe Datenmengen von Websites zu extrahieren! Erfahre Beitrag wof\u00fcr Scraping genutzt werden kann\u2714\",\"breadcrumb\":{\"@id\":\"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#primaryimage\",\"url\":\"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2019\/06\/Was-ist-37.png\",\"contentUrl\":\"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2019\/06\/Was-ist-37.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/wolf-of-seo.de\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Scraping\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/wolf-of-seo.de\/#website\",\"url\":\"https:\/\/wolf-of-seo.de\/\",\"name\":\"WOLF OF SEO\",\"description\":\"Die E-Commerce SEO-Agentur\",\"publisher\":{\"@id\":\"https:\/\/wolf-of-seo.de\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/wolf-of-seo.de\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/wolf-of-seo.de\/#organization\",\"name\":\"WOLF OF SEO\",\"url\":\"https:\/\/wolf-of-seo.de\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/wolf-of-seo.de\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2021\/11\/logo_wos_beitragsbild3.jpg\",\"contentUrl\":\"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2021\/11\/logo_wos_beitragsbild3.jpg\",\"width\":1,\"height\":1,\"caption\":\"WOLF OF SEO\"},\"image\":{\"@id\":\"https:\/\/wolf-of-seo.de\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/wolf.of.seo.ns\",\"https:\/\/x.com\/wolf_of_seo\"]}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"What is scraping? (And how does it work?) | WOLF OF SEO","description":"Web scraping is a technique to extract large amounts of data from websites! Learn what scraping can be used for\u2714","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/wolf-of-seo.de\/en\/what-is\/scraping\/","og_locale":"en_US","og_type":"article","og_title":"Scraping","og_description":"Web Scraping ist eine Technik, um gro\u00dfe Datenmengen von Websites zu extrahieren! Erfahre Beitrag wof\u00fcr Scraping genutzt werden kann\u2714","og_url":"https:\/\/wolf-of-seo.de\/en\/what-is\/scraping\/","og_site_name":"WOLF OF SEO","article_publisher":"https:\/\/www.facebook.com\/wolf.of.seo.ns","article_modified_time":"2023-08-23T09:45:49+00:00","og_image":[{"url":"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2019\/06\/Was-ist-37.png","type":"","width":"","height":""}],"twitter_card":"summary_large_image","twitter_site":"@wolf_of_seo","twitter_misc":{"Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/","url":"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/","name":"What is scraping? (And how does it work?) | WOLF OF SEO","isPartOf":{"@id":"https:\/\/wolf-of-seo.de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#primaryimage"},"image":{"@id":"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#primaryimage"},"thumbnailUrl":"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2019\/06\/Was-ist-37.png","datePublished":"2019-04-30T19:53:07+00:00","dateModified":"2023-08-23T09:45:49+00:00","description":"Web scraping is a technique to extract large amounts of data from websites! Learn what scraping can be used for\u2714","breadcrumb":{"@id":"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/wolf-of-seo.de\/was-ist\/scraping\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#primaryimage","url":"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2019\/06\/Was-ist-37.png","contentUrl":"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2019\/06\/Was-ist-37.png"},{"@type":"BreadcrumbList","@id":"https:\/\/wolf-of-seo.de\/was-ist\/scraping\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/wolf-of-seo.de\/"},{"@type":"ListItem","position":2,"name":"Scraping"}]},{"@type":"WebSite","@id":"https:\/\/wolf-of-seo.de\/#website","url":"https:\/\/wolf-of-seo.de\/","name":"WOLF OF SEO","description":"The e-commerce SEO agency","publisher":{"@id":"https:\/\/wolf-of-seo.de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/wolf-of-seo.de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/wolf-of-seo.de\/#organization","name":"WOLF OF SEO","url":"https:\/\/wolf-of-seo.de\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/wolf-of-seo.de\/#\/schema\/logo\/image\/","url":"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2021\/11\/logo_wos_beitragsbild3.jpg","contentUrl":"https:\/\/wolf-of-seo.de\/wp-content\/uploads\/2021\/11\/logo_wos_beitragsbild3.jpg","width":1,"height":1,"caption":"WOLF OF SEO"},"image":{"@id":"https:\/\/wolf-of-seo.de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/wolf.of.seo.ns","https:\/\/x.com\/wolf_of_seo"]}]}},"_links":{"self":[{"href":"https:\/\/wolf-of-seo.de\/en\/wp-json\/wp\/v2\/glossary\/4045","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wolf-of-seo.de\/en\/wp-json\/wp\/v2\/glossary"}],"about":[{"href":"https:\/\/wolf-of-seo.de\/en\/wp-json\/wp\/v2\/types\/glossary"}],"author":[{"embeddable":true,"href":"https:\/\/wolf-of-seo.de\/en\/wp-json\/wp\/v2\/users\/3"}],"version-history":[{"count":0,"href":"https:\/\/wolf-of-seo.de\/en\/wp-json\/wp\/v2\/glossary\/4045\/revisions"}],"wp:attachment":[{"href":"https:\/\/wolf-of-seo.de\/en\/wp-json\/wp\/v2\/media?parent=4045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}