Grand Prize

GRAND PRIZE FROM CYBERYOZH APP.

Win Apple MacBook, $2000, iPad and a tons of other prizes!

Participate












How to use proxies to collect data from marketplaces (parsing, analytics, competitive intelligence)

In the world of e-commerce, data is the new oil. Whoever owns information about prices, assortment, and competitor strategies rules the market. Marketplaces such as Amazon, Ozon, Wildberries, or Alibaba are giant, constantly updated databases containing this valuable information. Obtaining it means gaining a decisive competitive advantage.

The only way to extract this data on an industrial scale is through parsing (or web scraping). But there is a problem: marketplaces are well aware of this and actively defend themselves.

In this article, we will look at how to build an effective, scalable data collection system for analytics and competitive intelligence using the correct proxy configurations.

Important Note: When automating data collection, ensure that your actions comply with legislation (including GDPR and DMCA) and do not violate the Terms of Service (ToS) of the target platforms. Use proxies responsibly: avoid creating critical loads on servers and adhere to web scraping ethics.


Why don't marketplaces want to be parsed?

Collecting data manually is inefficient and slow. Automated collection (parsing) allows you to obtain huge amounts of data in a short time. This is exactly why marketplaces build entire echelons of defense:

  • IP Blocking. The most basic and effective protection method. If an abnormally high number of requests comes from a single IP address, it is immediately hit with a temporary or permanent ban.
  • Rate Limiting. The system allows, for example, no more than 30 requests per minute from one IP. Everything above the limit is blocked.
  • CAPTCHA. If the system notices signs of automation, it presents the user with a captcha that a standard parser cannot pass.
  • Geo-blocking. Prices, assortment, and delivery conditions on the same marketplace can differ cardinally for users from the USA and Germany. Without an IP address from the required region, you simply won't see relevant data.
  • Fingerprint Analysis. Advanced systems analyze hundreds of parameters of your browser. Examples of what exactly marketplaces check:

    • Canvas and WebGL fingerprinting: websites force the browser to invisibly draw a hidden shape. The way your graphics card and drivers render pixels creates a unique device identifier.

    • Audio fingerprints: checking how your system processes audio signals.

    • Technical headers: a mismatch between the User-Agent version and installed fonts or screen resolution instantly marks you as a bot.


Proxies — your key to data. But not just any proxy.

A proxy server is the technological foundation of any professional parser. It acts as an intelligent intermediary: routing your requests through various IP addresses to ensure high-load data collection and maintain privacy.

However, it is worth understanding: in modern realities, even the highest quality proxies require correct integration. For stable data acquisition under intense loads, proxies must be correctly embedded into your architecture. If your IP is a "clean" residential address, but the request parameters are set incorrectly, the system may reject the connection.

To achieve maximum results, proxies must be combined with proper header configuration and request frequency management to ensure a stable connection.

Why does the proxy type matter?

Not all types of connections are suitable for parsing marketplaces. Below we will break down the main types and determine which tasks each will be most effective for.

Proxy types and their applicability:

Residential rotating proxies  — choice #1 for mass parsing

These are dynamic IP addresses of real home users.

  • Advantages: Huge pools (millions of IPs) worldwide. A request from such an address looks to a marketplace like a visit from a regular customer via home Wi-Fi.

  • Verdict: Ideal for collecting large datasets: monitoring prices, stock levels, and product card content.

  • Flexible session setup: Depending on your tasks, you can choose one of three operating modes:

    1. Random IP: Automatic address change for every new request.

    2. Short session: Holding one IP for a period of up to 1 minute (convenient for quick action chains).

    3. Long session (Sticky): Fixing an IP for a long term — strictly up to 6 hours (necessary for simulating a long user stay on a site).

Static residential proxies (ISP)  — for the "long haul"

These are clean IPs from home providers that are assigned to you for the entire rental period.

  • Advantages: They combine the trust of a residential address with the stability of a server channel. The IP does not change, which is critical for protection systems.

  • Verdict: Indispensable for managing seller accounts, managing advertising accounts, and working with personal accounts where a constant IP address is critical for maintaining secure and continuous access to corporate resources.

Mobile private proxies  — the ultimate solution

Utilize IP addresses of cellular operators (4G/5G).

  • Advantages: The highest level of trust. Thanks to CGNAT technology, one IP is shared by thousands of real people, so marketplaces almost never block such addresses.

  • Dedicated ports: To ensure a high success rate of connections and reliable communication in complex, high-load environments and demanding parsing architectures, we recommend mobile dedicated ports. They provide an individual channel, maximum speed, and stability without "neighbors".

Datacenter proxies
    • Advantages: High speed and low price.

    • Verdict: Suitable only for small sites or working through official APIs. Major platforms often have strict connection requirements, making datacenter proxies less effective for resource-intensive data collection tasks.


Specifics of working with Mobile proxies in the interface

Managing mobile proxies has its unique features in the dashboard. Unlike other types, this product card provides a special API link for rotation (IP change). You need to find it in the interface, as this specific address is used for automatic IP updates within your software code or script.

Location of the automatic rotation link in the Mobile Proxies card

Fig. 1. Location of the link for automatic rotation in the Mobile Proxy card.

In addition to programmatic automation, the CyberYozh App implements the possibility of manual management. If you need to update the IP address instantly without waiting for a script to trigger, you can do it with one click directly in the control panel.

Button for forced manual IP address change in the personal account

Fig. 2. Button for forced manual IP change in the personal account.


Technical subtleties: Sessions, rotation, and infrastructure

Choosing the proxy type is just the beginning. For professional parsing, other parameters are also important.

  • Parsing infrastructure. Remember that proxies are only part of the system. Effective parsing requires:
  • A reliable parser: A script or program (e.g., in Python using Scrapy, BeautifulSoup, Selenium libraries) capable of processing HTML code.
  • User-Agent and Headers management: Your parser must be configured to work with dynamic headers and User-Agent rotation to maintain compatibility and stability.
  • Error handling: A mechanism that will correctly handle timeouts and errors, retrying failed requests through a different proxy.

Management of residential rotating proxies is implemented with maximum flexibility. You can either configure parameters manually via login prefixes or use the built-in generator in your personal account.

Management via personal account (Recommended method)

To get ready-made settings, simply go to the "My Proxies" section and click the "Generate credentials" button on the card of the purchased package.

In the menu that opens, you can visually select:

  • Geolocation: country, region/state, and specific city (only country for long sessions).

  • Session type: random IP, short session (session ID - up to 1 minute), or long session (long session ID - up to 6 hours).

  • Protocol: HTTP or SOCKS5.

  • Output format: 3 output formats are available in our generator for easy copying into any software:

    • IP:PORT (IP:PORT:USER:PASS)

    • USER:PASS (USER:PASS@IP:PORT)

    • PROTOCOL (http://USER:PASS@IP:PORT)

The generator will automatically form the correct connection string with all necessary prefixes.

Navigating to the configuration and connection parameters creation interface (credentials generator)

Fig. 3. Navigating to the configuration and connection parameters interface (credential generator).

 

Using the generator to configure the sid parameter, responsible for creating new unique sessions

Fig. 4. Using the generator to configure the sid parameter responsible for creating new unique sessions.

 

Configuring parameters to generate credentials using long (Sticky) sessions

Fig. 5. Configuring parameters for forming credentials using long (Sticky) sessions.

 

Result of the credentials generator

Fig. 6. Result of the credential generator's work.

Session types and manual prefix management

If you are configuring the IP change logic directly in your script's code, use the prefix system:

Session typeLogin prefixGeo-targetingIP lifespan
Random IP-res-anyCountryNew IP for every request
Short session-res-any-sid-XXXXXXXXCity, Region, CountryUp to 1 minute
Long (Sticky)-resfix-XX-nnid-TOKENCountry (XX — country code)Up to 6 hours

Important nuances of manual configuration:

  • Short sessions: In the -sid-47551677 prefix, you can use any random number of the same length to instantly create a new session.

  • Geo-prefix in short sessions: For example, -res_sc-us_georgia_macon-sid-12345 will route your traffic through Macon, Georgia.

  • Long sessions (Sticky): To work manually, you need to obtain the X-NN-LLS token via a trial curl request and substitute it into the login instead of 0 after -nnid-. Through the generator in the dashboard, this token is inserted automatically.


Checking proxies via terminal (curl)

The fastest way to ensure everything is set up correctly is to run a request in the console. This allows you to see the server's technical headers and check the correctness of the prefixes.

1. Checking a random residential IP

Use this format if you need high rotation (IP change for every request):

curl -v -x http://LOGIN-res-any:PASSWORD@51.77.190.247:5959 https://ipv4.icanhazip.com

2. Working with a long session (Sticky up to 6 hours)

To activate a long session manually, you must go through two stages:

Stage A: Obtaining the session token Execute a request by specifying 0 in the nnid parameter:

curl -v -x http://LOGIN-resfix-us-nnid-0:PASSWORD@51.77.190.247:5959 https://ipv4.icanhazip.com

Here us is the country prefix (USA), which can be replaced with the code of any other available country.

Stage B: Extracting and using the token

In the server response, find the line with the header X-NN-LLS: HTTP/1.1 200 Connection established X-NN-LLS: 9d016e262509d3827293

Copy the resulting token (9d016e262509d3827293) and substitute it instead of 0 in the login for all subsequent requests to hold the same IP: 51.77.190.247:5959:LOGIN-resfix-us-nnid-9d016e262509d3827293:PASSWORD

💡 Tip: To avoid doing these actions manually, use the Credential Generator in the CyberYozh App personal account. When selecting "Long session ID", the system will automatically generate and provide you with a ready-made login with an already active token for the selected country.


Conclusion: From data to strategy

Competitive intelligence on marketplaces is not magic, it's technology. It is based on a well-built data collection process, and the foundation of this process is high-quality, correctly selected proxies.

Saving on proxies during parsing is the most expensive mistake, leading to incomplete data, blocked tools, and ultimately, incorrect business decisions. Invest in reliable infrastructure, and you will gain access to information that will become your main trump card in the competitive struggle.

👉 Looking for a reliable parsing solution? Our rotating residential proxies provide access to millions of clean IP addresses worldwide with flexible session management. This is the ideal tool for collecting data from any marketplace, even the most protected ones.

Chat