In developing the AstrBot Douyin parsing plugin, I encountered a common but tricky issue: Server IP blocked by Douyin risk control, causing API requests to return empty data. This article records the complete process from a simple retry mechanism to the final solution using Cloudflare Workers reverse proxy.
When the plugin parses Douyin links, dysk.py returns None, manifesting as:
Douyin's risk control mechanism detects the request source IP. When an abnormal request pattern is detected, it returns an empty response (HTTP 200 but the body is empty).
Idea: When parsing fails, recreate the DouyinDownloader instance and retry.
Implementation Points:
Code Snippet:
Effect: The retry mechanism improves the success rate, but it treats the symptoms rather than the root cause. Once the IP is flagged by risk control, parsing still fails.
Idea:
Expected Success Rate: 75-93%
Phenomenon:
Cause: CF Workers' automatic gzip compression caused the response body transmission to fail.
Attempted Solutions:
fetch() result directly - Response body lostresponse.arrayBuffer() - Returned empty ArrayBufferContent-Length header - CF still forced compressionPhenomenon:
This is an empty gzip file, indicating the Douyin server returned an empty response.
Cause Analysis:
Response text length: 205)Response text length: 0)Request Cookie: No CookieRoot Cause: Although Python's requests.Session() manages Cookies, when proxying through CF Workers, Cookies were not automatically sent to CF Workers.
Solution: Manually build the Cookie request header.
Problem: CF Workers automatically gzip compresses the response, even if the Content-Length header is set, it cannot be disabled.
Solution: Use Base64 encoding to transmit data.
Principle:
Problem: requests.Session() Cookie management fails in proxy scenarios.
Solution:
Set-Cookie from response headers.Cookie request header.Key Code:
| Scenario | Without CF Proxy | With CF Proxy |
|---|---|---|
| Video Parsing | 60-70% | 95%+ |
| Image Parsing | 0-10% | 95%+ |
| Live Photo Parsing | 0-10% | 95%+ |
CF Workers Limits:
Base64 Encoding Overhead:
Cookie Management:
In AstrBot WebUI:
Send a Douyin link to the bot and observe the log output:
Symptom: Response text length: 0
Troubleshooting Steps:
Request Cookie in CF Workers logs.ttwid and msToken.Symptom: JSON parse failed: Expecting value
Troubleshooting Steps:
{"data": "...", "encoding": "base64"}.Symptom: Proxy request failed: timeout
Cause: Douyin API response is slow or CF Workers CPU time exceeded limit.
Solution:
By using Cloudflare Workers reverse proxy, the Douyin API risk control issue was successfully solved. Key technical points:
This solution is not only applicable to Douyin but can also be extended to other platforms with risk control mechanisms (such as Xiaohongshu, Bilibili, etc.).
def _parse_douyin_sync(self, url):
max_retries = 5
retry_delay = 5
for attempt in range(max_retries):
try:
current_time = time.time()
if attempt == 0:
# First attempt: Reuse instance (within 5 minutes)
if self.dy_downloader is None or (current_time - self.dy_downloader_time) > 300:
self.dy_downloader = DouyinDownloader()
self.dy_downloader_time = current_time
else:
# On retry: Force recreate instance
self.dy_downloader = DouyinDownloader()
self.dy_downloader_time = current_time
result = self.dy_downloader.get_detail(url)
if result is not None:
return (result, self.dy_downloader)
if attempt < max_retries - 1:
time.sleep(retry_delay)
except Exception as e:
if attempt < max_retries - 1:
time.sleep(retry_delay)
else:
raise
return (None, self.dy_downloader)
Response status: 200
Response headers: {'content-length': '0', 'content-type': 'text/plain'}
Response text length: 0
Original response length: 20
Original response content (hex): 1f8b08000000000000ff03000000000000000000
Decompressed length: 0
┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Python │─────▶│ CF Workers │─────▶│ Douyin API │
│ Client │ │(RevProxy + Base64)│ │ │
└─────────────┘ └──────────────────┘ └─────────────┘
│ │
│ │
└────────────────────────────────────────────────┘
Direct download video/image
// cloudflare_worker.js
// Cloudflare Workers Reverse Proxy Script - Proxy Douyin API
export default {
async fetch(request) {
// Handle OPTIONS preflight requests
if (request.method === "OPTIONS") {
return new Response(null, {
status: 200,
headers: {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Methods": "GET, POST, PUT, DELETE, OPTIONS",
"Access-Control-Allow-Headers": "*",
"Access-Control-Max-Age": "86400",
},
});
}
const url = new URL(request.url);
// Support multiple target domains
const targetHosts = {
"douyin": "www.douyin.com",
"ttwid": "ttwid.bytedance.com"
};
// Extract target type from path
// Example: /douyin/aweme/v1/web/aweme/detail/ or /ttwid/ttwid/union/register/
const pathMatch = url.pathname.match(/^\/(douyin|ttwid)(\/.*)/);
if (!pathMatch) {
return new Response("Invalid path. Use /douyin/* or /ttwid/*", { status: 400 });
}
const targetType = pathMatch[1];
const targetPath = pathMatch[2];
const targetHost = targetHosts[targetType];
// Build target request URL
const targetUrl = `https://${targetHost}${targetPath}${url.search}`;
try {
// Copy request headers
const headers = new Headers(request.headers);
headers.set("Host", targetHost);
// Debug: Print Cookie in request (comment out in production)
// console.log('Request Cookie:', headers.get('Cookie') || 'No Cookie');
// Remove CF related headers
headers.delete("cf-connecting-ip");
headers.delete("cf-ipcountry");
headers.delete("cf-ray");
headers.delete("cf-visitor");
// Initiate request
const response = await fetch(targetUrl, {
method: request.method,
headers: headers,
body: request.body,
redirect: 'follow'
});
// Debug: Print response status (comment out in production)
// console.log('Response status:', response.status);
// console.log('Response Content-Type:', response.headers.get('Content-Type'));
// Read response completely
const responseText = await response.text();
// Debug: Print response length (comment out in production)
// console.log('Response text length:', responseText.length);
// Base64 encode to bypass CF auto-compression
const base64Data = btoa(unescape(encodeURIComponent(responseText)));
// Return Base64 encoded data
return new Response(JSON.stringify({
data: base64Data,
encoding: 'base64'
}), {
status: response.status,
headers: {
'Content-Type': 'application/json; charset=utf-8',
'Access-Control-Allow-Origin': '*',
},
});
} catch (error) {
return new Response(
JSON.stringify({
error: "Proxy request failed",
message: error.message,
}),
{
status: 500,
headers: {
"Content-Type": "application/json",
"Access-Control-Allow-Origin": "*",
},
}
);
}
},
};
class DouyinDownloader:
def __init__(self, enable_cf_proxy=False, cf_proxy_url=""):
self.session = requests.Session()
self.session.headers.update({
"User-Agent": USERAGENT,
"Referer": "https://www.douyin.com/",
})
self.ab = ABogus(USERAGENT)
self.extractor = Extractor()
self.enable_cf_proxy = enable_cf_proxy
self.cf_proxy_url = cf_proxy_url.rstrip("/") if cf_proxy_url else ""
print("Initializing (getting ttwid/msToken)...")
self._init_tokens()
def _init_tokens(self):
base_str = string.digits + string.ascii_letters
ms_token = "".join(random.choice(base_str) for _ in range(156))
self.session.cookies.set("msToken", ms_token, domain=".douyin.com")
data = {"region": "cn", "aid": 1768, "needFid": False, "service": "www.ixigua.com",
"migrate_info": {"ticket": "", "source": "node"}, "cbUrlProtocol": "https", "union": True}
# Use CF proxy or direct connection
if self.enable_cf_proxy and self.cf_proxy_url:
url = f"{self.cf_proxy_url}/ttwid/ttwid/union/register/"
resp = self.session.post(url, json=data, timeout=30)
if resp.status_code == 200:
# Extract ttwid cookie from response and set to session
if 'set-cookie' in resp.headers or 'Set-Cookie' in resp.headers:
cookie_header = resp.headers.get('set-cookie') or resp.headers.get('Set-Cookie')
# Debug: Print Set-Cookie (comment out in production)
# print(f"Received Set-Cookie: {cookie_header[:100] if cookie_header else 'None'}")
if cookie_header and 'ttwid=' in cookie_header:
ttwid_match = re.search(r'ttwid=([^;]+)', cookie_header)
if ttwid_match:
ttwid_value = ttwid_match.group(1)
self.session.cookies.set("ttwid", ttwid_value, domain=".douyin.com")
# Debug: Print set cookie (comment out in production)
# print(f"Set ttwid cookie: {ttwid_value[:50]}...")
else:
url = "https://ttwid.bytedance.com/ttwid/union/register/"
resp = self.session.post(url, json=data, timeout=30)
if resp.status_code != 200:
raise Exception(f"Failed to initialize ttwid: HTTP {resp.status_code}")
def get_detail(self, url_input):
url = url_input.strip()
aweme_id = self._resolve_short_url(url)
if not aweme_id:
return None
print(f"Resolved ID: {aweme_id}")
params = {
"device_platform": "webapp",
"aid": "6383",
"channel": "channel_pc_web",
"aweme_id": aweme_id,
"update_version_code": "170400",
"pc_client_type": "1",
"version_code": "190500",
"version_name": "19.5.0",
"cookie_enabled": "true",
"platform": "PC",
"downlink": "10",
"msToken": self.session.cookies.get("msToken")
}
params["a_bogus"] = self.ab.get_value(params)
try:
# Use CF proxy or direct connection
if self.enable_cf_proxy and self.cf_proxy_url:
api = f"{self.cf_proxy_url}/douyin/aweme/v1/web/aweme/detail/"
else:
api = "https://www.douyin.com/aweme/v1/web/aweme/detail/"
self.session.headers.update({"User-Agent": USERAGENT})
# If using CF proxy, manually add Cookie to request headers
if self.enable_cf_proxy and self.cf_proxy_url:
# Get all cookies and build Cookie header
cookie_str = "; ".join([f"{k}={v}" for k, v in self.session.cookies.items()])
# Debug: Print sent Cookie (comment out in production)
# print(f"Sending Cookie: {cookie_str[:100]}...")
headers_with_cookie = {"Cookie": cookie_str}
resp = self.session.get(api, params=params, timeout=30, headers=headers_with_cookie)
else:
resp = self.session.get(api, params=params, timeout=30)
# Debug: Print request info (comment out in production)
# print(f"API Request: {api}")
# print(f"Response Status Code: {resp.status_code}")
# print(f"Response Headers: {dict(resp.headers)}")
if resp.status_code == 200:
try:
resp_json = resp.json()
# If using CF proxy, response will be Base64 encoded
if self.enable_cf_proxy and self.cf_proxy_url and isinstance(resp_json, dict) and 'encoding' in resp_json:
if resp_json.get('encoding') == 'base64':
import base64
decoded_text = base64.b64decode(resp_json['data']).decode('utf-8')
data = json.loads(decoded_text)
# Debug: Print decode success info (comment out in production)
# print(f"Base64 decode success, JSON keys: {list(data.keys())}")
else:
data = resp_json
else:
data = resp_json
if data.get("aweme_detail"):
return self.extractor.extract_data(data["aweme_detail"])
else:
print(f"Failed to get aweme_detail")
except Exception as e:
print(f"JSON parse failed: {e}")
# Debug: Print response content (comment out in production)
# print(f"Response content first 500 chars: {resp.text[:500]}")
else:
print(f"API request failed: {resp.status_code}")
# Debug: Print response content (comment out in production)
# print(f"Response content: {resp.text[:500]}")
except Exception as e:
print(f"Request exception: {e}")
return None
{
"enable_cf_proxy": {
"description": "Whether to enable Cloudflare proxy",
"type": "bool",
"hint": "Enabling this will proxy requests via CF Workers, effectively avoiding IP risk control",
"default": false
},
"cf_proxy_url": {
"description": "Cloudflare Workers Proxy Address",
"type": "string",
"hint": "Enter your deployed CF Workers address, e.g.: https://your-worker.workers.dev",
"default": ""
}
}
# Extract Cookie
cookie_header = resp.headers.get('set-cookie')
ttwid_match = re.search(r'ttwid=([^;]+)', cookie_header)
self.session.cookies.set("ttwid", ttwid_match.group(1), domain=".douyin.com")
# Send Cookie
cookie_str = "; ".join([f"{k}={v}" for k, v in self.session.cookies.items()])
headers_with_cookie = {"Cookie": cookie_str}
resp = self.session.get(api, headers=headers_with_cookie)
# 1. Login to Cloudflare Dashboard
# 2. Go to Workers & Pages
# 3. Create a new Worker
# 4. Paste cloudflare_worker.js code
# 5. Deploy and get Worker URL