Discuss Scratch

KatyPerryLinux
Scratcher
31 posts

Why does my PHP code show sites in French or German?

This code loads HTML from a URL within my webpage. My server has a French IP, but most sites show in English except google.com (French) and samsung.com (German). Accept-Language is set and I don't use a VPN.

<?php
function add_scheme_if_missing($url) {
    if (!preg_match('#^https?://#', $url)) {
        $url = 'http://' . $url;
    }
    return $url;
}
if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_POST['url'])) {
    $url = filter_var($_POST['url'], FILTER_SANITIZE_URL);
    $url = add_scheme_if_missing($url);
    if (filter_var($url, FILTER_VALIDATE_URL)) {
        header("Location: ?url=" . urlencode($url));
        exit();
    } else {
        $htmlContent = "Invalid URL";
    }
} elseif (isset($_GET['url'])) {
    $url = filter_var($_GET['url'], FILTER_SANITIZE_URL);
    $url = add_scheme_if_missing($url);
    if (filter_var($url, FILTER_VALIDATE_URL)) {
        // Setting a custom user agent
        $options = [
            'http' => [
                'header' => [
                    'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0',
                    'Accept-Language: en'
                ]
            ]
        ];
        $context = stream_context_create($options);
        $htmlContent = @file_get_contents($url, false, $context);
        if ($htmlContent === FALSE) {
            $htmlContent = "Failed to retrieve content";
        } else {
            // Extract the title from the HTML content
            $dom = new DOMDocument;
            @$dom->loadHTML($htmlContent);
            $titleNodes = $dom->getElementsByTagName('title');
            $pageTitle = ($titleNodes->length > 0) ? $titleNodes->item(0)->textContent : 'No Title';
            // Extract the favicon URL
            $linkNodes = $dom->getElementsByTagName('link');
            $faviconUrl = "";
            foreach ($linkNodes as $linkNode) {
                if ($linkNode->getAttribute('rel') === 'icon' || $linkNode->getAttribute('rel') === 'shortcut icon') {
                    $faviconUrl = $linkNode->getAttribute('href');
                    if (!parse_url($faviconUrl, PHP_URL_SCHEME)) {
                        $faviconUrl = rtrim($url, '/') . '/' . ltrim($faviconUrl, '/');
                    }
                    break;
                }
            }
            if (empty($faviconUrl)) {
                $faviconUrl = rtrim($url, '/') . '/favicon.ico';
            }
            $htmlContent = convert_links_to_get($htmlContent, $url);
        }
    } else {
        $htmlContent = "Invalid URL";
    }
} else {
    $url = "";
    $htmlContent = "";
    $pageTitle = "Fetch Tool";
    $faviconUrl = "";
}
function convert_links_to_get($html, $baseUrl) {
    $dom = new DOMDocument;
    @$dom->loadHTML($html);
    $tags = [
        'a' => 'href',
        'img' => 'src',
        'link' => 'href',
        'script' => 'src',
    ];
    foreach ($tags as $tag => $attribute) {
        $elements = $dom->getElementsByTagName($tag);
        foreach ($elements as $element) {
            $attrValue = $element->getAttribute($attribute);
            if ($attrValue && !parse_url($attrValue, PHP_URL_SCHEME)) {
                $attrValue = rtrim($baseUrl, '/') . '/' . ltrim($attrValue, '/');
                $element->setAttribute($attribute, $attrValue);
            } elseif (parse_url($attrValue, PHP_URL_HOST) === parse_url($_SERVER['HTTP_HOST'], PHP_URL_HOST)) {
                $attrValue = str_replace(parse_url($_SERVER['HTTP_HOST'], PHP_URL_HOST), parse_url($baseUrl, PHP_URL_HOST), $attrValue);
                $element->setAttribute($attribute, $attrValue);
            }
        }
    }
    return $dom->saveHTML();
}
?>
DifferentDance8
Scratcher
1000+ posts

Why does my PHP code show sites in French or German?

You have set the Accept-Language variable to “en”, which means that sites won't look at the IP and go “hmm, this is a French IP so maybe the language should be set to French” but rather look at the Accept-Language and see that it's “en” and assume that you want it in English.
Mrcomputer1
Scratcher
500+ posts

Why does my PHP code show sites in French or German?

DifferentDance8 wrote:

You have set the Accept-Language variable to “en”, which means that sites won't look at the IP and go “hmm, this is a French IP so maybe the language should be set to French” but rather look at the Accept-Language and see that it's “en” and assume that you want it in English.
That is their problem, most sites are respecting the header (or just default to English regardless of location or header) but Google and Samsung aren't respecting the header.
DifferentDance8
Scratcher
1000+ posts

Why does my PHP code show sites in French or German?

Mrcomputer1 wrote:

DifferentDance8 wrote:

You have set the Accept-Language variable to “en”, which means that sites won't look at the IP and go “hmm, this is a French IP so maybe the language should be set to French” but rather look at the Accept-Language and see that it's “en” and assume that you want it in English.
That is their problem, most sites are respecting the header (or just default to English regardless of location or header) but Google and Samsung aren't respecting the header.
In the case of Google, maybe they geodetected the IP and it turned out to be a French IP so they defaulted to France?
In the case of Samsung, IDK either
dumorando
Scratcher
100+ posts

Why does my PHP code show sites in French or German?

DifferentDance8 wrote:

Accept-Language variable to “en”,
ermmm ackchually its a header
you can do this in php with header(“Accept-Language: en”); i think
Steve0Greatness
Scratcher
1000+ posts

Why does my PHP code show sites in French or German?

dumorando wrote:

ermmm ackchually its a header
you can do this in php with header(“Accept-Language: en”); i think
That's for responding to an incoming request, the above code is sending an outgoing request.

Powered by DjangoBB