Python’s Requests Library (Guide) – Real Python

Body content workflow

By default, when you make a request, the body of the response is downloaded
immediately. You can override this behaviour and defer downloading the response
body until you access the :attr:`Response.content <requests.Response.content>`
attribute with the stream parameter:

Chunk-encoded requests

Requests also supports Chunked transfer encoding for outgoing and incoming requests.
To send a chunk-encoded request, simply provide a generator (or any iterator without
a length) for your body:

Content

The response of a GET request often has some valuable information, known as a payload, in the message body. Using the attributes and methods of Response, you can view the payload in a variety of different formats.

To see the response’s content in bytes, you use .content:

>>>

>>> response=requests.get('https://api.github.com')>>> response.contentb'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'

While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. response will do that for you when you access .text:

>>>

>>> response.text'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'

Because the decoding of bytes to a str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:

>>>

>>> response.encoding='utf-8'# Optional: requests infers this internally>>> response.text'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'

If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():

>>>

>>> response.json(){'current_user_url': 'https://api.github.com/user', 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}', 'authorizations_url': 'https://api.github.com/authorizations', 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}', 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}', 'emails_url': 'https://api.github.com/user/emails', 'emojis_url': 'https://api.github.com/emojis', 'events_url': 'https://api.github.com/events', 'feeds_url': 'https://api.github.com/feeds', 'followers_url': 'https://api.github.com/user/followers', 'following_url': 'https://api.github.com/user/following{/target}', 'gists_url': 'https://api.github.com/gists{/gist_id}', 'hub_url': 'https://api.github.com/hub', 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}', 'issues_url': 'https://api.github.com/issues', 'keys_url': 'https://api.github.com/user/keys', 'notifications_url': 'https://api.github.com/notifications', 'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}', 'organization_url': 'https://api.github.com/orgs/{org}', 'public_gists_url': 'https://api.github.com/gists/public', 'rate_limit_url': 'https://api.github.com/rate_limit', 'repository_url': 'https://api.github.com/repos/{owner}/{repo}', 'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}', 'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}', 'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}', 'starred_gists_url': 'https://api.github.com/gists/starred', 'team_url': 'https://api.github.com/teams', 'user_url': 'https://api.github.com/users/{user}', 'user_organizations_url': 'https://api.github.com/user/orgs', 'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}', 'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}'}

The type of the return value of .json() is a dictionary, so you can access values in the object by key.

You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.

Custom authentication

Requests allows you to specify your own authentication mechanism.

Any callable which is passed as the auth argument to a request method will
have the opportunity to modify the request before it is dispatched.

Event hooks

Requests has a hook system that you can use to manipulate portions of
the request process, or signal event handling.

Available hooks:

response:
The response generated from a Request.

You can assign a hook function on a per-request basis by passing a
{hook_name: callback_function} dictionary to the hooks request
parameter:

hooks={'response': print_url}

That callback_function will receive a chunk of data as its first
argument.

def print_url(r, *args, **kwargs):
    print(r.url)

Your callback function must handle its own exceptions. Any unhandled exception won’t be passed silently and thus should be handled by the code calling Requests.

If the callback function returns a value, it is assumed that it is to
replace the data that was passed in. If the function doesn’t return
anything, nothing else is affected.

def record_hook(r, *args, **kwargs):
    r.hook_called = True
    return r

Let’s print some request method arguments at runtime:

Example: specific ssl version

The Requests team has made a specific choice to use whatever SSL version is
default in the underlying library (urllib3). Normally this is fine, but from
time to time, you might find yourself needing to connect to a service-endpoint
that uses a version that isn’t compatible with the default.

Getting started with requests

Let’s begin by installing the requests library. To do so, run the following command:

If you prefer to use Pipenv for managing Python packages, you can run the following:

Once requests is installed, you can use it in your application. Importing requests looks like this:

Now that you’re all set up, it’s time to begin your journey through requests. Your first goal will be learning how to make a GET request.

Header ordering

In unusual circumstances you may want to provide headers in an ordered manner. If you pass an OrderedDict to the headers keyword argument, that will provide the headers with an ordering. However, the ordering of the default headers used by Requests will be preferred, which means that if you override default headers in the headers keyword argument, they may appear out of order compared to other headers in that keyword argument.

Keep-alive

Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session!
Any requests that you make within a session will automatically reuse the appropriate
connection!

Note that connections are only released back to the pool for reuse once all body
data has been read; be sure to either set stream to False or read the
content property of the Response object.

Log in to website using python requests module

As @hlt have commented, you must name field the same, as they are named in the form.

Also server may validate “Remember Username” checkbox, so it is better to include it in your request.

payload_login = {
    'proxyUsername': username,
    'proxyPassword': password,
    'proxyRememberUser': true
}

If this does not work for you, it means what site send auth-data is different way. For example, some JS-script may add hidden data in request, or encode some fields.

To find it out, you need to search this HTTP-request in your Browser’s Developter Panel or in a external HTTP-sniffer (like Fiddler).

Performance

When using requests, especially in a production application environment, it’s important to consider performance implications. Features like timeout control, sessions, and retry limits can help you keep your application running smoothly.

Post multiple multipart-encoded files

You can send multiple files in one request. For example, suppose you want to
upload image files to an HTML form with a multiple file field ‘images’:

To do that, just set files to a list of tuples of (form_field_name, file_info):

Proxies

If you need to use a proxy, you can configure individual requests with the
proxies argument to any request method:

Python requests and persistent sessions

the other answers help to understand how to maintain such a session. Additionally, I want to provide a class which keeps the session maintained over different runs of a script (with a cache file). This means a proper “login” is only performed when required (timout or no session exists in cache). Also it supports proxy settings over subsequent calls to ‘get’ or ‘post’.

It is tested with Python3.

Use it as a basis for your own code. The following snippets are release with GPL v3

import pickle
import datetime
import os
from urllib.parse import urlparse
import requests    

class MyLoginSession:
    """
    a class which handles and saves login sessions. It also keeps track of proxy settings.
    It does also maintine a cache-file for restoring session data from earlier
    script executions.
    """
    def __init__(self,
                 loginUrl,
                 loginData,
                 loginTestUrl,
                 loginTestString,
                 sessionFileAppendix = '_session.dat',
                 maxSessionTimeSeconds = 30 * 60,
                 proxies = None,
                 userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20220101 Firefox/40.1',
                 debug = True,
                 forceLogin = False,
                 **kwargs):
        """
        save some information needed to login the session

        you'll have to provide 'loginTestString' which will be looked for in the
        responses html to make sure, you've properly been logged in

        'proxies' is of format { 'https' : 'https://user:pass@server:port', 'http' : ...
        'loginData' will be sent as post data (dictionary of id : value).
        'maxSessionTimeSeconds' will be used to determine when to re-login.
        """
        urlData = urlparse(loginUrl)

        self.proxies = proxies
        self.loginData = loginData
        self.loginUrl = loginUrl
        self.loginTestUrl = loginTestUrl
        self.maxSessionTime = maxSessionTimeSeconds
        self.sessionFile = urlData.netloc   sessionFileAppendix
        self.userAgent = userAgent
        self.loginTestString = loginTestString
        self.debug = debug

        self.login(forceLogin, **kwargs)

    def modification_date(self, filename):
        """
        return last file modification date as datetime object
        """
        t = os.path.getmtime(filename)
        return datetime.datetime.fromtimestamp(t)

    def login(self, forceLogin = False, **kwargs):
        """
        login to a session. Try to read last saved session from cache file. If this fails
        do proper login. If the last cache access was too old, also perform a proper login.
        Always updates session cache file.
        """
        wasReadFromCache = False
        if self.debug:
            print('loading or generating session...')
        if os.path.exists(self.sessionFile) and not forceLogin:
            time = self.modification_date(self.sessionFile)         

            # only load if file less than 30 minutes old
            lastModification = (datetime.datetime.now() - time).seconds
            if lastModification < self.maxSessionTime:
                with open(self.sessionFile, "rb") as f:
                    self.session = pickle.load(f)
                    wasReadFromCache = True
                    if self.debug:
                        print("loaded session from cache (last access %ds ago) "
                              % lastModification)
        if not wasReadFromCache:
            self.session = requests.Session()
            self.session.headers.update({'user-agent' : self.userAgent})
            res = self.session.post(self.loginUrl, data = self.loginData, 
                                    proxies = self.proxies, **kwargs)

            if self.debug:
                print('created new session with login' )
            self.saveSessionToCache()

        # test login
        res = self.session.get(self.loginTestUrl)
        if res.text.lower().find(self.loginTestString.lower()) < 0:
            raise Exception("could not log into provided site '%s'"
                            " (did not find successful login string)"
                            % self.loginUrl)

    def saveSessionToCache(self):
        """
        save session to a cache file
        """
        # always save (to update timeout)
        with open(self.sessionFile, "wb") as f:
            pickle.dump(self.session, f)
            if self.debug:
                print('updated session cache-file %s' % self.sessionFile)

    def retrieveContent(self, url, method = "get", postData = None, **kwargs):
        """
        return the content of the url with respect to the session.

        If 'method' is not 'get', the url will be called with 'postData'
        as a post request.
        """
        if method == 'get':
            res = self.session.get(url , proxies = self.proxies, **kwargs)
        else:
            res = self.session.post(url , data = postData, proxies = self.proxies, **kwargs)

        # the session has been updated on the server, so also update in cache
        self.saveSessionToCache()            

        return res

A code snippet for using the above class may look like this:

if __name__ == "__main__":
    # proxies = {'https' : 'https://user:pass@server:port',
    #           'http' : 'http://user:pass@server:port'}

    loginData = {'user' : 'usr',
                 'password' :  'pwd'}

    loginUrl = 'https://...'
    loginTestUrl = 'https://...'
    successStr = 'Hello Tom'
    s = MyLoginSession(loginUrl, loginData, loginTestUrl, successStr, 
                       #proxies = proxies
                       )

    res = s.retrieveContent('https://....')
    print(res.text)

    # if, for instance, login via JSON values required try this:
    s = MyLoginSession(loginUrl, None, loginTestUrl, successStr, 
                       #proxies = proxies,
                       json = loginData)

Socks

.. versionadded:: 2.10.0

Streaming uploads

Requests supports streaming uploads, which allow you to send large streams or
files without reading them into memory. To stream and upload, simply provide a
file-like object for your body:

Timeouts

Most requests to external servers should have a timeout attached, in case the
server is not responding in a timely manner. By default, requests do not time
out unless a timeout value is set explicitly. Without a timeout, your code may
hang for minutes or more.

The connect timeout is the number of seconds Requests will wait for your
client to establish a connection to a remote machine (corresponding to the
connect()) call on the socket. It’s a good practice to set connect timeouts
to slightly larger than a multiple of 3, which is the default TCP packet
retransmission window.

Как авторизоваться на сайте со всплывающим окном авторизации?

Вам следует посмотреть, какие нужно передать значения методом POST. Если Вы открываете в браузере Google Chrome, то нажмите F12, или Просмотреть код. Затем перейдите во вкладку Network и авторизуйтесь на сайте. После авторизации нажмите на файл в таблице Name слева (чаще всего это самый первый файл, он может называться login/). Прокрутите до самого низа и найдите блок Form Data. Здесь вы можете найти все данные, которые следует передать, чтобы успешно авторизоваться на сайте, там же есть и уникальный токен о котором вы спрашивали в другом ответе.

В примере ниже значения взяты для сайта https://www.pythonanywhere.com/ (вы же должны передать те, которые есть в блоке Form Data)

...
data = {
        'csrfmiddlewaretoken': session.cookies.get('csrftoken'), 
        'auth-username': 'your_login', 
        'auth-password': 'your_password', 
        'login_view-current_step': 'auth'
}

session.headers.update({'Referer': url})
post_request = session.post(url, data=data)
... 

Плюс ко всему можете открыть файл hh_success.html (находится в той же директории, где и .py скрипт) и посмотреть его содержимое, корректно ли авторизовались и тд…

Пользовательский механизм аутентификации.

Любой вызываемый объект, который передается в качестве аргумента auth методу запроса, имеет возможность изменить запрос перед его отправкой.

Представим, что у есть веб-сервис, который будет отвечать только в том случае, если заголовок X-Pizza имеет значение пользователя. Это маловероятно, ну просто представим такую схему.

Затем можно сделать запрос, используя созданный класс PizzaAuth():

Пример базовой аутентификации.

Библиотека requests позволяет легко использовать многие формы аутентификации, включая очень распространенную базовую аутентификацию.

Расширение requests-oauthlib.

Расширение requests-oauthlib позволяет автоматически выполнять OAuth1 и OAuth2 аутентификацию из библиотеки requests без танцев с бубном.

Расширение будет полезно для большого числа веб-сайтов, которые используют OAuth1/2 для обеспечения быстрой аутентификации. Оно также предоставляет множество настроек, которые обрабатывают способы, которыми конкретные поставщики OAuth отличаются от стандартных спецификаций.

Для начала использования requests-oauthlib, его необходимо установить.

$ pip install requests_oauthlib

Расширение использует библиотеки Python requests и |OAuthlib|, чтобы предоставить простой в использовании интерфейс для создания клиентов OAuth1 и OAuth2.

В приведенном ниже примере показано веб-приложение, использующее веб-фреймворк Flask, которое подключается к API Github OAuth2. Этот пример должен быть легко перенесен на любой веб-фреймворк.

Хотя последовательность операций у большинства провайдеров остается неизменной, особенность Github заключается в том, что параметр redirect_uri является необязательным. Это означает, что может потребоваться явная передача redirect_uri объекту OAuth2Session (например, при создании настраиваемого OAuthProvider с помощью |flask-oauthlib|).

fromrequests_oauthlibimportOAuth2SessionfromflaskimportFlask,request,redirect,session,url_forfromflask.jsonimportjsonifyimportosapp=Flask(__name__)# Эта информация получена при регистрации нового приложения# GitHub OAuth здесь: https://github.com/settings/applications/newclient_id="<your client key>"client_secret="<your client secret>"authorization_base_url='https://github.com/login/oauth/authorize'token_url='https://github.com/login/oauth/access_token'@app.route("/")defdemo():"""1: Авторизация пользователя.    Перенаправление пользователя/владельца ресурса к поставщику     OAuth (например, Github) использование URL-адреса с несколькими    ключевыми параметрами OAuth.    """github=OAuth2Session(client_id)authorization_url,state=github.authorization_url(authorization_base_url)# Состояние используется для предотвращения CSRF, оставим на потом.session['oauth_state']=statereturnredirect(authorization_url)# 2: Авторизация пользователя, это происходит на провайдере.@app.route("/callback",methods=["GET"])defcallback():""" 3: Получение токена доступа.    Пользователь был перенаправлен обратно от поставщика на     зарегистрированный URL обратного вызова. С этим перенаправлением     приходит код авторизации, включенный в URL-адрес перенаправления.     Используем это, чтобы получить маркер доступа.    """github=OAuth2Session(client_id,state=session['oauth_state'])token=github.fetch_token(token_url,client_secret=client_secret,authorization_response=request.url)# На этом этапе уже можно получить защищенные ресурсы, сохраним # токен и покажем, как это делается из сохраненного токена в /profile.session['oauth_token']=tokenreturnredirect(url_for('.profile'))@app.route("/profile",methods=["GET"])defprofile():"""Извлечение защищенного ресурса с помощью токена OAuth 2.    """github=OAuth2Session(client_id,token=session['oauth_token'])returnjsonify(github.get('https://api.github.com/user').json())if__name__=="__main__":# Это позволяет нам использовать вызов по HTTP os.environ['OAUTHLIB_INSECURE_TRANSPORT']="1"app.secret_key=os.urandom(24)app.run(debug=True)

Conclusion

You’ve come a long way in learning about Python’s powerful requests library.

You’re now able to:

Because you learned how to use requests, you’re equipped to explore the wide world of web services and build awesome applications using the fascinating data they provide.

Похожее:  ПО НОМЕРУ ТЕЛЕФОНА УЗНАЕТЕ КТО ЗВОНИЛ ВАМ НА МОБИЛЬНЫЙ ТЕЛЕФОН

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *