A Hasty Introduction to Web Development

Definitions

Yeah, some of these might be silly, but let's do this!

What's the difference between the internet and the web (I'm really asking this). →

  • the internet - global system of interconnected computer networks; a network of networks
    • internet's underlying protocol for communication is TCP/IP
    • TCP/IP dictates how data should be packetized, addressed, transmitted, routed and received
  • the web - a collection of interconnected documents (web pages) and other resources (images, video, etc.), retrievable by url and connected by hyperlinks
    • HTTP is the protocol used to allow documents and resources to be requested over a network

Other Services

The web is just one of many services available on the internet… what are some others services and protocols on the internet? →

  • email (SMTP)
  • chat (XMPP, OSCAR, IRC)
  • file transfer (FTP)
  • voice (SIP, Skype protocol)
  • these are all examples of network protocols - ways of communicating over a network
 

Protocols

Hm. All this talk about protocols but … what exactly is a protocol? →

It's a bunch of rules and conventions for communication. Really. That's it.

For computers and communication between them, these rules may define:

  • the format for exchanging messages
  • a meaning (semantics) and syntax for these messages
  • the process for synchronizing the communication

A Protocol Example

Eloquent JavaSccript describes a simple chat protocol. For two computers to communicate with this protocol:

  1. one computer sends bits that represent the text, 'CHAT?', to another computer
  2. the other computer responds with 'OK!' to show that it accepts and understands the protocol
  3. from there, they can:
    • proceed to send each other strings of text
    • read the text sent by the other from the network
    • display the received text

A Slightly Closer Look at TCP/IP

The previous slides described Application protocols … (chat, mail, specific applications). However, these protocols don't define how data/messages actually gets from one computer to another in a networked environment →

  • how does a message get translated from (for example) plain text to electronic signals… and how is sent over the Internet, and translated back to plain text?
  • welp! there are other protocols - a stack of protocols that describe how this communication works
  • this stack is of protocols is often referred to as the TCP/IP stack
    • (mainly because TCP and IP are two of the major protocols involved)
 

TCP/IP Continued

The TCP/IP stack consists of 4 layers:

  1. Application Layer - application level protocols such as HTTP, SMTP, etc.
  2. Transport Layer - protocols involved in communication (connection establishment, flow-control) between applications (either on the same host/computer or different host), such as TCP or UDP
  3. Network Layer - the protocol responsible for routing packets of data across network boundaries - directs data to a specific computer / host, which is IP or Internet Protocol
  4. Physical (hardware) Layer / Link Layer - converts data to network signals and back (wi-fi, ethernet)

Sending a Message Over the Internet

Check out the diagram from this whitepaper on how the internet works (!). The whitepaper describes sending data from one host (computer) to another through the internet →

  1. messages start at the top of the stack and work downward
  2. each layer that the message passes through may break the message up into smaller chunks of more manageable data called packets
  3. packets go through the Application Layer and continue to the Transport Layer where each packet is assigned a port number (loosely speaking a number that specifies which program on the destination computer needs to receive the message)
  4. packets then proceed to the Network Layer, where each packet receives its destination IP address (number that identifies a computer on the network)
 

Sending a Message Over the Internet Continued

Starting from the hardware layer of this diagram, our message continues its journey! →

  1. with a port number and an IP address, the hardware layer turns packets of data into electronic signals and transmits them
  2. these packets eventually arrive at the other host (often going through intermediary routers in the process), and work their way back up the stack
  3. as the packets go upwards through the stack, all routing data that the sending computer's stack added (such as IP address and port number) is stripped from the packets
  4. when the data reaches the top of the stack, the packets have been re-assembled into their original form


Again, all of this comes from this whitepaper. Although it's nearly a couple of decades old, the networking aspects are still very relevant.

 

It All Starts With a URL

Each document or resource on the web is retrievable by a name, a URL (Universal Resource Locator). What are the parts to a URL? →

  • scheme/protocol - http (er, browsers accept schema-less)
  • domain or actual ip address - pizzaforyou.com
  • port (optional) - 80 (default if http)
  • path - /search
  • query_string (optional) - ?type=vegan
  • fragment_id (optional) - #topresult


scheme://domain:port/path?query_string#fragment_id

http://pizzaforyou.com:80/search?type=vegan#top_result

Domains and IP Addresses

Each machine connected to the Internet gets a unique IP address.

We can map domains to IP addresses through DNS (Domain Name System).

  • both IP Addresses and domains are acceptable in a URL.
  • on OSX, Linux (and windows), there's a file that allows you to map names to ip addresses (before using dns)
  • typically /etc/hosts or hosts.txt
  • localhost maps to 127.0.0.1… which essentially is your computer

HTTP

To retrieve documents on the web, we use HTTP (Hyper Text Transfer Protocol). The computer/application asking for the document is the client or user-agent, and the computer responding to requests for documents is the server.

  • generally, the server is going to be some sort of web server, like Apache or Nginx
  • the client (or the user-agent) is usuall → some sort of browser, like Chrome or Safari (there are clients other than browsers too)


HTTP is a request-response protocol, a very basic text-based (at least for version 1.1) communication method between computers:

  • the client sends a request for some data
  • the second computer responds to the request
 

HTTP Continued

The interaction between your browser and a web server goes something like this:

  1. the browser attempts to connect to the address of the server
  2. if the server is listening and reachable, a TCP connection is made between the server and the client on port 80 (the default port for HTTP traffic)
  3. the browser sends a request message
  4. on the same connection, the web server gives back a response message

A Request Message

A request consists of:

  • a request line … which includes a request method and a path - GET /teaching HTTP/1.1
  • request headers Host: jvers.comUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20100101 
  • an empty line
  • an optional body
  • note that a new line is represented by a carriage return, line feed\r\n


Here's a list of request header fields.

 

Request Methods

Here's a list of available request methods. A request method (sometimes called a verb) tells the server what action perform on the identified resource. A couple of common ones are … →

  • GET - retrieve specified resource/data without any other effect… (reading data)
    • data is passed through query string parameters / url
  • POST - requests that the server accept the data in the request for storage (creating data)
    • data is passed in body of request

A Response Message

A response consists of:

  • a status-line … which includes a status code and reason - HTTP/1.1 200 OK
  • response header fields - Content-Type: text/html
  • an empty line
  • an optional message body - usually an HTML document!
  • note that a new line is represented by a carriage return, line feed\r\n


And of course, a list of response header fields.

 

Status Codes

The status code that a server responds with is a numeric code that indicates the result of the request. Some typical status codes are →

  • 200 OK - request was successful!
  • 404 Not Found - resource was not found, but may be available again in the future
  • 500 Server Error - generic server error

Status Codes Continued

There are 5 different classes of status codes →

  • 1xx - Informational, request received
  • 2xx - Success, request was received, understood and accepted
  • 3xx - Redirection, additional action must be taken to complete request
  • 4xx - Client Error
  • 5xx - Server Error

A Sample Interaction

A request (again, using \r\n as newlines):

GET /teaching/ HTTP/1.1
Host: jvers.com
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36
Accept-Language: en-US,en;q=0.8
 

A Sample Interaction Continued

A response (again, using \r\n as newlines):

HTTP/1.1 200 OK
Date: Thu, 18 Feb 2016 15:23:39 GMT
Server: Apache/2.2.15 (Red Hat)
Accept-Ranges: bytes
Content-Length: 163
Content-Type: text/html; charset=UTF-8
Set-Cookie: STATICSERVERID=s3; path=/
Cache-control: private

<h2>Check out my fancy header!</h2>
 

Some Tooling (in increasing order of ease-of-use)

 

netcat

nc is a commandline utility for connection and communication through TCP or UDP. It can take a host and port as arguments:

nc cs.nyu.edu 80

Then… start typing! Let's try to retrieve the root document (/).

GET / HTTP/1.1
Host: cs.nyu.edu

Or… the document /home/index.html

GET /home/index.html HTTP/1.1
Host: cs.nyu.edu
 

curl

curl is a command line tool to transfer data to and from a server.

  • the -I (uppercase I) flag retrieves headers only
  • the -i (lowercase i) flag retrieves headers and body

 

# get the response headers for google.com only
curl -I google.com

# get the response headers for www.google.com only
curl -I www.google.com

# get the body only
curl www.google.com

# get the entire response (headers and body)
curl -i www.google.com

Note that the above examples all use GET as the http method

curl and POST

curl can also be used to send a request body along with a POST request. To do this →

  1. use the --data or -d flag sends the data after the option as an HTTP POST
    • curl -d username=foo
    • multiple -d options are merged: curl -d username=foo -d password=bar results in username=foo&password=bar
  2. use the -F flag to emulate sending form data (including sending files!)
    • causes curl to POST data using the Content-Type multipart/form-data header
    • (emulates pressing the submit button in a form)

 

posted @ 2023-02-13 05:24  M1stF0rest  阅读(10)  评论(0编辑  收藏  举报