intermediary representation (ADT)
In t r o d u c t i o n
In this assignment, we will use Haskell to develop a transpiler that converts Markdownstrings into HyperT ext Markup Language (HTML). This task involves parsing Markdownsyntax and generating corresponding HTML output. A web page is provided, in whichMarkdown will be sent through an HTML-based websocket connection to a Haskellackend server , the Haskell server will need to convert this Markdown into thecorresponding HTML and return it back to the website. A skeleton code was providedwhich will handle the basic communication between the web page and your assignmentcode.Y ou are encouraged to utilise materials covered in previous weeks, including solutionsfor tutorial questions, to aid in the development of your transpiler . Y ou must reference
or cite ideas and code constructs obtained from external sources , as well as
anything else you might find in your independent research, for this assignment.The assignment is split up into Part A (parsing), Part B (pretty printing) and Part Cextras). However , we do recommend completing Part A/Part B in tandem.The language you will parse will be based on the Markdown specification, however withadditional restrictions to reduce ambiguity . It is important that you read therequirementsof each exercise carefully to avoid unnecessary work.G o a l s / L e a r n i n g O u t c o m e sThe purpose of this assignment is to highlight and apply the skills you have learned to apractical exercise (parsing):
- Use functional programming and parsing ef fectively
- Understand and be able to use key functional programming principles (higherorder functions, pure functions, immutable data structures, abstractions)
- Apply Haskell and FP techniques to parse non-trivial Markdown text
S c o p e o f a s s i g n m e n tY ou are only required to parse an expression into the necessary data types and convertthe result to an HTML string such that it can be rendered using an existing interpreter .Y ou will not be required to render the Markdown or HTML strings.Ex e r c i s e s ( 2 4 m a r k s )These exercises provide a structured approach for creating the beginnings of atranspiler .
- Part A (12 marks): Parsing Markdown strings
- Part B (6 marks): Conversion between Markdown and HTML
- Part C (6 marks): Adding extra functionality to the webpage.
- (Extension) Part D Part E: extensions for bonus marks!Y ou must parse the input into an intermediary representation (ADT) such as an
Abstract Syntax T ree to receive marks. This will allow easy conversion between your ADT and HTML. Y ou must add deriving Show to your ADT and all custom types your ADT ontains. (Note that the skeleton code already has deriving Show on the ADT typefor you, which you must not remove.) Y ou must not override this default Show instance as this will help us test your code. Y our Assignment.hs file must export the following functions:
- markdownParser :: Parser ADT
- convertADTHTML :: ADT -> String
Example Scripts
For each of these exercises, there will be a series of provided Markdown files. By
running stack test , it will try to parse the Markdown and save the output to a folder .
This will generate HTML which you can manually view for correctness in a browser .During marking, we will be running your transpiler on more complex examples than theprovided example scripts, therefore, it is important you devise your own test cases toensure your parser is valid on more complex Markdown. It will also aim to produce a git diff , which is the dif ference between your output and the expected output. However , thisrequires installing the git command line tool . So, ensure that it isinstalled.Furthermore, the more recommended way to test your code will be to use npm run
dev in combination with stack run main can be used to run the webpage with a liveeditor , running your code in real-time.Pa r t A : ( 1 2 m a r k s ) : P a r s i n g M a r k d o w nThe first part of this task, requires you to parse a markdown string into an AlgebraicData T ype (ADT) . This requires you to define your own Algebraic Data T ype and definea series of functions that parse everything in the requirements. Consider that you willneed to convert the result to HTML and therefore, your ADT should have enoughinformation to assist you in converting to HTML.A s i d e - T e x t M o d i f i e r s ( 2 m a r k s )There are six dif ferent modifiers for inline text, which can change the way amarkdown string will be rendered. Y ou do not have to worry about any escapecharacters. All text modifiers will need to be strictly non-empty .
- Italic T ext: Specified by a single underscore character , _ . For example,_italics_
- Bold T ext: Specified by a set of two asterisks, ** , around a word. For example,**bold**
- Strikethrough: Specified by two tilde characters, ~~ , around a word. For example,~~strikethrough~~
- Link: Users can include a link to an external page using [link text](URL) .For example, [click here]( www.google.com ) . Y ou do not need toconsider links inside links.
- Inline Code: Users can include code in the middle of sentences, using a backtickcharacter , ` . For example, there is `code` here
- Footnotes: Users can indicate a footnote with [^ ℤ + ], where ℤ + = {1,2,3,…}, i.e.,any positive integer . For example, [^1] , [^2] and so forth. Note that you do
not need to validate any sort of ordering on these numbers, e.g., the markdown
may only contain one footnote [^10] . Y ou also do not need 代 写intermediary representation (ADT) to validate that thefootnote comes with an appropriate reference (see Footnote References ).○ Note thatthere must not be any whitespace inside the [ and ] . Forexample, [^ 1] , [^2 ] , and [ ^3] are all not valid footnotes.Y ou do not need to consider text withnested modifiers, such as **_bold anditalics_** . specified otherwise, the text inside the modifiers can include any amount of
whitespace ( excluding new lines). For example, _ italics _ , **bold ** , ~~
strikethrough~~ , ` inline code
` , and [ link text](example.com) , and
[link text] (example.com) are all valid.Im a g e s ( 0 . 5 m a r k s )
An image is specified with three parts:
- The Alt Text is the alternative text for the image, which is displayed if the
image fails to load or for accessibility purposes.
- The URL is the URL or path to the image file. This can be a web URL or a local
file path. The URL cannot contain any whitespace.
- The Caption Text is the caption for the image.
![Alt Text](URL "Caption Text")
The alternative text, caption text, and URL should not consider the text modifiers .
An image must be at the beginning of a line, and the exclamation mark ( ! ) character
may be preceded by zero or more (non-newline) whitespace characters.
There must be at least one non-newline whitespace character between the URL and
the caption text. For example, ![Alt Text](URL"Caption Text") is not a valid
image.
There must not be any spaces after the ! and before the [ .
F o o t n o t e R e f e r e n c e s ( 0 . 5 m a r k s )
Similarly to footnotes , footnote references consist of at the beginning of a line:
- zero or more (non-newline) whitespace characters, followed by
- [ ^ ℤ + ] , where ℤ + = {1,2,3,…}, i.e., any positive integer , followed by
- a colon ( : ), followed by
- some text. Note that this text will not include the text modifiers . Leading
whitespace before the text should be ignored.
[^1]: My reference.
[^2]:Another reference.
[^3]:
The 2 spaces after the colon should be ignored
[^4]: space before the [
F r e e T e x t ( 1 m a r k )
There can be any amount of text which does not follow any of the following other types.
This text may contain the modifiers . For example:
Here is some **markdown**More lines here
Text
Leading and trailing whitespace, including blank lines, of the whole Markdown input
should be trimmed. For example, if the entire Markdown input ends in a new line, that
should be ignored.
H e a d i n g s ( 1 m a r k )
Markdown headings are denoted by zero or more (non-newline) whitespace characters
followed by one or more hash symbols ( # ) at the beginning of a line, and then at least
one whitespace character (excluding new lines). There can be up to 6 # ’ s, producing a
heading up to level 6.
# Heading 1
## Heading 2
### Heading 3
#### Heading 4
##### Heading 5
###### Heading 6
# This heading has a space before the hash
Note that because at least one non-newline whitespace character is required, this is not
a valid heading: #Heading 1 . Also, because the line must start with the hash
characters (or whitespace), the following is not a valid heading: abc # Heading .
Alternatively , Heading 1 and Heading 2 can be specified with an alternative syntax
(shown below). On the line below the text, add at least 2 equals sign ( = ) characters for
heading level 1 or at least 2 dash ( - ) characters for heading level 2. The line below the
text must not contain any other characters. There is no alternative syntax for any other
heading levels. The heading text, equals sign ( = ) characters and dash ( - ) characters
may be preceded by zero or more (non-newline) whitespace characters.
Alternative Heading 1
======
Heading level 2
---------------
A heading 1 with a space in front of it====
Importantly , headings may include any of the previously mentioned text modifiers , for
example, a heading can be bolded, by surrounding it with a double asterisk.
# **Bolded Heading 1**
B l o c k q u o t e s ( 1 m a r k )
T o create a block quote in Markdown, you use the greater than symbol ( > ) at the
beginning of a line followed by the text you want to quote. The greater than symbol ( > )
may be preceded by zero or more (non-newline) whitespace characters before it on the
same line. Y ou can also include multiple lines of text within the same block quote by
starting each consecutive line with the greater than symbol ( > ). Leading whitespace
after the greater than symbol ( > ) and before the text should be ignored. The text inside
the block quote may have text modifiers . Y ou do not need to consider nested block
quotes. For example :
> This is a block quote.
> It can **span** multiple lines.
> This has a space before > and is also a block quote
C o d e ( 1 m a r k )
A code block in Markdown starts with three backticks ( ``` ) on a line by themselves,
followed by an optional language identifier . The code block ends with another three
backticks on a line by themselves. The code block should not consider the text
modifiers . The first three backticks may have zero or more (non-newline) whitespace
characters preceding it. An example code block is:
```haskell
main :: IO ()
main = do
putStrLn "Never gonna give you up"
putStrLn "Never gonna let you down"
putStrLn "Never gonna run around and desert you"
```
```
Never gonna let you cryNever gonna say goodbye
Never gonna tell a lie and hurt you
```
O r d e r e d L i s t s ( 2 m a r k s )
An ordered list consists of at least one ordered list item separated by exactly 1 new line
character . An ordered list item starts with a positive number at the beginning of a line, a
. (full stop) character , and at least one whitespace character (excluding new lines). An
ordered list must start with the number 1, and any number after that can appear . Y ou do
not have to consider any other numbering system or an unordered list.
Ordered lists may contain sublists, where there will be exactly 4 spaces before each
ordered list item. Each sublist must also start with the number 1. Similar to previous
sections, list items may also contain text modifiers .
- Item 1
- Sub Item 1
- Sub Item 2
- Sub Item 3
- **Bolded Item 2**
- Item 3
- Item 4
Y ou do not have to handle unordered lists.
Ordered lists must not have any whitespace before the number , unless it is the 4
spaces of indentation for a sublist. For example:
- This is an ordered list
- This is not an ordered list (starts with 2 spaces)
T a b l e s ( 3 m a r k s )
T o create a table in Markdown, you use pipes ( | ) to separate columns and at least
three dashes ( - ) between each column to separate the header row from the content
rows. Each column may contain varying amounts of dashes. Each row is written on a
separate line. The beginning and ending pipes ( | ) are compulsory . Each row must have
the same amount of columns. Each cell may also contain text with the text modifiers .
Leading and trailing whitespace before and after the text in each cell should be ignored.
Each row in the table may be preceded by zero or more (non-newline) whitespace
characters.| Tables
| Are
| Cool
|
| ------------- | ------------- | ----- |
| here
| is
| data
|
| here
| is
| data
|
| here | is also | **bolded data** |
| also | part of the | table |
P a r t B : ( 6 m a r k s ) : H T M L C o n v e r s i o n
The second part of this task requires you to convert your ADT into a HTML
representation. The resulting HTML file must be formatted such that it is indented with 4
spaces at the correct level to reflect the tree structure of HTML, ensuring that the HTML
is valid and correctly renders the provided markdown. Y ou do not need to indent the text
modifiers, but other nested objects should be indented correctly .
All HTML generated must be a self-contained webpage, i.e., including the following
information, placing all generated HTML within the <body> tags.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Test</title>
</head>
<body>
GENERATED CONTENT GOES HERE
</body>
</html>
As a reference for the conversion between markdown and HTML, here will be listed the
conversion of all of the examples from above.
T e x t M o d i f i e r s ( 1 m a r k )
- Italics: <em>italics</em>
- Bold: <strong>bold</strong>
- Strikethrough: <del>strikethrough</del>
- Link: <a href="URL">link text</a>
- Inline Code: <code>code</code>● Footnotes: <sup><a id="fn1ref" href="#fn1">1</a></sup> . It is
important that you follow this convention precisely , where 1 is the number
specified with the footnote, to ensure the footnotes work.
I m a g e s ( 0 . 5 m a r k s )
The image must be in an image tag, with the appropriate attributes filled.
<img src="URL" alt="Alt Text" title="Caption Text">
F o o t n o t e R e f e r e n c e s ( 0 . 5 m a r k s )
A footnote reference must be encased in a <p> tag, and have the appropriately
numbered id .
<p id="fn1">My reference.</p>
<p id="fn2">Another reference.</p>
<p id="fn3">The 2 spaces after the colon should be ignored</p>
<p id="fn4">space before the [</p>
F r e e T e x t ( 0 . 5 m a r k s )
Every line of free text must be encased in <p> tags. Y ou do not need to consider how to
handle newlines.
<p>Here is some <strong>markdown</strong></p>
<p>More lines here</p>
<p>Text</p>
H e a d i n g s ( 0 . 5 m a r k s )
Where, the number after the h , contains the level of the heading, for example, in
heading level 1:
<h1>Heading 1</h1>
<h2>Heading 2</h2>
<h3>Heading 3</h3>
<h4>Heading 4</h4>
<h5>Heading 5</h5>
<h6>Heading 6</h6>
<h1>This heading has a space before the hash</h1>
<h1>Alternative Heading 1</h1><h1>Heading level 2</h1>
<h1>A heading 1 with a space in front of it</h1>
B l o c k q u o t e s ( 0 . 5 m a r k s )
Each blockquote must be encased by <blockquote> , while each line within the
blockquote must be encased with a <p> tag.
<blockquote>
<p>This is a block quote.</p>
<p>It can <strong>span</strong> multiple lines.</p>
</blockquote>
C o d e ( 0 . 5 m a r k s )
The code block must be encased in both the <pre> and the <code> tags. If there is a
language identifier (e.g., haskell ), it must be included within the class attribute,
prefixed by language- . Otherwise, there should not be any class attribute. The
newlines and code indentation must remain.
<pre><code class="language-haskell">main :: IO ()
main = do
putStrLn "Never gonna give you up"
putStrLn "Never gonna let you down"
putStrLn "Never gonna run around and desert you"
</code></pre>
<pre><code>Never gonna let you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
</code></pre>
O r d e r e d L i s t s ( 1 m a r k s )
Ordered lists must begin and end with the <ol> tag, and each list item must begin and
end with the opening/closing <li> tag.
<ol>
<li>Item 1
<ol>
<li>Sub Item 1</li>
<li>Sub Item 2</li>
<li>Sub Item 3</li></ol>
</li>
<li><strong>Bolded Item 2</strong></li>
<li>Item 3</li>
<li>Item 4</li>
</ol>
T a b l e s ( 1 m a r k )
The HTML convention for representing tables involves using the <table> , <tr> ,
<th> , and <td> elements. <table> represents the entire table, <tr> represents a
row within the table, <th> represents a header cell within a table row , used for the
header row , and <td> represents a data cell within a table row , used for the content
rows.
Y ou may optionally include <thead> and <tbody> tags. Either of these outupts is
acceptable:
<table>
<tr>
<th>Tables</th>
<th>Are</th>
<th>Cool</th>
</tr>
<tr>
<td>here</td>
<td>is</td>
<td>data</td>
</tr>
<tr>
<td>here</td>
<td>is</td>
<td>data</td>
</tr>
<tr>
<td>here</td>
<td>is also</td>
<td><strong>bolded data</strong></td>
</tr>
<tr>
<td>also</td>
<td>part of the</td>
<td>table</td>
</tr></table>
<table>
<thead>
<tr>
<th>Tables</th>
<th>Are</th>
<th>Cool</th>
</tr>
</thead>
<tbody>
<tr>
<td>here</td>
<td>is</td>
<td>data</td>
</tr>
<tr>
<td>here</td>
<td>is</td>
<td>data</td>
</tr>
<tr>
<td>here</td>
<td>is also</td>
<td><strong>bolded data</strong></td>
</tr>
<tr>
<td>also</td>
<td>part of the</td>
<td>table</td>
</tr>
</tbody>
</table>Pa r t C ( 6 m a r k s ) : A d d i n g e x t r a f u n c t i o n a l i t y t o t h e w e b p a g e
This task involves changing the webpage to include extra capabilities allowing a more
feature-full UI. Y ou will not be marked on the layout, or ease of use of features, as long
as they are clearly visible to your marker , e.g., a button should be clearly visible on the
screen. This task will involve some light additions to both the HTML page and
T ypeScript code. This will likely involve creating an observable stream for the data,
merging it into the subscription stream, and sending the information to the Haskell
backend. The communicated information between the Haskell backend and the
webpage will need to be updated to include additional information that the user wants
the engine to achieve.
- A button must be added to the webpage for saving, where the converted HTML
is saved using Haskell. The user does not need to be prompted for a file name,
and the HTML should be saved according to the current time, formatted in ISO
8601 format for the current date and time: YYYY-MM-DDTHH:MM:SS . The
function getTime is provided which will provide you this time in an IO String
format.
○ If you are on Windows, file names cannot contain colons ( : ). Y ou may
replace this with any reasonable and sensible character you want, such as
an underscore or hyphen.
- A separate input box, to allow the user to change the title of the page , instead of
the default Converted HTML .Pa r t D ( u p t o 6 b o n u s m a r k s ) : E x t e n s i o n
Implement anything that is interesting, impressive, or otherwise “shows of f” your
understanding of Haskell, Functional Programming, and/or Parsing.
T o achieve the maximum amount of bonus marks, the feature should be similar in
complexity to Part C (6 marks):
The bonus marks only apply to this assignment, and the final mark for this assignment
is capped at 30 marks (100%). This means you cannot score more than 30 marks or
100%.
Some suggestions for extensions of varying complexity and dif ficulty:
- Markdown validation
○ E.g., enforce all table columns have the same width
- Correct BNF for the Markdown you are parsing in report (worth 2 marks)
○ For any part of the parser which is not context-free, you may simplify the
parsing rules to be context-free.
- Further extensions to the webpage for extra features, using RxJS
- Parse nested text modifiers, such as **_bold and italics_** and [click
**here**](https://example.com)
- Parse further parts of the markdown specification which make use of interesting
parsers, which you have not used in other parts of the assignment.
- Comprehensive test cases over the parser and pretty printing○ W arning: It is super hard to be comprehensive, stay away unless you love
testing.(Choosing one of the simpler suggestions to implement may not receive the maximumavailable marks).Re p o r t ( 2 m a r k s )Y ou are required to provide areport in PDF format of max. 600 words (markers will not
mark beyond this word limit). Descriptions of extensions can use up to 200 words perextension feature.Make sure to summarise the intention of the code, and highlight the interesting partsand dif ficulties you encountered. Focus on the "why" not the "how". Additionally , just posting screenshots of code is heavily discouraged , unless itcontains something of particular importance. Remember , markers will be looking at yourcode alongside your report, so we do not need to see your code twice.
Importantly , this report must include a description of why and how parser combinatorshelped you complete the parsing. In summary , your report should include the followingsections:
- Design of the code (including data structures)○ High-level description of approach
○ High-level structure of code○ Code architecture choicesParsing○ Usage of parser combinators○ Choices made in creating parsers and parser combinators
○ How parsers and parser combinators were constructed using the Functor ,Applicative, and Monad typeclasses
- Functional Programming (focusing on the why )○ Small modular functions
○ Composing small functions together
○ Declarative style (including point free style)Haskell Language Features Used (focusing on the why ) T ypeclasses and Custom T ypes
○ Higher order functions, fmap, apply , bind
○ Function compositionDescription of Extensions (if applicable)○ What you intended to implement
○ What you did implement
○ What is cool/interesting/complex about it○ This may include using Haskell features that are not covered in coursecontent
There is some overlap between the sections. Y ou should avoid repeating descriptions
or ideas in the report.Co d e Q u a l i t y ( 4 m a r k s )
ode quality will relate more to how understandable your code is. Y ou must havereadable and functional code, commented when necessary . Readable code means
that you keep your lines at a reasonable length (< 80 characters), that you providecomments above non-trivial functions, and that you comment sections of yourcodewhose function may not be clear .Y our functions should all be small and modular , building up in complexity , and taking
advantage of built-in functions or self-defined utility functions when possible. It should
be easy to read and understand what each piece of your code is doing, and why it isuseful. Do not reimplement library functions, such as map, and use the appropriatelibrary function when possible.Y our code should aim to re-use previous functions as much as possible, and not repeatwork when possible.
Code quality includes your ADT and if it is well structured, i.e., does not have a bunch ofrepeated data types and follows a logical manner (the JSON example from theappliedsession is a good example of what an ADT should look like).Ma r k i n g b r e a k d o w nThe main marking criteria for each parsing and pretty printing exercise consists of twoparts: correctness and FP style. Both correctness and FP style will be worth 50% ofthe marks for each of the exercises, i.e., if your code passes all tests, you will get atleast half marks for Exercise A, and Exercise B.Y ou will be provided with somesample input and tests for determining the validity of theoutputted HTML files. The sample inputs provided will not be exhaustive, you are
heavily encouraged to add your own, perhaps covering edge cases.
C o r r e c t n e s s
W e will be running a series of tests which test each exercise, and depending on how
many of the tests you pass, a proportion of marks will be awarded
F P S t y l e
FP style relates to if the code is done in a way that aligns with the unit content and
functional programming.
Y ou must apply concepts from the course. The important thing here is that you need to
use what we have taught you ef fectively . For example, defining a new type and its
Monad instance, but then never actually needing to use it will not give you marks. Note:
using bind (>>=) for the sake of using the Monad when it is not needed will not count
as "ef fective usage."Most importantly , code that does not utilise Haskell's language features, and that
attempts to code in a more imperative style, will not be awarded high marks.M i n i m u m R e q u i r e m e n t s :An estimate of a passing grade will be parsing up to and including code blocks, but notlists or tables, where the dif ficulty and the marks step up. However , this will need to beaccompanied by high code quality and a good report.A higher mark will requireparsing of the more dif ficult data structures, and modifications
of the HTML page.Ch a n g e l o g
- Add note that text modifiers must be non-empty
- Add note about BNF can simplify parser , if and only if the parser is not contextfree.
- 18 Sep: Remove the requirement to parse nested text modifiers and insteadmake that an extension
- 18 Sep: Fix issue in scaf fold where frontend would show output HTML with aleading and trailing quote
- 20 Sep: Changed “Abstract Data T ype” to “Algebraic Data T ype” (under Part A )
- 24 Sep: Clarify that URLs in images should not consider text modifiers
- 25 Sep: Clarify that there should be no spaces after ! and before [ in images
- 25 Sep: Clarify whitespace rules for images, footnote references, headings,blockquotes, code blocks, and tables
- 25 Sep: Specify how to convert a code block with no language identifier to HTML
- 28 Sep: Allow optionally including <thead> and <tbody> when rendering tables
- 29 Sep: Fix indentation in ordered list HTML output
- 6 Oct: Allow replacing colons with another character in file name on Windows